Documente Academic
Documente Profesional
Documente Cultură
Your customer has asked you to report the Cpk of the product you are sending. You know that to compute the Cpk, you need to have the product specifications, and that you need to have the mean and sigma. As you gather the information, someone asks, "Which sigma do they want?" You know that Cpk is calculated by dividing by 3 sigma. But which sigma should you use, estimated or calculated? Which is correct? Which would you report? Naturally, most of us would use the sigma that makes the Cpk look the best. But the sigma that makes the Cpk look best may not accurately reflect what you or your customer need to know about the process. Confusion over calculating Cpk by two different methods is one reason that a new index, Ppk, was developed. Ppk uses the calculated sigma from the individual data.
Given that Ppk uses the calculated sigma, it is no longer necessary to use the calculated sigma in Cpk. The only acceptable formula for Cpk uses the estimated sigma.
Estimated sigma:
Given that Ppk uses the calculated sigma, it is no longer necessary to use the calculated sigma in Cpk. The only acceptable formula for Cpk uses the estimated sigma. In 1991, the ASQC/AIAG Task Force published the "Fundamental Statistical Process Control" reference manual, which shows the calculations for Cpk as well as Ppk. These should be used to eliminate confusion about calculating Cpk. So which value is best to report, Cpk or Ppk? Although they show similar information, they have slightly different uses. Estimated sigma and the related capability indices (Cp, Cpk, and Cr) are used to measure the potential capability of a system to meet customer needs. Use it when you want to analyze a system's aptitude to perform. Actual or calculated sigma (sigma of the individuals) and the related indices (Pp, Ppk, and Pr) are used to measure the performance of a system to meet customer needs. Use it when you want to measure a system's actual process performance. Once you determine which capability index you will use, it can easily be calculated using software such as SQCpack or CHARTrunner.
Matt Savage
You, of course, provide the specifications. Now that you have these 3 pieces of information, the Cpk can be easily calculated. For example, lets say your process average is closer to the upper specification. Then Cpk is calculated by the following: Cpk = (USL - Mean) /( 3*Est. sigma). As you can see, the data is not directly used. The data is only indirectly used. It is used to determine mean and average range, but the raw data is not used in the Cpk calculation. Here is an example that might serve to clarify. Suppose you have the following example of 14 subgroups with a subgroup size of 2 Sample No. Average Range
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Average
0.03 0.10 0.05 1.00 1.50 1.10 1.10 1.10 1.25 1.00 0.75 0.75 1.00 1.20
0.06 0.20 0.10 0.00 1.50 1.50 1.00 1.01 1.20 0.30 0.76 0.50 1.10 1.40
0.045 0.150 0.075 0.500 1.500 1.300 1.050 1.055 1.225 0.650 0.755 0.625 1.050 1.300
0.030 0.100 0.050 1.000 0.000 0.400 0.100 0.090 0.050 0.700 0.010 0.250 0.100 0.200
The mean, X-bar, is 0.8057 and the average range, R-bar, is 0.220. For this example, the upper specification is 2.12, the target value is 1.12, and the lower specification is 0.12. In the data shown above, more than 21% of the data is outside the specification, so you would expect Cpk to be low, right? As it turns out, Cpk is relatively healthy at 1.17. (Yes, for this example, we have ignored the first cardinal rule: Before one looks at Cpk, the process must be in control.) Before we go on, lets check the math. Mean = 0.8057 Average range = 0.2200 Est. sigma = R-bar / d2 = 0.2200/1.128 =0.1950 Cpk = smallest of (Zupper and Zlower) / 3 Zlower = (Mean - LSL) / Est. sigma = (.8057 - 0.12) / .1950 = .6857 / .1950 Zlower = 3.516 Zupper is larger, so in this example, Cpk = Zlower / 3 = 3.516 / 3 Cpk = 1.172 So what gives? Here is an example where Cpk is good, yet the process is not centered and data is outside of at least one of the specifications. The reason Cpk is good is because the average range is understated and thus when you divide by the estimated sigma (which uses the average range), it over-inflates Cpk. The reason the average range is understated will be discussed in a future article. One last note, if you look at this data on a control chart, you will quickly see that it is not in control. Therefore, the Cpk statistic should be ignored when the process is not in control.
Matt Savage
Ppk, on the other hand, uses the standard deviation from all of the data. We can call this the sigma of the individual values or sigmai. Sigma of the individual values looks at variation within and between subgroups. For a process that exhibits drifting, estimated sigma would not pick up the total variation in the process and thus the Cpk becomes a cloudy statistic. In other words, one can not be sure it is a valid statistic.In contrast to Cpk, Ppk, which uses the sigma of the individual values, would pick up all the variation in the process. Again, sigmai uses between and within subgroup variation. So if there is drifting in the process, sigmai would typically be larger than the estimated sigma, sigmae, and thus Ppk would, as it should, be lower than Cpk.Here is a quick review of the formulae for Cpk and Ppk: Cpk = Zmin/3 where Zmin Zmin = (USL Mean) / est.sigma or = (Mean - LSL) / est. sigma Ppk = Zmin/3 where Zmin Zmin = (USL Mean) / sigmai = (Mean - LSL) / sigmai
We should be concerned with how well the process is behaving, therefore Ppk might be preferred over Cpk. Ppk is a more conservative approach to answering the question, "How good is my process?" Watch for a future article discussing the relatively new capability index, Cpm, and how it stacks up against Cpk and Ppk.
Chart produced using SQCpack A hypothetical example might clarify the point: Suppose I have 100 pieces of data that are grouped into units of 5 each. The chart above shows how the control chart and histogram of the data might look. In this example, the process is in control, and the Cpk = 1.252. Suppose I am evaluating another process whose mean and specifications are the same, but the Cpk = 1.803. Most of us would want to have the second process with the higher Cpk, but is the process necessarily better? Unless you determine if the process is in statistical control, you can not fairly answer this question.
As it turns out, the data is exactly the same, but what has changed is the order in which the data was grouped in the samples. This caused the range of the subgroups and R-bar (the average range) to be different. In the second data set, the data was rearranged so that the data within the sample is similar. The sigma of the individuals does not change, but the estimated sigma, which is used in the control limits and Cpk calculations, changes between the two distributions. With this example, determining if the process is in control before looking at Cpk pays off. Since the control chart in the second example, shown below, is not in control, you cannot be sure that its Cpk is a good representation of process capability. The first process, on the other hand, is in control and thus its Cpk is a good predictor of process capability.
Chart produced using SQCpack If you do not have the control chart to evaluate for process control, you might be tempted to select the second process as being "better" on the basis of the higher Cpk. As this example illustrates, you cannot fairly evaluate Cpk without first establishing process control. You can use software such as SQCpack or CHARTrunner to create control charts and calculate Cpk.
As this difference increases, so does the Cpm. And as this index becomes larger, the Cpm gets smaller. If the difference between the data and the target is small, so too is the sigma. And as this sigma gets smaller, the Cpm index becomes larger. The higher the Cpm index, the better the process, as shown in the diagrams below. In these 3 charts the process is the same, but as the process becomes more centered, the Cpm gets better.
In these 3 charts, the process stays centered about the target, but as the variation is reduced, the Cpm gets better.
We can use Loris raw data to provide an example of how Cpm is calculated:
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8 Sample 9
In a process with both upper and lower specifications, the target is typically the midpoint of these. When such a high degree of capability exists, one may want to ask the customer if the target value is ideal. Lori should check with her customer to determine if he or she wants a small shift toward one of the specifications. Regardless of the target in relation to the specifications, the focus should always be on making the product to target with minimum variation. Cpm is the capability index that accurately depicts this. Reference: L.J. Chan, S.K. Cheng, and F.A. Spiring, A New Measure of Process Capability: Cpm, Journal of Quality Technology, Vol.. 20, No. 3, July, 1989, p. 16.
An example may help to illustrate the outcome of each option. Lets assume you are making plastic pellets and your customer has specified that the pellets should have a low amount of moisture. The lower the moisture content, the better. No more than .5 is allowed. If the product has too much moisture, it will cause manufacturing problems. The process is in statistical control.
It is not likely your customer would be happy if you went with option A and decided not to calculate a Cpk. Going with option B, you might argue that the lower specification limit (LSL) is 0 since it is impossible to have a moisture level below 0. So with USL at .5 and LSL at 0, Cpk is calculated as follows: If USL = .5, X-bar = .0025, and estimated sigma = .15, then: Zupper = [(.5 - .0025) / .15] = 3.316, Zlower = [(.0025 0) / .15] = .01667 and Zmin = .01667 Cpk = .01667 /3 = .005 Your customer will probably not be happy with a Cpk of .005 and this number is not representative of the process. Example C assumes that the lower specification is missing. Since you do not have a LSL, Zlower is missing or nonexistent. Zmin therefore becomes Zupper and Cpk is Zupper/3. Zupper = 3.316 (from above) Cpk = 3.316 / 3 = 1.10. A Cpk of 1.10 is more realistic than .005 for the data given in this example and is representative of the process. As this example illustrates, setting the lower specification equal to 0 results in a lower Cpk. In fact, as the process improves (moisture content decreases) the Cpk will decrease. When the process improves, Cpk should increase. Therefore, when you only have one specification, you should enter only that specification, and treat the other specification as missing. An interesting debate (well, about as interesting as statistics gets) occurs with what to do with Cp (or Pp). Most textbooks show Cp as the difference between both specifications (USL LSL) divided by 6 sigma. Because only one specification exists, some suggest that Cp can not be calculated. Another suggestion is to look at ~ of the Cp. For example, instead of evaluating [(USL Mean) + (Mean LSL)] / 6*sigma, instead think of Cp as (USL Mean) / 3*sigma or (Mean LSL) / 3*sigma. You might note that when you only have one specification, this becomes the same formula as Cpk. Example capability analysis from the free Cpk calculator: The following custom analysis is based on 24 data points. The upper specification is 15. The lower specification is 5. Subgroup size is 1. The mean is 10.57. The minimum is 7.4. The maximum is 16.2. The estimated standard deviation is 1.16 Cpk is 1.27. This Cpk is considered fair based on the following scale: 0 to less than 1.0 is unacceptable - sometimes called "not capable" Greater than 1 to 1.33 is fair Greater than 1.33 to 1.66 is acceptable Greater than 1.66 is exceptional Ppk is 0.88 This is considered fair based on the scale above. Ppk is a measure of Capability similar to Cpk except that the actual standard deviation is used in the calculation - rather than an estimate of standard deviation.
Cp is 1.44 When Cp is greater than 1.0, the (six sigma) spread of your data is smaller than the width of your specification limits. Cp by itself does not tell where the process average is compared to the specification limits. Instead, it tells you the maximum your Cpk can become if the process average is at the center of the upper and lower specification limits, assuming the same variation. When Cp is greater than Cpk, the process should be adjusted so that the process average is centered between the upper and lower specifications to achieve the maximum Cpk. If no changes are made to this process it will produce approximately 0.01% outside of specifications or 67 defects per million. Capability analysis is based on these important assumptions: 1) Your data is normally distributed. This means a histogram of the data shows a normal bell curve. If your data is not normally distributed, software such as CHARTrunner or SQCpack can help with your capability analysis. 2) A control chart of the data shows no out-of-control conditions. When out-of-control conditions exist, the capability information is not reliable because the process is not predictable. Software such as CHARTrunner or SQCpack can help with your control charts. 3) The measurment system can demonstrate a %R&R less than 30%. Software such as GAGEpack can help you perform a measurement systems analysis. Review other links on this page to learn more about capability analysis. Analysis provided by PQ Systems, Inc.