Hypergeometric Distribution Model is used for estimating the number of faults initially resident in a program at the beginning of the test or debugging process based on the hypergeometric distribution. Let be the cumulative number of errors already detected so far by , and let be the number of newly detected errors by time .
- A program initially contains m faults when the test phase starts.
- A test is defined as a number of test instances which are couples of input data and output data. In other words, the collection of test operations performed in a day or a week is called a test instance. The test instances are denoted by for i = 1, 2, . . ., n.
- Detected faults are not removed between test instances.
Therefore, from the latter assumption, the same faults can be experienced at several test instances. Let be the number of faults experienced by test instance . It should be noted that some of the faults may be those that are already counted in , and the remaining Wi faults account for the newly detected faults.
If is an observed instance of , then we can see that . Each fault can be classified into one of two categories:
- Newly discovered faults
- Rediscovered faults
If we assume that the number of newly detected faults follows a hypergeometric distribution, then the probability of obtaining exactly newly detected faults among faults is,
for all i. Since is assumed to be hypergeometrically distributed, the expected number of newly detected faults during the interval is,
and the expected value of is given by,