If a problem only occurs randomly once in every N times on average, how many tests do I have to perform to be...





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{
margin-bottom:0;
}








40

















I know that you can never be 100% sure, but is there a method to determine an appropriate number of tests?










share|improve this question
























  • 7





    @whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

    – trashpanda
    May 28 at 9:04






  • 7





    This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

    – Rsf
    May 28 at 9:50






  • 2





    @JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

    – Cort Ammon
    May 28 at 16:18






  • 1





    Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

    – Kevin McKenzie
    May 28 at 17:51






  • 6





    As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

    – walen
    May 29 at 8:43


















40

















I know that you can never be 100% sure, but is there a method to determine an appropriate number of tests?










share|improve this question
























  • 7





    @whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

    – trashpanda
    May 28 at 9:04






  • 7





    This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

    – Rsf
    May 28 at 9:50






  • 2





    @JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

    – Cort Ammon
    May 28 at 16:18






  • 1





    Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

    – Kevin McKenzie
    May 28 at 17:51






  • 6





    As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

    – walen
    May 29 at 8:43














40












40








40


7






I know that you can never be 100% sure, but is there a method to determine an appropriate number of tests?










share|improve this question
















I know that you can never be 100% sure, but is there a method to determine an appropriate number of tests?







manual-testing intermittent-failures






share|improve this question















share|improve this question













share|improve this question




share|improve this question



share|improve this question








edited May 28 at 8:51









jonrsharpe

2151 gold badge3 silver badges9 bronze badges




2151 gold badge3 silver badges9 bronze badges










asked May 28 at 7:03









Sam HallSam Hall

3032 silver badges4 bronze badges




3032 silver badges4 bronze badges











  • 7





    @whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

    – trashpanda
    May 28 at 9:04






  • 7





    This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

    – Rsf
    May 28 at 9:50






  • 2





    @JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

    – Cort Ammon
    May 28 at 16:18






  • 1





    Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

    – Kevin McKenzie
    May 28 at 17:51






  • 6





    As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

    – walen
    May 29 at 8:43














  • 7





    @whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

    – trashpanda
    May 28 at 9:04






  • 7





    This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

    – Rsf
    May 28 at 9:50






  • 2





    @JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

    – Cort Ammon
    May 28 at 16:18






  • 1





    Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

    – Kevin McKenzie
    May 28 at 17:51






  • 6





    As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

    – walen
    May 29 at 8:43








7




7





@whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

– trashpanda
May 28 at 9:04





@whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

– trashpanda
May 28 at 9:04




7




7





This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

– Rsf
May 28 at 9:50





This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

– Rsf
May 28 at 9:50




2




2





@JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

– Cort Ammon
May 28 at 16:18





@JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

– Cort Ammon
May 28 at 16:18




1




1





Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

– Kevin McKenzie
May 28 at 17:51





Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

– Kevin McKenzie
May 28 at 17:51




6




6





As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

– walen
May 29 at 8:43





As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

– walen
May 29 at 8:43










6 Answers
6






active

oldest

votes


















36


















I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



Some (but certainly not all) points of investigation may be:




  • Specific accounts or data.

  • Differences in hosts/environments the applications or services are running on.

  • Different versions of the application or service running on different hosts

  • Certain days, dates, times of day or time zones.

  • Certain users and their specific means of accessing the application (physical device, browser, network connection)


This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.






share|improve this answer






















  • 20





    Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

    – Martin Bonner
    May 28 at 17:12











  • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

    – Cherree
    May 28 at 18:09






  • 7





    Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

    – JollyJoker
    May 29 at 7:38






  • 1





    So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

    – Matthieu M.
    May 29 at 11:22






  • 1





    Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

    – VLAZ
    May 30 at 11:22



















20


















You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,




  1. The test fails once in every N times randomly, on the unfixed version.

  2. The same test passes every time, or at least fails less often, on the fixed version.


You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.






share|improve this answer






















  • 9





    +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

    – jpa
    May 29 at 6:53











  • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

    – Nelson
    May 29 at 8:58



















10


















I suppose this answer could help you



You need to decide first at what probability you want to "detect" the problem.



This is a nice example to why theoretical knowledge is necessary even for testers.



The simplified version:




  • p is the probability for failure, 1/N in our case


  • then the probability for success is 1-p


  • and the probability to have N successful tries is (1-p)^N


  • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


  • extracting N and simplifying a bit assuming big enough N gives:


  • −log(1−p)⋅N


(*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)






share|improve this answer
























  • 5





    I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

    – dzieciou
    May 28 at 18:20






  • 1





    And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

    – dzieciou
    May 28 at 18:21











  • This is actually a valid point @Makyen, I edited the answer

    – Rsf
    May 29 at 8:07











  • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

    – Gregor
    May 31 at 1:54



















9


















While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



n = log(1-x)/log(1-p)




So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



p = 0.25
x = 0.95
n = log(0.05)/log(0.75) ≈ 10.4


so you'd need to run 11 trials





The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.






share|improve this answer




































    4


















    A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



    011001010011


    Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



    Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



    If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



    Consider the case where we have results
    a zeroes (test fails)
    b ones (test passes)
    Including after repair: c ones (test passes)



    The number of ways of arranging the a + b initial results is



    Ntot = (a + b)! /(b! * a!)


    In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



    Nsuc = (a + b - c)! / ( (b - c)! * a! )


    These patterns are those from all the Ntot possible patterns where the last c results are all ones.



    If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



    Cran = Nsuc/Ntot


    Or



    Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)!  * a!)


    Or



    Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


    Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



    As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



    There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.






    share|improve this answer




































      0


















      Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



      For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



      For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



      Plugging that all in, we have



      P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



      This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



      XY/(XY+1-Y)



      or



      1-(Y-1)/(XY+1-Y)






      share|improve this answer




























        Your Answer








        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "244"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });















        draft saved

        draft discarded
















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsqa.stackexchange.com%2fquestions%2f39365%2fif-a-problem-only-occurs-randomly-once-in-every-n-times-on-average-how-many-tes%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown


























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        36


















        I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



        Some (but certainly not all) points of investigation may be:




        • Specific accounts or data.

        • Differences in hosts/environments the applications or services are running on.

        • Different versions of the application or service running on different hosts

        • Certain days, dates, times of day or time zones.

        • Certain users and their specific means of accessing the application (physical device, browser, network connection)


        This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.






        share|improve this answer






















        • 20





          Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

          – Martin Bonner
          May 28 at 17:12











        • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

          – Cherree
          May 28 at 18:09






        • 7





          Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

          – JollyJoker
          May 29 at 7:38






        • 1





          So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

          – Matthieu M.
          May 29 at 11:22






        • 1





          Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

          – VLAZ
          May 30 at 11:22
















        36


















        I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



        Some (but certainly not all) points of investigation may be:




        • Specific accounts or data.

        • Differences in hosts/environments the applications or services are running on.

        • Different versions of the application or service running on different hosts

        • Certain days, dates, times of day or time zones.

        • Certain users and their specific means of accessing the application (physical device, browser, network connection)


        This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.






        share|improve this answer






















        • 20





          Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

          – Martin Bonner
          May 28 at 17:12











        • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

          – Cherree
          May 28 at 18:09






        • 7





          Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

          – JollyJoker
          May 29 at 7:38






        • 1





          So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

          – Matthieu M.
          May 29 at 11:22






        • 1





          Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

          – VLAZ
          May 30 at 11:22














        36














        36










        36









        I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



        Some (but certainly not all) points of investigation may be:




        • Specific accounts or data.

        • Differences in hosts/environments the applications or services are running on.

        • Different versions of the application or service running on different hosts

        • Certain days, dates, times of day or time zones.

        • Certain users and their specific means of accessing the application (physical device, browser, network connection)


        This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.






        share|improve this answer














        I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



        Some (but certainly not all) points of investigation may be:




        • Specific accounts or data.

        • Differences in hosts/environments the applications or services are running on.

        • Different versions of the application or service running on different hosts

        • Certain days, dates, times of day or time zones.

        • Certain users and their specific means of accessing the application (physical device, browser, network connection)


        This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.







        share|improve this answer













        share|improve this answer




        share|improve this answer



        share|improve this answer










        answered May 28 at 13:32









        CherreeCherree

        1,1246 silver badges12 bronze badges




        1,1246 silver badges12 bronze badges











        • 20





          Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

          – Martin Bonner
          May 28 at 17:12











        • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

          – Cherree
          May 28 at 18:09






        • 7





          Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

          – JollyJoker
          May 29 at 7:38






        • 1





          So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

          – Matthieu M.
          May 29 at 11:22






        • 1





          Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

          – VLAZ
          May 30 at 11:22














        • 20





          Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

          – Martin Bonner
          May 28 at 17:12











        • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

          – Cherree
          May 28 at 18:09






        • 7





          Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

          – JollyJoker
          May 29 at 7:38






        • 1





          So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

          – Matthieu M.
          May 29 at 11:22






        • 1





          Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

          – VLAZ
          May 30 at 11:22








        20




        20





        Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

        – Martin Bonner
        May 28 at 17:12





        Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

        – Martin Bonner
        May 28 at 17:12













        That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

        – Cherree
        May 28 at 18:09





        That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

        – Cherree
        May 28 at 18:09




        7




        7





        Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

        – JollyJoker
        May 29 at 7:38





        Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

        – JollyJoker
        May 29 at 7:38




        1




        1





        So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

        – Matthieu M.
        May 29 at 11:22





        So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

        – Matthieu M.
        May 29 at 11:22




        1




        1





        Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

        – VLAZ
        May 30 at 11:22





        Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

        – VLAZ
        May 30 at 11:22













        20


















        You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,




        1. The test fails once in every N times randomly, on the unfixed version.

        2. The same test passes every time, or at least fails less often, on the fixed version.


        You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



        If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.






        share|improve this answer






















        • 9





          +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

          – jpa
          May 29 at 6:53











        • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

          – Nelson
          May 29 at 8:58
















        20


















        You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,




        1. The test fails once in every N times randomly, on the unfixed version.

        2. The same test passes every time, or at least fails less often, on the fixed version.


        You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



        If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.






        share|improve this answer






















        • 9





          +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

          – jpa
          May 29 at 6:53











        • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

          – Nelson
          May 29 at 8:58














        20














        20










        20









        You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,




        1. The test fails once in every N times randomly, on the unfixed version.

        2. The same test passes every time, or at least fails less often, on the fixed version.


        You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



        If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.






        share|improve this answer














        You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,




        1. The test fails once in every N times randomly, on the unfixed version.

        2. The same test passes every time, or at least fails less often, on the fixed version.


        You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



        If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.







        share|improve this answer













        share|improve this answer




        share|improve this answer



        share|improve this answer










        answered May 29 at 1:12









        Double Vision Stout Fat HeavyDouble Vision Stout Fat Heavy

        3012 bronze badges




        3012 bronze badges











        • 9





          +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

          – jpa
          May 29 at 6:53











        • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

          – Nelson
          May 29 at 8:58














        • 9





          +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

          – jpa
          May 29 at 6:53











        • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

          – Nelson
          May 29 at 8:58








        9




        9





        +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

        – jpa
        May 29 at 6:53





        +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

        – jpa
        May 29 at 6:53













        This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

        – Nelson
        May 29 at 8:58





        This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

        – Nelson
        May 29 at 8:58











        10


















        I suppose this answer could help you



        You need to decide first at what probability you want to "detect" the problem.



        This is a nice example to why theoretical knowledge is necessary even for testers.



        The simplified version:




        • p is the probability for failure, 1/N in our case


        • then the probability for success is 1-p


        • and the probability to have N successful tries is (1-p)^N


        • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


        • extracting N and simplifying a bit assuming big enough N gives:


        • −log(1−p)⋅N


        (*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)






        share|improve this answer
























        • 5





          I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

          – dzieciou
          May 28 at 18:20






        • 1





          And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

          – dzieciou
          May 28 at 18:21











        • This is actually a valid point @Makyen, I edited the answer

          – Rsf
          May 29 at 8:07











        • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

          – Gregor
          May 31 at 1:54
















        10


















        I suppose this answer could help you



        You need to decide first at what probability you want to "detect" the problem.



        This is a nice example to why theoretical knowledge is necessary even for testers.



        The simplified version:




        • p is the probability for failure, 1/N in our case


        • then the probability for success is 1-p


        • and the probability to have N successful tries is (1-p)^N


        • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


        • extracting N and simplifying a bit assuming big enough N gives:


        • −log(1−p)⋅N


        (*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)






        share|improve this answer
























        • 5





          I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

          – dzieciou
          May 28 at 18:20






        • 1





          And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

          – dzieciou
          May 28 at 18:21











        • This is actually a valid point @Makyen, I edited the answer

          – Rsf
          May 29 at 8:07











        • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

          – Gregor
          May 31 at 1:54














        10














        10










        10









        I suppose this answer could help you



        You need to decide first at what probability you want to "detect" the problem.



        This is a nice example to why theoretical knowledge is necessary even for testers.



        The simplified version:




        • p is the probability for failure, 1/N in our case


        • then the probability for success is 1-p


        • and the probability to have N successful tries is (1-p)^N


        • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


        • extracting N and simplifying a bit assuming big enough N gives:


        • −log(1−p)⋅N


        (*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)






        share|improve this answer
















        I suppose this answer could help you



        You need to decide first at what probability you want to "detect" the problem.



        This is a nice example to why theoretical knowledge is necessary even for testers.



        The simplified version:




        • p is the probability for failure, 1/N in our case


        • then the probability for success is 1-p


        • and the probability to have N successful tries is (1-p)^N


        • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


        • extracting N and simplifying a bit assuming big enough N gives:


        • −log(1−p)⋅N


        (*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)







        share|improve this answer















        share|improve this answer




        share|improve this answer



        share|improve this answer








        edited May 31 at 4:28

























        answered May 28 at 8:27









        RsfRsf

        4,7251 gold badge15 silver badges29 bronze badges




        4,7251 gold badge15 silver badges29 bronze badges











        • 5





          I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

          – dzieciou
          May 28 at 18:20






        • 1





          And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

          – dzieciou
          May 28 at 18:21











        • This is actually a valid point @Makyen, I edited the answer

          – Rsf
          May 29 at 8:07











        • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

          – Gregor
          May 31 at 1:54














        • 5





          I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

          – dzieciou
          May 28 at 18:20






        • 1





          And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

          – dzieciou
          May 28 at 18:21











        • This is actually a valid point @Makyen, I edited the answer

          – Rsf
          May 29 at 8:07











        • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

          – Gregor
          May 31 at 1:54








        5




        5





        I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

        – dzieciou
        May 28 at 18:20





        I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

        – dzieciou
        May 28 at 18:20




        1




        1





        And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

        – dzieciou
        May 28 at 18:21





        And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

        – dzieciou
        May 28 at 18:21













        This is actually a valid point @Makyen, I edited the answer

        – Rsf
        May 29 at 8:07





        This is actually a valid point @Makyen, I edited the answer

        – Rsf
        May 29 at 8:07













        This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

        – Gregor
        May 31 at 1:54





        This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

        – Gregor
        May 31 at 1:54











        9


















        While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



        If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



        n = log(1-x)/log(1-p)




        So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



        p = 0.25
        x = 0.95
        n = log(0.05)/log(0.75) ≈ 10.4


        so you'd need to run 11 trials





        The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.






        share|improve this answer

































          9


















          While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



          If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



          n = log(1-x)/log(1-p)




          So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



          p = 0.25
          x = 0.95
          n = log(0.05)/log(0.75) ≈ 10.4


          so you'd need to run 11 trials





          The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.






          share|improve this answer































            9














            9










            9









            While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



            If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



            n = log(1-x)/log(1-p)




            So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



            p = 0.25
            x = 0.95
            n = log(0.05)/log(0.75) ≈ 10.4


            so you'd need to run 11 trials





            The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.






            share|improve this answer
















            While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



            If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



            n = log(1-x)/log(1-p)




            So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



            p = 0.25
            x = 0.95
            n = log(0.05)/log(0.75) ≈ 10.4


            so you'd need to run 11 trials





            The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.







            share|improve this answer















            share|improve this answer




            share|improve this answer



            share|improve this answer








            edited May 28 at 20:41

























            answered May 28 at 20:34









            BlueRaja - Danny PflughoeftBlueRaja - Danny Pflughoeft

            1914 bronze badges




            1914 bronze badges


























                4


















                A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



                011001010011


                Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



                Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



                If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



                Consider the case where we have results
                a zeroes (test fails)
                b ones (test passes)
                Including after repair: c ones (test passes)



                The number of ways of arranging the a + b initial results is



                Ntot = (a + b)! /(b! * a!)


                In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



                Nsuc = (a + b - c)! / ( (b - c)! * a! )


                These patterns are those from all the Ntot possible patterns where the last c results are all ones.



                If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



                Cran = Nsuc/Ntot


                Or



                Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)!  * a!)


                Or



                Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


                Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



                As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



                There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.






                share|improve this answer

































                  4


















                  A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



                  011001010011


                  Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



                  Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



                  If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



                  Consider the case where we have results
                  a zeroes (test fails)
                  b ones (test passes)
                  Including after repair: c ones (test passes)



                  The number of ways of arranging the a + b initial results is



                  Ntot = (a + b)! /(b! * a!)


                  In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



                  Nsuc = (a + b - c)! / ( (b - c)! * a! )


                  These patterns are those from all the Ntot possible patterns where the last c results are all ones.



                  If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



                  Cran = Nsuc/Ntot


                  Or



                  Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)!  * a!)


                  Or



                  Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


                  Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



                  As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



                  There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.






                  share|improve this answer































                    4














                    4










                    4









                    A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



                    011001010011


                    Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



                    Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



                    If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



                    Consider the case where we have results
                    a zeroes (test fails)
                    b ones (test passes)
                    Including after repair: c ones (test passes)



                    The number of ways of arranging the a + b initial results is



                    Ntot = (a + b)! /(b! * a!)


                    In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



                    Nsuc = (a + b - c)! / ( (b - c)! * a! )


                    These patterns are those from all the Ntot possible patterns where the last c results are all ones.



                    If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



                    Cran = Nsuc/Ntot


                    Or



                    Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)!  * a!)


                    Or



                    Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


                    Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



                    As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



                    There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.






                    share|improve this answer
















                    A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



                    011001010011


                    Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



                    Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



                    If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



                    Consider the case where we have results
                    a zeroes (test fails)
                    b ones (test passes)
                    Including after repair: c ones (test passes)



                    The number of ways of arranging the a + b initial results is



                    Ntot = (a + b)! /(b! * a!)


                    In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



                    Nsuc = (a + b - c)! / ( (b - c)! * a! )


                    These patterns are those from all the Ntot possible patterns where the last c results are all ones.



                    If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



                    Cran = Nsuc/Ntot


                    Or



                    Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)!  * a!)


                    Or



                    Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


                    Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



                    As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



                    There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.







                    share|improve this answer















                    share|improve this answer




                    share|improve this answer



                    share|improve this answer








                    edited May 28 at 17:44

























                    answered May 28 at 17:05









                    emrys57emrys57

                    1412 bronze badges




                    1412 bronze badges


























                        0


















                        Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



                        For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



                        For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



                        Plugging that all in, we have



                        P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



                        This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



                        XY/(XY+1-Y)



                        or



                        1-(Y-1)/(XY+1-Y)






                        share|improve this answer































                          0


















                          Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



                          For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



                          For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



                          Plugging that all in, we have



                          P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



                          This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



                          XY/(XY+1-Y)



                          or



                          1-(Y-1)/(XY+1-Y)






                          share|improve this answer





























                            0














                            0










                            0









                            Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



                            For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



                            For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



                            Plugging that all in, we have



                            P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



                            This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



                            XY/(XY+1-Y)



                            or



                            1-(Y-1)/(XY+1-Y)






                            share|improve this answer














                            Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



                            For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



                            For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



                            Plugging that all in, we have



                            P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



                            This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



                            XY/(XY+1-Y)



                            or



                            1-(Y-1)/(XY+1-Y)







                            share|improve this answer













                            share|improve this answer




                            share|improve this answer



                            share|improve this answer










                            answered May 28 at 20:56









                            AcccumulationAcccumulation

                            1911 bronze badge




                            1911 bronze badge


































                                draft saved

                                draft discarded



















































                                Thanks for contributing an answer to Software Quality Assurance & Testing Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsqa.stackexchange.com%2fquestions%2f39365%2fif-a-problem-only-occurs-randomly-once-in-every-n-times-on-average-how-many-tes%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown









                                Popular posts from this blog

                                Bruad Bilen | Luke uk diar | NawigatsjuunCommonskategorii: BruadCommonskategorii: RunstükenWikiquote: Bruad

                                Færeyskur hestur Heimild | Tengill | Tilvísanir | LeiðsagnarvalRossið - síða um færeyska hrossið á færeyskuGott ár hjá færeyska hestinum

                                He _____ here since 1970 . Answer needed [closed]What does “since he was so high” mean?Meaning of “catch birds for”?How do I ensure “since” takes the meaning I want?“Who cares here” meaningWhat does “right round toward” mean?the time tense (had now been detected)What does the phrase “ring around the roses” mean here?Correct usage of “visited upon”Meaning of “foiled rail sabotage bid”It was the third time I had gone to Rome or It is the third time I had been to Rome