Lecture 5: Conditional probability Statistics 101 Mine Çetinkaya-Rundel January 31, 2012
Announcements Announcements Teams: Clickers: a note about the team assignment survey sit with your teams in lab & (at least) today in lecture make sure to have registered to start collecting points daily clicker scores will be posted on Sakai 0: wasn t here / didn t answer at least 75% of the questions 1: answered at least 75% of the questions 2: answered at least 75% of the questions and correctly answered the review question Due: HW1, at the beginning of class on Thursday end of chapter exercises from textbook & on your own section of the lab typed ok stapled late work policy applies (see syllabus) Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 1 / 22
Recap Qnline quiz 1- commonly missed questions Great performance overall... Question 5: Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 2 / 22
Recap From lab... Variable assignment: # run the function and return the result > calcstreak(kobe$basket) [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0 0 1 0 0 0 1 1 0 1 0 [46] 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0 1 1 0 0 0 1 # run the function and save the result > kobe_streak <- calcstreak(kobe$basket) # result can later be referenced and used in other functions > kobe_streak [1] 1 0 2 0 0 0 3 2 0 3 0 1 3 0 0 0 0 0 1 1 0 4 1 0 1 0 1 0 1 2 0 1 2 1 0 0 1 0 0 0 1 1 0 1 0 [46] 2 0 0 0 3 0 1 0 1 2 1 0 1 0 0 1 3 3 1 1 0 0 0 0 0 1 1 0 0 0 1 > mean(kobe_streak) [1] 0.7631579 Saving your code in an R script: useful for referring back to your code later, much cleaner than looking for lines of code in your workspace will be very useful when working on projects saved R scripts (files that contain the code) can be found under the Files tab Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 3 / 22
Recap Review question Researchers randomly assigned 72 chronic users of cocaine into three groups: desipramine (antidepressant), lithium (standard treatment for cocaine) and placebo. Results of the study is shown below. Also shown is the distribution of number of patients who took desipramine and did not relapse based on 100 simulations of this experiment under the assumption of independence. no relapse no relapse total desipramine 10 14 24 lithium 18 6 24 placebo 20 4 24 total 48 24 72 Desipramine appears to be an (a) effective (b) ineffective treatment for cocaine addiction. http:// www.oswego.edu/ srp/ stats/ 2 way tbl 1.htm Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 4 / 22
1 Conditional probability Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012
Florida voters and illegal immigration 1,000 randomly sampled Florida voters were asked if they thought workers who have illegally entered the US should be allowed to keep their jobs and apply for US citizenship, allowed to keep their jobs as temporary guest workers but not allowed to apply for US citizenship, or lose their jobs and have to leave the country. The results of the survey by political ideology is shown below. conservative moderate liberal total apply for citizenship 57 120 101 278 guest worker 121 113 28 262 leave the country 179 126 45 350 not sure 15 4 1 20 total 372 363 175 910 Note: 910 respondents answered both questions. SurveyUSA poll from Jan 27-29, 2012, http:// www.surveyusa.com/ client/ PollReport.aspx?g=60d6fa81-2698-4c51-a5f8-714f40976df2. Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 5 / 22
Marginal probability What is the probability that a Florida voter is conservative? is in favor of illegal immigrants working in the US staying and applying for citizenship? conservative moderate liberal total apply for citizenship 57 120 101 278 guest worker 121 113 28 262 leave the country 179 126 45 350 not sure 15 4 1 20 total 372 363 175 910 P(conservative) = 372 910 0.41 P(apply for citizenship) = 278 910 0.31 marginal probability Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 6 / 22
Joint probability What is the probability that a Florida voter is conservative and is in favor of illegal immigrants working in the US staying and applying for citizenship? conservative moderate liberal total apply for citizenship 57 120 101 278 guest worker 121 113 28 262 leave the country 179 126 45 350 not sure 15 4 1 20 total 372 363 175 910 P(conservative and apply for citizenship) = 57 910 0.06 joint probability Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 7 / 22
Conditional probability If we know that a randomly selected Florida voter is conservative, what is the probability that s/he is in favor of illegal immigrants working in the US staying and applying for citizenship? conservative moderate liberal total apply for citizenship 57 120 101 278 guest worker 121 113 28 262 leave the country 179 126 45 350 not sure 15 4 1 20 total 372 363 175 910 P(apply for citizenship conservative) = 57 372 0.15 conditional probability Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 8 / 22
Conditional probability - another example Same survey asked if whether voters who are familiar with the DREAM act support or oppose it. 32% of the respondents are Democrats, 51% of the respondents support the DREAM act, and 21% of the respondents are Democrats and support the DREAM act. If we randomly select a respondent who supports the DREAM act, what is the probability that s/he is a Democrat? Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 9 / 22
Conditional probability - another example Same survey asked if whether voters who are familiar with the DREAM act support or oppose it. 32% of the respondents are Democrats, 51% of the respondents support the DREAM act, and 21% of the respondents are Democrats and support the DREAM act. If we randomly select a respondent who supports the DREAM act, what is the probability that s/he is a Democrat? P(Democrat) = 0.32 P(support) = 0.51 P(Democrat and support) = 0.21 P(Democrat support) =? Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 9 / 22
A contingency table of proportions support oppose total Democrat 0.21 0.32 non-democrat total 0.51 1 Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 10 / 22
A shortcut - Bayes Theorem Conditional probability (Bayes Theorem) P(A B) = P(A and B) P(B) Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 11 / 22
A shortcut - Bayes Theorem Conditional probability (Bayes Theorem) P(A B) = P(A and B) P(B) P(Democrat and support) P(Democrat support) = P(support) = 0.21 0.51 = 0.41 Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 11 / 22
Clicker question At a large apartment complex, 58% of the units have a washer and dryer, 32% have double parking, and 20% have both washer and dryer and double parking. A unit with double parking just became available at this apartment complex, what is the probability that it also has washer and dryer? (a) 0.20 (b) 0.345 (c) 0.625 (d) 0.064 Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 12 / 22
Clicker question At a large apartment complex, 58% of the units have a washer and dryer, 32% have double parking, and 20% have both washer and dryer and double parking. What percent of apartments have neither double parking nor washer and dryer? (a) 0.10 (b) 0.1856 (c) 0.20 (d) 0.30 Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 13 / 22
General multiplication rule Earlier we ve seen the multiplication rule for independent events: If A and B are independent, P(A and B) = P(A) P(B) Just now we saw Bayes theorem for calculating conditional probabilities: P(A and B) P(A B) = P(B) Bayes theorem doesn t require that the events be independent. By rearranging the above formula we obtain the following: General multiplication rule P(A and B) = P(A B) P(B) Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 14 / 22
Clicker question If events A and B are independent, which of the below is correct? (a) P(A B) = P(A and B) (b) P(A B) = P(A) (c) P(A B) = P(B) (d) P(A B) = 0 Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 15 / 22
What do you think? Clicker question If a woman tests positive for breast cancer on a mammogram, what is the probability that she actually has breast cancer? Take an educated guess. (a) Less than 15% (b) 15% to 40% (c) 40% to 60% (d) 60% to 85% (e) More than 85% Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 16 / 22
False positives In November 2009, the US Preventive Services Task Force changed its recommendations for breast cancer screening. Click here to play movie Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 17 / 22
Background information American Cancer Society estimates that about 1.7% of women have breast cancer. http:// www.cancer.org/ cancer/ cancerbasics/ cancer-prevalence Susan G. Komen For The Cure Foundation states that mammography correctly identifies about 78% of women who truly have breast cancer. http:// ww5.komen.org/ BreastCancer/ AccuracyofMammograms.html An article published in 2003 suggests that up to 10% of all mammograms are false positive. http:// www.ncbi.nlm.nih.gov/ pmc/ articles/ PMC1360940 Note: These percentages are approximate, and very difficult to estimate. Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 18 / 22
Cancer Mammography given cancer positive, 0.78 0.017*0.78 = 0.0133 cancer, 0.017 negative, 0.22 0.017*0.22 = 0.0037 positive, 0.1 0.983*0.1 = 0.0983 no cancer, 0.983 negative, 0.9 0.983*0.9 = 0.8847 Remember: P(A and B) = P(A B) P(B), from Bayes Theorem. Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 19 / 22
Clicker question If a woman tests positive for breast cancer on a mammogram, what is the probability that she actually has breast cancer? Choose the closest answer. (a) 1.5% (b) 3.6% (c) 12% (d) 78% (e) 88% Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 20 / 22
Clicker question Lupus is a medical phenomenon where antibodies that are supposed to attack foreign cells to prevent infections instead see plasma proteins as foreign bodies, leading to a high risk of blood clotting. It is believed that 2% of the population suffer from this disease. The test for lupus is very accurate if the person actually has lupus, however is very inaccurate if the person does not. More specifically, the test is 98% accurate if a person actually has the disease. The test is 74% accurate if a person does not have the disease. There is a line from the Fox television show House, often used after a patient tests positive for lupus: It s never lupus. Do you think there is truth to this statement? Use appropriate probabilities to support your answer. (a) yes (b) no Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 21 / 22
Statistics 101 (Mine Çetinkaya-Rundel) L5: Conditional probability January 31, 2012 22 / 22