Chapter 34 Describing the Relation between Two Variables 2010 Pearson Pren-ce Hall. All rights
Section 4.1 Scatter Diagrams and Correlation 2010 Pearson Pren-ce Hall. All rights 4-2
2010 Pearson Pren-ce Hall. All rights 4-3
2010 Pearson Pren-ce Hall. All rights 4-4
2010 Pearson Pren-ce Hall. All rights 4-5
EXAMPLE Drawing and Interpre.ng a Sca1er Diagram The data shown to the right are based on a study for drilling rock. The researchers wanted to determine whether the -me it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the explanatory variable, x, and -me (in minutes) to drill five feet is the response variable, y. Draw a scaqer diagram of the data. Source: Penner, R., and WaQs, D.G. Mining Informa-on. The American Sta.s.cian, Vol. 45, No. 1, Feb. 1991, p. 6. 2010 Pearson Pren-ce Hall. All rights 4-6
2010 Pearson Pren-ce Hall. All rights 4-7
Various Types of Relations in a Scatter Diagram 2010 Pearson Pren-ce Hall. All rights 4-8
2010 Pearson Pren-ce Hall. All rights 4-9
Determine the type of correlation between the variables. y x A. Positive linear correlation B. Negative linear correlation C. Nonlinear correlation Slide 4-10 Copyright 2010 Pearson Educa-on, Inc.
Determine the type of correlation between the variables. y x A. Positive linear correlation B. Negative linear correlation C. Nonlinear correlation Slide 4-11 Copyright 2010 Pearson Educa-on, Inc.
Determine the type of correlation between the variables. y x A. Positive linear correlation B. Negative linear correlation C. Nonlinear correlation Slide 4-12 Copyright 2010 Pearson Educa-on, Inc.
Determine the type of correlation between the variables. y x A. Positive linear correlation B. Negative linear correlation C. Nonlinear correlation Slide 4-13 Copyright 2010 Pearson Educa-on, Inc.
2010 Pearson Pren-ce Hall. All rights 4-14
2010 Pearson Pren-ce Hall. All rights 4-15
2010 Pearson Pren-ce Hall. All rights 4-16
2010 Pearson Pren-ce Hall. All rights 4-17
2010 Pearson Pren-ce Hall. All rights 4-18
2010 Pearson Pren-ce Hall. All rights 4-19
EXAMPLE Determining the Linear Correla.on Coefficient Determine the linear correlation coefficient of the drilling data. 2010 Pearson Pren-ce Hall. All rights 4-20
2010 Pearson Pren-ce Hall. All rights 4-21
2010 Pearson Pren-ce Hall. All rights 4-22
Calculate the linear correlation coefficient r, for temperature (x) and number of ice cream cones sold per hour (y). A. 0.946 B. 0.973 C. 17.694 D. 0.383 x 65 70 75 80 85 90 95 100 105 y 8 10 11 13 12 16 19 22 23 Slide 4-23 Copyright 2010 Pearson Educa-on, Inc.
Calculate the linear correlation coefficient r, for temperature (x) and number of ice cream cones sold per hour (y). A. 0.946 B. 0.973 C. 17.694 D. 0.383 x 65 70 75 80 85 90 95 100 105 y 8 10 11 13 12 16 19 22 23 Slide 4-24 Copyright 2010 Pearson Educa-on, Inc.
2010 Pearson Pren-ce Hall. All rights 4-25
EXAMPLE Does a Linear Rela.on Exist? Determine whether a linear relation exists between time to drill five feet and depth at which drilling begins. Comment on the type of relation that appears to exist between time to drill five feet and depth at which drilling begins. The correla-on between drilling depth and -me to drill is 0.773. The cri-cal value for n = 12 observa-ons is 0.576. Since 0.773 > 0.576, there is a posi-ve linear rela-on between -me to drill five feet and depth at which drilling begins. 2010 Pearson Pren-ce Hall. All rights 4-26
2010 Pearson Pren-ce Hall. All rights 4-27
According to data obtained from the Sta-s-cal Abstract of the United States, the correla-on between the percentage of the female popula-on with a bachelor s degree and the percentage of births to unmarried mothers since 1990 is 0.940. Does this mean that a higher percentage of females with bachelor s degrees causes a higher percentage of births to unmarried mothers? Certainly not! The correla-on exists only because both percentages have been increasing since 1990. It is this rela-on that causes the high correla-on. In general, -me series data (data collected over -me) will have high correla-ons because each variable is moving in a specific direc-on over -me (both going up or down over -me; one increasing, while the other is decreasing over -me). When data are observa-onal, we cannot claim a causal rela-on exists between two variables. We can only claim causality when the data are collected through a designed experiment. 2010 Pearson Pren-ce Hall. All rights 4-28
Another way that two variables can be related even though there is not a causal rela-on is through a lurking variable. A lurking variable is related to both the explanatory and response variable. For example, ice cream sales and crime rates have a very high correla-on. Does this mean that local governments should shut down all ice cream shops? No! The lurking variable is temperature. As air temperatures rise, both ice cream sales and crime rates rise. 2010 Pearson Pren-ce Hall. All rights 4-29
2010 Pearson Pren-ce Hall. All rights 4-30
This study is a prospec-ve cohort study, which is an observa-onal study. Therefore, the researchers cannot claim that increased cola consump-on causes a decrease in bone mineral density. Some lurking variables in the study that could confound the results are: body mass index height smoking alcohol consump-on calcium intake physical ac-vity 2010 Pearson Pren-ce Hall. All rights 4-31
Section 4.2 Least-squares Regression 2010 Pearson Pren-ce Hall. All rights 4-32
Using the following sample data: (a) Find a linear equation that relates x (the explanatory variable) and y (the response variable) by selecting two points and finding the equation of the line containing the points. Using (2, 5.7) and (6, 1.9): 2010 Pearson Pren-ce Hall. All rights 4-33
(b) Graph the equation on the scatter diagram. 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 (c) Use the equation to predict y if x = 3. 2010 Pearson Pren-ce Hall. All rights 4-34
2010 Pearson Pren-ce Hall. All rights 4-35
The difference between the observed value of y and the predicted value of y is the error, or residual. Using the line from the last example, and the predicted value at x = 3: residual = observed y predicted y = 5.2 4.75 = 0.45 7 6 5 4 (3, 5.2) } residual = observed y predicted y = 5.2 4.75 = 0.45 3 2 1 0 0 1 2 3 4 5 6 7 2010 Pearson Pren-ce Hall. All rights 4-36
2010 Pearson Pren-ce Hall. All rights 4-37
2010 Pearson Pren-ce Hall. All rights 4-38
EXAMPLE Finding the Least- squares Regression Line Using the drilling data (a) Find the least- squares regression line. (b) Predict the drilling -me if drilling starts at 130 feet. (c) Is the observed drilling -me at 130 feet above, or below, average. (d) Draw the least- squares regression line on the scaqer diagram of the data. 2010 Pearson Pren-ce Hall. All rights 4-39
(a) We agree to round the es-mates of the slope and intercept to four decimal places. (b) (c) The observed drilling -me is 6.93 seconds. The predicted drilling -me is 7.035 seconds. The drilling -me of 6.93 seconds is below average. 2010 Pearson Pren-ce Hall. All rights 4-40
(d) 8.5 8 Time to Drill 5 Feet 7.5 7 6.5 6 5.5 0 20 40 60 80 100 120 140 160 180 200 Depth Drilling Begins 2010 Pearson Pren-ce Hall. All rights 4-41
Find the least squares regression line for temperature (x) and number of ice cream cones sold per hour (y). A. B. C. D. x 65 70 75 80 85 90 95 100 105 y 8 10 11 13 12 16 19 22 23 Slide 4-42 Copyright 2010 Pearson Educa-on, Inc.
Find the least squares regression line for temperature (x) and number of ice cream cones sold per hour (y). A. B. C. D. x 65 70 75 80 85 90 95 100 105 y 8 10 11 13 12 16 19 22 23 Slide 4-43 Copyright 2010 Pearson Educa-on, Inc.
2010 Pearson Pren-ce Hall. All rights 4-44
Interpreta:on of Slope: The slope of the regression line is 0.0116. For each addi-onal foot of depth we start drilling, the -me to drill five feet increases by 0.0116 minutes, on average. Interpreta:on of the y- Intercept: The y- intercept of the regression line is 5.5273. To interpret the y- intercept, we must first ask two ques-ons: 1. Is 0 a reasonable value for the explanatory variable? 2. Do any observa-ons near x = 0 exist in the data set? A value of 0 is reasonable for the drilling data (this indicates that drilling begins at the surface of Earth. The smallest observa-on in the data set is x = 35 feet, which is reasonably close to 0. So, interpreta-on of the y- intercept is reasonable. The -me to drill five feet when we begin drilling at the surface of Earth is 5.5273 minutes. 2010 Pearson Pren-ce Hall. All rights 4-45
If the least- squares regression line is used to make predic-ons based on values of the explanatory variable that are much larger or much smaller than the observed values, we say the researcher is working outside the scope of the model. Never use a least- squares regression line to make predic-ons outside the scope of the model because we can t be sure the linear rela-on con-nues to exist. 2010 Pearson Pren-ce Hall. All rights 4-46
The least squares regression line for temperature (x) and number of ice cream cones sold per hour (y) is Predict the number of ice cream cones sold per hour when the temperature is 88º. A. 51.4 B. 10.1 C. 16.0 D. 14.2 Slide 4-47 Copyright 2010 Pearson Educa-on, Inc.
The least squares regression line for temperature (x) and number of ice cream cones sold per hour (y) is Predict the number of ice cream cones sold per hour when the temperature is 88º. A. 51.4 B. 10.1 C. 16.0 D. 14.2 Slide 4-48 Copyright 2010 Pearson Educa-on, Inc.
The data for temperature (x) and number of ice cream cones sold per hour (y) is shown. x 65 70 75 80 85 90 95 100 105 y 8 10 11 13 12 16 19 22 23 It would be reasonable to use the least squares regression line to predict the number of ice cream cones sold when it is 50 degrees. A. True B. False Slide 4-49 Copyright 2010 Pearson Educa-on, Inc.
The data for temperature (x) and number of ice cream cones sold per hour (y) is shown. x 65 70 75 80 85 90 95 100 105 y 8 10 11 13 12 16 19 22 23 It would be reasonable to use the least squares regression line to predict the number of ice cream cones sold when it is 50 degrees. A. True B. False Slide 4-50 Copyright 2010 Pearson Educa-on, Inc.
2010 Pearson Pren-ce Hall. All rights 4-51
To illustrate the fact that the sum of squared residuals for a least- squares regression line is less than the sum of squared residuals for any other line, use the regression by eye applet. 2010 Pearson Pren-ce Hall. All rights 4-52
Section 4.3 The Coefficient of Determination 2010 Pearson Pren-ce Hall. All rights 4-53
2010 Pearson Pren-ce Hall. All rights 4-54
The coefficient of determina:on, R 2, measures the propor-on of total varia-on in the response variable that is explained by the least- squares regression line. The coefficient of determination is a number between 0 and 1, inclusive. That is, 0 < R 2 < 1. If R 2 = 0 the line has no explanatory value If R 2 = 1 means the line variable explains 100% of the variation in the response variable. 2010 Pearson Pren-ce Hall. All rights 4-55
The data to the right are based on a study for drilling rock. The researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the predictor variable, x, and time (in minutes) to drill five feet is the response variable, y. Source: Penner, R., and Watts, D.G. Mining Information. The American Statistician, Vol. 45, No. 1, Feb. 1991, p. 6. 2010 Pearson Pren-ce Hall. All rights 4-56
2010 Pearson Pren-ce Hall. All rights 4-57
Sample Statistics Mean Standard Deviation Depth 126.2 52.2 Time 6.99 0.781 Correlation Between Depth and Time: 0.773 Regression Analysis The regression equation is Time = 5.53 + 0.0116 Depth 2010 Pearson Pren-ce Hall. All rights 4-58
Suppose we were asked to predict the time to drill an additional 5 feet, but we did not know the current depth of the drill. What would be our best guess? 2010 Pearson Pren-ce Hall. All rights 4-59
Suppose we were asked to predict the time to drill an additional 5 feet, but we did not know the current depth of the drill. What would be our best guess? ANSWER: The mean time to drill an additional 5 feet: 6.99 minutes 2010 Pearson Pren-ce Hall. All rights 4-60
Now suppose that we are asked to predict the time to drill an additional 5 feet if the current depth of the drill is 160 feet? ANSWER: Our guess increased from 6.99 minutes to 7.39 minutes based on the knowledge that drill depth is positively associated with drill time. 2010 Pearson Pren-ce Hall. All rights 4-61
2010 Pearson Pren-ce Hall. All rights 4-62
The difference between the observed value of the response variable and the mean value of the response variable is called the total deviation and is equal to 2010 Pearson Pren-ce Hall. All rights 4-63
The difference between the predicted value of the response variable and the mean value of the response variable is called the explained deviation and is equal to 2010 Pearson Pren-ce Hall. All rights 4-64
The difference between the observed value of the response variable and the predicted value of the response variable is called the unexplained deviation and is equal to 2010 Pearson Pren-ce Hall. All rights 4-65
2010 Pearson Pren-ce Hall. All rights 4-66
Total Variation = Unexplained Variation + Explained Variation 2010 Pearson Pren-ce Hall. All rights 4-67
Total Variation = Unexplained Variation + Explained Variation 1 = Unexplained Variation Total Variation + Explained Variation Total Variation Explained Variation Total Variation = 1 Unexplained Variation Total Variation 2010 Pearson Pren-ce Hall. All rights 4-68
To determine R 2 for the linear regression model simply square the value of the linear correla-on coefficient. 2010 Pearson Pren-ce Hall. All rights 4-69
EXAMPLE Determining the Coefficient of Determination Find and interpret the coefficient of determination for the drilling data. Because the linear correlation coefficient, r, is 0.773, we have that R 2 = 0.773 2 = 0.5975 = 59.75%. So, 59.75% of the variability in drilling time is explained by the least-squares regression line. 2010 Pearson Pren-ce Hall. All rights 4-70
Calculate the coefficient of determination r 2, for temperature (x) and number of ice cream cones sold per hour (y). A. 0.946 B. 0.973 C. 0.923 D. 0.986 x 65 70 75 80 85 90 95 100 105 y 8 10 11 13 12 16 19 22 23 Slide 4-71 Copyright 2010 Pearson Educa-on, Inc.
Calculate the coefficient of determination r 2, for temperature (x) and number of ice cream cones sold per hour (y). A. 0.946 B. 0.973 C. 0.923 D. 0.986 x 65 70 75 80 85 90 95 100 105 y 8 10 11 13 12 16 19 22 23 Slide 4-72 Copyright 2010 Pearson Educa-on, Inc.
Draw a scatter diagram for each of these data sets. For each data set, the variance of y is 17.49. 2010 Pearson Pren-ce Hall. All rights 4-73
Data Set A Data Set B Data Set C 2010 Pearson Pren-ce Hall. All rights 4-74