Chap 13-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 13 Multiple Regression Statistics for Business and Economics.

Презентация:



Advertisements
Похожие презентации
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Chap 11-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 11 Hypothesis Testing II Statistics for Business and Economics.
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Chap 15-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 15 Nonparametric Statistics Statistics for Business and Economics.
Chap 7-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 7 Sampling and Sampling Distributions Statistics for Business.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Chap 17-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 17 Analysis of Variance Statistics for Business and Economics.
Time-Series Analysis and Forecasting – Part IV To read at home.
Time-Series Analysis and Forecasting – Part V To read at home.
1 Another useful model is autoregressive model. Frequently, we find that the values of a series of financial data at particular points in time are highly.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Chap 10-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 10 Hypothesis Testing Statistics for Business and Economics.
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 1-1 Chapter 1 Why Study Statistics? Statistics for Business and Economics.
Business Statistics 1-1 Chapter Two Describing Data: Frequency Distributions and Graphic Presentation GOALS When you have completed this chapter, you will.
Chap 5-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 5 Discrete Random Variables and Probability Distributions Statistics.
Time-Series Analysis and Forecasting – Part II Lecture on the 5 th of October.
Correlation. In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to.
Time-Series Analysis and Forecasting Lecture on the 5 th of October.
The Law of Demand The work was done by Daria Beloglazova.
Транксрипт:

Chap 13-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 13 Multiple Regression Statistics for Business and Economics 6 th Edition

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 13-2 Chapter Goals After completing this chapter, you should be able to: Apply multiple regression analysis to business decision- making situations Analyze and interpret the computer output for a multiple regression model Perform a hypothesis test for all regression coefficients or for a subset of coefficients Fit and interpret nonlinear regression models Incorporate qualitative variables into the regression model by using dummy variables Discuss model specification and analyze residuals

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 13-3 The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model with k Independent Variables: Y-intercept Population slopesRandom Error

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 13-4 Multiple Regression Equation The coefficients of the multiple regression model are estimated using sample data Estimated (or predicted) value of y Estimated slope coefficients Multiple regression equation with k independent variables: Estimated intercept In this chapter we will always use a computer to obtain the regression slope coefficients and other regression summary measures.

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 13-5 Two variable model y x1x1 x2x2 Slope for variable x 1 Slope for variable x 2 Multiple Regression Equation (continued)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 13-6 Standard Multiple Regression Assumptions The values x i and the error terms ε i are independent The error terms are random variables with mean 0 and a constant variance, 2. (The constant variance property is called homoscedasticity)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 13-7 Standard Multiple Regression Assumptions (continued) The random error terms, ε i, are not correlated with one another, so that It is not possible to find a set of numbers, c 0, c 1,..., c k, such that (This is the property of no linear relation for the X j s)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 13-8 Example: 2 Independent Variables A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Advertising ($100s) Data are collected for 15 weeks

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 13-9 Pie Sales Example Sales = b 0 + b 1 (Price) + b 2 (Advertising) Week Pie Sales Price ($) Advertising ($100s) Multiple regression equation:

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Estimating a Multiple Linear Regression Equation Excel will be used to generate the coefficients and measures of goodness of fit for multiple regression Excel: Tools / Data Analysis... / Regression PHStat: PHStat / Regression / Multiple Regression…

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Multiple Regression Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap The Multiple Regression Equation b 1 = : sales will decrease, on average, by pies per week for each $1 increase in selling price, net of the effects of changes due to advertising b 2 = : sales will increase, on average, by pies per week for each $100 increase in advertising, net of the effects of changes due to price where Sales is in number of pies per week Price is in $ Advertising is in $100s.

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Coefficient of Determination, R 2 Reports the proportion of total variation in y explained by all x variables taken together This is the ratio of the explained variability to total sample variability

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising % of the variation in pie sales is explained by the variation in price and advertising Coefficient of Determination, R 2 (continued)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Estimation of Error Variance Consider the population regression model The unbiased estimate of the variance of the errors is where The square root of the variance, s e, is called the standard error of the estimate

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising The magnitude of this value can be compared to the average y value Standard Error, s e

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Adjusted Coefficient of Determination, R 2 never decreases when a new X variable is added to the model, even if the new variable is not an important predictor variable This can be a disadvantage when comparing models What is the net effect of adding a new variable? We lose a degree of freedom when a new X variable is added Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Used to correct for the fact that adding non-relevant independent variables will still reduce the error sum of squares (where n = sample size, K = number of independent variables) Adjusted R 2 provides a better comparison between multiple regression models with different numbers of independent variables Penalize excessive use of unimportant independent variables Smaller than R 2 (continued) Adjusted Coefficient of Determination,

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising % of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Coefficient of Multiple Correlation The coefficient of multiple correlation is the correlation between the predicted value and the observed value of the dependent variable Is the square root of the multiple coefficient of determination Used as another measure of the strength of the linear relationship between the dependent variable and the independent variables Comparable to the correlation between Y and X in simple regression

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Evaluating Individual Regression Coefficients Use t-tests for individual coefficients Shows if a specific independent variable is conditionally important Hypotheses: H 0 : β j = 0 (no linear relationship) H 1 : β j 0 (linear relationship does exist between x j and y)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap H 0 : β j = 0 (no linear relationship) H 1 : β j 0 (linear relationship does exist between x i and y) Test Statistic: ( df = n – k – 1) (continued) Evaluating Individual Regression Coefficients

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising t-value for Price is t = , with p-value.0398 t-value for Advertising is t = 2.855, with p-value.0145 (continued) Evaluating Individual Regression Coefficients

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap d.f. = = 12 =.05 t 12,.025 = H 0 : β j = 0 H 1 : β j 0 The test statistic for each variable falls in the rejection region (p-values <.05) There is evidence that both Price and Advertising affect pie sales at =.05 From Excel output: Reject H 0 for each variable CoefficientsStandard Errort StatP-value Price Advertising Decision: Conclusion: Reject H 0 /2=.025 -t α/2 Do not reject H 0 0 t α/2 /2= Example: Evaluating Individual Regression Coefficients

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Confidence Interval Estimate for the Slope Confidence interval limits for the population slope β j Example: Form a 95% confidence interval for the effect of changes in price (x 1 ) on pie sales: ± (2.1788)(10.832) So the interval is < β 1 < CoefficientsStandard Error Intercept Price Advertising where t has (n – K – 1) d.f. Here, t has (15 – 2 – 1) = 12 d.f.

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Confidence Interval Estimate for the Slope Confidence interval for the population slope β i Example: Excel output also reports these interval endpoints: Weekly sales are estimated to be reduced by between 1.37 to pies for each increase of $1 in the selling price CoefficientsStandard Error…Lower 95%Upper 95% Intercept … Price … Advertising … (continued)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Test on All Coefficients F-Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered together and Y Use F test statistic Hypotheses: H 0 : β 1 = β 2 = … = β k = 0 (no linear relationship) H 1 : at least one β i 0 (at least one independent variable affects Y)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap F-Test for Overall Significance Test statistic: where F has k (numerator) and (n – K – 1) (denominator) degrees of freedom The decision rule is

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Price Advertising (continued) F-Test for Overall Significance With 2 and 12 degrees of freedom P-value for the F-Test

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap H 0 : β 1 = β 2 = 0 H 1 : β 1 and β 2 not both zero =.05 df 1 = 2 df 2 = 12 Test Statistic: Decision: Conclusion: Since F test statistic is in the rejection region (p- value <.05), reject H 0 There is evidence that at least one independent variable affects Y 0 =.05 F.05 = Reject H 0 Do not reject H 0 Critical Value: F = F-Test for Overall Significance (continued) F

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Consider a multiple regression model involving variables x j and z j, and the null hypothesis that the z variable coefficients are all zero: Tests on a Subset of Regression Coefficients

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Goal: compare the error sum of squares for the complete model with the error sum of squares for the restricted model First run a regression for the complete model and obtain SSE Next run a restricted regression that excludes the z variables (the number of variables excluded is r) and obtain the restricted error sum of squares SSE(r) Compute the F statistic and apply the decision rule for a significance level Tests on a Subset of Regression Coefficients (continued)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Prediction Given a population regression model then given a new observation of a data point (x 1,n+1, x 2,n+1,..., x K,n+1 ) the best linear unbiased forecast of y n+1 is It is risky to forecast for new X values outside the range of the data used to estimate the model coefficients, because we do not have data to support that the linear model extends beyond the observed range. ^

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Using The Equation to Make Predictions Predict sales for a week in which the selling price is $5.50 and advertising is $350: Predicted sales is pies Note that Advertising is in $100s, so $350 means that X 2 = 3.5

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Predictions in PHStat PHStat | regression | multiple regression … Check the confidence and prediction interval estimates box

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Input values Predictions in PHStat (continued) Predicted y value < Confidence interval for the mean y value, given these xs < Prediction interval for an individual y value, given these xs <

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Two variable model y x1x1 x2x2 yiyi y i < x 2i x 1i Sample observation Residuals in Multiple Regression Residual = e i = (y i – y i ) <

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap The relationship between the dependent variable and an independent variable may not be linear Can review the scatter diagram to check for non-linear relationships Example: Quadratic model The second independent variable is the square of the first variable Nonlinear Regression Models

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Quadratic Regression Model where: β 0 = Y intercept β 1 = regression coefficient for linear effect of X on Y β 2 = regression coefficient for quadratic effect on Y ε i = random error in Y for observation i Model form:

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Linear fit does not give random residuals Linear vs. Nonlinear Fit Nonlinear fit gives random residuals X residuals X Y X Y X

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Quadratic Regression Model Quadratic models may be considered when the scatter diagram takes on one of the following shapes: X1X1 Y X1X1 X1X1 YYY β 1 < 0β 1 > 0β 1 < 0β 1 > 0 β 1 = the coefficient of the linear term β 2 = the coefficient of the squared term X1X1 β 2 > 0 β 2 < 0

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Testing for Significance: Quadratic Effect Testing the Quadratic Effect Compare the linear regression estimate with quadratic regression estimate Hypotheses (The quadratic term does not improve the model) (The quadratic term improves the model) H 0 : β 2 = 0 H 1 : β 2 0

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Testing for Significance: Quadratic Effect Testing the Quadratic Effect Hypotheses (The quadratic term does not improve the model) (The quadratic term improves the model) The test statistic is H 0 : β 2 = 0 H 1 : β 2 0 (continued) where: b 2 = squared term slope coefficient β 2 = hypothesized slope (zero) S b = standard error of the slope 2

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Testing for Significance: Quadratic Effect Testing the Quadratic Effect Compare R 2 from simple regression to R 2 from the quadratic model If R 2 from the quadratic model is larger than R 2 from the simple model, then the quadratic model is a better model (continued)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Example: Quadratic Model Purity increases as filter time increases: Purity Filter Time

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Example: Quadratic Model (continued) Regression Statistics R Square Adjusted R Square Standard Error Simple regression results: y = Time Coefficients Standard Errort StatP-value Intercept Time E-10 FSignificance F E-10 ^ t statistic, F statistic, and R 2 are all high, but the residuals are not random:

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Coefficients Standard Errort StatP-value Intercept Time Time-squared E-05 Regression Statistics R Square Adjusted R Square Standard Error FSignificance F E-13 Quadratic regression results: y = Time (Time) 2 ^ Example: Quadratic Model (continued) The quadratic term is significant and improves the model: R 2 is higher and s e is lower, residuals are now random

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Original multiplicative model Transformed multiplicative model The Log Transformation The Multiplicative Model:

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Interpretation of coefficients For the multiplicative model: When both dependent and independent variables are logged: The coefficient of the independent variable X k can be interpreted as a 1 percent change in X k leads to an estimated b k percentage change in the average value of Y b k is the elasticity of Y with respect to a change in X k

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Dummy Variables A dummy variable is a categorical independent variable with two levels: yes or no, on or off, male or female recorded as 0 or 1 Regression intercepts are different if the variable is significant Assumes equal slopes for other variables If more than two levels, the number of dummy variables needed is (number of levels - 1)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Dummy Variable Example Let: y = Pie Sales x 1 = Price x 2 = Holiday (X 2 = 1 if a holiday occurred during the week) (X 2 = 0 if there was no holiday that week)

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Same slope Dummy Variable Example (continued) x 1 (Price) y (sales) b 0 + b 2 b0b0 Holiday No Holiday Different intercept Holiday (x 2 = 1) No Holiday (x 2 = 0) If H 0 : β 2 = 0 is rejected, then Holiday has a significant effect on pie sales

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Sales: number of pies sold per week Price: pie price in $ Holiday: Interpreting the Dummy Variable Coefficient Example: 1 If a holiday occurred during the week 0 If no holiday occurred b 2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Interaction Between Explanatory Variables Hypothesizes interaction between pairs of x variables Response to one x variable may vary at different levels of another x variable Contains two-way cross product terms

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Effect of Interaction Given: Without interaction term, effect of X 1 on Y is measured by β 1 With interaction term, effect of X 1 on Y is measured by β 1 + β 3 X 2 Effect changes as X 2 changes

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap x 2 = 1: y = 1 + 2x 1 + 3(1) + 4x 1 (1) = 4 + 6x 1 x 2 = 0: y = 1 + 2x 1 + 3(0) + 4x 1 (0) = 1 + 2x 1 Interaction Example Slopes are different if the effect of x 1 on y depends on x 2 value x1x y Suppose x 2 is a dummy variable and the estimated regression equation is ^ ^

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Significance of Interaction Term The coefficient b 3 is an estimate of the difference in the coefficient of x 1 when x 2 = 1 compared to when x 2 = 0 The t statistic for b 3 can be used to test the hypothesis If we reject the null hypothesis we conclude that there is a difference in the slope coefficient for the two subgroups

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Multiple Regression Assumptions Assumptions: The errors are normally distributed Errors have a constant variance The model errors are independent e i = (y i – y i ) < Errors ( residuals ) from the regression model:

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Analysis of Residuals in Multiple Regression These residual plots are used in multiple regression: Residuals vs. y i Residuals vs. x 1i Residuals vs. x 2i Residuals vs. time (if time series data) < Use the residual plots to check for violations of regression assumptions

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap Chapter Summary Developed the multiple regression model Tested the significance of the multiple regression model Discussed adjusted R 2 ( R 2 ) Tested individual regression coefficients Tested portions of the regression model Used quadratic terms and log transformations in regression models Used dummy variables Evaluated interaction effects Discussed using residual plots to check model assumptions