336 Final Review Notes Regression Goal: identify the function that describes the relationship between a continuous dependent variable and one or more independent variable(s) SLR = Simple Linear Regression involves 1 variable MLR = multiple Linear Regression involves more than 1 variable
SLR We are trying to estimate: Y= B0 + B1X Where Y = dependent variable B0 = True y-intercept B1 = True Slope (the amount by with the line rises/falls per increase in an additional unit) X = Independent Variable Estimated Function: ŷ = b0 + b1X HOW TO SOLVE: 1. Identify the Dependent Variable (Y) and Independent Variable (X) 2. Create a Scatterplot and look for general linear regression 3. SOLVE using regression on a calculator a. Calculate and generate “co-efficient” of the best fit line, which are b o (a) and b1 (b) i. a or b0 = estimated y-intercept: math: the value of y when x=0; or means that if you do not study, on average you will receive 33.41% ii. b or b1 = estimated slope: math: the rate of change in y as x increases by 1; or for every additional hour studied, we would expect on average that the final grade would increase by . 8225% 4. Evaluate the Quality of the model by using: a. Coefficient of Correlation (r) – describes how strong the linear relationship is. If r is between -1 ≤ r ≤ 1, the linear relationship is strong b. Coefficient of Differentiation (r) – describes the goodness of fit. If r 2 is between 0 ≤ r2 ≤ 1, the fit of the model is good. i. R2 = ___% of variability in predicting the (dependent variable) is explained by knowing the (independent variable) ESS – -
Error Sum of Squares Always non-negative thus ESS ≥ 0 IF ESS = 0, then Y1 - Ŷ1 = 0 means a perfect fit with is RARE Thus, our goal is to minimize ESS as close to zero to find the best b 0 and b1 by using method of least squares (TSS = RSS + ESS) Regression Report RSS is the regression ss, ESS is the residual ss, and TSS is the total ss
Hypothesis Testing 1. Hypothesis a. H0 = 0 (means that there is no linear relationship between x and y) b. Ha ≠ 0 (means that there is a linear relationship that exists between x and y) c. Α = 0.05 (level of significance) 2. Decision Rule (3 Methods) i. T-TEST a. We REJECT H0 if ttest > -tα/2 or ttest > tα/2 b. Ttest = (b1 – B1)/ Se(b1) c. T-chart!! Df = n-2 where n = sample size d. *NOTE* two-tailed ii. USING P-VALUES a. REJECT H0 if p-value < α b. We estimate the p-value by using the t-table (the upper tail area probability) c. *NOTE* two-tailed, make sure to multiple each probability by 2 iii. 95% CONFIDENCE INTERVAL a. CI = b1 + tα/2 (Seb1) b. REJECT H0 if Confidence Interval DOES NOT contain 0 3. Draw a Conclusion
MLR (Multiple Linear Regression) -
Linearity – best fit line Normality – every value of X is normally distributed around the regression Homoscedasticity – random patterns of residual error Independence of errors – any error is independent from other errors b1 represents the change in y with an increase in one unit of x, when all other variables are held constant Evaluating Regression Models: o Adjusted R2 o Hypothesis Testing -> Overall Significance (only used before performing backwards ML)
Forward Regression Determining the “best model” by finding SLR of X 1; then finding X1 + X2 and X1 + X3, finding the best model, then finding X1 + X2 + X3 then evaluating which model is best by adjusted r2 and amount of violations.
Multicolinearity
We want y(dependent) and x (independent) variables to by highly correlated, however multicolinearity means that independent variables (Xs) are correlated with each other CAUSES: regression coefficients to have the “wrong sign”, t-values too small, and p-values too big. SOLVE WITH: overall significances F Test (first) and Correlation Matrix (last)
Backward Multiple Regression 1. Overall F-Test to determine if at least 1 independent variable is a significant predictor a. Hypothesis: H0 = B1 = B2 = B3 … = 0 (meaning all variables are unimportant to predict) Ha = At least one Bk ≠ 0 (meaning at least one variable is important at predicting) b. Decision Rule: P-value approach: REJECT H 0 if p-value < α c. Make a conclusion in Business Context – “based on this test, a 5% level significance, we can say that AT LEAST one of the independent variables is significant in predicting the dependent variable. 2. Continue to re-run and remove independent variables that have violates, eg. If their p-values < α 3. Check MULTICOLLINEARITY with Correlation Matrix a. Anything that is over ±0.6 may present a problem b. Check the signs of the coefficients of the correlation matrix with the coefficients of your regression model for “incorrect” signs
Dummy Variables
All coefficients are relative to the reference (dummy) variable o Eg. If the coefficient is -17, that variable sells 17 less units than the reference variable o SEE LECTURE NOTES EXAMPLE
Linear Programming (LP) 1. 2. 3.
A decision to be made with the respect to allocation of resources Constraints An objective function such as maximizing profits or minimizing costs Problems with 2 decision variables can be solved graphically
STEPS: 1. Define the decision variables 2. State the Objective Function 3. State the Constraints 4. State the Non-Negativity Constraints 5. FOR 2 DECISION VARIABLES ONLY a. Graph the relationships b. Find the feasible area c. Location the optimal point by using the objective function d. Identify the constraints involved with the optimal point and solve
Special LP Conditions 1. Alternate Optimal Solutions – more than one “best” solution 2. Redundant Constraints – plays NO part in determining the feasible area 3. Unbounded Solutions – solution with infinitely large or small solution, no limit to the decision variables
4. Infeasibility – No way to satisfy all constraints; no feasible area
100% Rule Sensitivity report is valid if: 2 or more objective function coefficients change simultaneously or 2 RHS constraints change simultaneously Eg. P= 20X1 + 30X2 ; X1 drops $2 and X2 increases $5 2/5 + 5/10 ; where 5 is the allowable decrease for X 1 and 10 is the allowable increase for X2 = 9/10 = 90% THUS, 90% < 100% thus the sensitivity report is valid
Value of Objective Function; in this case, maximum profits with optimal solution
Answer Report
Optimal Solution
Slack – the amount of resources that is not being used (helps determine binding/nonbinding constraints)
Sensitivity Report
Reduced Cost – For every production of this product, profits are reduced by this much; item must contribute this amount to improve optimal solution
Shadow Price – the maximum price to pay for one more unit; amount that increases objective function with one more unit
Max change in objective coefficients without changing the optimal solution (objective function still changes tho) Max change in RHS constraints without changing the optimal solution (objective