## [answered] Module 09 MLR (Contd.): &quot;Higher order&quot; mo

Microsoft Word - Assgt09.doc

The error diagnostics (from the second regression) show --- outliers.

The correct answer is none. I need to understand the logic behind the answer.

Module 09

MLR (Contd.): ?Higher order? models and error diagnostics

WORKSHEET

Module Content:

Chapter 5 (Readings: Sections 1 and 2): Pages 179 to 182; Chapter 6 (Readings: Sections

1 to 3, and 6 to 8) Pages 205 to 209; and 230 to 259. As before, the Text is the primary reading;

and previews and other material given here are of supplementary nature. Concepts: Listed below are reviewed using these power points.

?Curvilinear? Models for ?Linear? Regression?

Assumptions underlying Regression

Influence statistics

Correlated errors: Durbin-Watson (DW), and a quick fix Tools:

Excel with Data Analysis, KPK Macros, and the t-, F-, and DW - Tables. Illustration and assessment work: The easiest way we can model any quadratic in Xi is by creating a column of Xi2?s and including

it as another X in the MLR. Even higher order and other functional relations can be formulated

for inclusion in MLR in similar manner. This is covered in Chapter 5; and is the first topic

covered in this module. Models make best use of the modeler?s feel for the process.

The second topic covered in this Module is the conformity of the errors to the assumptions made

regarding them. These relate to their Normality and Independence (Text: pages.229 ? 260).

The assignment uses three time series: one (problem 6.16, Text page 266) on the growth of the

Byron Nelson Donations (DONATIONS6), a second (problem 6.14, Text page 265) on overall

growth of the United States Population (USPOP6), and a third (problem 5.7, Text page 201:

Please note this is from chapter 5) on the growth of the Kentucky derby bets (DERBY5). We

will use KPK macros to get error diagnostics, mainly standard errors, leverages, Cook?s distance

measures (Text: pages.239 ? 248), VIFs (Text: pages. 161 ? 163), and the Durbin Watson

Statistic (Text: pages. 254 ? 260) statistic for the first order autocorrelation of errors, on the first

data set. Then you will use a similar approach to model the second time series. The preparatory

quiz (PQ009) will be based on this second problem (6.14: parts a thru d, and h, predict only for

year 2000). Finally, Problem #5.8 using the growth of BETS, and its analysis performed in

exactly similar manner (use only linear trend, and predict the BETS only for year 1993) will be

used for TQ009. The other assigned problems (6.5, 9, and 15) should further help you assimilate

these concepts.

For problem # 6.16, first we run Regression in the standard Excel (with the default error

diagnostics enabled). [To perform the multiple linear regression (MLR) for DONATION on

Time as an Index variable, we retrieve the data in Excel; define a new variable taking 1981 as

base. So, start with 1982 coded as 1 and then use increments of one to represent consecutive

years. (There are many ways to achieve same results. This illustration is only one such.) Define

the variable INDXYR in column C, and fill in numbers 1 thru 21 in rows 2 thru 22. Now, we

regress DONATION on INDEXYR, making sure we have selected residuals, standardized

residuals, residual plots, line-fit plots, and Normal probability plots; to obtain the following.] The default output:

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.968

R Square

0.936

Square

0.933

Standard Error

0.447

Observations

21.000

ANOVA

df

1.000

19.000

20.000 SS

55.820

3.788

59.608 Coefficients

0.206

0.269 Standard

Error

0.202

0.016 Regression

Residual

Total Intercept

INDEXYR MS

55.820

0.199 F

279.989 Significance

F

0.000 t Stat

1.019

16.733 P-value

0.321

0.000 Lower 95%

-0.217

0.236 Upper

95%

0.629

0.303 The residual, standardized residual, and normal probability plot results:

PROBABILITY

OUTPUT RESIDUAL OUTPUT Observation

1.000

2.000

3.000

4.000

5.000

6.000

7.000

8.000

9.000

10.000

11.000

12.000

13.000

14.000

15.000

16.000

17.000

18.000

19.000

20.000

21.000 Predicted

DONATION

0.475

0.744

1.014

1.283

1.552

1.821

2.091

2.360

2.629

2.898

3.168

3.437

3.706

3.975

4.245

4.514

4.783

5.052

5.322

5.591

5.860 Residuals

0.375

0.456

0.326

0.257

-0.032

-0.211

-0.051

-0.620

-0.239

-0.258

-0.118

0.193

-0.506

-0.625

-0.415

-0.164

-0.433

0.868

0.778

0.509

-0.090 Standard

Residuals

0.861

1.047

0.750

0.591

-0.074

-0.486

-0.116

-1.424

-0.549

-0.594

-0.270

0.444

-1.163

-1.437

-0.953

-0.377

-0.995

1.994

1.789

1.170

-0.207 Percentile

2.381

7.143

11.905

16.667

21.429

26.190

30.952

35.714

40.476

45.238

50.000

54.762

59.524

64.286

69.048

73.810

78.571

83.333

88.095

92.857

97.619 DONATION

0.850

1.200

1.340

1.520

1.540

1.610

1.740

2.040

2.390

2.640

3.050

3.200

3.350

3.630

3.830

4.350

4.350

5.770

5.920

6.100

6.100 Residual plot/s: Line Fit (trend) plot: Normal Probability plot: Now, to perform some additional diagnostic checks on the errors, we use KPK macros and

regress DONATION on INDEXYR, selecting Leverage, Cook?s Distance, and Durbin Watson

from the menu. This generates the following (additional) output. Durbin-Watson statistic = 0.942434

RESIDUAL OUTPUT

Observation

1.000

2.000

3.000

4.000

5.000

6.000

7.000

8.000

9.000

10.000

11.000

12.000

13.000

14.000

15.000

16.000

17.000

18.000

19.000

20.000

21.000 Leverages

0.177

0.153

0.131

0.111

0.094

0.080

0.068

0.059

0.053

0.049

0.048

0.049

0.053

0.059

0.068

0.080

0.094

0.111

0.131

0.153

0.177 Cook's D

0.092

0.111

0.046

0.023

0.000

0.011

0.001

0.065

0.008

0.009

0.002

0.005

0.038

0.066

0.034

0.006

0.054

0.266

0.263

0.138

0.005 We note the clear correlation of errors, [The critical Durbin Watson Statistics for R &gt; 0, at 5%

alpha, corresponding to k=1 and n = 21 are 1.22 1nd 1.42. The sample DW* = 0.94 shows

positive error correlation.] To remedy and improve the model, we will include prior years

donation as a predictor for DONATION and run the two-variable regression.

To create a lagged Y- variable, please copy content of column B (rows 2 through 21) to the new

column D (rows 3 through 22) and label the variable as DONLSTYR or LAG-1-DON. (You may

use any label, but note that the first entry (cell D-2) will be missing.) We should also have the

donation for 2002 in the row for 2003. It can be used to predict the real donation for 2003. Now

we can delete row 2 (for any m-lagged model, we will lose ?m? observations; but will be able to

forecast ?m? periods forward), and regress DONATION on INDEXYR and DONLSTYR. We

will run the regression again using the KPK macros, choosing options Leverages, VIF?s, Cooks

Distance, and Durbin Watson. The new printouts should be:

The default output:

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.975

R Square

0.951

Square

0.945

Standard Error

0.394

Observations

20 ANOVA Regression

Residual

Total df

2

17

19 SS

51.332

2.637

53.968 Intercept

INDEXYR

DONLSTYR Coefficients

0.157

0.137

0.509 Standard

Error

0.197

0.057

0.203 MS

25.666

0.155 F

165.491 Significance

F

0.000 t Stat

0.799

2.411

2.515 P-value

0.435

0.027

0.022 Lower 95%

-0.258

0.017

0.082 Upper

95%

0.573

0.257

0.937 VIF

13.895

13.895 And the additional diagnostics are:

Durbin-Watson statistic = 1.935981

RESIDUAL OUTPUT Observation

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20 Leverages

0.225

0.215

0.164

0.132

0.096

0.092

0.069

0.161

0.069

0.069

0.054

0.063

0.129

0.175

0.128

0.104

0.166

0.327

0.312

0.250 Cook's

D

0.090

0.019

0.011

0.004

0.005

0.002

0.053

0.006

0.002

0.000

0.010

0.054

0.047

0.005

0.001

0.035

0.595

0.157

0.011

0.136 And, the predicted Donations for 2003 (corresponding to INDXYR=22) using this model to

forecast are: X 1 or

INDEXYR

22 X

Variable

2

5.77 Predicted

Value

6.117 Std Error

Prediction

0.221 Lower

95%

Mean

5.652 Upper

95%

Mean

6.582 Lower

95%

Predict

5.019 Upper

95%

Predict

7.215 Clearly, R-squared is better (93% to 95%); both variables are significant at 5% alpha; errors are

no longer co-related, with no serious outliers (as indicated by leverages and Cook?s D?s). The

VIF?s indicate that DONATIONs are related to INDEXYRs. (We already knew that; and that is the idea behind having proposed the model in the first place. Thus we overlook the VIF?s over

10, in this special situation.)

A sample of the questions you may answer (using the above output) may be:

1. What equation presents the relation between DONATION and INDXYR,

i.e., DONATION = ?..?

A. 0.206 + 0.269 INDXYR

B.0.232 + 0.127 INDXYR C. 0.188 + 0.016 INDXYR

D. 2.526 + 16.733 INDXYR

E. None of these

2. Is this relation statistically significant?

A. Yes

B. No

3. What is the coefficient of determination for the model?

A. 0.948

B. 0.509

C. 0.936

D. 0.974

E. 0.447

4. The standardized residuals show there are --- two- sigma outliers in the data set.

A. 1

B. 2

C. 3

D. 4

E. None.

5. The normal probability plot shows that errors are approximately normally distributed.

A. True

B. False

6. The test for serial autocorrelation of errors is done using the Durbin-Watson Statistic. What is

the critical value (i.e., Table value ? Refer Text: page B-11, Table B-7) below which you will

conclude a ?positive correlation exists??

A. 1.2

B. 1.41

C. 1.22

D. 1.42

E. None of these.

7. If the first model is significant, then the second model (with the added LAG variable) will

always be statistically significant.

A. True

B. False.

8. The R- square for the second model is:

A. 0.951

B. 0.509

C. 0.936 D. 0.974 E. 0.447 9. The computed statistic for the test of the significance of the overall (including the LAG

variable) is:

A. 17

B. 166

C.280

D. 3

E. 2

10. The error diagnostics (from the second regression) show --- outliers.

A. 1

B. 2

C. 3

D. 4

E. None

Solutions key: 1: A, 2: A, 3: C, 4: E. Note that the other metrics in the output (Leverage, Cooks

D etc. confirm this result. Text: pages. 242 ? 245) 5: A. Please note that errors in the line-fit-plot

follow an approximate straight line, on visual inspection (Contrast this with the plot in Figure

6.41, Text: page. 238), 6:C. The sample error correlation is 0.54, with the resulting Durbin

Watson statistic being 0.94. As 0.94 &lt; 1.22, we will conclude that there is significant

correlation, and hence propose the LAG model (Text: page. 259). 7: A. Note that the original

(significant) variable is still in the regression. 8:A. Note that this is higher than the 0.936 for the

base model. Also, note that the MSE has been reduced to 0.153 from 0.199 (significant

reduction, given that the data is in millions of dollars). 9:B , 10:E But, please look at the VIFs; the high VIF is to be expected because we know that Y- is heavily correlated to X-, and so

should the Lagged Y-. (Please use this link to hear a brief presentation of these solutions). Practice work:

Please perform the two regressions for POPULATION (Problem #6:14, page. 265). The first

regression sought is on year -- do not forget to create an index for the year, INDXYR. The

second regression sought is on year and lagged value of POPULATION. Note that this takes

into account the possible changes occurring from year to year due to immigration, etc. Please

use the KPK macros to get both these model outputs (complete with VIFs, Durbin-Watson,

standardized residuals etc.). Please answer ONLY questions (a) through (d) and (h), predicting

proceed to take PQ009. The Lead:

In real life, all variables that we use are not continuous and/or quantitative. Qualitative and

discrete variables are modeled as ?Dummy? variables? and they often enhance/ reduce the

effects of other variables through ?Interactions?. Indicator variables are created and used like

index variables. Competency assessment:

Please use the data provided in the file named DERBY5 (problem 5.7, Text page 201) and create two

models, first using INDEXYR only, and the second using the lagged Y (BETS) and INDEXYR. Use the

better model to ?forecast? the BETS for 1993. Please use 5% significance level, where ever needed for

the output. The Module?s competency assessment quiz will use this problem. Please take the module assessment quiz (TQ009):

The quiz covers the module material, and should be taken on the Date/ time-window posted in the

syllabus. Please be warned that this quiz may NOT be identical, but just at the same level of

understanding as PQ009 and the earlier ones. This file may be printed using this link.

Solution details:
STATUS
QUALITY
Approved

This question was answered on: Sep 18, 2020

Solution~0001013969.zip (25.37 KB)

This attachment is locked

We have a ready expert answer for this paper which you can use for in-depth understanding, research editing or paraphrasing. You can buy it or order for a fresh, original and plagiarism-free copy from our tutoring website www.aceyourhomework.com (Deadline assured. Flexible pricing. TurnItIn Report provided)

STATUS

QUALITY

Approved

Sep 18, 2020

EXPERT

Tutor