Question Details

[answered] Module 09 MLR (Contd.): "Higher order" mo


The correct answer is Yes-- I just need to understand what numbers are used to come to this conclusion.


Is this relation statistically significant?

Yes or No


Module 09

 

MLR (Contd.): ?Higher order? models and error diagnostics

 

WORKSHEET

 

Module Content:

 

Chapter 5 (Readings: Sections 1 and 2): Pages 179 to 182; Chapter 6 (Readings: Sections

 

1 to 3, and 6 to 8) Pages 205 to 209; and 230 to 259. As before, the Text is the primary reading;

 

and previews and other material given here are of supplementary nature. Concepts: Listed below are reviewed using these power points.

 

?Curvilinear? Models for ?Linear? Regression?

 

Assumptions underlying Regression

 

Influence statistics

 

Correlated errors: Durbin-Watson (DW), and a quick fix Tools:

 

Excel with Data Analysis, KPK Macros, and the t-, F-, and DW - Tables. Illustration and assessment work: The easiest way we can model any quadratic in Xi is by creating a column of Xi2?s and including

 

it as another X in the MLR. Even higher order and other functional relations can be formulated

 

for inclusion in MLR in similar manner. This is covered in Chapter 5; and is the first topic

 

covered in this module. Models make best use of the modeler?s feel for the process.

 

The second topic covered in this Module is the conformity of the errors to the assumptions made

 

regarding them. These relate to their Normality and Independence (Text: pages.229 ? 260).

 

The assignment uses three time series: one (problem 6.16, Text page 266) on the growth of the

 

Byron Nelson Donations (DONATIONS6), a second (problem 6.14, Text page 265) on overall

 

growth of the United States Population (USPOP6), and a third (problem 5.7, Text page 201:

 

Please note this is from chapter 5) on the growth of the Kentucky derby bets (DERBY5). We

 

will use KPK macros to get error diagnostics, mainly standard errors, leverages, Cook?s distance

 

measures (Text: pages.239 ? 248), VIFs (Text: pages. 161 ? 163), and the Durbin Watson

 

Statistic (Text: pages. 254 ? 260) statistic for the first order autocorrelation of errors, on the first

 

data set. Then you will use a similar approach to model the second time series. The preparatory

 

quiz (PQ009) will be based on this second problem (6.14: parts a thru d, and h, predict only for

 

year 2000). Finally, Problem #5.8 using the growth of BETS, and its analysis performed in

 

exactly similar manner (use only linear trend, and predict the BETS only for year 1993) will be

 

used for TQ009. The other assigned problems (6.5, 9, and 15) should further help you assimilate

 

these concepts.

 

For problem # 6.16, first we run Regression in the standard Excel (with the default error

 

diagnostics enabled). [To perform the multiple linear regression (MLR) for DONATION on

 

Time as an Index variable, we retrieve the data in Excel; define a new variable taking 1981 as

 

base. So, start with 1982 coded as 1 and then use increments of one to represent consecutive

 

years. (There are many ways to achieve same results. This illustration is only one such.) Define

 

the variable INDXYR in column C, and fill in numbers 1 thru 21 in rows 2 thru 22. Now, we

 

regress DONATION on INDEXYR, making sure we have selected residuals, standardized

 

residuals, residual plots, line-fit plots, and Normal probability plots; to obtain the following.] The default output:

 

SUMMARY OUTPUT

 

Regression Statistics

 

Multiple R

 

0.968

 

R Square

 

0.936

 

Adjusted R

 

Square

 

0.933

 

Standard Error

 

0.447

 

Observations

 

21.000

 

ANOVA

 

df

 

1.000

 

19.000

 

20.000 SS

 

55.820

 

3.788

 

59.608 Coefficients

 

0.206

 

0.269 Standard

 

Error

 

0.202

 

0.016 Regression

 

Residual

 

Total Intercept

 

INDEXYR MS

 

55.820

 

0.199 F

 

279.989 Significance

 

F

 

0.000 t Stat

 

1.019

 

16.733 P-value

 

0.321

 

0.000 Lower 95%

 

-0.217

 

0.236 Upper

 

95%

 

0.629

 

0.303 The residual, standardized residual, and normal probability plot results:

 

PROBABILITY

 

OUTPUT RESIDUAL OUTPUT Observation

 

1.000

 

2.000

 

3.000

 

4.000

 

5.000

 

6.000

 

7.000

 

8.000

 

9.000

 

10.000

 

11.000

 

12.000

 

13.000

 

14.000

 

15.000

 

16.000

 

17.000

 

18.000

 

19.000

 

20.000

 

21.000 Predicted

 

DONATION

 

0.475

 

0.744

 

1.014

 

1.283

 

1.552

 

1.821

 

2.091

 

2.360

 

2.629

 

2.898

 

3.168

 

3.437

 

3.706

 

3.975

 

4.245

 

4.514

 

4.783

 

5.052

 

5.322

 

5.591

 

5.860 Residuals

 

0.375

 

0.456

 

0.326

 

0.257

 

-0.032

 

-0.211

 

-0.051

 

-0.620

 

-0.239

 

-0.258

 

-0.118

 

0.193

 

-0.506

 

-0.625

 

-0.415

 

-0.164

 

-0.433

 

0.868

 

0.778

 

0.509

 

-0.090 Standard

 

Residuals

 

0.861

 

1.047

 

0.750

 

0.591

 

-0.074

 

-0.486

 

-0.116

 

-1.424

 

-0.549

 

-0.594

 

-0.270

 

0.444

 

-1.163

 

-1.437

 

-0.953

 

-0.377

 

-0.995

 

1.994

 

1.789

 

1.170

 

-0.207 Percentile

 

2.381

 

7.143

 

11.905

 

16.667

 

21.429

 

26.190

 

30.952

 

35.714

 

40.476

 

45.238

 

50.000

 

54.762

 

59.524

 

64.286

 

69.048

 

73.810

 

78.571

 

83.333

 

88.095

 

92.857

 

97.619 DONATION

 

0.850

 

1.200

 

1.340

 

1.520

 

1.540

 

1.610

 

1.740

 

2.040

 

2.390

 

2.640

 

3.050

 

3.200

 

3.350

 

3.630

 

3.830

 

4.350

 

4.350

 

5.770

 

5.920

 

6.100

 

6.100 Residual plot/s: Line Fit (trend) plot: Normal Probability plot: Now, to perform some additional diagnostic checks on the errors, we use KPK macros and

 

regress DONATION on INDEXYR, selecting Leverage, Cook?s Distance, and Durbin Watson

 

from the menu. This generates the following (additional) output. Durbin-Watson statistic = 0.942434

 

RESIDUAL OUTPUT

 

Observation

 

1.000

 

2.000

 

3.000

 

4.000

 

5.000

 

6.000

 

7.000

 

8.000

 

9.000

 

10.000

 

11.000

 

12.000

 

13.000

 

14.000

 

15.000

 

16.000

 

17.000

 

18.000

 

19.000

 

20.000

 

21.000 Leverages

 

0.177

 

0.153

 

0.131

 

0.111

 

0.094

 

0.080

 

0.068

 

0.059

 

0.053

 

0.049

 

0.048

 

0.049

 

0.053

 

0.059

 

0.068

 

0.080

 

0.094

 

0.111

 

0.131

 

0.153

 

0.177 Cook's D

 

0.092

 

0.111

 

0.046

 

0.023

 

0.000

 

0.011

 

0.001

 

0.065

 

0.008

 

0.009

 

0.002

 

0.005

 

0.038

 

0.066

 

0.034

 

0.006

 

0.054

 

0.266

 

0.263

 

0.138

 

0.005 We note the clear correlation of errors, [The critical Durbin Watson Statistics for R > 0, at 5%

 

alpha, corresponding to k=1 and n = 21 are 1.22 1nd 1.42. The sample DW* = 0.94 shows

 

positive error correlation.] To remedy and improve the model, we will include prior years

 

donation as a predictor for DONATION and run the two-variable regression.

 

To create a lagged Y- variable, please copy content of column B (rows 2 through 21) to the new

 

column D (rows 3 through 22) and label the variable as DONLSTYR or LAG-1-DON. (You may

 

use any label, but note that the first entry (cell D-2) will be missing.) We should also have the

 

donation for 2002 in the row for 2003. It can be used to predict the real donation for 2003. Now

 

we can delete row 2 (for any m-lagged model, we will lose ?m? observations; but will be able to

 

forecast ?m? periods forward), and regress DONATION on INDEXYR and DONLSTYR. We

 

will run the regression again using the KPK macros, choosing options Leverages, VIF?s, Cooks

 

Distance, and Durbin Watson. The new printouts should be:

 

The default output:

 

SUMMARY OUTPUT

 

Regression Statistics

 

Multiple R

 

0.975

 

R Square

 

0.951

 

Adjusted R

 

Square

 

0.945

 

Standard Error

 

0.394

 

Observations

 

20 ANOVA Regression

 

Residual

 

Total df

 

2

 

17

 

19 SS

 

51.332

 

2.637

 

53.968 Intercept

 

INDEXYR

 

DONLSTYR Coefficients

 

0.157

 

0.137

 

0.509 Standard

 

Error

 

0.197

 

0.057

 

0.203 MS

 

25.666

 

0.155 F

 

165.491 Significance

 

F

 

0.000 t Stat

 

0.799

 

2.411

 

2.515 P-value

 

0.435

 

0.027

 

0.022 Lower 95%

 

-0.258

 

0.017

 

0.082 Upper

 

95%

 

0.573

 

0.257

 

0.937 VIF

 

13.895

 

13.895 And the additional diagnostics are:

 

Durbin-Watson statistic = 1.935981

 

RESIDUAL OUTPUT Observation

 

1

 

2

 

3

 

4

 

5

 

6

 

7

 

8

 

9

 

10

 

11

 

12

 

13

 

14

 

15

 

16

 

17

 

18

 

19

 

20 Leverages

 

0.225

 

0.215

 

0.164

 

0.132

 

0.096

 

0.092

 

0.069

 

0.161

 

0.069

 

0.069

 

0.054

 

0.063

 

0.129

 

0.175

 

0.128

 

0.104

 

0.166

 

0.327

 

0.312

 

0.250 Cook's

 

D

 

0.090

 

0.019

 

0.011

 

0.004

 

0.005

 

0.002

 

0.053

 

0.006

 

0.002

 

0.000

 

0.010

 

0.054

 

0.047

 

0.005

 

0.001

 

0.035

 

0.595

 

0.157

 

0.011

 

0.136 And, the predicted Donations for 2003 (corresponding to INDXYR=22) using this model to

 

forecast are: X 1 or

 

INDEXYR

 

22 X

 

Variable

 

2

 

5.77 Predicted

 

Value

 

6.117 Std Error

 

Prediction

 

0.221 Lower

 

95%

 

Mean

 

5.652 Upper

 

95%

 

Mean

 

6.582 Lower

 

95%

 

Predict

 

5.019 Upper

 

95%

 

Predict

 

7.215 Clearly, R-squared is better (93% to 95%); both variables are significant at 5% alpha; errors are

 

no longer co-related, with no serious outliers (as indicated by leverages and Cook?s D?s). The

 

VIF?s indicate that DONATIONs are related to INDEXYRs. (We already knew that; and that is the idea behind having proposed the model in the first place. Thus we overlook the VIF?s over

 

10, in this special situation.)

 

A sample of the questions you may answer (using the above output) may be:

 

1. What equation presents the relation between DONATION and INDXYR,

 

i.e., DONATION = ?..?

 

A. 0.206 + 0.269 INDXYR

 

B.0.232 + 0.127 INDXYR C. 0.188 + 0.016 INDXYR

 

D. 2.526 + 16.733 INDXYR

 

E. None of these

 

2. Is this relation statistically significant?

 

A. Yes

 

B. No

 

3. What is the coefficient of determination for the model?

 

A. 0.948

 

B. 0.509

 

C. 0.936

 

D. 0.974

 

E. 0.447

 

4. The standardized residuals show there are --- two- sigma outliers in the data set.

 

A. 1

 

B. 2

 

C. 3

 

D. 4

 

E. None.

 

5. The normal probability plot shows that errors are approximately normally distributed.

 

A. True

 

B. False

 

6. The test for serial autocorrelation of errors is done using the Durbin-Watson Statistic. What is

 

the critical value (i.e., Table value ? Refer Text: page B-11, Table B-7) below which you will

 

conclude a ?positive correlation exists??

 

A. 1.2

 

B. 1.41

 

C. 1.22

 

D. 1.42

 

E. None of these.

 

7. If the first model is significant, then the second model (with the added LAG variable) will

 

always be statistically significant.

 

A. True

 

B. False.

 

8. The R- square for the second model is:

 

A. 0.951

 

B. 0.509

 

C. 0.936 D. 0.974 E. 0.447 9. The computed statistic for the test of the significance of the overall (including the LAG

 

variable) is:

 

A. 17

 

B. 166

 

C.280

 

D. 3

 

E. 2

 

10. The error diagnostics (from the second regression) show --- outliers.

 

A. 1

 

B. 2

 

C. 3

 

D. 4

 

E. None

 

Solutions key: 1: A, 2: A, 3: C, 4: E. Note that the other metrics in the output (Leverage, Cooks

 

D etc. confirm this result. Text: pages. 242 ? 245) 5: A. Please note that errors in the line-fit-plot

 

follow an approximate straight line, on visual inspection (Contrast this with the plot in Figure

 

6.41, Text: page. 238), 6:C. The sample error correlation is 0.54, with the resulting Durbin

 

Watson statistic being 0.94. As 0.94 < 1.22, we will conclude that there is significant

 

correlation, and hence propose the LAG model (Text: page. 259). 7: A. Note that the original

 

(significant) variable is still in the regression. 8:A. Note that this is higher than the 0.936 for the

 

base model. Also, note that the MSE has been reduced to 0.153 from 0.199 (significant

 

reduction, given that the data is in millions of dollars). 9:B , 10:E But, please look at the VIFs; the high VIF is to be expected because we know that Y- is heavily correlated to X-, and so

 

should the Lagged Y-. (Please use this link to hear a brief presentation of these solutions). Practice work:

 

Please perform the two regressions for POPULATION (Problem #6:14, page. 265). The first

 

regression sought is on year -- do not forget to create an index for the year, INDXYR. The

 

second regression sought is on year and lagged value of POPULATION. Note that this takes

 

into account the possible changes occurring from year to year due to immigration, etc. Please

 

use the KPK macros to get both these model outputs (complete with VIFs, Durbin-Watson,

 

standardized residuals etc.). Please answer ONLY questions (a) through (d) and (h), predicting

 

only for year 2000, asked in the Text. Please have a copy of the results with your answers, as you

 

proceed to take PQ009. The Lead:

 

In real life, all variables that we use are not continuous and/or quantitative. Qualitative and

 

discrete variables are modeled as ?Dummy? variables? and they often enhance/ reduce the

 

effects of other variables through ?Interactions?. Indicator variables are created and used like

 

index variables. Competency assessment:

 

Please use the data provided in the file named DERBY5 (problem 5.7, Text page 201) and create two

 

models, first using INDEXYR only, and the second using the lagged Y (BETS) and INDEXYR. Use the

 

better model to ?forecast? the BETS for 1993. Please use 5% significance level, where ever needed for

 

the output. The Module?s competency assessment quiz will use this problem. Please take the module assessment quiz (TQ009):

 

The quiz covers the module material, and should be taken on the Date/ time-window posted in the

 

syllabus. Please be warned that this quiz may NOT be identical, but just at the same level of

 

understanding as PQ009 and the earlier ones. This file may be printed using this link.

 


Solution details:
STATUS
Answered
QUALITY
Approved
ANSWER RATING

This question was answered on: Sep 18, 2020

PRICE: $15

Solution~0001013968.zip (25.37 KB)

Buy this answer for only: $15

This attachment is locked

We have a ready expert answer for this paper which you can use for in-depth understanding, research editing or paraphrasing. You can buy it or order for a fresh, original and plagiarism-free copy from our tutoring website www.aceyourhomework.com (Deadline assured. Flexible pricing. TurnItIn Report provided)

Pay using PayPal (No PayPal account Required) or your credit card . All your purchases are securely protected by .
SiteLock

About this Question

STATUS

Answered

QUALITY

Approved

DATE ANSWERED

Sep 18, 2020

EXPERT

Tutor

ANSWER RATING

GET INSTANT HELP/h4>

We have top-notch tutors who can do your essay/homework for you at a reasonable cost and then you can simply use that essay as a template to build your own arguments.

You can also use these solutions:

  • As a reference for in-depth understanding of the subject.
  • As a source of ideas / reasoning for your own research (if properly referenced)
  • For editing and paraphrasing (check your institution's definition of plagiarism and recommended paraphrase).
This we believe is a better way of understanding a problem and makes use of the efficiency of time of the student.

NEW ASSIGNMENT HELP?

Order New Solution. Quick Turnaround

Click on the button below in order to Order for a New, Original and High-Quality Essay Solutions. New orders are original solutions and precise to your writing instruction requirements. Place a New Order using the button below.

WE GUARANTEE, THAT YOUR PAPER WILL BE WRITTEN FROM SCRATCH AND WITHIN YOUR SET DEADLINE.

Order Now