The correct answer is Yes-- I just need to understand what numbers are used to come to this conclusion.
Is this relation statistically significant?
Yes or No
Indicator/ Dummy variable Regressions, and stepwise procedures
Chapter 7 (Readings: Sections 1 to 3): Pages 273 to 293; Chapter 8 (Readings: Sections 1
to 4) Pages 311 to 328. As before, the Text is the primary reading; previews and other material
provided in this worksheet are of supplementary nature. Concepts: Listed below are previewed using these power points.
Coding of seasonal data with/ without interactions
Introduction to variable screening procedures Tools:
Excel with Data Analysis, KPK Macros, usual Tables and Stepwise macros. Illustration and assessment work:
This assignment is about creating and using Indicator/ Dummy variables as predictors in
MLRs (Text: pages 273 to 284). The number of possible formulations for these predictor
variables multiply manifold as we propose, and include interactions among them. We will look at
variable screening procedures to limit their number, only as a first step toward model-building.
As before, we will use three problems/datasets from the Text for this Worksheet. First:
problem 7.1 (page 285) utilizing the dataset HARRIS7, to run a multiple regression and answer
the questions asked in the Text. Please note that some output generated by SAS is given in the
Text. We use Excel to replicate it, and generate additional error diagnostics on the side. Then we
run a second model looking for possible improvement to the Model, adding gender-interaction
variables. The mock-quiz in this Worksheet is based on these two models. Please use similar
approach to the second problem 7.7 (page 301) to analyze the work orders data at the Texas
Christian University (WORKORD7). ?BUILDING? type will need to be coded using dummy
variables. Additional explanatory variables are NOT required to answer the questions raised in
the text. The preparatory quiz (PQ010) will be based on this second problem. Finally, the third
problem is 7.10 (page 302) using the dataset BIGTEX7 to explore possible gender discrimination
in wages. Only to ensure uniformity, an exact procedure to follow is laid out later in this
Worksheet. The assessment quiz (TQ010) will be based on this problem.
Problem 7.1 uses only one dummy/ logical variable. The data relates to salaries of Harris
Corp. employees (SALARY) by their education (EDUCAT), prior experience (EXPER), service
(MONTHS), and gender (coded as MALES = 1 if male; = 0 if female). [For this coding,
?female? is the base or reference; the coefficient of ?MALES? will thus give us the estimate of
the difference between male and female salaries, considering the effects of the other three
variables (EDUCAT, EXPER, and MONTHS).] First, run the MLR for SALARY using all the
given independent variables, at 5% significance level. The output (given on next page) can be
used to answer all the questions (parts a through e). Normal plot of errors is ensures normality.
Observe the low R-square value. To explore any possible improvement, we will run a second regression including additional variables representing interactions between gender and each of
the other three variables. (Note that this model assumes persistent gender discrimination, even
after hiring.) While running this second MLR using the KPK macros, please check VIF?s for any
multicollinearity. The output for the first part follows:
Adjusted R Square
257476.58 t Stat
6.13 The error diagnostics part of the output (partially reproduced) should look as follows:
RESIDUAL OUTPUT PROBABILITY OUTPUT Observation
- The probability plot below verifies (visually) the normality of errors (straight line trend). SALARY Normal Probability Plot
0 20 40 60 80 Sample Percentile 100 120 As was stated earlier, we create the interaction variables, as products of MALES with each of the
other three variables, viz., EDUCAT, EXPER, and MONTHS. The formulae in the worksheet
shown below present a simple way to achieve this. (These relations may be defined in the first
row (#2) and copied easily through the spreadsheet.)
=D4*$E4 The output from this second MLR (being shown only in part) follows:
Adjusted R Square
13.41992 t Stat
3.935882 A sample of the questions you may be expected to answer (using these and similar outputs) is
The first five (5) questions in this quiz relate to the model:
SALARY = ?o + ?1* EDUCAT + ?2 *EXPER + ?3 *MONTHS + ?4 *MALES
1. What is the minimum mean SALARY one may expect in Harris Corp.?
2. Based on the error diagnostics generated, the assumptions related to the errors appear:
3. What percentage of variation in salaries is explained by the regression?
4. What is the difference between starting male and female salaries?
E. 218 5. The standard error suggests that using this model, the SALARY could be predicted within (?)
??. dollars (rounded to the nearest 100) at 95% confidence, for this data set. (Please fill in the
Using both the model outputs, the Interaction variables are to be collectively tested for their
6. What is the improvement in R2 observed, in using the Interaction model?
7. Collectively, the Interaction variables appear to be statistically ?.. in the model.
8. If we look at just the VIFs, which is the variable causing the highest (the most)
D. MALES E. None of these
9. Which two variables do you suspect will exhibit the highest pair-wise correlation?
A. MALES and EDUCAT B. EDUCAT and EDXMA
C. MALES and EDXMA
D. EXXMA and MOXMA E. INTERCEPT and MOXMA.
10. Based on the statistically better model, what salary will you predict for a male with 12 years
of EDUCATion, 6 years of EXPERience, and time from hiring = 15 MONTHS?
A. 3620 B. 3530 C. 4360 D. 5690 E. 10000 Solutions key: 1:B, 2:A, 3:C, 4:D, 5:C, 6:B, 7:B (as none is individually significant, a collective
test is superfluous), 8:D (MALES exhibits the highest VIF), 9:C (as these are the only two
variables that have high VIFs, they can only be correlated to each other: implies males are
consistently lower/ higher educated in the dataset.) 10:D (by substitution into the first equation,
the better statistical choice). Please use this link to hear an explanation of these solutions. Practice work:
For this part, please use WORKORD7. The variable BUILDING type is to be modeled using
Dummy variables (as done with GENDER, in the earlier example). Create three variables (say,
RESID, ATHLT, and ACADM; note that we will be holding Administrative buildings as the
base, with this coding). The three variables viz., RESID, ATHLT, and ACADM could be created
in columns D, E, and F (to stay adjacent to the other independent variables), their values could be
defined using the Excel function ?IF?. The function may be defined in the second row, and
copied through the spreadsheet. When coded in this manner, the data should look as:
1 As before, a possible view of the formulae version of the spreadsheet (partial) is given below: DAYS
1 Please perform the MLR for DAYS on all the other variables (Exclude BUILDING as it has been
coded using indicators). Use KPK macros with the VIF option. Part of the output is shown below
for your guidance.
65.337 t Stat
0.889 This example will be used for the preparatory quiz, PQ010. Now, KPK macros do not perform
STEPWISE procedures illustrated in Chapter 8. So, we will use the two macros Statpro and Regr
to perform the stepwise selection of variables. (Alternatively, these and more details on them
may be obtained from http://smgpublish.bu.edu/pekoz/statpro.htm). Please save the routines and
open them in Excel; open the dataset MEDICORP8, and follow through add ins > Regressions>
Stepwise etc. choosing default settings where applicable, and get the following output.
Results of stepwise regression for SALES
Step 1 - Entering variable: ADV
StErr of Est
2.7721 Std Err
2.1941 Regression coefficients
3.3501 Step 2 - Entering variable: BONUS 0.9246
-10.6688 % Change
1.8562 Std Err
0.3719 Summary measures
StErr of Est
Unexplained Regression coefficients
3.3405 Note that these results match those in Figures 8.7 (Text page 323, using MINITAB) and 8.10
(Text page 326 and 327, using SAS). You are expected to have an appreciation for the
FORWARD SELECTION, BACKWARD ELIMINATION, and their combination viz., the
STEPWISE procedures, though you will not be tested/quizzed on these. Please run the
FORWARD and BACKWARD procedures on this dataset, in order to have a feel for their
Please have a copy of the results for the MLR of DAYS using WORKORD7 data, with your
answers to problem 7.7, as you proceed to take PQ010. The Lead:
Analysis of variance that we have seen here is a more versatile procedure. In the next Module,
we will extend its application to more general settings: introducing the General Linear Modeling
procedure. Competency assessment:
Please use the data provided in the file named BIGTEX7 (problem 7.10, Text page 302). Run a regression
for SALARY on GENDER. Note that Gender appears to make significant difference to Salary. (Partial
outputs are given here only for your guidance and uniformity of the analysis. Please obtain the full results
on your own.)
18.15942 t Stat
0.000102 But this is before accounting for the other (likely, more significant) variables such as Education and
Service life. As Education is correlated to Position (R=0.8, even with ordinal data; verify using
?Correlation? from Data Analysis in Excel: also note that Graduate degree holder does not perform
Manual labor etc.) we need to use only one of EDUCAT and POSITION. So, for the second step, we
code in and use Positions 1- 4 as 4 Dummy variables (Position 5 as base) and along with YEARS and
GENDER run the second regression. This time we note only the Position variables are significant.
18.58968 t Stat
0.007081 So, for the third model we use only the 4 significant Position variables.
27.63752 t Stat
1.35E-05 As we look for the significance of Gender, we run a fourth model with Position variables and GENDER.
22.23095 t Stat
1.2E-05 With these four MLR?s, we should be able to answer the ?Gender Discrimination? question posed in the
problem. The Module?s competency assessment quiz will use this problem and the full regression outputs.
Please note that this Module utilizes/ derives from all previous work, especially the nested models and the
ability to test for the significance of a few selected predictor variables, be such variables quantitative or
qualitative. Please take the module assessment quiz (TQ010):
The quiz covers the module material, and should be taken on the Date/ time-window posted in the
syllabus. Please be warned that this quiz may NOT be identical, but just at the same level of
understanding as PQ010 and the earlier ones. Please take special note that ?Incremental models?
approach is used in testing for GENDER and YEARS in presence of the position variables, together and
individually. This file may be printed using this link.
This question was answered on: Sep 18, 2020
Buy this answer for only: $15
This attachment is locked
We have a ready expert answer for this paper which you can use for in-depth understanding, research editing or paraphrasing. You can buy it or order for a fresh, original and plagiarism-free copy from our tutoring website www.aceyourhomework.com (Deadline assured. Flexible pricing. TurnItIn Report provided)
Pay using PayPal (No PayPal account Required) or your credit card . All your purchases are securely protected by .
About this QuestionSTATUS
Sep 18, 2020EXPERT
GET INSTANT HELP/h4>
We have top-notch tutors who can do your essay/homework for you at a reasonable cost and then you can simply use that essay as a template to build your own arguments.
You can also use these solutions:
- As a reference for in-depth understanding of the subject.
- As a source of ideas / reasoning for your own research (if properly referenced)
- For editing and paraphrasing (check your institution's definition of plagiarism and recommended paraphrase).
NEW ASSIGNMENT HELP?
Order New Solution. Quick Turnaround
Click on the button below in order to Order for a New, Original and High-Quality Essay Solutions. New orders are original solutions and precise to your writing instruction requirements. Place a New Order using the button below.
WE GUARANTEE, THAT YOUR PAPER WILL BE WRITTEN FROM SCRATCH AND WITHIN YOUR SET DEADLINE.