Dummy Variables

Some potential explanatory variables are categorical and cannot be measured on a quantitative scale.

However, we often need to use these variables because they are related to the response variable. The trick is to create dummy variables, also called indicator or 0-1 variables. These are variables that indicate the category a given observation is in.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Dummy Variables -- continued

To create dummy variables we can use an IF statement or we can use StatPro’s Dummy variable procedure.

The Dummy variable procedure is usually easier particularly when there are multiple categories. Once the dummy variables are created, we can combine the variables if we like by simply adding the columns to get the dummy for the new category.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis

In this example we create dummy variables for Gender, and EducLev. Then we can run a regression analysis with Salary as the response variable, using any combination of numerical and dummy explanatory variables. We must follow two rules:

– We shouldn’t use any of the original categorical variables that the dummies are based on. – We should use one less dummy than the number of categories for any categorical variable.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

This second rule is a technical one. If we violate it the software will give us an error message. For example, Ed_1-Ed_6, any five of these variables can be used. The omitted dummy then corresponds to the reference category. As we will see the interpretation of the dummy variable coefficients are all relevant to this reference category. To get used to dummy variables in regression analysis we will proceed in several stages.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

We first estimate a regression equation with only one variable. The output is shown in this table. The resulting equation is Predicated Salary = 45.505 - 8.26Female

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

To interpret this equation recall that Female has only two possible values, 0 and 1. If we substitute 1 then the predicted salary equals 37.209 and if we substitute 0 the predicated salary is 45.505. These are the average salaries of females and males. Therefore the interpretation of the -8.926 coefficient of the Female dummy variable is straightforward.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The above equation only tells part of the story, it ignores all information except for gender.

We expand this equation by adding the experience variables. The output is shown in this table.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The corresponding equation is

Predicted Salary = 35.492 + 0.998YrsExper + 0.131YrsPrior - 8.080Female

It is useful to write two separate equations, one for females and one for males

Predicted Salary = 27.412 + 0.988YrsExper + 0.131YrsPrior Predicted Salary = 35.492 + 0.988YrsExper + 0.131YrsPrior

We interpret the coefficient -8.080 of the Female dummy variable as the average salary disadvantage for females relative to males after controlling for job experience. But there is still more story to tell.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

We next add job grade to the equation by including five of the six job grade dummies. Although any five can be use we use Job_2-Job_6. The resulting output is shown in this table.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The estimated regression equations is now

Predicated Salary=30.230 + 0.408YrsExper + 0.149YrsPrior - 1.962Female + 2.57Job_2 + 6.295Job_3 + 10.475Job_4 +16.011Job_5 + 27.647Job_6

There are no two categorical variables involved, gender and job grade. However, we can still write a separate equation for any combination of categories by setting the dummies to the appropriate values.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

For example, the equation for females at the fifth job grade is found by setting Female=1 and Job_5=1 and setting the other job dummies equal to 0. The equation formed is

PredictedSalary = 44.279 + 0.408YrsExper + 0.150YrsPrior

We interpret this equation as follows:

– For either gender and any job grade, the expected increase is salary for one extra year of experience with Fifth National is $408; the expected salary increase for one year experience with another bank is $149.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

– The coefficients of the job dummies indicate the average increase in salary an employee can expect relative to the reference (lowest) job grade. – The key coefficient, the negative $1962 for females indicates the average salary disadvantage for females relative to males, given that they have the same experience levels and are in the same job grade

Although the “penalty” is still substantial, it is less than a fourth of the penalty we saw before.

It appears that females might be getting paid less on average partly because they are in the lower job categories.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

We can check whether females are disproportionately in the lower job categories by using a pivot table with JobGrade in the row area, Gender in the column area and the count (expressed as a percentage) of any variable in the data area.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

Clearly, females tend to be concentrated at the lower job grades.

This certainly helps to explain why females get lower salaries on average, but it doesn’t explain why females are at the lower job grades in the first place. We won’t be able to provide a thorough analysis of this issue but we can add one more piece to the puzzle now by adding education level, age, and PCJob to the equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

We don’t provide the whole equation but the resulting output is shown here.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The coefficients can be seen in the output.

It doesn’t appear to add much to the previous equation. The “penalty” does, however, go up to $2555, which is slightly greater than the $1962. At face value we can interpret the coefficients of the education dummies as a benefit (or loss if negative) of extra education relative to a high school diploma, the reference category.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The coefficient of PCJob implies that an employee with a computer-related job can expect an extra $4923 in salary relative to an employee without a computer-related job, provided the other variables are the same for each employee.

The age coefficient is quite small and has little effect on salary.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Conclusion

The main conclusion we can draw from the output is that there is still a plausible case to be made for discrimination against females, even after including information on all the variables in the database in the regression equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

BANK.XLS

The Fifth National Bank of Springfield is facing a gender-discrimination suit. The charge is that its female employees receive substantially smaller salaries than its male employees. The bank’s employee database is listed in this file. Here is a partial list of the data.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Question

Earlier we estimated an equation for Salary suing the numerical explanatory variables YrsExper and YrsPrior and the dummy variable Female.

If we drop the YrsPrior variable from the equation (for simplicity) and rerun the regression, we obtain the equation

Predicted Salary = 35.824 + 0.981YrsExper - 8.012Female

The R2 value for this equation is 49.1%. If we decide to include an interaction variable between YrsExper and Female in this equation, what is the effect?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Interaction Terms

An interaction variable algebraically is the product of two variables. Its effect is to allow the effect of one of the variables on Y to depend on the value of the other variable. The interaction term allows the slope of the regression line to differ between the two categories.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

We first need to form an interaction variable that is the product of YrsExper and Female. This can be done two ways in Excel.

– we can do it manually by introducing a new variable that contains the product of the two variables involved, or – we can use the StatPro/Data Utilities/Create Interaction Variable menu item.

Using the latter way we must select Female and YrsExper as the variables, and we do not check either of the boxes in the dialog box -- neither should be a categorical variable.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

Once the interaction variable has been created, we include it in the regression equation in addition to the other variables. The multiple regression output is shown here.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The estimated regression equation is

Predicated Salary = 30.430 + 1.528YrsExper + 4.908Female - 1.248YrsExper_Female

As we discussed before it is useful to write this equation as two separate equations, one for females and one for males. The female equation is

Predicated Salary = 34.528 + 0.280YrsExper

and the male equation is

Predicated Salary = 30.430 + 1.528YrsExper

Next we can show these equations graphically.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Nonparallel Female and Male Salary Lines

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The Y-intercept for the female line is slightly higher females with no experience at Fifth National Bank tend to start out slightly higher than males - but the slope of the female line is much lower. That is, males tend to move up the salary ladder much more quickly than females.

Again, this provides another argument, although a somewhat different one, for gender discrimination against females.

The R2 value increased from 49.1% to 63.9%. The interaction variable has definitely added to the explanatory power of the equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

BANK.XLS

The Fifth National Bank of Springfield is facing a gender-discrimination suit. The charge is that its female employees receive substantially smaller salaries than its male employees. The bank’s employee database is listed in this file. Here is a partial list of the data.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Question

A glance at the distribution of salaries of the 208 employees shows some skewness to the right - a few employees make substantially more than the majority of employees. Therefore, it might make sense to use the natural logarithm of Salary instead of Salary as the response variable.

If we do this, how do we interpret the results?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

All of the analyses we did earlier with this data set could be repeated except with Log_Salary as the response variable.

For the sake of discussion we will look only at the regression equation with Female and YrsExper as explanatory variables. After we create the Log_Salary variable and run the regression, we obtain the output shown here.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Output with Log_Salary as Response Variable

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

The estimated regression equation is

Predicted Log_Salary = 3.5829 +0.0188YrsExper - 0.1616 Female

The R2 and se values are 42.4% and 0.1794. For comparison with Salary these were 49.1% and 8.070. We first interpret that neither of these values are directly comparable to the Salary values. The two R2 values are percentages explained of different response variables, Log_Salary and Salary. The fact that one is smaller does not mean a “worse” fit. They simply aren’t comparable.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The situation for se is even worse. Each se is a measure of a typical residual, but the residuals in the Log_Salary equation are in log dollars, whereas the residuals in the Salary equation are in dollars. Therefore it is no surprise that the Log_Salary is much smaller than the se for the Salary equation. If we want comparable standard error measures for the two equations, we should take antilogs of the fitted values from the Log_Salary equation to convert them back to dollars, subtract these from the original Salary values, and take the standard deviation of these residuals.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The resulting standard deviation is 7.74. This is somewhat smaller than the se from the Salary equation, an indication of a slightly better fit.

Finally we interpret the equation itself. When the response variable is Log_Y and a term on the right hand side of the equation is of the form bX, then whenever X increases by one unit Y-hat changes by a constant percentage, and this percentage is approximately equal to b (written as a percentage).

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

This means that for each year of experience with Fifth National, an employees salary can be expected to increase 1.88%.

The Female expected percentage decrease in salary is 16.16%. In other words this equation implies that females can expect to make about 16% less than men for comparable years of experience.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

POWER.XLS

The Public Service Electric Company produces different quantities of electricity each month, depending on the demand.

This file lists the number of units of electricity produced (Units) and the total cost of producing these (Cost) for a 36-month period. The data set appears on the next slide.

How can regression be used to analyze the relationship between Cost and Units?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Data for Electric Power

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

A good place to start is with a scatterplot of Cost versus Units.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The scatterplot indicates a definite positive relationship and one that is nearly linear. However, there is also some evidence of curvature in the plot. The points increase slightly less rapidly as Units increase from left to right. In economic terms, there may be economics of scale, where marginal cost of the electricity decreases as more units of electricity are produced. Nevertheless, we use regression to estimate a linear relationship between Cost and Units.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The resulting regression equation is

Predicted Cost = 23,651 + 30.53 Units

The corresponding R2 and se are 73.6% and $2734. We also requested a scatterplot of the residuals versus the fitted values. The scatterplot is on the next slide. Obtaining this scatterplot is always a good idea if nonlinearity is suspected. The sign of nonlinearity in this plot is that the residuals to the far left and the far right are all negative, whereas the majority of the residuals in the middle are positive.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Residuals from a Straight-Line Fit

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

Admittedly the pattern is far from perfect - there are a few negatives in the middle - but the plot does hint at nonlinear behavior.

The negative-positive-negative behavior of the residuals suggests a parabola; that is, a quadratic equation with the square of Units included in the equation. We first create a new variable Sqr_Units in the data set. This can be done manually or using StatPro’s Transform Variables menu item.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

Then we use multiple regression to estimate the equation for Cost with both explanatory variables, Units and Sqr_Units, included.

The resulting equation from the output on the next slide is

Predicated Cost = 5793 +98.3Units - 0.0600Sqr_Units

Note that R2 has increase to 82.2% and se has decreased to $2281.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Output with Squared Term Included

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

One way to see how this regression equation fits the scatterplot of Costs versus Units is to use Excel’s trendline option. To do so activate the scatterplot, click on any point and use the Chart/Add Trendline menu item, click the Type tab and select the Polynormal type or order 2, that is a quadratic. A graph of the equation is superimposed on the scatterplot on the following slide. It shows reasonably good fit, plus an obvious curvature.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Quadratic Fit Scatterplot

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The main downside to a quadratic regression equation is that there is no easy interpretation of the coefficients of Units and Sqr_Units. All we can say is that the terms in the equation combine to explain the nonlinear relationship between units produced and total cost. A final note about the equation concerns the coefficient of Sqr_Units.

– First, the fact that it is a negative make the parabola bend downward. This produces the decreasing marginal cost behavior, where every extra unit of electricity incurs a smaller cost.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

– Second, we shouldn’t be fooled by the small magnitude of this coefficient. Remember that it is the coefficient of Units squared, which is a large quantity. Therefore, the effect of the product -0.0600Sqr_Units is sizable.

One other possibility we might examine is a logarithmic fit. In this case we create a new variable Log_Units, the natural logarithm of Units, and then regress Cost against the single variable Log_Units.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

To create the new variable we can again use StatPro’s Transform Variable menu item and then we can superimpose a logarithmic curve on the scatterplot of Cost versus Units by using the trendline feature.

This curve appears in the scatterplot on the next slide. To the naked eye, it appears to be similar, and about as good a fit as the quadratic curve.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Logarithmic Fit Scatterplot

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The resulting regression equation is

Predicted Cost = -63,993 + 16,654Log_Units

The values of R2 and se are 79.8% and 2393. These latter values indicate that the logarithmic fit is not quite as good as the quadratic fit. However, the advantage of the logarithmic equation is that it is easier to interpret.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

In this case, where the log of the explanatory variable is used, we can interpret its coefficient as follows.

– Suppose Units increases by 1%, for example from 600 to 606. Then the equation implies that the expected Cost will increase approximately $166.54. – In words, every 1% increase in Units is accompanied by an expected $166.54 increase in Cost. – Note that for larger values of Units, a 1% increase represents a larger absolute increase. But each such 1% increase entails the same increase in Cost. This is another way of describing the decreasing marginal cost property.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

CARDEMAND.XLS

This file contains annual data (1970-1987) on domestic auto sales in the United States. The data set is shown here on the next slide. The variables are defined as

– Quantity: annual domestic auto sales (in number of units) – Price: real price index of new cars – Income: real disposable income – Interest: prime rate of interest

Estimate and interpret a multiplicative (constant elasticity) relationship between Quantity and Price, Income and Interest.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Car Demand Data

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Constant Elasticity Relationships

A particular type of nonlinear relationship that has firm grounding in economic theory is called a constant elasticity relationship. It is also called a multiplicative relationship. One property of this type of relationship is that the effect of a change on any explanatory variable Xi on Y depends on the levels of the other X’s in the equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

We first take the natural logs of all four variables.

– This can be done in one step using the Transform Variables menu item or we can use Excel’s LN function.

We then use multiple regression, with Log_Quantity as the response variable and Log_Price, Log_Income, and Log_Interest as the explanatory variables. The resulting output is shown on the next slide and the corresponding equation

Predicted Log_Quantity = 4.675 - 1.185Log_Price + 2.183Log_Income - 0.19Log_Interest

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Output for Multiplicative Relationship

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

If we like we can convert this back to the original variables, that is back to multiplicative form, by taking antilogs. The result is

Predicted Quantity = 107.198Price-1.185Income2.183Interest0.191

In either form the equation implies that the elasticities are approximately equal to -1.185, 2.183 and -0.191. When Price increases by 1%, Quantity tends to decrease by about 1.185%; when Income increases by 1%, Quantity tends to increase by about 2.183%; and when Interest increases by 1%, Quantity tends to decrease by about 0.191%.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Conclusions

Does this multiplicative equation provide a better fit to the automobile data than does an additive relationship?

Without doing considerable more work it is difficult to answer this questions with certainty. As we discussed previously, it is not sufficient to compare R2 and se values for the two fits.

We will simply state that the multiplicative relationship provides a reasonably good fit, and it makes sense economically.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

LEARNING.XLS

The Presario Company produces a variety of small industrial products. It has just finished producing 22 batches of a new product (new to Presario) for a customer.

This file contains the times (in hours) to produce each batch. These data are in the table on the next slide.

Clearly, the times have tended to decrease as Presario has gained more experience in making the product.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Data for Learning Curve

Does the multiplicative learning model apply to these data, and what does it imply about the learning rate?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Learning Curve Model

A final example of a multiplicative relationship is the learning curve model.

A learning curve relates the unit production time (or cost) to the cumulative volume of output since that production process first began. Empirical studies indicate that production times tend to decrease by a relatively constant percentage every time cumulative output doubles. The constant percentage is called the learning rate.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

One way to check whether the multiplicative learning model is reasonable is to create the log variables Log_time and Log_batch in the usual way and then see whether a scatterplot of Log_Time versus Log_Batch is approximately linear.

The multiplicative model implies that it should be. Such a scatterplot is shown on the next slide, along with a superimposed linear trend line. The fit appears to be quite good.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Scatterplot of Log Variables with Linear Trend Superimposed

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

To estimate the relationship, we regress Log_Time on Log_Batch. The resulting equation is

Predicated Log_Time = 4.834 - 0.155Log_Batch

There are a couple of ways of interpreting this equation.

– First, because it is based on a multiplicative relationship, we can interpret the coefficient -0.155 as an elasticity. That is when Batch increases by 1%, Time tends to decrease by approximately 0.155%. Although this is correct it is not as “useful” as the “doubling” interpretation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

– We know that the estimated learning rate satisfies -0.155 = ln(learning rate/ln(2) Solving for the learning rate (multiply through by ln(2)) and then take antilogs, we find that it is 0.898, or approximately 90%. In other words, whenever cumulative production doubles, the time to produce a batch decreases by about 10%.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Predicting Future Production Times

Presario could use this regression equation to predict future production times.

For example, suppose the customer places an order for 15 more batches of the same product. We can use the equation to predict the log of production time for each batch, then take their antilogs and sum them to obtain the total production time. The calculations are shown in rows 26-42 of the following table. The total predicted time to finish is about 1115 hours.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Using the Learning Curve Model for Predications

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Some potential explanatory variables are categorical and cannot be measured on a quantitative scale.

However, we often need to use these variables because they are related to the response variable. The trick is to create dummy variables, also called indicator or 0-1 variables. These are variables that indicate the category a given observation is in.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Dummy Variables -- continued

To create dummy variables we can use an IF statement or we can use StatPro’s Dummy variable procedure.

The Dummy variable procedure is usually easier particularly when there are multiple categories. Once the dummy variables are created, we can combine the variables if we like by simply adding the columns to get the dummy for the new category.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis

In this example we create dummy variables for Gender, and EducLev. Then we can run a regression analysis with Salary as the response variable, using any combination of numerical and dummy explanatory variables. We must follow two rules:

– We shouldn’t use any of the original categorical variables that the dummies are based on. – We should use one less dummy than the number of categories for any categorical variable.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

This second rule is a technical one. If we violate it the software will give us an error message. For example, Ed_1-Ed_6, any five of these variables can be used. The omitted dummy then corresponds to the reference category. As we will see the interpretation of the dummy variable coefficients are all relevant to this reference category. To get used to dummy variables in regression analysis we will proceed in several stages.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

We first estimate a regression equation with only one variable. The output is shown in this table. The resulting equation is Predicated Salary = 45.505 - 8.26Female

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

To interpret this equation recall that Female has only two possible values, 0 and 1. If we substitute 1 then the predicted salary equals 37.209 and if we substitute 0 the predicated salary is 45.505. These are the average salaries of females and males. Therefore the interpretation of the -8.926 coefficient of the Female dummy variable is straightforward.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The above equation only tells part of the story, it ignores all information except for gender.

We expand this equation by adding the experience variables. The output is shown in this table.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The corresponding equation is

Predicted Salary = 35.492 + 0.998YrsExper + 0.131YrsPrior - 8.080Female

It is useful to write two separate equations, one for females and one for males

Predicted Salary = 27.412 + 0.988YrsExper + 0.131YrsPrior Predicted Salary = 35.492 + 0.988YrsExper + 0.131YrsPrior

We interpret the coefficient -8.080 of the Female dummy variable as the average salary disadvantage for females relative to males after controlling for job experience. But there is still more story to tell.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

We next add job grade to the equation by including five of the six job grade dummies. Although any five can be use we use Job_2-Job_6. The resulting output is shown in this table.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The estimated regression equations is now

Predicated Salary=30.230 + 0.408YrsExper + 0.149YrsPrior - 1.962Female + 2.57Job_2 + 6.295Job_3 + 10.475Job_4 +16.011Job_5 + 27.647Job_6

There are no two categorical variables involved, gender and job grade. However, we can still write a separate equation for any combination of categories by setting the dummies to the appropriate values.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

For example, the equation for females at the fifth job grade is found by setting Female=1 and Job_5=1 and setting the other job dummies equal to 0. The equation formed is

PredictedSalary = 44.279 + 0.408YrsExper + 0.150YrsPrior

We interpret this equation as follows:

– For either gender and any job grade, the expected increase is salary for one extra year of experience with Fifth National is $408; the expected salary increase for one year experience with another bank is $149.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

– The coefficients of the job dummies indicate the average increase in salary an employee can expect relative to the reference (lowest) job grade. – The key coefficient, the negative $1962 for females indicates the average salary disadvantage for females relative to males, given that they have the same experience levels and are in the same job grade

Although the “penalty” is still substantial, it is less than a fourth of the penalty we saw before.

It appears that females might be getting paid less on average partly because they are in the lower job categories.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

We can check whether females are disproportionately in the lower job categories by using a pivot table with JobGrade in the row area, Gender in the column area and the count (expressed as a percentage) of any variable in the data area.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

Clearly, females tend to be concentrated at the lower job grades.

This certainly helps to explain why females get lower salaries on average, but it doesn’t explain why females are at the lower job grades in the first place. We won’t be able to provide a thorough analysis of this issue but we can add one more piece to the puzzle now by adding education level, age, and PCJob to the equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

We don’t provide the whole equation but the resulting output is shown here.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The coefficients can be seen in the output.

It doesn’t appear to add much to the previous equation. The “penalty” does, however, go up to $2555, which is slightly greater than the $1962. At face value we can interpret the coefficients of the education dummies as a benefit (or loss if negative) of extra education relative to a high school diploma, the reference category.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Analysis -- continued

The coefficient of PCJob implies that an employee with a computer-related job can expect an extra $4923 in salary relative to an employee without a computer-related job, provided the other variables are the same for each employee.

The age coefficient is quite small and has little effect on salary.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Conclusion

The main conclusion we can draw from the output is that there is still a plausible case to be made for discrimination against females, even after including information on all the variables in the database in the regression equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

BANK.XLS

The Fifth National Bank of Springfield is facing a gender-discrimination suit. The charge is that its female employees receive substantially smaller salaries than its male employees. The bank’s employee database is listed in this file. Here is a partial list of the data.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Question

Earlier we estimated an equation for Salary suing the numerical explanatory variables YrsExper and YrsPrior and the dummy variable Female.

If we drop the YrsPrior variable from the equation (for simplicity) and rerun the regression, we obtain the equation

Predicted Salary = 35.824 + 0.981YrsExper - 8.012Female

The R2 value for this equation is 49.1%. If we decide to include an interaction variable between YrsExper and Female in this equation, what is the effect?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Interaction Terms

An interaction variable algebraically is the product of two variables. Its effect is to allow the effect of one of the variables on Y to depend on the value of the other variable. The interaction term allows the slope of the regression line to differ between the two categories.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

We first need to form an interaction variable that is the product of YrsExper and Female. This can be done two ways in Excel.

– we can do it manually by introducing a new variable that contains the product of the two variables involved, or – we can use the StatPro/Data Utilities/Create Interaction Variable menu item.

Using the latter way we must select Female and YrsExper as the variables, and we do not check either of the boxes in the dialog box -- neither should be a categorical variable.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

Once the interaction variable has been created, we include it in the regression equation in addition to the other variables. The multiple regression output is shown here.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The estimated regression equation is

Predicated Salary = 30.430 + 1.528YrsExper + 4.908Female - 1.248YrsExper_Female

As we discussed before it is useful to write this equation as two separate equations, one for females and one for males. The female equation is

Predicated Salary = 34.528 + 0.280YrsExper

and the male equation is

Predicated Salary = 30.430 + 1.528YrsExper

Next we can show these equations graphically.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Nonparallel Female and Male Salary Lines

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The Y-intercept for the female line is slightly higher females with no experience at Fifth National Bank tend to start out slightly higher than males - but the slope of the female line is much lower. That is, males tend to move up the salary ladder much more quickly than females.

Again, this provides another argument, although a somewhat different one, for gender discrimination against females.

The R2 value increased from 49.1% to 63.9%. The interaction variable has definitely added to the explanatory power of the equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

BANK.XLS

The Fifth National Bank of Springfield is facing a gender-discrimination suit. The charge is that its female employees receive substantially smaller salaries than its male employees. The bank’s employee database is listed in this file. Here is a partial list of the data.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Question

A glance at the distribution of salaries of the 208 employees shows some skewness to the right - a few employees make substantially more than the majority of employees. Therefore, it might make sense to use the natural logarithm of Salary instead of Salary as the response variable.

If we do this, how do we interpret the results?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

All of the analyses we did earlier with this data set could be repeated except with Log_Salary as the response variable.

For the sake of discussion we will look only at the regression equation with Female and YrsExper as explanatory variables. After we create the Log_Salary variable and run the regression, we obtain the output shown here.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Output with Log_Salary as Response Variable

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

The estimated regression equation is

Predicted Log_Salary = 3.5829 +0.0188YrsExper - 0.1616 Female

The R2 and se values are 42.4% and 0.1794. For comparison with Salary these were 49.1% and 8.070. We first interpret that neither of these values are directly comparable to the Salary values. The two R2 values are percentages explained of different response variables, Log_Salary and Salary. The fact that one is smaller does not mean a “worse” fit. They simply aren’t comparable.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The situation for se is even worse. Each se is a measure of a typical residual, but the residuals in the Log_Salary equation are in log dollars, whereas the residuals in the Salary equation are in dollars. Therefore it is no surprise that the Log_Salary is much smaller than the se for the Salary equation. If we want comparable standard error measures for the two equations, we should take antilogs of the fitted values from the Log_Salary equation to convert them back to dollars, subtract these from the original Salary values, and take the standard deviation of these residuals.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The resulting standard deviation is 7.74. This is somewhat smaller than the se from the Salary equation, an indication of a slightly better fit.

Finally we interpret the equation itself. When the response variable is Log_Y and a term on the right hand side of the equation is of the form bX, then whenever X increases by one unit Y-hat changes by a constant percentage, and this percentage is approximately equal to b (written as a percentage).

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

This means that for each year of experience with Fifth National, an employees salary can be expected to increase 1.88%.

The Female expected percentage decrease in salary is 16.16%. In other words this equation implies that females can expect to make about 16% less than men for comparable years of experience.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

POWER.XLS

The Public Service Electric Company produces different quantities of electricity each month, depending on the demand.

This file lists the number of units of electricity produced (Units) and the total cost of producing these (Cost) for a 36-month period. The data set appears on the next slide.

How can regression be used to analyze the relationship between Cost and Units?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Data for Electric Power

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

A good place to start is with a scatterplot of Cost versus Units.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The scatterplot indicates a definite positive relationship and one that is nearly linear. However, there is also some evidence of curvature in the plot. The points increase slightly less rapidly as Units increase from left to right. In economic terms, there may be economics of scale, where marginal cost of the electricity decreases as more units of electricity are produced. Nevertheless, we use regression to estimate a linear relationship between Cost and Units.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The resulting regression equation is

Predicted Cost = 23,651 + 30.53 Units

The corresponding R2 and se are 73.6% and $2734. We also requested a scatterplot of the residuals versus the fitted values. The scatterplot is on the next slide. Obtaining this scatterplot is always a good idea if nonlinearity is suspected. The sign of nonlinearity in this plot is that the residuals to the far left and the far right are all negative, whereas the majority of the residuals in the middle are positive.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Residuals from a Straight-Line Fit

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

Admittedly the pattern is far from perfect - there are a few negatives in the middle - but the plot does hint at nonlinear behavior.

The negative-positive-negative behavior of the residuals suggests a parabola; that is, a quadratic equation with the square of Units included in the equation. We first create a new variable Sqr_Units in the data set. This can be done manually or using StatPro’s Transform Variables menu item.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

Then we use multiple regression to estimate the equation for Cost with both explanatory variables, Units and Sqr_Units, included.

The resulting equation from the output on the next slide is

Predicated Cost = 5793 +98.3Units - 0.0600Sqr_Units

Note that R2 has increase to 82.2% and se has decreased to $2281.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Output with Squared Term Included

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

One way to see how this regression equation fits the scatterplot of Costs versus Units is to use Excel’s trendline option. To do so activate the scatterplot, click on any point and use the Chart/Add Trendline menu item, click the Type tab and select the Polynormal type or order 2, that is a quadratic. A graph of the equation is superimposed on the scatterplot on the following slide. It shows reasonably good fit, plus an obvious curvature.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Quadratic Fit Scatterplot

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The main downside to a quadratic regression equation is that there is no easy interpretation of the coefficients of Units and Sqr_Units. All we can say is that the terms in the equation combine to explain the nonlinear relationship between units produced and total cost. A final note about the equation concerns the coefficient of Sqr_Units.

– First, the fact that it is a negative make the parabola bend downward. This produces the decreasing marginal cost behavior, where every extra unit of electricity incurs a smaller cost.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

– Second, we shouldn’t be fooled by the small magnitude of this coefficient. Remember that it is the coefficient of Units squared, which is a large quantity. Therefore, the effect of the product -0.0600Sqr_Units is sizable.

One other possibility we might examine is a logarithmic fit. In this case we create a new variable Log_Units, the natural logarithm of Units, and then regress Cost against the single variable Log_Units.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

To create the new variable we can again use StatPro’s Transform Variable menu item and then we can superimpose a logarithmic curve on the scatterplot of Cost versus Units by using the trendline feature.

This curve appears in the scatterplot on the next slide. To the naked eye, it appears to be similar, and about as good a fit as the quadratic curve.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Logarithmic Fit Scatterplot

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

The resulting regression equation is

Predicted Cost = -63,993 + 16,654Log_Units

The values of R2 and se are 79.8% and 2393. These latter values indicate that the logarithmic fit is not quite as good as the quadratic fit. However, the advantage of the logarithmic equation is that it is easier to interpret.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

In this case, where the log of the explanatory variable is used, we can interpret its coefficient as follows.

– Suppose Units increases by 1%, for example from 600 to 606. Then the equation implies that the expected Cost will increase approximately $166.54. – In words, every 1% increase in Units is accompanied by an expected $166.54 increase in Cost. – Note that for larger values of Units, a 1% increase represents a larger absolute increase. But each such 1% increase entails the same increase in Cost. This is another way of describing the decreasing marginal cost property.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

CARDEMAND.XLS

This file contains annual data (1970-1987) on domestic auto sales in the United States. The data set is shown here on the next slide. The variables are defined as

– Quantity: annual domestic auto sales (in number of units) – Price: real price index of new cars – Income: real disposable income – Interest: prime rate of interest

Estimate and interpret a multiplicative (constant elasticity) relationship between Quantity and Price, Income and Interest.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Car Demand Data

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Constant Elasticity Relationships

A particular type of nonlinear relationship that has firm grounding in economic theory is called a constant elasticity relationship. It is also called a multiplicative relationship. One property of this type of relationship is that the effect of a change on any explanatory variable Xi on Y depends on the levels of the other X’s in the equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

We first take the natural logs of all four variables.

– This can be done in one step using the Transform Variables menu item or we can use Excel’s LN function.

We then use multiple regression, with Log_Quantity as the response variable and Log_Price, Log_Income, and Log_Interest as the explanatory variables. The resulting output is shown on the next slide and the corresponding equation

Predicted Log_Quantity = 4.675 - 1.185Log_Price + 2.183Log_Income - 0.19Log_Interest

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Regression Output for Multiplicative Relationship

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

If we like we can convert this back to the original variables, that is back to multiplicative form, by taking antilogs. The result is

Predicted Quantity = 107.198Price-1.185Income2.183Interest0.191

In either form the equation implies that the elasticities are approximately equal to -1.185, 2.183 and -0.191. When Price increases by 1%, Quantity tends to decrease by about 1.185%; when Income increases by 1%, Quantity tends to increase by about 2.183%; and when Interest increases by 1%, Quantity tends to decrease by about 0.191%.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Conclusions

Does this multiplicative equation provide a better fit to the automobile data than does an additive relationship?

Without doing considerable more work it is difficult to answer this questions with certainty. As we discussed previously, it is not sufficient to compare R2 and se values for the two fits.

We will simply state that the multiplicative relationship provides a reasonably good fit, and it makes sense economically.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Modeling Possibilities

LEARNING.XLS

The Presario Company produces a variety of small industrial products. It has just finished producing 22 batches of a new product (new to Presario) for a customer.

This file contains the times (in hours) to produce each batch. These data are in the table on the next slide.

Clearly, the times have tended to decrease as Presario has gained more experience in making the product.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Data for Learning Curve

Does the multiplicative learning model apply to these data, and what does it imply about the learning rate?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Learning Curve Model

A final example of a multiplicative relationship is the learning curve model.

A learning curve relates the unit production time (or cost) to the cumulative volume of output since that production process first began. Empirical studies indicate that production times tend to decrease by a relatively constant percentage every time cumulative output doubles. The constant percentage is called the learning rate.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution

One way to check whether the multiplicative learning model is reasonable is to create the log variables Log_time and Log_batch in the usual way and then see whether a scatterplot of Log_Time versus Log_Batch is approximately linear.

The multiplicative model implies that it should be. Such a scatterplot is shown on the next slide, along with a superimposed linear trend line. The fit appears to be quite good.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Scatterplot of Log Variables with Linear Trend Superimposed

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

To estimate the relationship, we regress Log_Time on Log_Batch. The resulting equation is

Predicated Log_Time = 4.834 - 0.155Log_Batch

There are a couple of ways of interpreting this equation.

– First, because it is based on a multiplicative relationship, we can interpret the coefficient -0.155 as an elasticity. That is when Batch increases by 1%, Time tends to decrease by approximately 0.155%. Although this is correct it is not as “useful” as the “doubling” interpretation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Solution -- continued

– We know that the estimated learning rate satisfies -0.155 = ln(learning rate/ln(2) Solving for the learning rate (multiply through by ln(2)) and then take antilogs, we find that it is 0.898, or approximately 90%. In other words, whenever cumulative production doubles, the time to produce a batch decreases by about 10%.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Predicting Future Production Times

Presario could use this regression equation to predict future production times.

For example, suppose the customer places an order for 15 more batches of the same product. We can use the equation to predict the log of production time for each batch, then take their antilogs and sum them to obtain the total production time. The calculations are shown in rows 26-42 of the following table. The total predicted time to finish is about 1115 hours.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

Using the Learning Curve Model for Predications

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6