Unit 8

Published on May 2016 | Categories: Documents | Downloads: 61 | Comments: 0 | Views: 567

of 16

Content

UNIT 8
Structure
8.0 8.1 8.2

CORRELATION AND REGRESSION ANALYSIS

Objectives Introduction Coalation
8.2.1 Concept 8.2.2 Correlation and Independence 82.3 Nonsense Correlation

8.3

Regression
8.3.1 Concept 8.3.2 Correlation and Regression 8.3.3 Simple Regression 8.3.4 Multiple Regression
,

8.4

Types of Data

8.6

Exercises

8.7 Key Words
8.8

Some Usefd Books

8.9 Answers or Hints to Check Your Progress
8.1 0 Answers or Hints toY3xercises
I

8.0 OBJECTIVES
\

After going through this unit,you wlbe able to: i l refresh the concept of linear correlation;

.

state(thatzero correlation does not imply that the variables are independent; on the other hand, independence of variables imply zero correlation; appreciate tJle fact that t h i presence of a high degree of correlation does not necessarily amount to the existence of a meaningful relationship among the variables under consideration; explain between the concept of correlation and that of regression; refresh the method of least square in connection with two variable r&ssiop; distinguish between direct r e g r e s s i m r e v e r s e regression; understand how the approach to multiple regression is an extension of thk approach followed in two variable regression and know about various types of data that can be used'in regression analysis.

Quantitative hlethods-l

8.1 INTRODUCTION
Quantitative techniques are important tools of analysis in today's research in Economics. These tools can be broadly divided into two classes: mathematical tools and statistical tools. Econonlic research is often concerned with theorizing of some economic phenomenon. Different mathematical tools are employed to express such a theory in a precise mathematical form. This mathematical form of economic theory is what is generally called a inathematicalmodel. A major purpose of the formulation of a mathematical model is to subject it to further mathematical treatment to gain a deeper understanding of the economic phenomenon that the researcher may be priniarily interested in. However, the theoq so developed needs to be tested in the real-world situation.In other words, the usefulness of a mathematical model depends on its empirical verification. Thus, in economic research, often a researcher is hardpressed to put the mathematical model in such a form that it can render itself to empirical verification. For h s purpose, various statisticaltechniqueshave been found to be extremely useful. We should note here, that often such techniques have been appropriately modified to suit the purposes of the economists. Consequently, a very rich and powerful area of economic analysis known as Econometrics has grown over the years. We may provide a working definition of Econometricshere. It may be described as the application of statistical tools in the quantitative analysis of economic phenomena. We may mention here that econometricianshave not only provided important tools for economic analysis but also their contributions have significantly enriched the subject matter of Statistical Science in general. Today, no researcher can possibly ignore the need for being familiar with econometrictools for the purpose of serious empirical economic analysis. In the subsequentunits, you will learn about regression models of econometric analysis. The concepts of correlation and regression form the core of regression models. You are already familiar with these two concepts, as you have studied them in the compulsory course on QuantitativeMethods (MEC-003). In this unit we are going to put the two concepts in the perspectiveof en~pirical research in Economics. Here, our emphasiswill be on examininghow the applications of these two concepts are important in studyingthe possibility of relationshipthat may exist among economic variables.

8.2 CORRELATION
8.2.1 Concept
In the introductionto this chapter, we have already referred to a mathematical model of some real-world observable economic phenomenon. In general, a model consists of some hctional relationships, some equations, some identitiesand someconstraints. Once, a model like this is formulated, the next issue is to examine how this model works in the real-world situation, for example, in India. This is what is known as the estimation of an econometric model. It may be mentioned here that Lawrence Klein did some pioneering work in the formulation and estimation of suchmodels. In fact, many complex econometric models consisting of hundreds of functions, equations, identities and constraintshave been constructed and estimated for differenteconomies of the world, including India, by using empirical data. The estimation of such complete macro-econometric models, however, involves certain issues that are beyond our scope. As a result, we shall abstract from such kind of a model and focus on a single equation economic relationship and consider itsempirical verification For example,intheKeynesianmodel of income detemimtion, consumption function plays a pivotal role. The essence of this relationship is that

consumptiondepends on income. We may specifl a simpleconsumptionfunctionin the form of a linear equation with two constraints: one, autonomous part of consumptionbeing positive and two, marginal propensity to consume being more than zero but less than on.. Thus, our consumption equation is

Correlation and Regression Analysis

This kind of a single equation and its estimationis commonly known as the'regression model in the econometric literature. It may be mentioned here that such a singleequation regression model need not be a part of any econometricmodel and can be a mathematical formulationof some independentlyobserved economicphenomenon. Any scientificinquiry has to be conducted systematically,and, economic inquiry is no exception. In the case of our regression model involving consumption and incame, for example,aprelirninarystep may be to examine, whether inthe real-world situation, t there exists any relationshipbetween consumptionand income a all. This is precisely t what we attempt at with the help of the concept of correlation.Thus, a the moment, we are not concerned with the issue of dependenceof consumption on income or vice-versa. We are simply interested in the possible co-movement of the two variables. We shall focus on the differencebetween correlationand regression later. Correlation can be defined as a quantitativemeasure of the degree or strength of relationshipthat may exist between two variables. You are already familiar with the concept of Karl Pearson's coefficient of correlation. IfXand Yare two variables, we know that this correlation coefficient is given by the ratio of q e covariance between X and Y to the product of the standard deviation of X and that of Y. In symbols:

-

'

The symbolshave usual meaning. Here, the covariance in the numerator is important. This in fact, gives a measure of the simultaneous change in the two uariables. It is divided by product of the standard deviation of X and Y to make the measure f k e of any unit in order to facilitate a comparison between more than one set of bi-variate data which may be expressed in different units. It may be noted here that this measure! of correlatibn coefficient is independent of a shift in the origin and a change of scale. The correlation coefficient lies between +1 and -1. In symbol :

If the two variables tend to move in the same direction, the correlation coefficient is positive. In the event ofthe two variablestento move iri the opjpositedirections, the correlation coefficient assumes a negative value. In the case of a perfect correlationship, the correlation coefficient is either +1 or -1, which is almost impossible in economics. When there does not seem to be any relationshipbetween the two variables on the basis of the available data, the correlation coefficientmay assume a value equal to zero. *

It should be noted here that Karl Pearson's correlation coefficient measures linear correlationship between two variables. This means that there exists a proportional relationship between the two variables i.e., the two variables change in a fixed proportion. For example, we may find that the correlation coefficient between

Vuantitallve Methods-'

disposable income and personal consunlp~ion expenditure in India on the basis of some national income data is 0.7. It only means that consumption in relation to income or income in relation to consumption changes by a factor of 0.7. We again stress here that at the moment we are not commenting on whether income is the independent variable and consumption is the dependent variable or it is the other way round. It is important he^ to comment on what is known as coefficientof determination.Although iVis numerically equal to the square of the correlation coefficient, conceptually it is quite different from the correlation coefficient. We shall discuss this concept in details in the next unit.

Example 8.1
If three uncorrelated variables x,, x2 and x3 have the same standard deviation, find the correlation coefficient between x, + x2 and x2 + x3. Supposeu = x, + x, and similarly,~ x, + x,.Then,wegave to find r,,. Let a,,a, and a, = bc the standard deviationsof x, ,x, and x, respectivdy. Let cov(x, ,x,), cov(x, ,x,) and cov(x, ,x; ) be covarianccs between t k pairsof the variables(x, ,x,), (x, ,x,) and (x, ,x,) respectivdy.Since,it is given thatthe variablesare uncorrelakd, we have

Let

XI, X2and X3be the means of

X,,

X2and X3 . So, respectively, we have,

Therefore, r,, = cov(U,V)- -= 0.5 -a '

a ,

2a2

8.2.2

Correlation abd Independence

We should appreciate that in the real-world situation the relationship between two variables may not be linear in nature. In fact, often variables are involved in all kinds of non-linear relationships. Thus, we should be very clear that even when Karl Pearson's correlation coejficient is found to be zero, the two variables might still be related in a non-linear y n n e r . The frequently quoted statement, "Independence of two variables implies zero correlation coefficient but the converse is not necessarily true." exemplifies this fact. Statistics and consequently Econometrics of non-linear relationships are quite involved in nature and beyond the scope of the present discussion. Consequently, at this stage, linearity should be taken as a necessary simplieing assumption. However, we shall see later that essentially a non-linear reIationship can sometimes be reduced to a linear relationshipthrough some appropriate transformation and the tools of linear analysis can still be effectivery applied to such transformed relationships. We often employ su h techniques as apractical solution to the complexities iilvolved in a non-linear relationship.

1

8.2.3 Nonsense Correlation
Sometimes, two variables, even when seem to be not related in any manner, may display a high degree of correlation. Yule called this kind oi'correlation as 'nonsense correlation'. lf we measure two variables at regular time intervals, both the variables may display a strong time-trend. As a result, the two variables may display a strong correlationshipeven when they are unrelated. Thus one should be very careful while using such a source of data. In fact, a new branch of econometrics, known as Time Series econometrics, has been developed for exclusively handling such a situation. Another situation when two, seeminglyunrelated, variables may display ahigh degree of correlation is the result ofthe influence of a third variable on both of them. Thus, the existence of a correlation between two variables does not necessarily imply a relationship between them. It only indicates that the data are not inconsistent with the possibility of such a relationship. The reasonableness of a possible relationship must be established on theoretidal considerations first, and then we should proceed with the computation of correlation. Check Your Progress 1 1) Define correlation between two variables. ii) How do you measure linear correlation between two variables? iii) Why is this measure called a measure of linear correlation?
i)

Correlation and Regression Analysis

2) Explain how two independent variables have zero correlation but the converse is not true.

3)

Does the presence of strong correlation between two variables necessarily imply the existence of a meaningful relationshipbetween them?

REGRESSION
8.3.1

Concept

The term regression literally means a backward movement. Francis Galton first used the term in the late nineteenth century. He studied the relationship between the height of parents and that of children. Galton observed that although tall parents had tall children and similarly short parents had short children in a statistical sense, but in general the children's height tended towards an average value. In other words, the children's height moved backward or regressed to the average. However, now the term regression in statistics has nothing to do with its earlier connotationof a backward movement.

Quantitative Methods-l

Regression analysis can be described as the study of the dependence of one variable on another or more vatiables. In other words, we can use it for examining the relationshipthat may exist among certain variables. For example, we may be interested in issues like how the aggregate demand for money depends upon the aggregate income level in an economy. We may employ regression techniqueto examine this. Here, Aggregate demand for money is called the dependent variable and aggregate income level is called the independent variable. Consequently, we have a simple demand for money hction. In this context, we present the following table to show some of the terms that are also used in the literature in place of dependent variable and independent variable. Table 8.1: Classifying Terms for Variables in Regression Analysis Dependent Variable Explained Variable Regressand Predictand EndogenousVariable ControlledVariable Target Variable Response Variable
Source: Maddala (2002) and Gujrati (2003).

Independent Variable ExplanatoryVariable Regressor Predictor ExogenousVariable Control Variable Control Variable StimulusVariable

It is now important to clarify that the terms dependent and independent do not necessarily imply a causal connection between the two types of variables. Thus, regression analysis per-se is not really concerned with causality analysis.Acausal connection has to be established first by some theory that is outside the parlance of the regressionanalysis. In our earlier example of consumption function and the present example of demand for money function we have theories like Keynesian income hypothesis and transaction demand for money. On the basis of such theories perhaps we can employ regression technique to get some preliminary idea of some causal connection involvingcertain variables. In fact, causality study is now a highly speckhxd branch of econometrics and goes far beyond the scope of the ordinary regression analysis. A major purpose of regression analysis is to predict the value of one variable given the value of another or more variables. Thus, we may be interested in predicting the aggregate demand of money from a given value of aggregateincome. We should be clear that by virtue of the very nature of economics and other branches of social science, the concern is a statistical relationship involving some variables rather than an exact mathematical relationship as we may obtain in natural science. Consequently,if we are able to establish some kind of a relationship between an independent variable Xand a dependent variable Y, it can be expected to give us ogy sort of an average value of Y for a given value ofX. This kind of arelationship is known as a statistical or stochastic relationship. Regression method is essentially concerned with the analysis of such kind of a stochasticrelationship. From the above discussion, it should be clear that in our context, the dependent variable is assumed to be stochasticor random. In contrast,the independent variables are taken to be non-stochastic or non-random. However, we must mention here that at an advanced level, even the independent variables are assumed to be stochastic. In the next unit, we shall discuss the stochastic nature of the regression analysis in details.

If a regression relationship has just one independent variable, it is called a two variable or simple regression. On the other hand, if we have more than one independent variable in it, then it is multiple repsion.

Correlation and Regression Analysis

8.3.2 Correlation and Regression
Earlier we made a reference ta the conceptual difference between correlation and regression. We may discuss it here. In regression analysis, we examinethe nature of the relationship between the dependent and the independent variables. Here, as stated earlier, we try to estimate the average value of one variable h m the given values of other variables. In correlation, on the other hand, our focus is on the measurement of the strength of such a relationship.Consequently,in regression, we classify the variables in two classes of dependent and independent variables. In correlation,the treatment of the variables is rather symmetric;we do not have such kind of a classification. Finally, in regression, at our level, we take the dependent variable as random or stochastic and the independent variables as non-random or fixed.In correlation, in contrast,all the variables are implicitly taken to be random in nature.

8.3.3 Simple Regression
Here, we are focusing onjust one independent variable. The first thing that we have to do is to specify h e relationship between Xand Y. Let us assume that there is a linear relationship between the two variables like:

The concept of linearity, however, requires some clarification.We are postponing that discussion to the next unit. Moreover, there can be various types of intrinsically non-linear relationships also. The treatment of suchrelationshipsis beyond our scope. Our purpose is to estimate the constants a and b from empirical observations on X and Y. Tlzr Method of Least Squares Usually, we have a sample of observations of a given size say, n. If we plot the n pairs of observations, we obtain a scatter-plot, as it is known in the literature. An example of a scatter-plot is presented below.

Fig. 8.1: Scatter-Plot

X

Quantitative Methods-I

A visual inspection of the scatter-plot makes it clear that for different values ofX, the correspondingvalues of Yare not aligned on a straight line. As we have mentioned earlier, in regression, we are concerned with an inexact or statisticalrelationship. And this is the consequence of such a relationship. Now, the constants a and b are respectively the intercept and slope of the straight line described by the abovementioned l n a equation and several straight lines with differentpairs of the values ier (a, b) can be passed through the above scatter. Our concern is the choice of a particular pair as the estimates of a and b for the regression equation under consideration. Obviously,this calls for an objective criterion.

Such a criterion is provided by the method of least squares. The philosophy behind the least squares method is that we should fit in a straight line through the scatterht plot, in such a manner t a the vertical differences between the observed values of Y and the correspondmg values obtained fiom the straight line for different values of X, called errors,are minimum.The line fitted in such a fashion is called the regression line. The values of a and b obtained fiom the regression line are taken to be the estimates of the intercept and slope (regressioncoefficient)of the regression equation. The values of Y obtained fiom regression line are called the estimated values of Y. A stylized scatter-plotwith a straight line fitted in it is presented below: The method of least square requires that we should choose our a and b in such a manner that sum of the squares of the vertical differences between the actual values or observed values of Yand the ones obtained &om the straight line is minimum. Putting mathematically,

with respect to a and b where f is called the estimated value of Y. The values of a and b so obtained are known as the least-square estimatesof a and b and are normally denoted by 2 and b^ This is a well-known minimization procedure of calculus and you must have done that in the course on QuantitativeMethods (MEC-003). You $so must haveobtained the normal equationsand solved them for obtaining 6 and b" .We are leaving that as an exercise for this unit. The earlier shown ~cat&r-~lot a regression line is with presented below:

Fig. 8.3: Scatter-Plot with the Regression Line

'

This regression line, obviously, has a negative intercept. If we recapitulate, the two normal equations that we obtained fiom the above-mentioned procedure are given by

Correlation and Regression Analysis

and

After solving the two equations simultaneouslywe obtain the least square estimates
h=

-

C

(X- X)(Y-

x

0

(X-

Xp

and

~n regression analysis,the slope coefficientassumes special significance. It measures

the rate of change of the dependentvariable with respect to the independent variable.
As a result, it is this constant that indicateswhether there exists a relationship between

X and Y or not. The regression equation

is in fact called the regression of Y on X, the slope b of this equation is termed as the regression coefficient of Y on X. It is also denoted by byx.Aglance at the expression of the regression coefficient Yon X makes it quite clear that the above expression can also be written as

Thus, putting the values of a and b, the regression equation of Y on X can be written as

Reverse Regression Suppose, in another regression relationshipXacts as the dependent variable and Y as the independent variable. Then that relationshipis called the regression ofXon Y. Here, we should dewtely avoid the temptation of expressingXin terms of Y fiom the regression equation of Y on Xto obtain that ofXon Y and trying to mechanically extract the least square estimates of its constants fiom the already known values of B and i .The regression ofXon Y is in fact intrinsically differeat fiom that of Y onX. Geometricallyspeaking, in regression ofXon Y we m n m z the sum ofthe squares , iiie of the horizontal distances as against the mhhization of the sum of the squares of the vertical distances in Yon X, for obtaining the least square estimates. If our regression equation of X on Y is given by
d

Quantitative Methods-l

then its least square estimates are given by the criterion:

<

'

with respect to a' and b' By applying the usual minimization procedure, we obtain the following two normal equations:

and

We can simultaneouslysolve these two equations to get the least square estimates

and

i = X-b' Y '

--

A

-

The slope b' of the regression of Xon Y is called the regression coefficient ofX on Y. It measures the rate of change ofXwith respect to Y in order to distinguish it , clearly from the regression coefficient of Y onX; we also use the symbol b for it. ,
Puttipg the values of a' and b', the regression equation of X on Y can be written as

To highlight the *t n diffaencebetween the two kinds of r e m i o n , the regression of Y on Xis sometirnes'termedas the direct regression and that of theXon Yis called the reverse regression. Maddala (2002) gives an example of direct regression and reverse regression in connection with the issue of gender bias in the offer of emoluments. Let us assume that the variable Xrepresents qualifications and the variable Yrepresents emoluments. We may be interested in finding whether males and females with the1 same qualifications receive the same emoluments or not. We may examine this by running the direct regression of Y onX. Alternatively,we may be curious about if males and females with the same emoluments possess the same qualifications or not. We may investigate into this by running the reverse regression ofXon Y. Thus, it is perhaps valid to run both the regressions in order to have a clear insight into the question of gender bias in emoluments.

Properties
Let us now briefly consider some of the properties of the regression.

1)

The product of the two regression coefficients is always equal to the square of the correlation coefficient:

Correlation and Regression Analysis

2)

The two regression coefficients have the same sign. In fact, the sign of the two coefficients dependsupon the sign of the correlation coefficient. Since the standard deviationsof both Xand Yare, by definition,positive; if correlationcoefficient is positive, both the regression coefficients are positive and similarly, if correlation coefficient happens to be negative, both the regressioncoefficientsbecome negative. The two regression lines always intersect each other at the point (X, Y). When r = _+ 1 ,there is an exact linear relationship between X and Y and in that case, the two regression lines coincide with each other. When r = 0 , the two regression equations reduce to Y = Y and X = In such a situation, neither YnorXcan be estimated from their respective regression equations.
- -

3)
4) 5)

z.

As mentioned earlier, coefficient of determinationis an important concept in the context of regression analysis. However, the concept will be more contextual if we discuss it in the next unit.

Example 8.2
From the following results, obtain the two regression equations and the estimate of the yield of crop, when the rainfall is 22 cm; and the rainfall, when the yield is 600 kg.

Yield in kg
Mean Standard Deviation 508.4 36.8

Rainfall in cm
26.7 4.6

Co-efficient of correlation between yield and rainfall = 0.52. Let Y be yield and X be rainfall. So, for estimating the yield, we have to run the regression of Y on Xand for the purpose of estimating the rainfall, we have to use the regression ofXon Y . Wehave, z = 2 6 . 7 , Y=508.4, a =4.6, a =36.8:and r=0.!2 , , 36.8 4.6 : regression coefficientsby, = 0.52 x -= 4.16and b, = 0.52~- = 0.065 . 4.6 36.8 Hence, the regression equation of Yon X is
Y -508.4 = 4.16(X - 26.7) orY = 4.16X +397.33

Similarly, the regression equation of X on Y is X - 26.7 = O.O65(Y - 508.4) or X = 0.065Y - 6.346 When X = 22, Y = 4.16 x 22 + 397.33 = 488.8 When Y = 600, Y = 0.065 x 600 - 6.346 = 32.7 Hence, the estimated yield of crop is 488.8 kg and'the estimated rainfall is 32.7 cm.

i

Quantitative Methods-]

8.4

TYPES OF DATA

We conclude this unit by discussing the types of data that may be used for the purpose of economic analysis in general and regression analysis in particular. We can use three kinds of data for the empiricalverificationof any economicphenomenon. They are: time series, cross section, pooled or panel data. Time Series Data A time series is a collection of the values of a variable that are observed at different points of time. Generally, the interval between two successivepoints of time remains fixed. In other words, we collect data at regular time intervals. Such data may be collected daily, weekly, monthly, quarterly or annually. We have for example, daily data series for gold price, weekly money supply figures, monthlyprice index, quarterly GDP series and annual budget data. Sometimes, we may have the same data in more than one time interval series; for example, both quarterly and annual GDP series may be available. The time interval is generally called the frequency of the time series. It should be clear that the above-mentioned list of time intervals is by no means an exhaustiveone. There can be, for example, an hourly time series like that of stock price sensitivity index. Similarly,we may have decennial population census figures. We should note that conventionally, if the frequency is one year or more, it is called a low frequency time series. On the other hand, if the frequency is less than one year, it is termed as a high frequency time series. A major problem with time series is what is known as non-stationary data. The presence of non-stationarity is the main reason for nonsense correlation that we talked about in connection with our discussionon correlation. Cross Section Data In cross section data, we have observations for a variable for different units at the same point of time. For example, we have the state domestic product figures for different states in India for a particular year. Similarly, we may collect various stock price figures at the same point of time in a particular day. Cross section data are also not free from problems. One main problem with this kind of data is that of the heterogeneity that we shall refer to in the next unit. Pooled Data Here, we may have time series observations for various cross sectional units. For example, we may have time series of domestic product of each state for India and we may have a panel of such series. This is why such kind of a data set is called panel data. Thus, in this kind of data, we combine the element of time series with that of cross section data. One major advantage with such kind of data is that we may have quite a large data set and the problem of degrees of freedomthat mainly arises due to the non-availability of adequatedata can largely be overcome. Recently, the treatment of panel data has received much attention inempirical economic analysis. Check Your Progress 2
1)

Explain how regression is not primarily concerned with causality analysis.

2)

Bring out the differencebetween correlationand regression.

- Correlation and Regression Analysis

I
!

.................................................................................................................. ..................................................................................................................
3)
What is the distinctionbetween Time Series Data and Cross Section Data?

1 !

..................................................................................................................

............................................... ..................................................................................................................
...................................................................

4)

E x p l h the concept of reverse regression.

I

I
I
I

8.5 LET US SUM UP

I

i

-

,

Regression models occupy a central place in empirical economic analysis. These models are essentiallybased on the conceptsof comelation and regression. Correlation is a quantitative measure of the strength of the linear .among some variables. The existence of a high degreerelationship that may exist of correlation,however, is not necessarily the evidence of a meaningll relationship. It only suggests that the data are not inconsistent withthe possibility of such kind of a relationship. Regression on the other hand focuses on the direction of a linear relationship. Here, one is concerned with the dependence of one variable on other variables. Regression, in itself, does not suggest any causalrelationship. Correlation and regression, both are concerned with a statistical or stochastic relationship as against amathematical or an exact relationship. In the conventional regression analysis,the dependent variable is treated to be stochastic or random, whereas, the independent variables are taken to be non-stochastic in nature. The constants of a regression equation are estimated h m the empirical observationsby using the least squaretechnique. In atwo variable regression equation, there is one dependent variable and one independent variable. The slope coefficient of a regression equation is called the regression coefficient. It measures the rate of change of the dependent variable with respect to the independent variable. The distinction between the concept of direct regression and that of the reverse regression is crucial in the regression analysis. Sometimes by running both the kinds of regression, important insight can be gained in the empirical economic analysis. In multiple regression, there are at least two independent variables. Finally, in regression analysis, three types of data, namely, time series, cross section and pooled, can be used.

Quantitative Methods-I

8.6 EXERCISES
1)

Prove that correlation coefficient lies between - 1 and + 1. Show that correlation coefficient is unaffected by a shift in the origin and a change of scale. For the regressionequation of Y onX, derive the least square estimators of the parameters. Try and work out the same for the regression equation ofXon Y. From the following data, derive that regression equation which you consider to be economically more meaningful. Givejustification for your choice. Output Profit per unit

2)

3)
4)

5
1.70

7 2.40

9
2.80

11 3.40

13 3.70

15 4.40

5)

To study the effect of rain on yield of wheat, the followingresults were obtained:
Mean Standard Deviation

Yield in kg per acre M a l l in inches Correlation coefficient is 0.80.

800 50

12 2

Estimate the yield, when d a l l is 80 inches.

KEY WORDS
coefficient of Determination Corklation Cross Section Data Econometrics Mathematical Model Method of Least Square
: It is equal to the square ofthe correlationcoefficient.

: It is a quantitative measure of the strength of the

relationship that may exist among certain variables.
: In cross section data, we have observations for a

variable for different unitsat the same point of time.
: It is described as the application of statistical tools in

the quantitativeanalysis of economicphenomena.
: The mathematical form of some economic theory is

what is generally called a mathematical model.
: 1t is the method of estimating the parameters of a

regression equation in such a fashion that the sum of the squares of the differences between the actual values or observed values of the dependent variable and their estimated values from the regression equation is minimum.
: It is a regression equation with more than one

Multiple Regression Nonsense Correlation

independent variable.
: The presence of correlation between two

variables when there does not exist any meanmgfd relationship between them is known as nonsense correlation.

Pooled Data

: In pooled data, we have time series observations

for various cross sectional units. Here, we combine the element oftime series with that of cross section data.
Regression Equation
: It is the equation that specifies the relationship

Correlation and Regression Analysis

between the dependent and the independent variables for the purpose of estimatingthe constants or the parameters of the equation with the help of empirical data on the variables.
Regression
: It is a statistical analysis of the nature of the

relationship between the dependent and the independent variables.
Reverse Regression
: It is an independent estimation of a new regression

equation when the independent variable ofthe origu7al equation is changed into the dependent variable and the dependent variable of the original equation is changed into the independent variable.
Time Series Data Two Variable Regression
: It is a series of the values of a variable obtained at

different points of time.
: It is a regression equation with one independent

variable.

8.8 SOME USEFUL BOOKS
Gujrati, Damodar N. (2003); Basic Econometrics, Fourth Edition, Chapter 2, Chapter 3, and Chapter 7, McGraw-Hill, New York. Maddala, G.S. (2002); Introduction to Econometrics, Third Edition, Chapter 3 and Chapter 4, John Wiley & Sons Ltd., West Sussex. Pindyck, Robert S. and Rubinfeld, Daniel L. (1991), Econometric Models and Economic Forecasts, Third Edition, Chapter 1 McGraw-Hill, New York. ; Karmel, P.H. and Polasek, M. (1 986); AppliedStatisticsfor Economists, Fourth Edition, Chapter 8, Khosla Publishing House, Delhi.

8.9 ANSWERS OR HINTS TO CHECK YOUR PROGRESS
Check Your Progress 1

1) i)
ii)
\

See section 8.2.1. See section 8.2.1. No, because, it is assumed that there exists a linear relationship between the variables.

2) 3)

See section 8.2.2. See section 8.2.3.

Quantitative Methods-I

Check Your Progress 2

1) See section 8.3.1.
2) 3) Seesection8.3.2. See section 8.4.

8.10 ANSWERS OR HINTS TO EXERCISES
1)

Do Yourself. Do Yourself.

2) 3)

Y = 0.257X + 0.50

4) Do Yourself. 5) 944 kg per acre.

Unit 8

Comments

Content

Sponsor Documents

Recommended