Statistics

Published on June 2016 | Categories: Documents | Downloads: 36 | Comments: 0 | Views: 657
of 41
Download PDF   Embed   Report

Comments

Content

MEASURES OF CENTRAL TENDENCY Formula
1.

Mean for grouped data, using assumed mean with step deviation method Mean = A + Where – A is the assumed mean d is deviations from assumed mean divided by common interval

∑ fd * c
n

∑ fd

is the summation of frequency * deviations

c is the class interval n is total frequency Median for grouped data, Median = l + Where – l is lower limit of the median class n is total frequency cf is the cumulative frequency before median class c is class interval Mode for Grouped Data Mode = l + Where – l is lower limit of the modal class f is frequency of modal class interval

2.

n / 2 − cf *c f

3.

∆1 − ∆ 2 *C ∆1

∆1 is the frequency of pre-modal class – frequency of modal class ∆ 2 is the frequency of post-modal class – frequency of modal class
c is the class interval Five number summary comprisesa. b. c. d. e.

4.

Smallest observation First quartile (lower quartile) Median Third quartile (upper quartile) Largest observation

Statistics

Page 1

OBJECTIVE QUESTIONS Choose the best answer / Fill in the blanks / True or False – 1. 2. 3. If the classes are of the form 0 - 10, 10 – 20, etc they are called _______________ classes If the classes are of the form 1 - 10, 11 - 20,etc they are called _________________ classes If the classes are of the form 0 - 10, 10 – 20, etc an item of value 10 will be entered in – a. b. c. d. 4. 5. 6. 7. Class 0 – 10 Class 10 – 20 Either of the above None of the above

If the classes are of the form 0 - 10, 10 – 20, etc the class interval is ____________ If the classes are of the form 0 - 10, etc the mid point of class is ____________ Number of observations falling within a class is called - Class _____________ Ogive means – a. b. c. d. Cumulative frequency curve Frequency Cure Mathematical Average Arithmetic Mean

8. 9.

Data can be in ________________________ or _____________________form. The measures of central tendency are ______________, ______________ & _________________ a. b. c. d. a. b. c. d. Measures of Central Tendency Measures of Dispersion Measures of Middle Values Measures of Mathematical Averages Mean = Median = Mode Mean > Median > Mode Mean < Median < Mode Mean + Median = Mode

10. Mean, Median and Mode are known as –

11. If all the items in a distribution are of the same value, then-

12. The sum of deviations of all observations from the Arithmetic Mean is ____________ 13. In a symmetrical distributiona. b. c. d. Mean = Median = Mode Mean > Median > Mode Mean < Median < Mode Mean + Median = Mode

Statistics

Page 2

14. Empirical formula about measures of central tendency given by Karl Pearson for an asymmetrical distribution is – a. b. c. d. Mean – Mode = 3 (Mean – Median) 2 Mode = (Mean + Median) 2 Mean = (Mode + Median) 2 Median = (Mode + Mean)

15. Quartiles are _____________________ 16. Percentiles are ____________________ 17. Deciles are _____________________ 18. True or False a. b. c. d. e. The following measures are affected when the highest value in a set of observations is altered The following measures are affected when the lowest value in a set of observations is altered The following measures are affected when the highest value and the lowest in a set of observations are altered The following measures are affected when each value in a set of observations are increased or decreased by a constant value The following measures are affected when each value in a set of observations are multiplied or divided by a constant value Measure Mean Median Mode a b c d e

Statistics

Page 3

PROBLEMS CALCULATE THE MEASURES OF CENTRAL TENDENCY AND THE FIVE NUMBER SUMMARY FOR THE FOLLOWING DATA 1. Data pertaining to marks of students and ages of people is given below a. b. 2. Marks of students in a test is 48, 60, 59, 67, 66, 78 Ages of people in a group is 70, 72, 63, 56, 37, 82, 55, 85, 63

Cycle test marks of students are given below – Class A Class B 55 45 58 35 64 64 70 60 75 58 72

3.

Data pertaining to workers and their wages is given below Wages (Rs) No. of Workers 35 19 45 12 55 15 65 10 75 14

4.

Monthly income of 100 families is given belowMonthly Income (Rs) Less than 10 Less than 20 Less than 30 Less than 40 Less than 50 Less than 60 Less than 70 Less than 80 Less than 90 No. of Families 5 12 26 44 64 78 87 94 100

5.

Data pertaining to students and their marks is given below Marks No. of Students 0–9 1 10 – 19 3 20 – 29 19 30 – 39 10 40 -49 15 50 - 59 2

Statistics

Page 4

MEASURES OF DISPERSION Measure Range Coefficient of Range Individual Data Discrete Data Grouped Data

L−S L−S L+S
L is Maximum Value S is Minimum Value

L−S L−S L+S
L is Maximum Value S is Minimum Value

Explanation

L is mid-value of largest class S is mid-value of smallest class No class should be open-ended. If it is an inclusive class, it should be converted to exclusive classes

L−S L−S L+S

Quartile Deviation Coefficient of Quartile Deviation

Q3 − Q1 2 Q3 − Q1 Q3 + Q1

Q3 − Q1 2 Q3 − Q1 Q3 + Q1

Q3 − Q1 2 Q3 − Q1 Q3 + Q1

n ( − cf ) Q1 = l + 4 *c f
Where l = lower limit of Q1 class n = total no. of observations cf = cumulative frequency till class preceding Q1 class c = class size f = frequency of Q1 class

Q1 =
Explanation

( N + 1)th item 4

Q1 is the x value of the

( N + 1)th item 4

Q3 =

3( N + 1)th item Q3 is the x value of 3( N + 1)th 4 the
4

item

3n − cf ) Q3 = l + 4 *c f (
Where l = lower limit of Q3 class n = total no. of observations cf = cumulative frequency till class preceding Q3 class c = class size f = frequency of Q3 class

Statistics

Page 5

OBJECTIVE QUESTIONS Choose the best answer / Fill in the blanks / True or False 1.

The measure of degree of scatter of the data from the central value is a. b. c. d. Dispersion Skewness Average Mean

2. 3.

______________is the difference between the largest and the smallest value of the variable Quartile deviation is otherwise called as – a. b. c. d. Quartile Range Inter quartile range Intra quartile range Semi inter quartile range Average deviation Dispersion Difference Zero sum

4.

Mean deviation is otherwise called as – a. b. c. d.

5. 6. 7. 8. 9.

The relative measure of standard deviation is called ___________________________________ Square of standard deviation is called _______________________________ Sum of squares of deviation is minimum when taken from ___________________ Sum of absolute deviation is minimum when taken from _____________ Inter quartile range is a. b. c. d. Q3 – Q1 Q1 – Q2 Q2 – Q1 Q3 – Q2

Statistics

Page 6

10.

True or False a. b. c. d. e. The following measures are affected when the highest value in a set of observations is altered The following measures are affected when the lowest value in a set of observations is altered The following measures are affected when the highest value and the lowest in a set of observations are altered The following measures are affected when each value in a set of observations are increased or decreased by a constant value The following measures are affected when each value in a set of observations are multiplied or divided by a constant value Measure Range Mean Deviation Quartile Deviation Standard Deviation Variance a b c d e

Statistics

Page 7

PROBLEMS CALCULATE THE MEASURES OF DISPERSION FOR THE FOLLOWING DATA 1. The following are the runs scored by two cricketers in 10 innings. a. b. Find which batsman is a better player Find out which batsman is more consistent (more reliable) 16 42 8 56 24 43 56 37 90 31 104 45 48 50 32 29 8 30 14 27

Batsman I Batsman II 2.

Heights of 60 students in a class are as below. Height (in cms) No. of Students 152.5 3 153 9 153.5 7 154 13 155 8 155.5 6 157.5 7 158 5 159.5 2

3.

A factory produced two types of electric bulbs A and B. In a study about the life of bulbs, the following results were obtained c. d. Find which type of bulb is long lasting Find out which type of bulb is more variable Length of Life (in hours) 60 – 80 80 – 100 100 – 120 120 – 140 140 - 160 A (no. of bulbs) 10 22 52 20 16 B (no. of bulbs) 8 60 24 16 12

Statistics

Page 8

CORRELATION AND REGRESSION 1. Correlation measures the degree of relationship between two or more variables
a. b. c. d. e. f.

The symbol for measuring correlation is ‘r’ ‘r’ lies between -1 and +1 Correlation is independent of origin and scale Correlation is symmetric with respect to the variables It is independent of units Correlation means relationship and not causation Dependency Nature and strength of association Causation Coincidental relationship Influence of other variables Positive and negative correlation Linear and non-linear correlation Simple, partial and multiple correlation Difference in periods for cause and effect relationship to be established is known as lag and lead Advertisement and marketing expenses may lead to sales with a lag Additional supply of materials today may lead to reduction in prices after some time Effect of increase in income may lead to increase in expenditure and savings after a period Boom in agricultural produce may lead to increase in industrial output after a gap of time Regression is a functional relationship between the value of 2 variables With the help of regression lines we can predict most likely value of one variable given the other If x and y are two variables, then y can be represented as equal to ax + b or x is equal to cy + d where a, b, c, and d are constants. These are known as linear regression equations Rate of change of one variable to unit change in other variable is called regression coefficient The regression lines intersect at ( x , y ) where x and y are mean of x and y respectively If r = 0, then the regression lines will be perpendicular to each other If r = ± 1, then the regression lines will coincide r is the geometric mean of the regression coefficients Both the regression coefficients are either positive or negative At least 1 regression coefficient must be numerically less than unity Regression coefficients are independent of origin but not scale

2.

Understanding why association exists a. b. c. d. e.

3.

Important types of correlation are –
a. b. c.



Lag and lead in correlation
a. b. c. d. e.



Regression
a. b. c.

d. e. f. g. h. i. j. k.

Statistics

Page 9

Formula1. Methods of Correlation a. Karl Pearson’s Coefficient of Correlation Assumed mean method

r=

N ∑ dx 2 − (∑ dx) 2 N ∑ dy 2 − (∑ dy ) 2

N ∑ dxdy − ∑ dx ∑ dy

Where dx is (all values of x – assumed mean of x) and dy is all values of y – assumed mean of y and N is the number of observations Direct method

r=

N ∑ x 2 − ( ∑ x ) 2 N ∑ y 2 − (∑ y ) 2

N ∑ xy − ∑ x ∑ y

Where x is all values of x and y is all values of y and N is the number of observations (Note: Karl Pearson’s coefficient of correlation is also called product moment correlation) b. Spearman’s Rank Correlation WHEN RANKS ARE NOT GIVEN OR UNEQUAL RANKS GIVEN

6∑ d 2 R = 1− n(n 2 − 1)
Where, d is difference of ranks of x and y variable and n is number of observations WHEN RANKS ARE EQUAL

R = 1−

6(∑ d 2 +

1 3 (mi − mi )) 12 n(n 2 − 1)

Where, d is difference of ranks of x and y variable and n is number of observations and mi is number of times a rank is repeated in the first or second variable C. Two way Frequency Table

r=
Steps-

N ∑ fdx 2 − (∑ fdx )2 N ∑ fdy 2 − (∑ fdy ) 2
Take step-deviations of x and y from assumed mean and denote them dx and dy Multiply dx and dy and the frequency of each cell and note the figure in upper right hand corner of each cell Add all values of fdxdy and obtain ∑fdxdy

N ∑ fdxdy − ∑ fdx ∑ fdy

Statistics

Page 10

Multiply frequencies of variable x by deviations of variable x and obtain ∑fdx Take square of deviations from variable x and multiply by frequencies to obtain ∑fdx2 Multiply frequencies of variable x by deviations of variable y and obtain ∑fdy Take square of deviations from variable y and multiply by frequencies to obtain ∑fdy2 Substitute the values in the formula to obtain r d Concurrent Deviation Method

R=±

2C − n n

Where C is number of concurrent deviations (where sign change from previous pair of x and y is same and n is number of pairs observed) 4. Probable Error

PE = 0.6745

(1 − r 2 ) n

Where r is correlation and n is number of pairs observed

SE =

(1 − r 2 ) n

Where r is correlation and n is number of pairs observed

δ
5.

(Rho) is r ± PE

Calculation of Regression Equation
a.

(x − x) = r

σx ( y − y) σy

Where x and y are means of x and y respectively and r
b.

( y − y) = r

σy (x − x) σx

σx is called the regression coefficient of x on y σy
σy σx

Where x and y are means of x and y respectively and r
c.

is called the regression coefficient of x on y

Fitting a straight line y on x – Equation is Y = a + bX

∑ y = na + b ∑ x
∑ xy = a ∑ x + b ∑ x 2
Where if we solve for ‘a’ and equate the 2 equations, we will get the value of b as mentioned below

r

σy N ∑ dxdy − ∑ dx ∑ dy = by = σx N ∑ dx 2 − (∑ dx )2
x

Where dx is (all values of x – assumed mean of x) and dy is all values of y – assumed mean of y and N is the number of observations
d.

Fitting a straight line x on y -

Statistics

Page 11

r

σx N ∑ dxdy − ∑ dx ∑ dy = bxy = σy N ∑ dy 2 − (∑ dy ) 2

Where dx is (all values of x – assumed mean of x) and dy is all values of y – assumed mean of y and N is the number of observations
e.

Fitting a parabolic curve or a second degree equationEquation is Y = a + bX + cX2

∑ y = na + b ∑ x + c ∑ x 2 ∑ xy = a ∑ x + b ∑ x 2 + c ∑ x 3 ∑ x 2 y = a ∑ x 2 + b ∑ x3 + c ∑ x 4
f.

Multiple Regression Equations For 3 variables, equation is X = a + bY + cZ

∑ x = na + b ∑ y + c ∑ z ∑ xy = a ∑ y + b ∑ y 2 + c ∑ yz ∑ xz = a ∑ z + b ∑ yz + c ∑ z 2
Similarly, it can be done for N variables.

Statistics

Page 12

OBJECTIVE QUESTIONS CHOOSE THE BEST ANSWER / FILL IN THE BLANKS / TRUE OR FALSE 1. An analysis of the relationship among two or more variables is called
a. b. c. d.

Correlation Skewness Dispersion Kurtosis

2. 3. 4. 5. 6.

If x and y are independent, then correlation between them is _________ If the decrease in one variable influences the decrease in the other, it is called _______________ correlation If the decrease in one variable influences the increase in the other, it is called _______________ correlation If the ration between two sets of variables is same, then it is called _____________________ correlation Curvilinear correlation is
a. b. c. d.

Linear correlation Non-linear correlation Simple correlation Special correlation

7. 8. 9.

Perfect negative correlation is when r = _________ Perfect positive correlation is when r = _________ Completely no correlation is when r = ________
a. b. c. d.

10. Change of scale in value of x or y series willAffect the value of ‘r’ very much Not affect the value of ‘r’ Affect the value of ‘r’ slightly Increase or decrease the value of r proportional to the change of scale The amount of rainfall and the yield of crops The color of an employee’s dress and the employee’s salary Age of applicants for life insurance and the annual premium payable Sale of raincoats and the sale of umbrellas

11. State the nature of correlation that exists between the following variablesa. b. c. d.

12. Correlation value lies between ____________and ________ 13. Coefficient of determination is _________ and coefficient of non-determination is ____________ 14. State true or false
a. b. c. d.

Correlation coefficient is unaffected by shift in origin Covariance between 2 variables is always positive Rank correlation lies between 0 and 1 If one set of values are removed, then coefficient of correlation for the remaining pairs remains unchanged

Statistics

Page 13

e.

If correlation between 2 variables are 0, then the variables are independent

Statistics

Page 14

15. Do the following items have positive, negative or zero correlation
a. b. c. d.

Price and demand Age and life expectancy Age of husband and wife Income and savings of a person

Statistics

Page 15

PROBLEMS CALCULATE CORRELATION FOR THE FOLLOWING DATA 1. Find the correlation and also regression equations between advertisement expenses and sales of a particular brand of icecream Dippy-Dip Month Advt. Exp (Rs 000s) Sales (Rs lakhs) 2. Jan 20 30 Feb 25 36 Mar 28 40 Apr 32 42 May 36 45 Jun 34 40

Find correlation and also regression equations between marks in statistics and accounting of a particular group of students Roll No of student Statistics marks Accounting marks 101 45 79 102 66 56 103 58 61 104 74 48 105 81 40

3.

Find correlation and regression equations between age of cars and annual maintenance cost Age of cars Annual maintenance cost 2 1600 4 1500 6 1800 8 1700 10 2100

4.

Find rank correlation between marks in test and marks in interview of a group of candidates in a job selection procedure Marks in Test Marks in Interview 24 38 33 40 33 44 42 50 53 49 60 45 60 52 60 50 71 55 75 68

5.

Find correlation between percentage score given by 2 judges Y\X 50 – 60 60 – 70 70 -80 80 – 90 90 – 100 60-70 4 3 70 – 80 2 5 3 3 80 – 90 2 3 3 5 5 90 – 100 3 6 3

X – Percentage score by judge A Y - Percentage score by Judge B 6. Excel Pharma has launched a new preventive medicine for the treatment of Swine Flu. The data below is the effect on 100 patients who have taken the medicine against 100 patients who have not taken the medicine and being admitted to the hospital with viral infection. 98% are free from Swine Flu in the first case vs. 21% who are infected with Swine Flu in the second case. Excel Pharma is claiming a very high success rate on use of their medicine. Comment

Statistics

Page 16

7.

Following is the data pertaining to the sensex value and the gold price as on 1st of month from Jan to Sep 2010. What will be the sensex value in Oct 2010, if the gold price will increase by 10% for diwali purchase season? MONTH 24 Ct Gold Price/gm Sensex JAN 10 1500 14000 FEB 10 1550 15000 MAR 10 1600 1550 APR 10 1620 15500 MAY 10 1700 16000 JUN 10 1750 17000 JUL 10 1800 17500 AUG 10 1850 18000 SEP 10 1900 18500

8.

Find the multiple linear regression equation of X on Y and Z from the data given belowX Y Z 2 3 4 4 5 6 6 7 8 8 9 10

9.

(Please find below an article printed in the front page of Chennai Times) Chennai During our recent investigations, it was found that five Chennai cricket players, Sairam, Sandeep, Sankar, Sundar, and Suresh are deeply involved with the betting syndicate. It has been confirmed by our sources that these players willfully underperformed in the recently concluded ODI series against the Bangalore team. In the table below are the batting scores of these five players along with the team score and the result of the matches in the recently concluded Friendship series. Career Batting Average 28 26 41 85 34 224 60% WON

Player Sairam Sandeep Sankar Sundar Suresh Team Chennai Result

1st ODI 41 17 33 89 0 272 WON

2nd ODI 19 19 42 112 3 212 LOST

3rd ODI 12 17 39 58 2 171 LOST

4TH ODI 33 71 36 90 1 265 WON

5th ODI 30 10 45 67 1 178 LOST

Further, it was predicted by the paper in a letter to the board that the players will under perform in their matches against Mumbai also and the prediction factor was given to the Chennai Police much in advance before the actual matches were played. The table contains scores calculated by the prediction factor vs. actual scores for the five Chennai players in the one off ODI match against Mumbai Player Sairam Sandeep Sankar Sundar Suresh Predicted score 36 74 41 87 4 Actual score 35 73 40 90 3

Please give your comments about these investigations and the truth in the allegations against the players.

Statistics

Page 17

TIME SERIES Time Series - It is arrangement of data according to time of occurrence in chronological order. Any series of measurement that is variable over time is called Time series. Utility of Time Series • Analysis Past behavior Effect of Factors Help predict future behavior • • • Forecasting Help make future plan of action Evaluation Evaluation of current achievements Comparison Scientific basis for making comparisons Isolating effects of various components Components of Time Series • Long term Secular Trend (T) - General Trend to increase or decrease over a period of time Cyclic Variations (C) - Oscillatory movements with periods greater than 1 year. Usually may last 7-9 years • Short Term Seasonal Variation (S) - Movements due to forces which are usually rhythmic in nature and within a year Irregular Variations ( I ) - No regular period of occurrence and accidental changes, purely random, unforeseen and unpredictable Mathematical Models • Additive Model Y=T+S+C+I Components are independent to each other Different components are expressed in original units and are residuals S, C & I are expressed as deviations from T • Multiplicative Model Y=T*S*C*I S, C & I are expressed as ratios or in percentages Components may be dependent on each other Mostly used in real life practice

Statistics

Page 18



Preliminary adjustments before Analyzing Time Series o o o o o Time Variation - Adjusting for no. of days in a month Population Variation - Adjust for variables affected by population like per capita income Price Changes - Use real values rather than nominal values Comparability - Make data homogeneous and comparable

Miscellaneous Changes

Measurement of Trend Freehand or Graphic Method • • • • • • Merits • • • • • Simple and time saving No mathematical calculation required Very flexible Highly subjective Hence, not suitable for forecasting and decision making <> Simplest and Most Flexible Method First step to plot points on a paper Then, draw a freehand smooth curve through points Number of points above curve and below curve should be equal Total deviations should be zero Sum of square of deviations should be the minimum possible

Merits and Demerits of Graphic Method

Demerits

Method of Semi Averages • • Merits • • • • • • Simple method Trend figures are objective Line can be extended to obtain future estimates Assumption of linear trend Affected by extreme values and use of arithmetic mean Obtained and predicted values are not precise and reliable <> Semi averages are the averages of two halves of a series Whole data is classified into two equal parts with respect to time

Merits and Demerits of method of Semi Averages

Demerits

Statistics

Page 19

Method of Moving Averages • • • • Merits • • • • • • • Simple and Objective method Flexible to add additional data without affecting calculations If period of moving average coincides with period of cyclical fluctuations, then they are automatically eliminated No trend values for some initial and end periods No functional relationship between value and time Difficulty in selecting period of moving average Bias in case the trend is non-linear<> Method helps to reduce fluctuations and obtain trend values with fair degree of accuracy Method consists of taking arithmetic mean of the values for a certain time span and placing at the centre of time span In case of even years, the centered moving average has to be found In some cases, weights may be given to the moving averages called weighted moving average

Merits and Demerits of Method of Moving Averages

Demerits

Method of Least squares • • • Merits • • • • • • Trend line for entire period Functional relationship between time and value Objective method Requires many calculations and is complicated Seasonal, cyclical or irregular variations are ignored If even a single data pair is added, a new equation has to be formed <> As sum of deviations from mean is zero, sum of deviations from line of best fit is zero Hence, called as method of least squares or best fit Y = a + bX where ‘a’ and ‘b’ are constants

Merits and demerits of Method of least squares

Demerits

Other Methods of obtaining trends • • • Fitting a Second Degree Trend or a parabolic trend Y = a + bX + cX2 where a, b, and c are constants Fitting an exponential trend Y = a b X where a, and b are constants Exponential smoothing average

Statistics

Page 20

Selection of type of trend • • • • If first differences are constant, use linear method If second differences are constant, use quadratic method If first differences of logarithm are constant, use exponential curve If first differences tend to decrease by a constant percentage, use modified exponential curve

Methods of measuring Seasonal Variations Method of Simple Averages • • • • • Arrange seasonal data across given periods Find average of data for same season Find average of averages Get percentage weights for various seasons It is simple to find but there is an assumption that there is almost no cyclical or irregular variation or of negligible value Ratio to Trend Method • • • • • Arrange seasonal data across given periods Using a suitable method, find seasonal trend values for annual data and then seasonal data Get percentage for actual seasonal data by dividing actual data/ trend values Find Seasonal Index which is average of percentages If total of seasonal index more or less than 1200 or 400, adjustment correction factor = 1200 or 400/(Total SI)

Ratio to Moving Average method • • • • First take a centered moving average Get percentage for actual seasonal data by dividing actual data/ centered moving average Arrange percentage data seasonally and take average If total of seasonal index more or less than 1200 or 400, adjustment correction factor = 1200 or 400/(Total SI)

De-Seasonalisation of Data • • • • • • • Elimination of seasonal variation is called as de-seasonalisation of data Either additive or multiplicative models are used Measurement of cyclical variations Eliminate Trends and Seasonal Variations from the original data using additive or multiplicative models Irregular variations are removed from this data by using the method of moving averages of appropriate period Cyclical variations are the only variations left and can be measured now Measurement of Irregular variations

Residual Method

Statistics

Page 21

• •

Using additive or multiplicative models by removing trend, seasonal or cyclical variations They are found to be of small magnitude

Forecasting of Data Qualitative Forecasting • When historical data are not available

Quantitative Forecasting • • • When historical data available Casual forecasting methods Time Series forecasting methods

Forecasting methods using time series • • • • • Mean forecast Naive forecast Linear Trend Forecast Non-Linear Trend Forecast Forecasting with Exponential Smoothing

Statistics

Page 22

Objective Questions CHOOSE THE BEST ANSWER / FILL IN THE BLANKS / TRUE OR FALSE 1. With which form of time series would you associate the followinga. A fire in the factory delaying production for three weeks b. Need for increased wheat production due to rise in the population c. Change in day temperature from winter to summer d. Increase in employment during harvest time e. Price hike in petroleum products due to Gulf war

2.

Fill in the blanks
a. An overall rise or fall in a time series is called____________ b. A time series consists of data arranged in _________________ order c. The additive model is expressed as Y = ________________________ d. The multiplicative model is expressed as Y = ________________________ e. The trend line obtained by the method of least squares is known as line of __________ f. The component of time series useful for long-term forecasting is _____________ g. For the annual data _______________________component of time series is missing h. If growth rate is constant, the trend line is _____________ i. A polynomial of the form Y = a + bX + cX2 is called _______________________ j. Trend is the overall tendency of the time series data to _____________ or _______________ over a long period of time k. Seasonal variations are variations with periods of _________________ and are mostly caused by _________________

3.

Choose the correct answer
a. Trend refers to a long term tendency to

i. Increase only ii. Decrease only iii. Increase or Decrease iv. None of the above
b. If trend is absent in a time series, seasonal indices are obtained by using

i. Method of simple averages ii. Ratio to trend method iii. Ratio to moving average method iv. Method of least squares
c. The most widely used method of measuring seasonal variations is

i. Method of simple averages ii. Ratio to trend method iii. Ratio to moving average method iv. Link relative method

Statistics

Page 23

d. The method used in the study of cyclical variations is

i. Ratio to trend method ii. Ratio to moving average method iii. Link relative method iv. Residual method

Statistics

Page 24

PROBLEMS Find trend lines for the following data by a. b. c. d.

Semi Averages method Moving Averages method Weighted Moving Averages method Least Squares method

1.

Assume a 4 yearly cycle with equal weights 1970 53 71 79 72 76 73 66 74 69 75 94 76 105 77 87 78 79 79 104 80 97 81 92 82 101 83 105

Year Value

2.

Following is the data pertaining to the sensex value and the gold price as on 1st of month from Jan to Sep 2010. What will be the sensex value in Oct 2010, if the gold price will increase by 10% for diwali purchase season? Month 24 Ct Gold Price/gm Sensex Jan 10 1500 14000 Feb 10 1550 15000 Mar 10 1600 1550 Apr 10 1620 15500 May 10 1700 16000 Jun 10 1750 17000 Jul 10 1800 17500 Aug 10 1850 18000 Sep 10 1900 18500

Find seasonal indices for the following data by a. b. c. d.
3.

Method of simple averages Ratio to trend method Ratio to moving average method Link Relative method

Output of Coal in Million Tonnes Year 2005 2006 2007 2008 2009 Q1 73 70 73 75 65 Q2 67 63 68 64 60 Q3 66 61 68 61 56 Q4 68 66 72 67 63

4.

Monthly data pertaining to rice production in lakhs of tonnes the period of Jan 2007 to Dec 2009 Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2007 16 15 14 18 17 19 20 17 16 14 16 19 2008 25 23 25 27 24 25 26 22 22 22 22 23 2009 21 20 21 19 18 17 19 20 21 20 18 16

Statistics

Page 25

5.

Calculate the seasonal variations by ratio to trend method for the following data from 2005 to 2009 Year 2005 2006 2007 2008 2009 IQ 30 34 40 54 80 II Q 40 52 58 76 92 III Q 36 50 54 68 86 IV Q 34 44 48 62 82

6.

Calculate the seasonal variations by ratio to moving average method for the following data from 2007 to 2009 Year 2007 2008 2009 IQ 68 65 68 II Q 62 58 63 III Q 61 66 63 IV Q 63 61 67

Statistics

Page 26

PROBABILITY Concepts Probability is the mathematics of chance. A probability experiment is a chance process that leads to well defined outcomes or results. An outcome of a probability experiment is the result of a single trial of a probability experiment. Each outcome of a probability experiment occurs at random. Each outcome of the experiment is equally likely. A trial means tossing a coin once, rolling a die or drawing a single card from the deck. The set of all outcomes of a probability experiment is called a sample space. Sample space can be represented using tree diagrams and tables. Probability Experiment is a process of chance that leads to well defined outcomes or results. An event is one or more outcomes of a sample space. An event with a single outcome is called simple event and with two or more outcomes is called a compound event. Rules – 1. 2. 3. 4. 5. The probability of any event will always be from 0 to 1 When an event cannot occur (impossible event), the probability will be 0 When an event is certain to occur, the probability is 1 The sum of the probabilities of all the outcomes in the sample space is 1 The probability that an event will not occur = (1 – probability that event will occur)

Sample space can be represented in two ways: tree diagrams and tables. A tree diagram can be used to determine the outcome of a probability experiment. A tree diagram consists of branches corresponding to the outcomes of two or more probability experiments that are done in sequence. Sample spaces can also be represented using tables. For example, the outcomes when selecting a card from an ordinary deck can be represented by a table. When two dice are rolled, 36 outcomes can be represented by using a table. Once a sample space is found, probabilities can be computed for specific events Addition RulesMany times in probability, it is necessary to find probability of two or more events occurring. In these cases, the addition rules are used. When the events are mutually exclusive, they have no outcome in common. P (A or B) = P (A) + P (B) When the two events are not mutually exclusive, they have some common outcomes. P (A or B) = P (A) + P (B) – P (A and B) The key word in these problems is “Or”, and it means add or union.

Multiplication RulesWhen two events occur in sequence, the probability that both events occur can be found by using multiplication rules. When two events are independent, the probability that the first event occurs does not affect or change the probability of the second event occurring.

Statistics

Page 27

P (A and B) = P (A). P (B) If the events are dependent, the probability of the second event occurring is changed after the second event occurs. P (A and B) = P (A). P (B|A) where P (B|A) = P (B|A) is also known as conditional probability. .

Conditional Probability – The key word for multiplication rule is “and” and it means intersection. Conditional probability is used when additional information is known about the probability of an event. Odds and Expectations – Odds are used to determine the payoffs in gambling games. Odds are computed from probabilities; however, probabilities can be computed from odds if the true odds are known. Odds in favor = Odds against =

Expected ValueMathematical expectations can be thought of as a long term average. If the game is played many times, the average of the outcomes or the payouts can be computed using mathematical expectation. E(x) = In order to determine the number of outcomes or events, the fundamental counting rule, the permutation rules, and the combination rule can be used. The difference between a permutation and a combination is that for a permutation, the order or arrangement of the objects is important. For example, order is important in phone numbers, identification tags, social security numbers, license plates, dictionary etc. Order is not important when selecting objects from a group. There are three types of probability: Classical probability uses sample spaces. A sample space is the set of outcomes of a probability experiment. Classical probability is defined as the number of ways (outcomes) the event can occur divided by the total number of outcomes in the sample space. Empirical probability uses frequency distributions, and it is defined as the frequency of an event divided by the total number of frequencies Subjective probability is made by a person’s knowledge of the situation and is basically an educated guess as to the chance of the event occurring

Bayes’ theorem –

Statistics

Page 28

Probability Distributions – 1. Uniform Distribution- A distribution is said to be uniform if the probability of the variable is equal for all values in the given interval. For example – If people come to a railway station in a uniform distribution and a train leaves every 5 minutes. What is the probability that a person arriving at the station will have to wait for less than a minute? The number of persons arriving is uniform and hence one in five persons arrive every minutes and hence probability = 0.2 2. Binomial Distribution – • • • • • • • Each trial can only have two outcomes There are a fixed number of trials The outcome of each trial is independent of each other The probability for an outcome must be same for each trial where n is number of trials, r is number of successes, p is probability of success

3.

Poisson Distribution – • • It is used when variable occurs over a period of time, over a period of area or volume P= where e is mathematical constant, λ is mean or expected value and x is number of successes where mean

and variance = np

Statistics

Page 29

4.

Normal Distribution – • • • • • • • In a standard normal distribution, mean is 0 and variance is 1. If The standard normal values are called z scores It is bell shaped and symmetric about the mean and continuous and asymptotic to the axis Area under the curve is 1 The mean, median and mode are at the centre of the distribution

Statistics

Page 30

Problems 1. 2. 3. 4. 5. 6. 7. 8. 9. When a die is rolled, what is the probability of getting a number greater than 4? Two dice are rolled. The probability that the sum of spots on the faces will be ‘8’ is? When two coins are tossed, the probability of getting two tails is? When a card is selected from a standard pack, the probability that it is a ‘9’ is? When a card is selected from a standard pack, the probability that it is a diamond or a number card is? In a survey of 180 people, 7s are over 60. If a person is selected at random, what is the probability that the person is over 60? If a letter is selected at random from the word “PROBABILITY”, the probability that it is a vowel is? In a box, there are 6 white marbles, 3 blue marbles and 1 red marble. If a marble is selected at random what is the probability that it is not white? In a sample of 10 pieces, 4 are defective. If 3 are selected at random and tested, what is the probability that they are not defective? 10. How many different 3 digit codes can be made? 11. If 30% of commuters ride to work on a bus, find the probability that if 8 workers are selected at random, 3 will ride the bus. 12. A survey found that 10% of older people have given up driving. If a sample of 1000 persons is taken, the standard deviation of the sample will be? 13. A board of directors consists of 7 women and 5 men. If 4 directors are selected at random, the probability that exactly 2 directors are men is? 14. The probability that there will be a car accident in a particular road is 0.01. The number of accidents follows Poisson distribution. If there are 500 cars on the road on a particular day, find the probability that there will be exactly 4 accidents? 15. About 5% of rabbits are brown in color. If the distribution is Poisson, find the probability that in 100 randomly selected rabbits, 7 rabbits are brown in color? 16. In an exam (which is approximately normally distributed), the average marks were 200 and variance was 400. If a person who took the exam was selected at random, find the probability that the person scores above 230. 17. The average height for adult kangaroos is 64 inches with a variance of 4 inches. Assume normal distribution. If a kangaroo is selected at random, find the probability that its height is between 62 and 66.8 inches 18. Box 1 contains 2 red balls and 1 blue ball. Box 2 contains 1 red ball and 3 blue balls. Each of the two boxes is selected and a ball is selected from the box at random. If the ball is red, find the probability it came from box 1? 19. Two manufacturers supply paper cups to a certain catering service. ‘A’ supplied 100 cups and 5 were damaged. ‘B’ supplied 50 cups and 3 were damaged. If a cup is damaged, find the probability that it came from ‘A’? 20. A street vendor, if the vendor is caught by city inspector, must pay a fine of Rs 50. Otherwise, the vendor can make Rs 100 at Main Road or Rs 75 at Cross Road. Construct a payoff table, determine the optimal strategy for both locations, and find the value of the game.

Statistics

Page 31

HYPOTHESIS TESTING Procedure in Hypothesis Testing1. 2. 3. 4. 5. Formulate a Hypothesis Set up a suitable significance level Select test criterion Compute the statistic Make the decision H0 Accepted Correct decision Type II error (β) H0 Rejected Type I error (α) Correct decision

H0 is True H0 is False Explanations• • • • • • • • •

Parameter – Statistical measure based on all units of a population Statistic – Statistical measure based on all units of a sample Sampling distribution – Distribution of a statistic Standard error – Standard deviation of the sampling distribution of the statistic Confidence interval – An interval that is expected to include the true values of the parameter with the desired levels of confidence Significance level (α) – It indicates the percentage of sample data outside certain limits. It is also the probability of committing a type I error Acceptance region – Complementary region Critical Region – Rejection region One tail test – A hypothesis with two rejection regions. o o Right tail test - H0 =µ and H1 > µ or H0 ≤ µ and H1 > µ Left tail test - H0 =µ and H1 < µ or H0 ≥ µ and H1 < µ

• • • • •

Two tail test – A hypothesis with one rejection region. H0 =µ and H1≠µ Null hypothesis (H0) – The hypothesis which is tested for possible rejection under the assumption that it is true. It is also known as the hypothesis of no difference. Alternate hypothesis (H1) – A hypothesis which contradicts the null hypothesis. It decides whether the test has to be a one tailed test or two tailed test Type I error – Rejecting a hypothesis when it is true. It is also known as rejecting a good lot or producer’s risk Type II error – Accepting a hypothesis when it is false. It is also known as accepting a bad lot or consumer’s risk

Statistics

Page 32

Non-Parametric Tests • K-S test for goodness of fit of one sample (Kolmogorov-Smirnov) o o o o o o o • Sum cumulative frequency of observed values Convert to percentage Find the expected values and convert to percentage Find the difference of observed and expected values The maximum difference value is called D value Degree of freedom is the number of observations Compare with table value of D at degrees of freedom

U Test (Mann-Whitney Test for Equality of two means)

U = n1n2 +

n1 (n1 + 1) n (n + 1) − R1or = n1n2 + 2 2 − R2 Whichever is lesser 2 2

n1n2 2 n1n2 (n1 + n2 + 1) 2 If σ = 12 U −µ Z=

µ=

Where Ri is sum of ranks of each group and ni = number of observations in each group • H Test (Kruskal Wallis Rank Sum Test for Equality of several means)

σ

R 12 H= Σ i − 3(n + 1) Where n = total number of observations, Ri = group sum of ranks n(n + 1) ni

2

Statistics

Page 33

PROBLEMS1. A company surveyed 100 respondents to know about the importance of computers in their life. The respondents indicated as follows. Use Kolmogorov-Smirnov test (K-S test) to test the hypothesis that there is no difference in ratings amongst the respondents Total Respondents Very Important Somewhat Important Neither Important nor Unimportant Somewhat Unimportant Very Unimportant 1. 100 25 30 10 20 15

The following data indicates the lifetime (in hours) of samples of two kinds of light bulbs in continuous use. Use MannWhitney U test to compare the life time of brands A and B light bulbs. Brand A Brand B 603 620 625 640 641 646 622 620 585 652 593 639 660 590 600 646 633 631 580 669 615 610 648 619

2.

A company used three different methods of advertising its product in three cities It found out the increased sales in identical retail outlets in three cities as follows. Use Kruskal-Wallis method (H test) to test the hypothesis that the increase in sales using different methods in different cities is the same at 5% level of significance. Chennai Mumbai Kolkata 70 65 53 58 57 59 60 48 71 45 55 70 55 75 63 62 68 60 89 45 58 72 52 75 63

Statistics

Page 34

Chi-Square Test • Chi square distribution for goodness of fit-

χ2 = ∑

where k is number of classes •

( Fo − Fe) 2 Where Fo = Observed Frequency, Fe = Expected Frequency DF (degrees of freedom) = (k-1) Fe

Chi square distribution for independence of attributes-

χ2 = ∑

( Fo − Fe) 2 Fe row total * column total Fe = grand total

Where Fo = Observed Frequency, Fe = Expected Frequency DF (degrees of freedom) = (r-1)(c-1) where r is number of rows and c is number of columns

Statistics

Page 35

PROBLEMSTest for goodness of fit1. The following table gives the average number of calls received by an operator on various days of the week in a call centre. Find out whether the calls are uniformly distributed over the week. Days Number of calls Monday 124 Tuesday 120 Wednesday 126 Thursday 134 Friday 146

Test for independence of attributes2. The following information is obtained concerning 50 randomly selected students. Can it be inferred that availing of loans is more common among boys? Educational Loan Taken Not taken Total Boys 14 16 30 Girls 8 12 20 Total 22 28 50

Statistics

Page 36



Z test for one sample mean-

Z=

x −µ

σ

Where

σ

n

is the standard error. If ‘ σ ’ is not given, we can use‘s’

n
• Z test for difference between means-

Z=

x1 − x 2

σ 12
n1

+

σ 22
n2
=

Where

σ 12
n1
2

+

σ 22
n2

is standard error and H0 =µ1-µ2=0. If

σ

is not known, we can estimate

σ

by

the formula

σ

n1 s1 + n2 s 2 n1 + n2

2



T Test for One sample mean-

t=

x−µ x−µ . Where standard deviation is given directly, use formula t = s SD n n −1

Degrees of freedom = n-1 • T test for difference between means-

n1 s1 + n2 s2 Where s = and n1+n2-2 = degrees of freedom t= n1 + n2 − 2 1 1 s + n1 n2
x1 − x 2

2

2

Statistics

Page 37

1.

ANOVA The following table gives the retail prices of a certain commodity in some selected shops in four cities as below. Can we say the prices of the commodities differ in the four cities?

City Chennai Mumbai Delhi Kolkata

Prices 11 7 9 8 7 9 4 12 10 11 7 12 3 8 2 8

2.

The sales of 4 salesmen - A, B, C & D of the Company Sellers in three seasons are given below. Can we conclude that overall sales are dependent on seasons? Are the four salesmen equally effective?

Season/Salesman Summer Winter Monsoon

A 6 7 8

B 4 6 5

C 8 6 10

D 6 9 9

Statistics

Page 38

Statistics

Page 39

DECISION THEORY DECISION UNDER UNCERTAINTY

1.

A retailer has space for up to 4 Kgs of tomato in his store. The cost per Kg is Rs 30 and the selling price per Kg is Rs 50.Any units not sold at the end of the day are wasted. He sells in Kgs only. Construct a payoff and opportunity loss table.

2.

A newspaper vendor can stock up to 10 newspapers in his store. There is a guaranteed demand for 5 newspapers. Each newspaper costs Rs 2 per unit and is sold for Rs 4. Unsold newspapers are disposed off for Rs 1 per unit. Construct a payoff and opportunity loss table.

3.

A food product company is contemplating the introduction of a new product to replace an existing product at a higher price (S1), modifying the existing product at a moderately increased price (S2), and continuing the same product with new packaging at a nominally increased price (S3). Sales may increase (E1), not change at all (E2) or decrease (E3) with respect to these strategies. The marketing department has given profits for each of these strategies are given below-

E1 S1 S2 S3 700,000 500,000 300,000

E2 300,000 450,000 300,000

E3 150,000 0 300,000

What strategy should the company choose on the basis of - Maximin criterion, Maximax criterion, Minimax Regret criterion, Laplace criterion and Hurwitz criterion (α=0.8)?

DECISION UNDER RISK

4.

A milk producer needs to determine how many litres of milk are to be produced on a daily basis to meet demand. Milk is sold in multiples of 5 litres only and there is an assured demand for 15 litres every day. Milk costs Rs 14 per litre and is sold at Rs 20 per litre. Unsold milk is disposed off. Past records of 200 days show the following demand pattern

Milk (Litres) No. of days

15 4

20 16

25 20

30 80

35 40

40 30

45 10

Construct a conditional profit table, Identify the best course of action for maximum expected profits and Calculate EVPI

Statistics

Page 40

Statistics

Page 41

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close