Meaning of Statistics The term statistics mean that the numerical statement as well as statisticalmethodology. W hen it is used in the sense of statistical data it refers to quantitative aspects of things and is a numerical description. Example : Income of family, production of automobile industry, sales of cars etc.T h e r e q u a n t i t i e s a r e n u m e r i c a l . B u t t h e r e a r e s o m e q u a n t i t i e s w h i c h a r e n o t i n themselves numerical but can be made so by counting. The sex of a baby is not a number, but by counting the number of boys, we can associate a numerical descriptionto sex of all new born babies, for an example, when saying that 60% of all live-born babies are boy. This information then, comes within the realm of statistics.
Definition The word statistics can be used is two senses, viz, singular and plural. Innarrow sense and plural sense, statistics denotes some numerical data (statistical data).In a wide and singular sense statistics refers to the statistical methods. Therefore, these have been grouped under two heads – „Statistics as a data” and “Statistics as amethods”.
Statistics as a Data Some definitions of statistics as a data area)Statistics are numerical statement of facts in any department of enqu iring placed in relation to each other.Powley b ) B y s t a t i s t i c s w e m e a n q u a n t i t i e s d a t a a f f e c t e d t o a m a r k e d e x t e n t b y multiplasticity of course.Yule and Kendallc ) B y s t a t i s t i c s w e m e a n a g g r e g a t e s o f f a c t s a f f e c t e d t o a m a r k e d e x t e n t b y multiplicity of causes, numerically expressed, enumerated or estimated accordingto reasonable standard of accuracy, collected in a systematic manner for pre -determinated purpose and placed in relation to each other.- H. SecristThis definition is more comprehensive and exhaustive. It shows more light oncharacteristics of statistics and covers different aspects.Some characteristics the statistics should possess by H. Secrist can be listed asfollows.1
Statistics are aggregate of facts Statistics are affected to a marked extent by multiplicity of causes. Statistics are numerically expressed Statistics should be enumerated / estimated Statistics should be collected with reasonable standard of accuracy Statistics should be placed is relation to each other.
Statistics as a methods Definition a)“Statistics may be called to science of counting”- A.L. Bowley b)“Statistics is the science of estimates and probabilities”. - Boddington c)Dr. Croxton and Cowden have given a clear and concise definition. “ S t a t i s t i c s m a y b e d e f i n e d a s t h e c o l l e c t i o n , p r e s e n t a t i o n , a n a l y s i s a n d interpretation of numerical data”.
According to Croxton and Cowden there are 4 stages. a)Collection of Data A structure of statistical investigation is based on a systematic collection of data. The data is classified into two groups) Internal data and External dataI n t e r n a l d a t a a r e o b t a i n e d f r o m i n t e r n a l r e c o r d s r e l a t e d t o operations of business organisation such as production, sourc e o f i n c o m e a n d e x p e n d i t u r e , inventory, purchases and accounts. The external data are collected and purchased by external agencies. The external data could be either primary data or secondary data. The primary data are collected for first time and original, while secondary data are collected by published by some agencies. b)Organizations of data The collected data is a large mass of figures that needs to be organized. The collected data must be edited to rectify for any omissions, irrelevant answers, and wrong computations. The edited data must be classified and tabulated to suit further analysis.
c)Presentation of data The large data that are collected cannot be understand and analysis easily and quickly. Therefore, collected data needs to be presented in tabular or graphic form. This systematic order and graphical presentation helps for further analysis. D)Analysis of data The analysis requires establishing the relationship between on e o r m o r e variables. Analysis of data includes condensation, abstracting, summarization, conclusion etc. W ith the help of statistical tools and techniques like measures of dispersion central tendency, correlation, variance analysis etc analysis can be done. E) Interpretation of data The interpretation requires deep insight of the subject. Interpretation involves drawing the valid conclusions on the bases of the analysis of data. This work requires good experience and skill. This process is very important as conclusions of results are done based on interpretation. We can define statistics as per Seligman as follows. “Statistics is a science which deals with the method and of collecting, classifying, presenting, comparing and interpretating the n u m e r i c a l d a t a collected to throw light on enquiry”. Importance of statistics Bowleys Skewness Coefficient An alternative measure of skewness has been proposed by the late professor Bowley. Bowley‟s quartiles are based on quartiles. In a symmetrical distribution first and third quartiles are equidistant from the median as can be seen from the following diagram. In an asymmetrical distribution the third quartile is the same distance over the median as the first quartile is below it i.e. Q3 –Med.
=
Med.
–
Q1 or
Q3 =
Q1 –
2
Med.
=
0
If this distribution is positively skewed the top 25 per cent of the values will tend to be farther from median than the bottom 25 per cent. i.e. Q3 will be farther from median than Q1 is form median and the reverse for negative skewness. Hence a possible measure is Skb = (Q3 – Med.) – (Med. – Q1)/ (Q3 – Med.) + (Med. – Q1) Q1 Med. Skb =
Bowley’s
coefficient
of
or Q3 + Q1 – 2/ Q3 – skewness.
It must be remembered that the results obtained by these two measures are not to be compared with one another especially. The numerical values are not related to one
another since the burley‟s measure, because of its computational basis, is limited to values between -1 and +1, while person‟s measure has no such limits. Not only do the numerical values obtained from these two formulae bear no necessary relationship to one another but, on rare occasions, with unusually shaped distributions, it is possible for them to emerge with opposite sings. Illustration: find Bowley‟s Coefficient of Skewness for the following frequency distribution: No. of children per family No. of families
0 7
1 10
2 16
3 25
4 18
5 11
6 8
Solution: calculation of Bowley‟s Coefficient of Skewness Number of children per family X 0 1 2 3 4 5 6
No. of families 7 10 16 25 18 11 8
c.f 7 17 33 58 76 87 95
SkB =Q3 +Q1 –2Med./Q3 –Q1 Q1 =Size
of N
Q3 =Size
=
Size
+
+
of N
2
=
4, +
is –
1 =
is
of 48th item 4
= 95
3(N+1)th item
of 72th item
Size SkB =
¼th item
of
Size Med.
+
1/2th item
(3)/4
hence Q1
3×96/4
=
hence Q3
3. 2
24th item,
= 98/2 Hence
–
2
= =
0/2
2
72th item 4 48th item.
median =
=
=3 =
0.
Karl Pearsons Skewness This method of measuring skewness, also known as Pearsonian coefficient of skewness, was suggested by Karl Pearson, a great British biometrician and statistician. It is based upon the difference between mean and mode, this difference is divided by standard deviation to give a relative measure the formula thus becomes:
Skp = Skp =
–
median Karl
mode/standard
Pearson‟s
coefficient
deviation
of
skewness.
There is no limit to this measure in theory and this is a slight drawback. But in practice the value given by this formula is rarely very high and usually lies between ± 1. When a distribution is symmetrical, the values of mean, median and mode coincide and, therefore the coefficient of skewness will be zero. When a distribution is positively skewed, the coefficient of skewness shall have plus sign and when it is negatively skid, the coefficient of skewness shall have minus sign. The degree of skewness shall be obtained by the numeral value. Say, 0.8 or 0.2 etc. thus this formula given both the direction as well as the extent of skinniness. The above method of measuring skewness cannot be used where mode is ill defined; however, in moderately skewed distribution the averages have the following relationship: Mode
=
3
median
–
2
mean
And therefore, if this value of mode is substituted in the above formula we arrive at another formula for finding out skewness. Skp = [X – (3 med. - 2X)]/ σ = X – 3 med.
/ σ = 2 X = 3 (X- med.)/ σ
Theoretically the value of this coefficient varies between ± 3; however, in practice it is rare that the coefficient of skewness obtained by the above method exceeds ± 1. Illustration: calculate Karl Pearson‟s coefficient of skewness from the following data; Profits ($ 0.1 million) 70-80 80-90 90-100 100-110