Master of Business Administration- MBA Semester 1 MB0040 – Statistics for Management - 4 Credits (Book ID: B1129) Assignment Set - 1 (60 Marks) . Q1. Define “Statistics”. What are the functions of Statistics? Distinguish between Primary data and Secondary data. Ans. Statistics as a discipline is considered indispensable in almost all spheres of humanknowledge. There is hardly any branch of study which does not use statistics. Scientific,social and economic studies use statistics in one form or another. These disciplines make-use of observations, facts and figures, enquiries and experiments etc. using statistics andstatistical methods. Statistics studies almost all aspects in an enquiry. It mainly aims atsimplifying the complexity of information collected in an enquiry. It presents data in asimplified form as to make them intelligible. It analyses data and facilitates drawal of conclusions. Now let us briefly discuss some of the important functions of statistics. Presents facts in. simple form:Statistics presents facts and figures in a definite form. That makes the statement logical andconvincing than mere description. It condenses the whole mass of figures into a singlefigure. This makes the problem intelligible. Reduces the Complexity of data:Statistics simplifies the complexity of data. The raw data are unintelligible. We make themsimple and intelligible by using different statistical measures. Some such commonly usedmeasures are graphs, averages, dispersions, skewness, kurtosis, correlation and regressionetc. These measures help in interpretation and drawing inferences. Therefore, statisticsenables to enlarge the horizon of one's knowledge. Facilitates comparison:Comparison between different sets of observation is an important function of statistics. Comparison is necessary to draw conclusions as Professor Boddington rightly points out.” the object of statistics is to enable comparison between past and present results to ascertainthe reasons for changes, which have taken place and the effect of such changes in future. Soto determine the efficiency of any measure comparison is

necessary. Statistical devices likeaverages, ratios, coefficients etc. are used for the purpose of comparison. Testing hypothesis:Formulating and testing of hypothesis is an important function of statistics. This helps indeveloping new theories. So statistics examines the truth and helps in innovating new ideas. Formulation of Policies :Statistics helps in formulating plans and policies in different fields. Statistical analysis of data forms the beginning of policy formulations. Hence, statistics is essential for planners,economists, scientists and administrators to prepare different plans and programmes. Forecasting :The future is uncertain. Statistics helps in forecasting the trend and tendencies. Statisticaltechniques are used for predicting the future values of a variable. For example a producerforecasts his future production on the basis of the present demand conditions and his pastexperiences. Similarly, the planners can forecast the future population etc. considering thepresent population trends. Derives valid inferences :Statistical methods mainly aim at deriving inferences from an enquiry. Statistical techniques are often used by scholars’ planners and scientists to evaluate different projects. These techniques are also used to draw inferences regarding population parameters on the basis of sample information.Statistics is very helpful in the field of business, research, Education etc., some of theuses of Statistics are: Statistics helps in providing a better understanding and exact description of aphenomenon of nature. Statistics helps in proper and efficient planning of a statistical inquiry in any field of study. Statistical helps in collecting an appropriate quantitative data. Statistics helps in presenting complex data in a suitable tabular, diagrammatic andgraphic form for any easy and comprehension of the data. Statistics helps in understanding the nature and pattern of variability of aphenomenon through quantitative observations. Statistics helps in drawing valid inference, along with a measure of their reliabilityabout the population parameters from the sample dataAny statistical data can be classified under two categories depending upon the sources utilized.These categories are,

1. Primary data 2. Secondary data Primary Data: Primary data is the one, which is collected by the investigator himself for the purpose of aspecific inquiry or study. Such data is original in character and is generated by surveyconducted by individuals or research institution or any organisation. 1. The collection of data by the method of personal survey is possible only if thearea covered by the investigator is small. Collection of data by sending theenumerator is bound to be expensive. Care should be taken twice that theenumerator record correct information provided by the informants. 2. Collection of primary data by framing a schedules or distributing and collectingquestionnaires by post is less expensive and can be completed in shorter time. 3. Suppose the questions are embarrassing or of complicated nature or the questionsprobe into personnel affairs of individuals, then the schedules may not be filledwith accurate and correct information and hence this method is unsuitable. 4. The information collected for primary data is mere reliable than those collectedfrom the secondary data.Importance of Primary data cannot be neglected. A research can be conducted withoutsecondary data but a research based on only secondary data is least reliable and may have biasesbecause secondary data has already been manipulated by human beings. In statistical surveys it isnecessary to get information from primary sources and work on primary data: for example, thestatistical records of female population in a country cannot be based on newspaper, magazine and other printed sources. One such sources are old and secondly they contain limitedinformation as well as they can be misleading and biased. Secondary Data: Secondary data are those data which have been already collected and analysed bysome earlier agency for its own use; and later the same data are used by a different agency. According to W.A.Neiswanger, ‘ A primary source is a publication in which the data are published by the same authority which gathered and analysed them. A secondary source is apublication, reporting the data which have been gathered by other authorities and for which others are responsible’.

1. Secondary data is cheap to obtain. Many government publications are relatively cheapand libraries stock quantities of secondary data produced by the government, bycompanies and other organizations. 2. Large quantities of secondary data can be got through internet. 3. Much of the secondary data available has been collected for many years and therefore itcan be used to plot trends. 4. Secondary data is of value to: - The government help in making decisions and planningfuture policy. Business and industry in areas such as marketing, and sales in order toappreciate the general economic and social conditions and to provide information oncompetitors. Research organizations by providing social, economical and industrialinformation. Secondary data can be less valid but its importance is still there. Sometimes it is difficult toobtain primary data; in these cases getting information from secondary sources is easier and possible. Sometimes primary data does not exist in such situation one has to confine the researchon secondary data. Sometimes primary data is present but the respondents are not willing toreveal it in such case too secondary data can suffice: for example, if the research is on thepsychology of transsexuals first it is difficult to find out transsexuals and second they may not bewilling to give information you want for your research, so you can collect data from books orother published sources.

Q2. Draw a histogram for the following distribution: Age 0-10 10-20 20-30 30-40 40-50 No. of People 2 5 10 8 4

Ans.

Q3. Find the (i) arithmetic mean and (ii) the median value of the following set of values: 40, 32, 24, 36, 42, 18, 10. Ans. (i) Arithmetic mean = 40+32+24+36+42+18+10 7 = 28.85

(ii) Arranging in Ascending Order 10, 18, 24, 32, 36, 40, 42. Therefore 32 is the median value.

Q4. Calculate the standard deviation of the following data: Marks 78-80 No. of 3 students Ans. 80-82 15 82-84 26 84-86 23 86-88 9 88-90 4

Q5. Explain the following terms with respect to Statistics: (i) Sample, (ii) Variable, (iii) Population. Ans. i) Sample In statistics, a sample is a subset of a population. Typically, the population is verylarge, making a census or a complete enumeration of all the values in the population impracticalor impossible. The sample represents a subset of manageable size. Samples are collected andstatistics are calculated from the samples so that one can make inferences or extrapolations fromthe sample to the population. This process of collecting information from a sample is referred toas sampling.A complete sample is a set of objects from a parent population that includes ALL such objectsthat satisfy a set of well-defined selection criteria. For example, a complete sample of Australianmen taller than 2m would consist of a list of every Australian male taller than 2m. But it wouldn'tinclude German males, or tall Australian females, or people shorter than 2m. So to compile sucha complete sample requires a complete list of the parent population, including data on height,gender, and nationality for each member of that parent population. In the case of humanpopulations, such a complete list is unlikely to exist, but such complete samples are oftenavailable in other disciplines, such as complete magnitude-limited samples of astronomicalobjects.An unbiased sample is a set of objects chosen from a complete sample using a selection processthat does not depend on the properties of the objects. For example, an unbiased sample of Australian men taller than 2m might consist of a randomly sampled subset of 1% of Australianmales taller than 2m. But one chosen from the electoral register might not be unbiased since, forexample, males aged under 18 will not be on the electoral register. In an astronomical context, anunbiased sample might consist of that fraction of a complete sample for which data are available,provided the data availability is not biased by individual source properties.The best way to avoid a biased or unrepresentative sample is to select a random sample, alsoknown as a probability sample. A random sample is defined as a sample where each individualmember of the population has a known, non-zero chance of being selected as part of the sample.Several types of random samples are simple random samples, systematic samples, stratifiedrandom samples, and cluster random samples. (ii) Variable A variable is a characteristic that may assume more than one set of values to which anumerical measure can be assigned.Height, age, amount of income, province or country of birth, grades obtained at school and typeof housing are all examples of variables. Variables may be classified into various categories,some of which are outlined in this section.Categorical variables: A categorical variable

(also called qualitative variable) is one for whicheach response can be put into a specific category. These categories must be mutually exclusiveand exhaustive. Mutually exclusive means that each possible survey response should belong toonly one category, whereas, exhaustive requires that the categories should cover the entire set of possibilities. Categorical variables can be either nominal or ordinal. Nominal variables: A nominal variable is one that describes a name or category. Contrary toordinal variables, there is no 'natural ordering' of the set of possible names or categories. Ordinal variables: An ordinal variable is a categorical variable for which the possible categoriescan be placed in a specific order or in some 'natural' way. Numeric variables: A numeric variable, also known as a quantitative variable, is one that canassume a number of real values such as age or number of people in a household. However, notall variables described by numbers are considered numeric. For example, when you are asked toassign a value from 1 to 5 to express your level of satisfaction, you use numbers, but the variable(satisfaction) is really an ordinal variable. Numeric variables may be either continuous ordiscrete.Continuous variables: A variable is said to be continuous if it can assume an infinite number of real values. Examples of a continuous variable are distance, age and temperature.The measurement of a continuous variable is restricted by the methods used, or by the accuracyof the measuring instruments. For example, the height of a student is a continuous variablebecause a student may be 1.6321748755... metres tall.Discrete variables: As opposed to a continuous variable, a discrete variable can only take afinite number of real values. An example of a discrete variable would be the score given by a judge to a gymnast in competition: the range is 0 to 10 and the score is always given to onedecimal (e.g., a score of 8.5). (iii) Population A statistical population is a set of entities concerning which statistical inferences are tobe drawn, often based on a random sample taken from the population. For example, if wewere interested in generalizations about crows, then we would describe the set of crowsthat is of interest. Notice that if we choose a population like all crows, we will be limitedto observing crows that exist now or will exist in the future. Probably, geography willalso constitute a limitation in that our resources for studying crows are also limited.Population is also used to refer to a set of potential measurements or values, including notonly cases actually observed but those that are potentially observable. Suppose, forexample, we are interested in the set of all adult crows now alive in the county of

Cambridge shire, and we want to know the mean weight of these birds. For each bird inthe population of crows there is a weight, and the set of these weights is called thepopulation of weights. A subset of a population is called a subpopulation. If different subpopulations havedifferent properties, the properties and response of the overall population can often bebetter understood if it is first separated into distinct subpopulations. For instance, a particular medicine may have different effects on differentsubpopulations, and these effects may be obscured or dismissed if such specialsubpopulations are not identified and examined in isolation. Similarly, one can often estimate parameters more accurately if one separates outsubpopulations: distribution of heights among people is better modeled by consideringmen and women as separate subpopulations, for instance. Populations consisting of subpopulations can be modeled by mixture models, whichcombine the distributions within subpopulations into an overall population distribution.

Q6. An unbiased coin is tossed six times. What is the probability that the tosses will result in: (i) at least four heads, and (ii) exactly two heads. Ans.

Let ‘A’ be the event of getting head. Given that:

(ii) The probability that the tosses will result in exactly two heads is given by:

Therefore, the probability that the tosses will result in exactly two heads is 15/64.

Master of Business Administration- MBA Semester 1 MB0040 – Statistics for Management - 4 Credits (Book ID: B1129) Assignment Set - 2 (60 Marks) Q1. Find Karl Pearson’s correlation co-efficient for the data given in the below table: X Y Ans. X 18 16 12 8 4 aX = 58 Y 22 14 12 10 8 aY = 66

2 2

18 22

16 14

12 12

8 10

4 8

X 324 256 144 64 16

2

Y 484 196 144 100 64

2

XY 216 224 144 80 32 aXY = 696

aX = 804

aY = 988

Q2. Find the (i) arithmetic mean (ii) range and (iii) median of the following data: 15, 17, 22, 21, 19, 26, 20. Ans. Arithmetic mean= (15+77+22+21+19+26+20)/7=140/7=20Range number- lowest number/2= 58/2=2 = highest

Q3. What is the importance of classification of data? What are the types of classification of data? Ans. Data classification and identification is all about tagging your data so it can be found quickly and efficiently. But organisations can also gain from de-duplicating their information, which helps to cut storage and backup costs, whilst speeding up data searches. Thirdly, classification can help an organisation to meet legal and regulatory requirements for retrieving specific information within a set timeframe, and this is often the motivation behind implementing data classification technology. However, data strategies differ greatly from one organisation to the next, as each generates different types and volumes of data. The balance may vary greatly from one user to the next between office documents, e-mail correspondence, images, video files, customer and product information, financial data, and so on. It may seem a good idea to classify and tag everything in the databases, but experts warn against it. Andy Whitton, partner in Deloitte's data practice says, "Full data classification can be a very expensive activity that very few organisations do well. Certified database technologies can tag every data item however, in our experience only governments do this because of the cost implications." Instead, Whitton said, companies need to choose certain types of data to classify, such as account data, personal data, or commercially valuable data. He added that the start point for most companies is to classify data in line with their confidentiality requirements, adding more security for increasingly confidential data. "If it goes wrong, this could be the most externally damaging and internally sensitive. For example, everyone is very protective over salary data," says Whitton. As well as the type and confidentiality of the data, organisations should also consider its integrity, as low-quality data cannot be trusted. Users should also consider its availability, because high data availability requires a resilient storage and networking environment.

Tagging the data in the right way, by using an effective metadata strategy , is essential, said Greg Keller, chief evangelist at software firm Embarcadero. "In other words, the egg must truly precede the chicken." The enterprise is overwhelmed with data, including relational (structured) and non-relational (semi-structured or non-structured), much of which is redundant, stale and of radically varying quality", he explains. "A plan must be put in place, by an enterprise or data architecture team, to first source the desired data, standardising the path to it, documenting the data's structure and general content along with any known business rules and then ultimately communicating this initial set of information to relevant constituencies." Once this platform of initial "metadata" has been established and replicated successfully to other information stores, the organisation can implement a "classification taxonomy" to tag the assets of varying types, in terms of their business relevance, said Keller. "This set of tags can range from its quality encryption/security level, to its volatility," says Keller. classification, to its

Q4. The data given in the below table shows the production in three shifts and the number of defective goods that turned out in three weeks. Test at 5% level of significance whether the weeks and shifts are independent.

Shift I II III Total 1st Week 15 20 25 60 2nd Week 5 10 15 30 3rd Week 20 20 20 60 Total 40 50 60 150

Ans. ObservedValue (O) Expected Value (E) (O – E)

2 15 40 x 60 /150 = 16 1 0.0625 20 50 x 60/150 = 20 0 0.000025 60 x 60/150 = 24 1 0.04175 40 x 30/150 = 8 9 1.125010 50 x 30/150 = 10 0 0.000015 60 x 30/150 = 12 9 0.750020 40 x 60/150 = 16 16 1.000020 50 x 60 /150 = 20 0 0.000020 60 x 60/150 = 24 16 0.6667 c 2 3.6459 The steps followed to calculate c 2 are described below. 1. Null hypothesis ‘H o ’: The week and shifts are independent Alternate hypothesis ‘H A ’: The week and shifts are dependent 2. Level of Significance is 5% and D.O.F (3 – 1) (3 – 1) = 43. Test Statistics4. Test c 2cal = 3.64595. Conclusion: Since c 2cal (3.6459) < c 2tab (9.49), ‘H o ’ is accepted. Hence, the attributes ‘week’and ‘shifts’ are independent.

Q5. What is sampling? Explain briefly the types of sampling Ans. Sampling refers to the statistical process of selecting and studying the characteristics of a relatively small number of items from a relatively large population of such items,, to draw statistically valid inferences about the characteristics about the entire population. There are two broad methods of sampling used by researchers, nonrandom (or judgment) sampling and random (or probability) sampling. In judgement sampling the researcher selects items to be drawn from the population based on his or her judgement about how well these items represent the whole population.The sample is thus based on someones knowledge about the population and the characteristics of individual items within it. The chance of an item being included in the sample are influenced by the characteristic of the item as judged by an expert selecting the item. A judgement sampling system is simple and less expensive to use. Also when there is very little known about the population under study a pilot study based on judgement sample is carried out to permit design of a more rigorous sampling system for a detailed study. In random sampling, individual judgement plays no part in selection of sample. Each item in the sample stands equal chance of being included in the sample. In case of random sampling, the researcher is required to use specific statistical processes to ensure this equal probability of every item in the population. A random sampling system enables more reliable results of statistical analysis with measurable margins of errors and degree of confidence. The sampling techniques may be broadly classified into 1. Probability sampling 2. Non-probability sampling Probability Sampling: Probability sampling provides a scientific technique of drawing samples from the population. The technique of drawing samples is according to the law in which each unit has a probability of being included in the sample.

Simple random sampling

Under this technique, sample units are drawn in such a way each and every unit in the population has an equal and independent chance of being included in the

sample. If a sample unit is replaced before drawing the next unit, then it is known as simple Random Sampling with Replacement. If the sample unit is not replaced before drawing the next unit, then it is case, probability of drawing a unit is 1/N, where N is the population size. In the case probability of drawing a unit is 1/Nn.

Stratified random sampling

This sampling design is most appropriate if the population is heterogeneous with respect to characteristic under study or the population distribution is highly skewed. Table: Merits and demerits of stratified random sampling Merits 1. Sample is more representative 2. Provides more efficient estimate 3. Administratively more convenient 4. Can be applied in situation where different degrees of accuracy is desired for different segments of population

Demerits 1. Many times the stratification is not effective 2. Appropriate sample sizes are not drawn from each of the stratum

Systematic sampling

This design is recommended if we have a complete list of sampling units arranged in some systematic order such as geographical, chronological or alphabetical order. Table: Merits and demerits of systematic sampling Merits Demerits 1. Very easy to operate and easy to 1. Many case we do not get up-to-date check. list. 2. It saves time and labour. 2. It gives biased results if periodic feature exist in the data. 3. More efficient than simple random sampling if we have up-to-date frame.

Cluster sampling

The total population is divided into recognizable sub-divisions, known as clusters such that within each cluster they are homogenous. The units are selected from each cluster by suitable sampling techniques.

Multi-stage sampling

The total population is divided into several stages. The sampling process is carried out through several stages.

Figure: Multistage sampling Non-probability sampling: Depending upon the object of inquiry and other considerations a predetermined number of sampling units is selected purposely so that they represent the true characteristics of the population.

Judgment sampling

The choice of sampling items depends exclusively on the judgment of the investigator. The investigator’s experience and knowledge about the population will help to select the sample units. It is the most suitable method if the population size is less.

Q6. Suppose two houses in a thousand catch fire in a year and there are 2000 houses in a village. What is the probability that: (i) none of the houses catch fire and (ii) At least one house catch fire? Ans. Given the probability of a house catching fire is: P= 2/1000 = 0.002 and n = 2000 Therefore, m – np = 2000 - 0.002 – 4

necessary. Statistical devices likeaverages, ratios, coefficients etc. are used for the purpose of comparison. Testing hypothesis:Formulating and testing of hypothesis is an important function of statistics. This helps indeveloping new theories. So statistics examines the truth and helps in innovating new ideas. Formulation of Policies :Statistics helps in formulating plans and policies in different fields. Statistical analysis of data forms the beginning of policy formulations. Hence, statistics is essential for planners,economists, scientists and administrators to prepare different plans and programmes. Forecasting :The future is uncertain. Statistics helps in forecasting the trend and tendencies. Statisticaltechniques are used for predicting the future values of a variable. For example a producerforecasts his future production on the basis of the present demand conditions and his pastexperiences. Similarly, the planners can forecast the future population etc. considering thepresent population trends. Derives valid inferences :Statistical methods mainly aim at deriving inferences from an enquiry. Statistical techniques are often used by scholars’ planners and scientists to evaluate different projects. These techniques are also used to draw inferences regarding population parameters on the basis of sample information.Statistics is very helpful in the field of business, research, Education etc., some of theuses of Statistics are: Statistics helps in providing a better understanding and exact description of aphenomenon of nature. Statistics helps in proper and efficient planning of a statistical inquiry in any field of study. Statistical helps in collecting an appropriate quantitative data. Statistics helps in presenting complex data in a suitable tabular, diagrammatic andgraphic form for any easy and comprehension of the data. Statistics helps in understanding the nature and pattern of variability of aphenomenon through quantitative observations. Statistics helps in drawing valid inference, along with a measure of their reliabilityabout the population parameters from the sample dataAny statistical data can be classified under two categories depending upon the sources utilized.These categories are,

1. Primary data 2. Secondary data Primary Data: Primary data is the one, which is collected by the investigator himself for the purpose of aspecific inquiry or study. Such data is original in character and is generated by surveyconducted by individuals or research institution or any organisation. 1. The collection of data by the method of personal survey is possible only if thearea covered by the investigator is small. Collection of data by sending theenumerator is bound to be expensive. Care should be taken twice that theenumerator record correct information provided by the informants. 2. Collection of primary data by framing a schedules or distributing and collectingquestionnaires by post is less expensive and can be completed in shorter time. 3. Suppose the questions are embarrassing or of complicated nature or the questionsprobe into personnel affairs of individuals, then the schedules may not be filledwith accurate and correct information and hence this method is unsuitable. 4. The information collected for primary data is mere reliable than those collectedfrom the secondary data.Importance of Primary data cannot be neglected. A research can be conducted withoutsecondary data but a research based on only secondary data is least reliable and may have biasesbecause secondary data has already been manipulated by human beings. In statistical surveys it isnecessary to get information from primary sources and work on primary data: for example, thestatistical records of female population in a country cannot be based on newspaper, magazine and other printed sources. One such sources are old and secondly they contain limitedinformation as well as they can be misleading and biased. Secondary Data: Secondary data are those data which have been already collected and analysed bysome earlier agency for its own use; and later the same data are used by a different agency. According to W.A.Neiswanger, ‘ A primary source is a publication in which the data are published by the same authority which gathered and analysed them. A secondary source is apublication, reporting the data which have been gathered by other authorities and for which others are responsible’.

1. Secondary data is cheap to obtain. Many government publications are relatively cheapand libraries stock quantities of secondary data produced by the government, bycompanies and other organizations. 2. Large quantities of secondary data can be got through internet. 3. Much of the secondary data available has been collected for many years and therefore itcan be used to plot trends. 4. Secondary data is of value to: - The government help in making decisions and planningfuture policy. Business and industry in areas such as marketing, and sales in order toappreciate the general economic and social conditions and to provide information oncompetitors. Research organizations by providing social, economical and industrialinformation. Secondary data can be less valid but its importance is still there. Sometimes it is difficult toobtain primary data; in these cases getting information from secondary sources is easier and possible. Sometimes primary data does not exist in such situation one has to confine the researchon secondary data. Sometimes primary data is present but the respondents are not willing toreveal it in such case too secondary data can suffice: for example, if the research is on thepsychology of transsexuals first it is difficult to find out transsexuals and second they may not bewilling to give information you want for your research, so you can collect data from books orother published sources.

Q2. Draw a histogram for the following distribution: Age 0-10 10-20 20-30 30-40 40-50 No. of People 2 5 10 8 4

Ans.

Q3. Find the (i) arithmetic mean and (ii) the median value of the following set of values: 40, 32, 24, 36, 42, 18, 10. Ans. (i) Arithmetic mean = 40+32+24+36+42+18+10 7 = 28.85

(ii) Arranging in Ascending Order 10, 18, 24, 32, 36, 40, 42. Therefore 32 is the median value.

Q4. Calculate the standard deviation of the following data: Marks 78-80 No. of 3 students Ans. 80-82 15 82-84 26 84-86 23 86-88 9 88-90 4

Q5. Explain the following terms with respect to Statistics: (i) Sample, (ii) Variable, (iii) Population. Ans. i) Sample In statistics, a sample is a subset of a population. Typically, the population is verylarge, making a census or a complete enumeration of all the values in the population impracticalor impossible. The sample represents a subset of manageable size. Samples are collected andstatistics are calculated from the samples so that one can make inferences or extrapolations fromthe sample to the population. This process of collecting information from a sample is referred toas sampling.A complete sample is a set of objects from a parent population that includes ALL such objectsthat satisfy a set of well-defined selection criteria. For example, a complete sample of Australianmen taller than 2m would consist of a list of every Australian male taller than 2m. But it wouldn'tinclude German males, or tall Australian females, or people shorter than 2m. So to compile sucha complete sample requires a complete list of the parent population, including data on height,gender, and nationality for each member of that parent population. In the case of humanpopulations, such a complete list is unlikely to exist, but such complete samples are oftenavailable in other disciplines, such as complete magnitude-limited samples of astronomicalobjects.An unbiased sample is a set of objects chosen from a complete sample using a selection processthat does not depend on the properties of the objects. For example, an unbiased sample of Australian men taller than 2m might consist of a randomly sampled subset of 1% of Australianmales taller than 2m. But one chosen from the electoral register might not be unbiased since, forexample, males aged under 18 will not be on the electoral register. In an astronomical context, anunbiased sample might consist of that fraction of a complete sample for which data are available,provided the data availability is not biased by individual source properties.The best way to avoid a biased or unrepresentative sample is to select a random sample, alsoknown as a probability sample. A random sample is defined as a sample where each individualmember of the population has a known, non-zero chance of being selected as part of the sample.Several types of random samples are simple random samples, systematic samples, stratifiedrandom samples, and cluster random samples. (ii) Variable A variable is a characteristic that may assume more than one set of values to which anumerical measure can be assigned.Height, age, amount of income, province or country of birth, grades obtained at school and typeof housing are all examples of variables. Variables may be classified into various categories,some of which are outlined in this section.Categorical variables: A categorical variable

(also called qualitative variable) is one for whicheach response can be put into a specific category. These categories must be mutually exclusiveand exhaustive. Mutually exclusive means that each possible survey response should belong toonly one category, whereas, exhaustive requires that the categories should cover the entire set of possibilities. Categorical variables can be either nominal or ordinal. Nominal variables: A nominal variable is one that describes a name or category. Contrary toordinal variables, there is no 'natural ordering' of the set of possible names or categories. Ordinal variables: An ordinal variable is a categorical variable for which the possible categoriescan be placed in a specific order or in some 'natural' way. Numeric variables: A numeric variable, also known as a quantitative variable, is one that canassume a number of real values such as age or number of people in a household. However, notall variables described by numbers are considered numeric. For example, when you are asked toassign a value from 1 to 5 to express your level of satisfaction, you use numbers, but the variable(satisfaction) is really an ordinal variable. Numeric variables may be either continuous ordiscrete.Continuous variables: A variable is said to be continuous if it can assume an infinite number of real values. Examples of a continuous variable are distance, age and temperature.The measurement of a continuous variable is restricted by the methods used, or by the accuracyof the measuring instruments. For example, the height of a student is a continuous variablebecause a student may be 1.6321748755... metres tall.Discrete variables: As opposed to a continuous variable, a discrete variable can only take afinite number of real values. An example of a discrete variable would be the score given by a judge to a gymnast in competition: the range is 0 to 10 and the score is always given to onedecimal (e.g., a score of 8.5). (iii) Population A statistical population is a set of entities concerning which statistical inferences are tobe drawn, often based on a random sample taken from the population. For example, if wewere interested in generalizations about crows, then we would describe the set of crowsthat is of interest. Notice that if we choose a population like all crows, we will be limitedto observing crows that exist now or will exist in the future. Probably, geography willalso constitute a limitation in that our resources for studying crows are also limited.Population is also used to refer to a set of potential measurements or values, including notonly cases actually observed but those that are potentially observable. Suppose, forexample, we are interested in the set of all adult crows now alive in the county of

Cambridge shire, and we want to know the mean weight of these birds. For each bird inthe population of crows there is a weight, and the set of these weights is called thepopulation of weights. A subset of a population is called a subpopulation. If different subpopulations havedifferent properties, the properties and response of the overall population can often bebetter understood if it is first separated into distinct subpopulations. For instance, a particular medicine may have different effects on differentsubpopulations, and these effects may be obscured or dismissed if such specialsubpopulations are not identified and examined in isolation. Similarly, one can often estimate parameters more accurately if one separates outsubpopulations: distribution of heights among people is better modeled by consideringmen and women as separate subpopulations, for instance. Populations consisting of subpopulations can be modeled by mixture models, whichcombine the distributions within subpopulations into an overall population distribution.

Q6. An unbiased coin is tossed six times. What is the probability that the tosses will result in: (i) at least four heads, and (ii) exactly two heads. Ans.

Let ‘A’ be the event of getting head. Given that:

(ii) The probability that the tosses will result in exactly two heads is given by:

Therefore, the probability that the tosses will result in exactly two heads is 15/64.

Master of Business Administration- MBA Semester 1 MB0040 – Statistics for Management - 4 Credits (Book ID: B1129) Assignment Set - 2 (60 Marks) Q1. Find Karl Pearson’s correlation co-efficient for the data given in the below table: X Y Ans. X 18 16 12 8 4 aX = 58 Y 22 14 12 10 8 aY = 66

2 2

18 22

16 14

12 12

8 10

4 8

X 324 256 144 64 16

2

Y 484 196 144 100 64

2

XY 216 224 144 80 32 aXY = 696

aX = 804

aY = 988

Q2. Find the (i) arithmetic mean (ii) range and (iii) median of the following data: 15, 17, 22, 21, 19, 26, 20. Ans. Arithmetic mean= (15+77+22+21+19+26+20)/7=140/7=20Range number- lowest number/2= 58/2=2 = highest

Q3. What is the importance of classification of data? What are the types of classification of data? Ans. Data classification and identification is all about tagging your data so it can be found quickly and efficiently. But organisations can also gain from de-duplicating their information, which helps to cut storage and backup costs, whilst speeding up data searches. Thirdly, classification can help an organisation to meet legal and regulatory requirements for retrieving specific information within a set timeframe, and this is often the motivation behind implementing data classification technology. However, data strategies differ greatly from one organisation to the next, as each generates different types and volumes of data. The balance may vary greatly from one user to the next between office documents, e-mail correspondence, images, video files, customer and product information, financial data, and so on. It may seem a good idea to classify and tag everything in the databases, but experts warn against it. Andy Whitton, partner in Deloitte's data practice says, "Full data classification can be a very expensive activity that very few organisations do well. Certified database technologies can tag every data item however, in our experience only governments do this because of the cost implications." Instead, Whitton said, companies need to choose certain types of data to classify, such as account data, personal data, or commercially valuable data. He added that the start point for most companies is to classify data in line with their confidentiality requirements, adding more security for increasingly confidential data. "If it goes wrong, this could be the most externally damaging and internally sensitive. For example, everyone is very protective over salary data," says Whitton. As well as the type and confidentiality of the data, organisations should also consider its integrity, as low-quality data cannot be trusted. Users should also consider its availability, because high data availability requires a resilient storage and networking environment.

Tagging the data in the right way, by using an effective metadata strategy , is essential, said Greg Keller, chief evangelist at software firm Embarcadero. "In other words, the egg must truly precede the chicken." The enterprise is overwhelmed with data, including relational (structured) and non-relational (semi-structured or non-structured), much of which is redundant, stale and of radically varying quality", he explains. "A plan must be put in place, by an enterprise or data architecture team, to first source the desired data, standardising the path to it, documenting the data's structure and general content along with any known business rules and then ultimately communicating this initial set of information to relevant constituencies." Once this platform of initial "metadata" has been established and replicated successfully to other information stores, the organisation can implement a "classification taxonomy" to tag the assets of varying types, in terms of their business relevance, said Keller. "This set of tags can range from its quality encryption/security level, to its volatility," says Keller. classification, to its

Q4. The data given in the below table shows the production in three shifts and the number of defective goods that turned out in three weeks. Test at 5% level of significance whether the weeks and shifts are independent.

Shift I II III Total 1st Week 15 20 25 60 2nd Week 5 10 15 30 3rd Week 20 20 20 60 Total 40 50 60 150

Ans. ObservedValue (O) Expected Value (E) (O – E)

2 15 40 x 60 /150 = 16 1 0.0625 20 50 x 60/150 = 20 0 0.000025 60 x 60/150 = 24 1 0.04175 40 x 30/150 = 8 9 1.125010 50 x 30/150 = 10 0 0.000015 60 x 30/150 = 12 9 0.750020 40 x 60/150 = 16 16 1.000020 50 x 60 /150 = 20 0 0.000020 60 x 60/150 = 24 16 0.6667 c 2 3.6459 The steps followed to calculate c 2 are described below. 1. Null hypothesis ‘H o ’: The week and shifts are independent Alternate hypothesis ‘H A ’: The week and shifts are dependent 2. Level of Significance is 5% and D.O.F (3 – 1) (3 – 1) = 43. Test Statistics4. Test c 2cal = 3.64595. Conclusion: Since c 2cal (3.6459) < c 2tab (9.49), ‘H o ’ is accepted. Hence, the attributes ‘week’and ‘shifts’ are independent.

Q5. What is sampling? Explain briefly the types of sampling Ans. Sampling refers to the statistical process of selecting and studying the characteristics of a relatively small number of items from a relatively large population of such items,, to draw statistically valid inferences about the characteristics about the entire population. There are two broad methods of sampling used by researchers, nonrandom (or judgment) sampling and random (or probability) sampling. In judgement sampling the researcher selects items to be drawn from the population based on his or her judgement about how well these items represent the whole population.The sample is thus based on someones knowledge about the population and the characteristics of individual items within it. The chance of an item being included in the sample are influenced by the characteristic of the item as judged by an expert selecting the item. A judgement sampling system is simple and less expensive to use. Also when there is very little known about the population under study a pilot study based on judgement sample is carried out to permit design of a more rigorous sampling system for a detailed study. In random sampling, individual judgement plays no part in selection of sample. Each item in the sample stands equal chance of being included in the sample. In case of random sampling, the researcher is required to use specific statistical processes to ensure this equal probability of every item in the population. A random sampling system enables more reliable results of statistical analysis with measurable margins of errors and degree of confidence. The sampling techniques may be broadly classified into 1. Probability sampling 2. Non-probability sampling Probability Sampling: Probability sampling provides a scientific technique of drawing samples from the population. The technique of drawing samples is according to the law in which each unit has a probability of being included in the sample.

Simple random sampling

Under this technique, sample units are drawn in such a way each and every unit in the population has an equal and independent chance of being included in the

sample. If a sample unit is replaced before drawing the next unit, then it is known as simple Random Sampling with Replacement. If the sample unit is not replaced before drawing the next unit, then it is case, probability of drawing a unit is 1/N, where N is the population size. In the case probability of drawing a unit is 1/Nn.

Stratified random sampling

This sampling design is most appropriate if the population is heterogeneous with respect to characteristic under study or the population distribution is highly skewed. Table: Merits and demerits of stratified random sampling Merits 1. Sample is more representative 2. Provides more efficient estimate 3. Administratively more convenient 4. Can be applied in situation where different degrees of accuracy is desired for different segments of population

Demerits 1. Many times the stratification is not effective 2. Appropriate sample sizes are not drawn from each of the stratum

Systematic sampling

This design is recommended if we have a complete list of sampling units arranged in some systematic order such as geographical, chronological or alphabetical order. Table: Merits and demerits of systematic sampling Merits Demerits 1. Very easy to operate and easy to 1. Many case we do not get up-to-date check. list. 2. It saves time and labour. 2. It gives biased results if periodic feature exist in the data. 3. More efficient than simple random sampling if we have up-to-date frame.

Cluster sampling

The total population is divided into recognizable sub-divisions, known as clusters such that within each cluster they are homogenous. The units are selected from each cluster by suitable sampling techniques.

Multi-stage sampling

The total population is divided into several stages. The sampling process is carried out through several stages.

Figure: Multistage sampling Non-probability sampling: Depending upon the object of inquiry and other considerations a predetermined number of sampling units is selected purposely so that they represent the true characteristics of the population.

Judgment sampling

The choice of sampling items depends exclusively on the judgment of the investigator. The investigator’s experience and knowledge about the population will help to select the sample units. It is the most suitable method if the population size is less.

Q6. Suppose two houses in a thousand catch fire in a year and there are 2000 houses in a village. What is the probability that: (i) none of the houses catch fire and (ii) At least one house catch fire? Ans. Given the probability of a house catching fire is: P= 2/1000 = 0.002 and n = 2000 Therefore, m – np = 2000 - 0.002 – 4