Business Statistics

Published on June 2016 | Categories: Types, School Work | Downloads: 55 | Comments: 0 | Views: 402
of 13
Download PDF   Embed   Report

Correlation, Data, Standard deviation, Mean, Mode, Median, Business Statistics, Measures of Dispersion,

Comments

Content

Data: Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized. Or In fact, a good definition of data is "facts or figures from which conclusions can be drawn". Or

Data are collection of any number of related observations.
Example: 1. Each student's test score is one piece of data. 2. The daily weight measurements of each individual in your classroom. 3. The number of movie rentals per month for each household in your neighborhood. 4. The city's temperature (measured every hour) for a one-week period. Information: When data is processed, organized, structured or presented in a given context so as to make it useful, it is called Information. Or A good definition of information is "data that have been recorded, classified, organized, related, or interpreted within a framework so that meaning emerges". Example: The class' average score or the school's average score is the information that can be concluded from the given data. 1. The number of persons in a group in each weight category (20 to 25 kg, 26 to 30 kg, etc.). 2. The total number of households that did not rent a movie during the last month; and 3. The number of days during the week where the temperature went above 20°C. Data Classification: 1. Quantitative 2. Qualitative 3. Numerical Quantitative: In this classification, data are classified on the basis of characteristics which can be measured such as height, weight, income, expenditure, production, or sales. Examples of continuous and discrete variables in a data set are shown in Table 1.1.

Qualitative: In qualitative classification, data are classified on the basis of descriptive characteristics or on the basis of attributes like sex, literacy, region, caste, or education, which cannot be quantified. This is done in two ways: Simple classification: In this type of classification, each class is subdivided into two sub-classes and only one attribute is studied, for example male and female; blind and not blind, educated and uneducated; and so on. Manifold classification: In this type of classification, a class is subdivided into more than two subclasses which may be sub-divided further.

Numerical Data: Any quantity defined by a number is numerical data. Examples of numerical data include a baseball game's score, your weight, the number of books you own, etc. Virtually any number that correlates to an amount of something is numerical data. Summarizing Data Large amounts of data are often compressed into more easily assimilated summaries, which provide the user with a sense of the content, without overwhelming him or her with too many numbers. There a number of ways data can be presented. We will consider two here—one is to present the data in a distribution, and the other is to provide summary statistics that capture key aspects of the data. FREQUENCY DISTRIBUTION A frequency distribution is a tabular summary of data showing the number (frequency) of items in each of several non-overlapping classes. Or A frequency distribution is an organized tabulation showing exactly how many individuals are located in each category on the scale of measurement. A frequency distribution presents an organized picture of the entire set of scores, and it shows where each individual is located relative to others in the distribution. Or A representation, either in a graphical or tabular format, which displays the number of observations within a given interval. The intervals must be mutually exclusive and exhaustive. Frequency distributions are usually used within a statistical context. Or Frequency distributions are visual displays that organize and present frequency counts so that the information can be interpreted more easily. Frequency distributions can show either the actual number of observations falling in each range or the percentage of observations.

Example:
A survey was taken on Maple Avenue. In each of 20 homes, people were asked how many cars were registered to their households. The results were recorded as follows: 1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0

By looking at this frequency distribution table quickly, we can see that out of 20 households surveyed, 4 households had no cars, 6 households had 1 car, etc. Importance of Frequency Distribution: I. II. III. IV. V. VI. Frequency distribution graphs are useful because they show the entire set of scores. At a glance, you can determine the highest score, the lowest score, and where the scores are centered. The graph also shows whether the scores are clustered together or scattered over a wide range.

Frequency distribution helps us to analyze the data. Frequency distribution helps us to estimate the frequencies of the population on the basis of the ample. Frequency distribution helps us to facilitate the computation of various statistical measures

Central Tendency: The terms central tendency refers to the ‗‘ middle‘‘ value or perhaps a typical value of data and is measured using mean, mode or median. Or Numbers that describe what is average or typical of the distribution. Or A measure of central tendency is a single value that describes the way in which a group of data cluster around a central value. To put in other words, it is a way to describe the center of a data set. Importance of Central Tendency: 1. 2. 3. It lets us know what is normal or 'average' for a set of data. It also condenses the data set down to one representative value, which is useful when you are working with large amounts of data. Central tendency allows you to compare one data set to another. For example, let's say you have a sample of girls and a sample of boys and you are interested in comparing their heights. By calculating the average height for each sample, you could easily draw comparisons between the girls and boys. Central tendency is also useful when you want to compare one piece of data to the entire data set.

4.

Measures of Central Tendency I. Mode II. Median III. Mean Mode: The category or score with the largest frequency (or percentage) in the distribution. The mode can be calculated for variables with levels of measurement that are: nominal, ordinal, or interval-ratio. Example: Number of Votes for Candidates for Mayor. The mode, in this case, gives you the ―central‖ response of the voters: the most popular candidate. Candidate A – 11,769 votes Candidate B – 39,443 votes Candidate C – 78,331 votes The Mode: ―Candidate C‖

Let's find the mode for the girls' heights. The heights are 60, 72, 61, 66, 63, 66, 59, 64, 71, and 68. The number 66 appears twice. All of the other numbers only appear once. Therefore, the mode is 66 inches. Now let's look at the boys' heights. The heights are 66, 78, 79, 69, 77, 79, 73, 74, and 62. The number 79 appears more times than any of the other numbers in the list. The mode height for the boys is 79 inches. Median: The score that divides the distribution into two equal parts, so that half the cases are above it and half below it. The median is the middle score, or average of middle scores in a distribution. Median Exercise #1 (N is odd) Job Satisfaction Frequency Very High 2 High 3 Moderate 5 Low 7 Very Low 4 TOTAL 21 So, Median is ―5‘‘

Median Exercise #2 (N is even) Calculate the median for this hypothetical distribution: Satisfaction with Health Very High 5 High 7 Moderate 6 Low 7 Very Low 3 TOTAL 28 Frequency

So, Median = (6+7)/2 =6.5 There are three methods for computing the median, depending on the distribution of scores. First, if you have an odd number of scores pick the middle score. 1 4 6 7 12 14 18 Median is 7 Second, if you have an even number of scores, take the average of the middle two. 1 4 6 7 8 12 14 16 Median is (7+8)/2 = 7.5 Mean: The arithmetic average obtained by adding up all the scores and dividing by the total number of scores. Formula for the Mean

Y

Y
N

“Y bar” equals the sum of all the scores, Y, divided by the number of scores, N. Calculating the mean with grouped scores

Y

fY
N

Where: f Y = a score multiplied by its frequency Mean: Grouped Scores

Measures of Dispersion A Measure of Dispersion is designed to state the extent to which individual observations (or items) vary from their average. Or Measures of dispersion are descriptive statistics that describe how similar a set of scores are to each other. Importance of Measures of Dispersion 1. A measure of dispersion provides a summary statistic that indicates the magnitude of such dispersion and, like a measure of central tendency. 2. Comparison of two or more Series: With the help of measures of dispersion, it becomes easier to compare the variability existing between two or more series. This facilitates the information regarding the consistency present in the two or more sets of data. 3. Control of Variability: One of the important objective of measures of dispersion is to control the variability existing in the data. With control in variability, the quality of the data is enhanced. For example, in order to study the distribution of income and wealth, the measures of dispersion are used by the government of the country.

4. Inequalities in the distribution of wealth and income can be measured in dispersion. 5. Dispersion is used to compare and measure concentration of economic power and monopoly in the country. 6. Dispersion is used in output control and price control 7. Reliability of Measure of Central Tendency: Measures of dispersion helps in deciding the extent to which the measures of average are reliable in describing the nature of the data. If the dispersion is large in the data, then the average is a good representative of the data and viceversa. 8. Helpful in the Use of Further Statistical Analysis: Another important use of measures of dispersion is that it facilitates the use of other statistical measures such as regression, correlation etc. All these statistical measures are based on the measures of variation. There are three main measures of dispersion: I. The range II. Interquartile range III. Standard deviation The Range: Simply the difference between the largest and smallest values in a set of data. Or The range is defined as the difference between the largest score in the set of data and the smallest score in the set of data, XL - XS. Example: What is the range of the following data: 4 8 1 6 6 2 9 3 6 9 The largest score (XL) is 9; the smallest score (XS) is 1; the range is Range = largest observation - smallest observation =XL - XS =9-1 =8 Useful for: daily temperature fluctuations or share price movement. Interquartile Range A measure of variability that overcomes the dependency on extreme values is the interquartile range (IQR). This measure of variability is the difference between the third quartile, Q3, and the first quartile, Q1. In other words, the interquartile range is the range for the middle 50% of the data. Or Is defined as the difference between the upper and lower quartiles. Interquartile range = upper quartile - lower quartile IQR = Q3 - Q1 For the data on monthly starting salaries, the quartiles are Q3 = 3600 and Q1 = 3465. Thus, the interquartile range is 3600 - 3465= 135.

Standard Deviation: The standard deviation is defined to be the positive square root of the variance. Following the notation we adopted for a sample variance and a population variance, we use s to denote the sample standard deviation and σ to denote the population standard deviation. The standard deviation is derived from the variance in the following way. Sample standard deviation = s = √s2 Population standard deviation = σ = √σ2

Variance The variance is a measure of variability that utilizes all the data. The variance is based on the difference between the value of each observation (xi) and the mean. The difference between each xi and the mean ( ̅ for a sample, μ for a population) is called a deviation about the mean. For a sample, a deviation about the mean is written (xi - ̅ ); for a population, it is written (xi - μ). In the computation of the variance, the deviations about the mean are squared. If the data are for a population, the average of the squared deviations is called the population variance. The population variance is denoted by the Greek symbol σ2. For a population of N observations and with μ denoting the population mean, the definition of the population variance is as follows. POPULATION VARIANCE
̇

In most statistical applications, the data being analyzed are for a sample. When we compute a sample variance, we are often interested in using it to estimate the population variance σ2. Although a detailed explanation is beyond the scope of this text, it can be shown that if the sum of the squared deviations about the sample mean is divided by n - 1, and not n, the resulting sample variance provides an unbiased estimate of the population variance. For this reason, the sample variance, denoted by s2, is defined as follows. SAMPLE VARIANCE
̇

̅

So, S2 =

= 64

Standard deviation = √ (variance) Variance = (standard deviation) 2

Importance of Standard of Deviation: I. Takes into account every observation II. Measures the ‘average deviation’ of observations from mean III. Works with squares of residuals not absolute values—easier to use in further calculations
Every observation in the population is used.

Standard deviation  δ 

 x  x 
n

2

In practice, most populations are very large and it is more common to calculate the sample standard deviation.

Sample standard deviation  s 
1. Calculate the mean 2. Calculate the residual for each x 3. Square the residuals 4. Calculate the sum of the squares 5. Divide the sum in Step 4 by (n-1)

 x  x 
n 1

2

x
xx
( x  x )2

 xx
 xx n 1
 xx n 1



2
2
2



6. Take the square root of quantity in Step 5
If data is in a frequency distribution
No. Units n 1 2 3 Total
Tot al



Frequency f 85 192 123 400

Calculate standard deviation using:

 x  x s  1



2

The square of the population standard deviation is called the variance.

Variance  δ 2
Correlation
The term ―correlation‖ refers to a measure of the strength of association between two variables. If the two variables increase or decrease together, they have a positive correlation. If, increases in one variable are associated with decreases in the other, they have a negative correlation. Correlation analysis is used to measure strength of the association (linear relationship) between two variables. Or Co-variation or co-relation between two variables. These variables change together. Correlation coefficient: A statistic that quantifies a relation between two variables. Can be either positive or negative. Falls between -1.00 and 1.00. The value of the number (not the sign) indicates the strength of the relation. The population correlation coefficient ρ (rho) measures the strength of the association between the variables. Or The sample correlation coefficient r is an estimate of ρ and is used to measure the strength of the linear relationship in the sample observations.

Regression: Regression is the attempt to explain the variation in a dependent variable using the variation in independent variables. Importance of regression: 1. Trend Line Analysis Linear regression is used in the creation of trend lines, which uses past data to predict future performance or "trends." Usually, trend lines are used in business to show the movement of financial or product attributes over time. Stock prices, oil prices, or product specifications can all be analyzed using trend lines. 2. Risk Analysis for Investments The capital asset pricing model was developed using linear regression analysis, and a common measure of the volatility of a stock or investment is its beta--which is determined using linear regression. Linear regression and its use is key in assessing the risk associated with most investment vehicles. 3. Sales or Market Forecasts Multivariate (having more than two variables) linear regression is a sophisticated method for forecasting sales volumes, or market movement to create comprehensive plans for growth. This method is more accurate than trend analysis, as trend analysis only looks at how one variable changes with respect to another, where this method looks at how one variable will change when several other variables are modified.

4. Total Quality Control Quality control methods make frequent use of linear regression to analyze key product specifications and other measurable parameters of product or organizational quality (such as number of customer complaints over time, etc.). 5. Linear Regression in Human Resources Linear regression methods are also used to predict the demographics and types of future work forces for large companies. This helps the companies to prepare for the needs of the work force through development of good hiring plans and training plans for the existing employees. Regression analysis is a statistical technique that attempts to explore and model the relationship between two or more variables. For example, an analyst may want to know if there is a relationship between road accidents and the age of the driver. Regression equation yˆi = b0 + b1xi Where, yˆi = estimated value of quarterly sales ($1000s) for the ith restaurant. b0 = the y intercept of the estimated regression line. b1 = the slope of the estimated regression line. xi = size of the student population (1000s) for the ith restaurant.

Here,

The slope of the estimated regression equation (b1 = 5) is positive, implying that as student population increases, sales increase. In fact, we can conclude (based on sales measured in $1000s and student population in 1000s) that an increase in the student population of 1000 is associated with an increase of $5000 in expected sales; that is, quarterly sales are expected to increase by $5 per student.

Sampling: A sample is ―a smaller (but hopefully representative) collection of units from a population used to determine truths about that population‖. Or A sample in the very general sense is a set of units observed from the all possible units. Or The sampling design is the methodology in which the data is collected. Types of Samples Probability (Random) Samples Simple random sample Systematic random sample Stratified random sample Multistage sample Multiphase sample Cluster sample 2. Non-Probability Samples Convenience sample Purposive sample Quota Importance of sampling: It is economical and practical; faster and cheaper; it can yield more comprehensive information; it is more accurate; and because of savings it permits in time and money, the sample survey makes possible the use of much larger and much more varied populations than would be possible for the same expenditure if one were making a complete enumeration. 1.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close