Chapter 20
Curved Patterns
Question No. 38: Cellular Phones in Africa
(a) The scatterplot of two types of subscribers suggests a possible linear trend in the number of
landlines. The plot of Landline subscribers seems more Linear than that of Mobile
subscribers.
Mobile subsribes (000)
9000
300000
8000
250000
7000
200000
6000
150000
5000
100000
4000
50000
3000
0
1990
-50000
-100000
1995
2000
2005
2010
Year
Mobile Subscribers (Sub-Sahara, 000)
2000
2015
1000
Landline subscribers (000)
Mobile & Landline Subscribers Vs Year
350000
0
Land Line Subscribers (Sub-Sahara, 000)
Linear (Mobile Subscribers (Sub-Sahara, 000))
Linear (Land Line Subscribers (Sub-Sahara, 000))
(b)
Land Line Subscribers Vs Year
Land Line Subscribers(000)
9000
y = 374.51x - 744620
R² = 0.987
8000
7000
6000
5000
Land Line Subscribers (SubSahara, 000)
4000
3000
Linear (Land Line Subscribers
(Sub-Sahara, 000))
2000
1000
0
1990
1995
2000
2005
Year
2010
2015
The linear trend of number of land line subscribers has high regression fitted value, r2=0.987,
but it doesn’t seem to have a trend in the extremes or in the middle of the data.
(c) The regression equation is number of landline subscribers (in 1000s) = 374.51(year)-744620
The slope implies that there is an annual growth in the average number of landline
subscribers by 374510 and the negative intercept represents a large unrealistic extrapolation
for the 0th year.
(d)
Residual Plot
400
300
200
Residuals
100
0
-1001994
1996
1998
2000
2002
2004
2006
2008
2010
-200
-300
-400
-500
Year
There is no pattern that can be interpreted from the residual plot. The residuals represent a
poor fit, deviating from the linearity. The linear equation under-predicts in the edges of the
plot and over-predicts in the the middle of the plot.
(e)
log e Landlne subsribes (000)
Log Subscribers Vs Year
9.2
y = 0.0751x - 141.8
R² = 0.9723
9
8.8
8.6
8.4
8.2
8
7.8
1990
1995
2000
2005
Year
2010
2015
2012
Log trend line shows the bending pattern in the original plot . The residuals from this curve seems to
be random. So, the curve of ‘Estimated loge (Number of Subscribers) = b0 + b1 Year’ is not a better
summary of the growth of the use of landlines compared to that of ‘Number of Subscribers = b0 + b1
Year’ model.
(f)
Log Subscribers Vs Year
16
y = 0.5819x - 1156
R² = 0.9788
log e mobile subsribes (000)
14
12
10
8
6
4
2
0
1994
1996
1998
2000
2002
2004
2006
2008
2010
Year
The regression equation for the log is
log e (number of mobile subscribers) = 0.5819 (years) – 1156
16
14
y = 0.5819x - 1156
R² = 0.9788
12
10
8
y = 0.0751x - 141.8
R² = 0.9723
6
4
2
0
1994
1996
1998
2000
2002
2004
2006
2008
Log land Line
Log mobile
Linear (Log land Line)
Linear (Log mobile)
log inv (5819) = 1.789, log inv (0.0751) = 1.078
2010
2012
2012
This implies a high annual rate of growth as the growth rate in the number of mobile
subscribers is 1.789x1000, whereas the growth rate of the number of landline subscribers is
1.078x1000
Question Number 40: CO2
CO2 (million tons) Vs GDP (billion dollars)
8000
7000
y = 0.5094x + 55.537
R² = 0.5553
CO2 (million tons)
6000
5000
4000
3000
2000
1000
0
$0.00
$2,000.00
$4,000.00
$6,000.00
$8,000.00
GDP (billion dollars)
$10,000.00
$12,000.00
(a) The three prominent Outliers are People’s Republic of China, US and Japan
(b) The plot after removing the outliers
CO2 (million of tons)
CO2 Vs GDP
1800
1600
1400
1200
1000
800
600
400
200
0
$0.00
y = 0.4587x + 39.846
R² = 0.4041
$500.00
$1,000.00
$1,500.00
$2,000.00
GDP (billion dollars)
CO2 (million tons)
Linear (CO2 (million tons))
$2,500.00
The pattern in the plot says that the countries with low GDP have lower levels of CO2 emission. The
pattern in the plot is an exponential pattern
The equation to summarize the variation in the form of regression line :
CO2 (in millions of tons)=0.4587*GDP(in billion dollars)+39.846
(c)
Log CO2 Vs Log GDP
10
y = 0.879x + 0.2104
R² = 0.8043
8
Log CO2
6
4
2
0
-2
0
2
4
-2
6
8
10
Log GDP
The linear pattern is apparent in the scatterplot.
(d) The fitted equation for the plot is : Log CO2 = (Log GDP)*0.879+0.2104
Residual Plot
2.5
2
1.5
Residuals
1
-2
0.5
0
-0.5
0
2
4
-1
-1.5
-2
Log GDP
6
8
10
.
(e) The fitted equation implies that the fit seems to be appropriate as no pattern is found. The
variation over log GDP is also seems to be the equal
Fitted equation: Log CO2 = (Log GDP)*0.879+0.2104
(f)
Log 10 CO2 Vs Log 10 GDP
4.5
4
y = 0.879x + 0.0914
R² = 0.8043
3.5
Log10 CO2
3
2.5
2
1.5
1
0.5
0
-0.5 -0.5 0
-1
0.5
1
1.5
2
2.5
3
Log 10 GDP
Yes, there is change in the y-intercept and in the fitted regression line.