Learning Objectives
After completing this module, the student will be able to
- explain the purpose of calibration
- find a calibration curve using the Excel function trendline
- write a macro in Excel
- explain the meaning of R
2
- explain sources of error when estimating the independent
variable value
- find a confidence interval for the independent variable value
Knowledge and Skills
- trendline calculation
- linear regression
- coefficient of determination
- calibration
Prerequisites
- linear equation
- average and standard deviation
- normal distribution
Pre-assessment
Before completing the module test whether you master the prerequisites. Linear Equation
1. Find the equation of a horizontal line that goes through the point (2,4).
2. Find the equation of a vertical line that goes through the point (-1,3).
3. Determine the equation of the line passing through (-2,1) and (3,-1/2).
4. Determine the equation of the line passing through (1,-2) and (-2,4).
5. Determine the equation of the line with slope 3 and vertical intercept (0,2).
6. Determine the equation of the line passing through (-1,-1) and parallel to the line passing through
(0,1) and (3,0).
7. Graph of the line given by the equation 2 1 = + y x .
8. Graph the line given by the equation 3 4 1 0 ÷ + = x y .
Average and Standard Deviation
9. Find the average and sample standard deviation of the following data set: 2,4,5,6,6,7,8
10. Write down the equation for calculating the average and the sample standard deviation of a data set
of size n:
1 2
, ,...,
n
x x x
Normal Distribution
11. Suppose X is normally distributed with mean 2 and standard deviation 1. Find (a) the 75
th
percentile,
(b) the 95
th
percentile, and (c) the 99
th
percentile.
12. Suppose X is normally distributed with mean 3 and variance 4. Find the probability that X is between
1 and 4, that is, find (1 4) s s P X .
13. Suppose X is normally distributed with mean -1 and standard deviation 4. Find an interval centered
about the mean so that with probability 0.95 X is contained in that interval.
14. Suppose that the number of seeds a plant produces is normally distributed with mean 142 and
standard deviation 31. Find the probability that a randomly sampled plant will produce more than
200 seeds.
Calibration
According to the NIST handbook
(http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd133.htm), “[t]he goal of calibration is to
quantitatively convert measurements made on one of two measurement scales to the other
measurement scale.” The relationship between two measurements is used to convert one measurement
into the other measurement. You saw one such example in your chemistry lab where you measured
absorbance to find the concentration of an unknown sample. In this case, the relationship between
absorbance and concentration was linear. You derived the relationship by measuring absorbance of
standard samples of known concentration. The resulting line is called calibration curve. The basis for the
calibration curve is Beer’s Law, which states that there is a direct linear relationship between
absorbance (A) and concentration (c): When if we graph absorbance as a function of concentration, a
straight line with positive slope provides a good fit. To illustrate this, we provide in the following table
absorption measurements of standard samples:
Concentration
[μmole L
-1
]
Absorbance
0 0
20 0.2356
40 0.4725
60 0.7127
80 0.9507
If we graph the data points and fit a straight line through the points (Figure 1), we find that the equation
of the straight line is 0.0119 0.0014 A c = ÷ .
This curve is called a standard curve and is used to infer the unknown concentration of a solution. For
instance, if we find that the absorbance A of an unknown solution is 0.6386, we find for the
concentration c
0.6386 ( 0.0014)
53.8
0.0119
c
÷ ÷
= =
The data in our example fits Beer’s Law extremely well. The data was generated using a Virtual Lab on
Spectrophotometry (http://www.chm.davidson.edu/vce/Spectrophotometry/UnknownSolution.html).
When data are obtained in actual lab experiments, measurement errors need to be taken into account.
A Model for Linear Calibration
We assume in the following that we measure a signal y that depends linearly on a quantity x. We call the
quantity x the independent variable and the quantity y the dependent variable. We assume that we
measure x without error and that the quantity y is measured with an error ε that is normally distributed
with mean 0 and standard deviation σ. The relationship between the two quantities is then
y a bx c = + +
To get a sense for the measurement uncertainty when inferring the quantity x from the measurement y,
we begin with simulating an experiment in which we have a set of n standard samples and for each
sample we measure the signal m times.
Figure 2: Screenshot of the simulation. The input parameters are listed in the yellow box; the simulated data are
listed in the gray box; the estimated values of the slope and vertical intercept are listed in the green box together
with the calculation of the unknown quantity x based on the measurement of the unknown sample y. The graph
displays the simulated data (blue symbols), the trendline (black line), and the unknown measurement (red data
point).
Linear Regression
When two quantities are linearly related, such as absorbance and concentration, a straight line provides
a good fit. In Excel, a straight line can be fitted using the Trendline option. The Trendline option is under
the Layout in the Chart Tools. When clicking on the blue triangle under Trendline and choosing More
Trendline Options, a window opens that offers additional options, such as Display Equation on chart
and Display R-squared value on chart. We already know the meaning of the Equation. We will now look
at the meaning of R-squared.
Assume a linear model y a bx c = + + where the error has mean 0 and standard deviation o . We
obtained data points ( , )
j j
x y , 1,2,..., j n = , and used the Trendline option to fit a straight line. This results
in estimates for the slope and the intercept. We denote the estimated value of the intercept by ˆ a and
the estimated value of the slope by
ˆ
b .
How does Excel estimate the slope and the intercept?
The method that Excel uses to estimate the slope and the intercept is called method of least squares.
The method says: Find ˆ a and
ˆ
b so that the expression
2
1
ˆ
ˆ ( )
n
j j
j
y a bx
=
(
÷ +
¸ ¸
¿
is as small as possible. We say that the sum of the squared deviations is minimized. Expressions for the
estimated intercept and slope can be given. It is not important to memorize the expressions.
The least square line (or linear regression line) is given by
ˆ
ˆ y a bx = +
with
1
2
1
( )( )
ˆ
( )
ˆ
ˆ
n
j j
j
n
j
j
x x y y
b
x x
a y bx
=
=
÷ ÷
=
÷
= ÷
¿
¿
To measure how good the fit is we calculate a quantity called the coefficient of determination, which is
abbreviated as R
2
. For each data point ( , )
j j
x y , we can define
ˆ
ˆ ˆ
j j
y a bx = + . We introduce the deviation of
the measured y-values from their mean,
j
y y ÷ , which we can write as
ˆ ˆ ( ) ( )
j j j j
y y y y y y ÷ = ÷ + ÷
A somewhat lengthy calculation shows that the total sum of squared deviations
2
1
( )
n
j
j
y y
=
÷
¿
can be
written as a part that is explained by the linear model (Explained) and a part that reflects the stochastic
errors (Unexplained)
2 2 2
1 1 1
Total Explained Unexplained
ˆ ˆ ( ) ( ) ( )
n n n
j j j j
j j j
y y y y y y
= = =
÷ = ÷ + ÷
¿ ¿ ¿
2
1 2
2
1
ˆ ( )
Explained
Total
( )
n
j
j
n
j
j
y y
R
y y
=
=
÷
= =
÷
¿
¿
The coefficient of determination
2
R is the proportion of variation that is explained by the model.
In-class Activity 2
Return to the spreadsheet CalibrationWorkbook. Under the tab “Simulation,” you have already worked
on the simulation of standard samples with values 10,20,40,60,80 x = and 90 and where the intercept
0 a= and the slope 1 b= . Each signal is measured 3 times. The simulated data are in the gray-colored
box. The graph has a small textbox where the equation of the trendline and the coefficient of
determination is listed. You will see that when you increase the standard deviation, the coefficient of
determination decreases. Give a verbal explanation as to why you would expect this.
confidence interval. The Cell K26 contains the value of half the length of the confidence interval, which
we denote by
x
C . We can thus report the result also as *
x
x C ± .
If you want to read more about Linear Calibration, consult the statistics and data analysis paper by
Burke, S. Regression and Calibration. LC GC Europe Online Supplement.
Homework
1. Find a linear regression line through the given points and compute the coefficient of determination
x -3.0 -2.0 -1.0 0.0 1.0 2.0
y -6.3 -5.6 -3.3 0.1 1.7 2.1
2. To determine whether the frequency of chirping crickets depends on temperature, the following
data were obtained by Pierce, 1949 (The Songs of Insects. Cambridge, Mass. Harvard University
Press):
Temperature (F) 69 70 72 75 81 82 83 84 89 93
Chirps/sec 15 15 16 16 17 17 16 18 20 29
Fit a linear trendline and find the coefficient of determination.
3. To determine the glucose in a wine sample an enzyme spectroscopy method is used. The calibration
curve is obtained from the following data
Added glucose,
[glucose] (mM)
0.000 0.050 0.100 0.200 0.300 0.400
Absorbance 0.231 0.279 0.314 0.423 0.540 0.665
(a) Find the equation of the calibration curve and the coefficient of determination.
(b) Suppose the absorbance of an unknown sample is measured as 0.356. Use the calibration curve
to estimate the glucose level.