PHYS111Q : Labs
Subsections


B. Linear Regressions

Is your data linear?

So you want to fit a straight line to a set of measurements... First, make sure you really want to do this. That is, see if you can convince yourself that a plot of your data, a series of $(x,y)$ pairs, is compatible with a linear model
\begin{displaymath}
y = mx + b
\end{displaymath} (83)

where $m$ is the slope, and $b$ is the $y$ intercept. Fig. 11 shows a plot of a set of data that does not appear to be linear. We could apply a linear fit to this data, but it is unclear what the results would mean.

Figure 11: An example of a set of data which does not appear to be compatible with a linear model. Instead, the slope of the graph appears to increase with $x$.
\includegraphics{lsf-fig1.eps}

A sample set of data compatible with a linear model is given in Table 1 and plotted in Fig. 12. Note that none of the data points fall along the straight line shown in the figure. Measurements compatible with a linear model generally do not all fall exactly on a single straight line, because measurements include uncertainties. Although the data do not fall on a single line, the diamonds in Fig. 12 appear to be scattered randomly about a common line.

Figure 12: The data (diamonds) and the best linear fit (dashed line).
\includegraphics{lsf-fig2.eps}


Table 1: Data which appear to be compatible with a linear model.
x values y values
0 7.28
1 7.51
2 7.16
3 7.96
4 8.64
5 9.46
6 8.39
7 9.14


The Method of Least Squares

Once you are convinced that your data are compatible with a linear model, it is reasonable to apply the method of least squares, also called a linear regression, to your data. This is a method of finding the slope and intercept, with associated uncertainties, of the line giving the ``best fit'' to your data. The method is implemented as a function in the Excel spreadsheet and as an analysis tool in Logger Pro. Detailed instructions for using them are given below.

Figure 13: The solid vertical bars are the differences $y_i - (mx_i + b)$ between the data and the linear fit. The sum of the squares of the deviations are minimized by the fitting procedure.
\includegraphics{lsf-fig3.eps}

A derivation of the method of least squares is beyond the scope of this course,1 and we won't need to use the associated equations, since the method is automated in the software tools we use. Instead of working through a derivation, we will consider the the general idea behind the process. In Fig. 13, the solid bars show the differences

\begin{displaymath}
y_i - (mx_i + b)
\end{displaymath} (84)

between the data and the fit function (Eq. 83). The method of least squares gives the line which minimizes the sum of the squares of these differences,
\begin{displaymath}
\sum_i [y_i - (mx_i + b)]^2
\end{displaymath} (85)

These are the ``least squares'' referred to in the name of the method. Try sketching a different line on Fig. 13, and add your own ``difference bars,'' and you should be able to convince yourself that they would give a larger value of the sum in Eq. 85.

The method of least squares is analogous to calculating the mean and the associated standard deviation of a set of measurements of a single quantity. The uncertainties in the measurements are assumed to be random, and the resulting slope and intercept are estimates of the most probable true values. The dashed line shown in Fig. 12 is the result of a least squares fit to the data in Table 1. The fit results are

\begin{eqnarray*}
m &=& 0.296901449\\
\sigma_m &=& 0.075575599\\
b &=& 7.15400...
...& 0.316155415\\
\sigma_y &=& 0.489785862\\
r^2 &=& 0.720062996
\end{eqnarray*}

where $m$ is the slope, $b$ is the $y$ intercept, $\sigma _y$ is an estimate of the uncertainty of individual $y$ measurements called the mean square error or the standard deviation of the y estimate and $r$ is the correlation coefficient.

Interpreting the Fit Results

Figure 14: Error bars on the individual $y$ measurements are $\sigma _y$. The shaded region reflects the uncertainty ranges $\pm \sigma _m$ and $\pm \sigma _b$ of the best fit slope and intercept.
\includegraphics{lsf-fig4.eps}

Zero Intercept

In some cases, you may have a reason (a theoretical model, e.g.) to test whether your data are consistent with a linear model with a y intercept $b$ of zero. If a linear fit to your data yields a standard deviation of the $y$ intercept $\sigma_b$ greater than $b$ itself, then you may conclude that your data are consistent with a zero y intercept.

If your theoretical model predicts a zero $y$ intercept, and you find that a linear fit yields an intercept consistent within uncertainty with zero, you may want to perform a fit in which you fix the value of $b$ at zero in order to find a best value of the slope compatible with your model. Instructions for performing linear fits with $b$ fixed at zero with Logger Pro and Excel are included below.

Fitting with Logger Pro

Fitting with Excel


Copyright © 2003-2007, Lewis A. Riley Updated Tue Nov 30 13:48:34 2004

Creative Commons License
This work is licensed under a Creative Commons License.