Table window |
Columns menu |
Least Squares Fit |
|
This topic consists of the following sections:
The least squares fit operation calculates for all discrete values in two input columns an output value in an output column, according to a certain user-selected function which is supposed to describe the relation between all values in the two input columns. Coefficients for this function are calculated in such a way that the sum of errors is as small as possible (best fit).
The least squares fit operation provides a tool to describe the best fitting relation between two columns in a table. The most common application of this concept is a linear regression which means the two table columns are known or expected to have some linear relationship.
This may be described with the following equation:
Y = a * X + b
where X and Y are known variables (records from the two columns in the table) and where a and b are unknown coefficients.
Usually there is some experimental error present or other deviation from a line describing the relation between X and Y. This means that no values for a and b are possible for a perfect fit. The best fit is found when the sum of the errors is as small as possible. This is the case when the sum of squares of the deviations of all points from the line, indicating the relationship, is as small as possible (see figures below). In other words the sum of squares of the errors on each data point is minimized.
The model used, provides a method for finding the least squares approximation to a set of data points (X, Y). The approximation must be a linear combination of a set basis vectors. The functional form of the approximation (polynomial, exponential etc.) is determined by the user.
To find the least squares fit, X and Y can be represented graphically using one of the five predefined function types: exponential, logarithmic, power law, polynomial or trigonometric.
Then the number of terms can be specified:
Y = a X + b
Y = a X 2 + b X + c
By calculating a least squares fit through the Columns menu of a table window, you will thus obtain values in an output column which if plotted in a graph are positioned exactly on the line which represents the best fitting curve through all input points.
Tip:
It might be easier to directly calculate a least squares fit in a graph window.
The results of the least squares fit will then be directly shown as a continuous line in your graph window. The resulting formula can be viewed in the Graph Options - Least Squares Fit dialog box; in this dialog box you can also modify the calculation by choosing another function, etc.
Examples:
The examples below show the relations between column colX and column colY1 and between column colX and column colY2 as a result of a least squares fit.
When using the least squares fit operation on the Columns menu of a table window, the output column shows the Y-values which are exactly on the line of the best-fit curve.
Input column requirements:
The least squares fit operation provides a tool to describe the best fitting relation between two table columns. Usually there is some experimental error present or other deviation from a line describing the relation between X and Y. This means that no values for a and b are possible for a perfect fit. The best fit is found when the sum of the errors (least squares) is as small as possible.
Dialog box options:
X - column: |
Select the name of the column representing the x-values (independent variable). For the input column requirements, see above Functionality. |
Y - column: |
Select the name of the column representing the y-values (dependent variable). |
Function: |
Select the type of function: Exponential, Logarithmic, Polynomial, Power Law, or Trigonometric. For more information, see Functionality and Algorithm. |
Nr. of terms: |
Type the number of terms to be used in the formula, describing the best fit. The text box is only available for function types polynomial (minimum 2 terms) and trigonometric (minimum 3 terms). |
Output column: |
Type the name of the output column that will contain the transformed Y-values (obtained by applying the fitting function to the independent (X) variable, skipping undefined values). A dependent column is created. |
The Column Properties dialog box appears.
Tips:
To display the input values as points:
A point graph will be directly shown.
To display the results of the least-squares fit as a line:
Initially, a point graph will be displayed.
Now, a line will be drawn through the calculated values.
The results of the least squares fit will then be directly shown as a line in your graph window.
The Least Squares Fit operation can be directly executed by typing one of the following expressions on the command line of the table window:
OUTCOL |
= |
ColumnLeastSquaresFit(Col1, Col2, Function) |
OUTCOL |
= |
ColumnLeastSquaresFit(Col1, Col2, Function, n) |
where:
OUTCOL |
is the name of your output column. |
ColumnLeastSquaresFit |
|
is the command to start the Least Squares Fit operation. |
|
Col1 |
is the name of your X-column, i.e. an independent variable with a value domain. |
Col2 |
is the name of your Y-column, i.e. a dependent variable with a value domain. |
Function |
is the function to be used in the description of the relationship between X and Y. For the function, type either: exponential | logarithmic | polynomial | power | trigonometric |
n |
When using a polynomial or trigonometric function:
When using an exponential, logarithmic or power function:
|
When the definition symbol = is used, a dependent output column is created; when the assignment symbol := is used, the dependency link is immediately broken after the column is calculated.
Given a set of m data points (X,Y) and n unknown coefficients in the fitting function, an m * n matrix (m ≥ n), is constructed. In this matrix, n is the number of basis vectors in the approximation. A basis vector is a function itself defined by the fit type, selected by the user. The accuracy of the fit (goodness of fit) is indicated by the standard deviation:
S.D. = Ö ( S (Yi - F (Xi) )2 / (m - n))
where:
F(Xi) |
is the least squares solution at the point Xi. |
(Yi - F (Xi)) |
is the residual and (m - n) is the degree of freedom to the fit. |
Exponential:
This module fits the function: Y = a e b X
where a and b are real numbers to the data points.
A linear equation is obtained by taking the natural logarithm of both sides:
ln(Y) = ln(a) + b X
Note:
The Y-values of the data points must all have the same sign (either all positive or all negative).
Logarithmic:
This module fits the function: Y = a ln(b X)
in which a and b are real numbers to the data points.
A linear equation is obtained rewriting the equation to:
Y = a ln(b) + a ln(X)
Note:
The X-values of the data points must all have the same sign (either all positive or all negative) and none of them equals 0.
Polynomial:
This module uses Chebyshev polynomials to fit a polynomial to the data points.
The Nr of terms must be one greater than the degree of the polynomial. A straight-line least squares fit needs only two terms. The elements of the solution vector S will be as follows:
Sj = aj
where:
aj |
is the coefficient of X j - 1 |
1 <= j <= Nr of terms |
|
2 <= Nr of terms <= Nr data points |
Power law:
This module fits the function: Y = a X b
in which a and b are real numbers to the data points.
A linear equation is obtained by taking the natural logarithm of both sides:
ln(Y) = ln(a) + b * ln(X)
Note:
All X-values of the data points must be positive and the Y-values of the data points must all have the same sign (either all positive or all negative) and none of them equals 0.
Trigonometric:
This module fits a finite Fourier series to the data points.
The elements of the solution vector S will be as follows:
Sj = Fj - 1
where:
F j - 1 |
is the (j - 1)th term in the Fourier series. |
1 <= j <= Nr of terms |
|
3 <= Nr of terms <= Nr data points |
The first few terms in the Fourier series are:
F0 = 1
F1 = cos(X)
F2 = sin(X)
F3 = cos(2X)
F4 = sin(2X)
F5 = cos(3X)
F6 = sin(3X)
See also: