Table window

Least Squares Fit

 

This topic consists of the following sections:

1. Functionality

2. Dialog box

3. Command line

4. Algorithm

5. Description of functions

Functionality

The least squares fit operation calculates for all discrete values in two input columns an output value in an output column, according to a certain user-selected function which is supposed to describe the relation between all values in the two input columns. Coefficients for this function are calculated in such a way that the sum of errors is as small as possible (best fit).

The least squares fit operation provides a tool to describe the best fitting relation between two columns in a table. The most common application of this concept is a linear regression which means the two table columns are known or expected to have some linear relationship.

This may be described with the following equation:

Y = a * X + b

where X and Y are known variables (records from the two columns in the table) and where a and b are unknown coefficients.

Usually there is some experimental error present or other deviation from a line describing the relation between X and Y. This means that no values for a and b are possible for a perfect fit. The best fit is found when the sum of the errors is as small as possible. This is the case when the sum of squares of the deviations of all points from the line, indicating the relationship, is as small as possible (see figures below). In other words the sum of squares of the errors on each data point is minimized.

The model used, provides a method for finding the least squares approximation to a set of data points (X, Y). The approximation must be a linear combination of a set basis vectors. The functional form of the approximation (polynomial, exponential etc.) is determined by the user.

To find the least squares fit, X and Y can be represented graphically using one of the five predefined function types: exponential, logarithmic, power law, polynomial or trigonometric.

Then the number of terms can be specified:

By calculating a least squares fit through the Columns menu of a table window, you will thus obtain values in an output column which if plotted in a graph are positioned exactly on the line which represents the best fitting curve through all input points.

Tip:

It might be easier to directly calculate a least squares fit in a graph window.

The results of the least squares fit will then be directly shown as a continuous line in your graph window. The resulting formula can be viewed in the Graph Options - Least Squares Fit dialog box; in this dialog box you can also modify the calculation by choosing another function, etc.

Examples:

The examples below show the relations between column colX and column colY1 and between column colX and column colY2 as a result of a least squares fit.

  

Fig. 1: The best fit between colX and colY1 using a polynomial function with 2 terms (a straight line) defined by the line:
Y = 49.101 + 1.276 * X

 

Fig. 2: The best fit between colX and colY2 using an exponential function (always 2 terms) defined by the line:
Y = 42.341302790409 * exp(0.007566547732 * X)

When using the least squares fit operation on the Columns menu of a table window, the output column shows the Y-values which are exactly on the line of the best-fit curve.

Input column requirements:

Dialog box

The least squares fit operation provides a tool to describe the best fitting relation between two table columns. Usually there is some experimental error present or other deviation from a line describing the relation between X and Y. This means that no values for a and b are possible for a perfect fit. The best fit is found when the sum of the errors (least squares) is as small as possible.

Dialog box options:

X - column:

Select the name of the column representing the x-values (independent variable). For the input column requirements, see above Functionality.

Y - column:

Select the name of the column representing the y-values (dependent variable).

Function:

Select the type of function: Exponential, Logarithmic, Polynomial, Power Law, or Trigonometric. For more information, see Functionality and Algorithm.

Nr. of terms:

Type the number of terms to be used in the formula, describing the best fit. The text box is only available for function types polynomial (minimum 2 terms) and trigonometric (minimum 3 terms).

Output column:

Type the name of the output column that will contain the transformed Y-values (obtained by applying the fitting function to the independent (X) variable, skipping undefined values). A dependent column is created.

The Column Properties dialog box appears.

Tips:

  1. To view the resulting least squares fit formula: double-click the output column name in the table. The Column Properties dialog box appears; then, click the Additional Info button in that dialog box.
  2.  

  3. To view the results in a graph, choose the Create Graph command from the File menu in the table window.
  4. To display the input values as points:

    A point graph will be directly shown.

     

    To display the results of the least-squares fit as a line:

    Initially, a point graph will be displayed.

    Now, a line will be drawn through the calculated values.

     

  5. It is probably easier to directly perform a least squares fit in a graph window.
  6. The results of the least squares fit will then be directly shown as a line in your graph window.

Command line

The Least Squares Fit operation can be directly executed by typing one of the following expressions on the command line of the table window:

  

OUTCOL

=

ColumnLeastSquaresFit(Col1, Col2, Function)

OUTCOL

=

ColumnLeastSquaresFit(Col1, Col2, Function, n)

where:

OUTCOL

is the name of your output column.

ColumnLeastSquaresFit

is the command to start the Least Squares Fit operation.

Col1

is the name of your X-column, i.e. an independent variable with a value domain.

Col2

is the name of your Y-column, i.e. a dependent variable with a value domain.

Function

is the function to be used in the description of the relationship between X and Y. For the function, type either:

exponential | logarithmic | polynomial | power | trigonometric

n

When using a polynomial or trigonometric function:

  • Optional parameter to specify the number of terms to be used in the function.
  • n may not be greater than the number of valid data points, excluding points with undefined X or Y values.
  • If the parameter is omitted, 2 terms will be used for a polynomial function, and 3 terms will be used for a trigonometric function.

When using an exponential, logarithmic or power function:

  • Parameter is not required and can be omitted: always 2 terms will be used.

When the definition symbol = is used, a dependent output column is created; when the assignment symbol := is used, the dependency link is immediately broken after the column is calculated.

Algorithm

Given a set of m data points (X,Y) and n unknown coefficients in the fitting function, an m * n matrix (mn), is constructed. In this matrix, n is the number of basis vectors in the approximation. A basis vector is a function itself defined by the fit type, selected by the user. The accuracy of the fit (goodness of fit) is indicated by the standard deviation:

  

S.D. = Ö ( S (Yi - F (Xi) )2 / (m - n))

where:

F(Xi)

is the least squares solution at the point Xi.

(Yi - F (Xi))

is the residual and (m - n) is the degree of freedom to the fit.

Description of functions

Exponential:

This module fits the function: Y = a e b X

where a and b are real numbers to the data points.

A linear equation is obtained by taking the natural logarithm of both sides:

ln(Y) = ln(a) + b X

Note:

The Y-values of the data points must all have the same sign (either all positive or all negative).

Logarithmic:

This module fits the function: Y = a ln(b X)

in which a and b are real numbers to the data points.

A linear equation is obtained rewriting the equation to:

Y = a ln(b) + a ln(X)

Note:

The X-values of the data points must all have the same sign (either all positive or all negative) and none of them equals 0.

Polynomial:

This module uses Chebyshev polynomials to fit a polynomial to the data points.

The Nr of terms must be one greater than the degree of the polynomial. A straight-line least squares fit needs only two terms. The elements of the solution vector S will be as follows:

Sj = aj

where:

aj

is the coefficient of X j - 1

1 <= j <= Nr of terms

2 <= Nr of terms <= Nr data points

Power law:

This module fits the function: Y = a X b

in which a and b are real numbers to the data points.

A linear equation is obtained by taking the natural logarithm of both sides:

ln(Y) = ln(a) + b * ln(X)

Note:

All X-values of the data points must be positive and the Y-values of the data points must all have the same sign (either all positive or all negative) and none of them equals 0.

Trigonometric:

This module fits a finite Fourier series to the data points.

The elements of the solution vector S will be as follows:

Sj = Fj - 1

where:

F j - 1

is the (j - 1)th term in the Fourier series.

1 <= j <= Nr of terms

3 <= Nr of terms <= Nr data points

The first few terms in the Fourier series are:

F0 = 1

F1 = cos(X)

F2 = sin(X)

F3 = cos(2X)

F4 = sin(2X)

F5 = cos(3X)

F6 = sin(3X)

See also: