Home > bobstuff > regress.m

regress

PURPOSE ^

Notes on Regression Methods for Linear Models

SYNOPSIS ^

This is a script file.

DESCRIPTION ^

          Notes on Regression Methods for Linear Models
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  A general problem in data analysis is fitting a linear model of the
  form

             y = a + b*x

  to a set of N [x,y] data points. The statistical problem is to first 
  determine the parameters (intercept a, slope b) which provide the
  'best' fit of the data to the model, and then estimate the confidence
  intervals for [a,b]. The 'best' fit is determined by least-squares,
  i.e., chosing [a,b] to minimize an appropriate merit function chi which
  involves squares of [x,y]. This leads to [a,b] having confidence
  intervals given by the student-t distribution.

  In the usual case (Case A), x is considered the independent variable
  with no measurement error and y the dependent variable with a random
  error given by N(0, stdy), where N is the normal distrubution function
  with zero mean and standard deviation stdy. In this case, the merit
  function chi is defined as the sum of the squared residuals

         chi = sum( r(n)^2 ) = sum( y(n)-a-b*x(n))^2 )

  where r(n) is the residual or vertical distance between y(n) and the
  'best' fit line for the nth point, and the sum is over [1:N].
  Differentiation of chi with respect with a and b and setting the
  resulting equations to zero yield two normal equations which are linear
  in a and b, thus allowing an exact solution.  LINREG(x,y) finds this
  solution and returns the 95% confidence intervals for [a,b].

  In the case when both x and y are considered random variables (Case B),
  i.e., when x also contains random measurement error characterized by
  N(0, stdx), then the merit function chi is defined as the sum of the
  squared perpendicular distances between the data points and the 'best'
  fit line, so that

      chi = sum( (y(n)-a-b*x(n))^2/(stdy^2+b^2*stdx^2) )

  where the denominator can be considered to convert the vertical
  distance into the perpendicular distance. In this case, stdx and stdy
  are assumed known apriori. Since b appears in the denominator of chi,
  differentiation of chi with respect to b leads to a nonlinear normal
  equation for b (the normal equation for a is unchanged from case A).
  For the special case considered here, namely that the uncertainities in
  x(y) are independent of the value of x(y), then the nonlinear normal
  equation for b reduces to a quadratic which can be solved 'exactly'
  using FZERO and an initial guess. LINREG1(x,y,stdx,stdy,b0) determines
  the best [a,b] and returns their 95% confidence intervals given the
  input [stdx,stdy].  In the limit as the uncertainity in (say) x gets
  very small (stdx<<stdy), the results of LINREG1 and LINREG converge.

  SUMMARY:
  Case A: y is random variable. Use LINREG(x,y).  
  Case B: x and y are random variables. Use LINREG1(x,y,stdx,stdy,b0).
          LINREG1 uses QUADRAT.

  For general discussion of both cases, see sections 15.1-15.3, 
  Press et al. (1992), Numerical Recipes: The Art of Scientific 
  Computing, Edition 2.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 ver. 1: 12/1/96 (RB)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 %          Notes on Regression Methods for Linear Models
0002 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0003 %  A general problem in data analysis is fitting a linear model of the
0004 %  form
0005 %
0006 %             y = a + b*x
0007 %
0008 %  to a set of N [x,y] data points. The statistical problem is to first
0009 %  determine the parameters (intercept a, slope b) which provide the
0010 %  'best' fit of the data to the model, and then estimate the confidence
0011 %  intervals for [a,b]. The 'best' fit is determined by least-squares,
0012 %  i.e., chosing [a,b] to minimize an appropriate merit function chi which
0013 %  involves squares of [x,y]. This leads to [a,b] having confidence
0014 %  intervals given by the student-t distribution.
0015 %
0016 %  In the usual case (Case A), x is considered the independent variable
0017 %  with no measurement error and y the dependent variable with a random
0018 %  error given by N(0, stdy), where N is the normal distrubution function
0019 %  with zero mean and standard deviation stdy. In this case, the merit
0020 %  function chi is defined as the sum of the squared residuals
0021 %
0022 %         chi = sum( r(n)^2 ) = sum( y(n)-a-b*x(n))^2 )
0023 %
0024 %  where r(n) is the residual or vertical distance between y(n) and the
0025 %  'best' fit line for the nth point, and the sum is over [1:N].
0026 %  Differentiation of chi with respect with a and b and setting the
0027 %  resulting equations to zero yield two normal equations which are linear
0028 %  in a and b, thus allowing an exact solution.  LINREG(x,y) finds this
0029 %  solution and returns the 95% confidence intervals for [a,b].
0030 %
0031 %  In the case when both x and y are considered random variables (Case B),
0032 %  i.e., when x also contains random measurement error characterized by
0033 %  N(0, stdx), then the merit function chi is defined as the sum of the
0034 %  squared perpendicular distances between the data points and the 'best'
0035 %  fit line, so that
0036 %
0037 %      chi = sum( (y(n)-a-b*x(n))^2/(stdy^2+b^2*stdx^2) )
0038 %
0039 %  where the denominator can be considered to convert the vertical
0040 %  distance into the perpendicular distance. In this case, stdx and stdy
0041 %  are assumed known apriori. Since b appears in the denominator of chi,
0042 %  differentiation of chi with respect to b leads to a nonlinear normal
0043 %  equation for b (the normal equation for a is unchanged from case A).
0044 %  For the special case considered here, namely that the uncertainities in
0045 %  x(y) are independent of the value of x(y), then the nonlinear normal
0046 %  equation for b reduces to a quadratic which can be solved 'exactly'
0047 %  using FZERO and an initial guess. LINREG1(x,y,stdx,stdy,b0) determines
0048 %  the best [a,b] and returns their 95% confidence intervals given the
0049 %  input [stdx,stdy].  In the limit as the uncertainity in (say) x gets
0050 %  very small (stdx<<stdy), the results of LINREG1 and LINREG converge.
0051 %
0052 %  SUMMARY:
0053 %  Case A: y is random variable. Use LINREG(x,y).
0054 %  Case B: x and y are random variables. Use LINREG1(x,y,stdx,stdy,b0).
0055 %          LINREG1 uses QUADRAT.
0056 %
0057 %  For general discussion of both cases, see sections 15.1-15.3,
0058 %  Press et al. (1992), Numerical Recipes: The Art of Scientific
0059 %  Computing, Edition 2.
0060 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0061 % ver. 1: 12/1/96 (RB)
0062 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Generated on Thu 01-Dec-2005 07:46:03 by m2html © 2003