% Notes on Regression Methods for Linear Models %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % A general problem in data analysis is fitting a linear model of the % form % % y = a + b*x % % to a set of N [x,y] data points. The statistical problem is to first % determine the parameters (intercept a, slope b) which provide the % 'best' fit of the data to the model, and then estimate the confidence % intervals for [a,b]. The 'best' fit is determined by least-squares, % i.e., chosing [a,b] to minimize an appropriate merit function chi which % involves squares of [x,y]. This leads to [a,b] having confidence % intervals given by the student-t distribution. % % In the usual case (Case A), x is considered the independent variable % with no measurement error and y the dependent variable with a random % error given by N(0, stdy), where N is the normal distrubution function % with zero mean and standard deviation stdy. In this case, the merit % function chi is defined as the sum of the squared residuals % % chi = sum( r(n)^2 ) = sum( y(n)-a-b*x(n))^2 ) % % where r(n) is the residual or vertical distance between y(n) and the % 'best' fit line for the nth point, and the sum is over [1:N]. % Differentiation of chi with respect with a and b and setting the % resulting equations to zero yield two normal equations which are linear % in a and b, thus allowing an exact solution. LINREG(x,y) finds this % solution and returns the 95% confidence intervals for [a,b]. % % In the case when both x and y are considered random variables (Case B), % i.e., when x also contains random measurement error characterized by % N(0, stdx), then the merit function chi is defined as the sum of the % squared perpendicular distances between the data points and the 'best' % fit line, so that % % chi = sum( (y(n)-a-b*x(n))^2/(stdy^2+b^2*stdx^2) ) % % where the denominator can be considered to convert the vertical % distance into the perpendicular distance. In this case, stdx and stdy % are assumed known apriori. Since b appears in the denominator of chi, % differentiation of chi with respect to b leads to a nonlinear normal % equation for b (the normal equation for a is unchanged from case A). % For the special case considered here, namely that the uncertainities in % x(y) are independent of the value of x(y), then the nonlinear normal % equation for b reduces to a quadratic which can be solved 'exactly' % using FZERO and an initial guess. LINREG1(x,y,stdx,stdy,b0) determines % the best [a,b] and returns their 95% confidence intervals given the % input [stdx,stdy]. In the limit as the uncertainity in (say) x gets % very small (stdx<