ST3131: Regression Analysis
AY2014/2015, Semester 1, Lecturer: Anthony Kuk
Course Coverage:
1. Analysis of Variance (ANOVA)
2. Simple Linear Regression
3. Multiple Linear Regression
4. Variable Selection
5. Residuals, Influence & Outliers
6. Departures from Assumptions
7. Indicator Variables
8. Nonlinear Regression
This module cover studies the assumptions, computations and tests involved in regression analysis.
The module begins with deriving preliminary results such as the F test, and its role in hypothesis testing. Then the lecture moves on to ANOVA, a collection of statistical models used in order to analyze the differences between groups and their associate procedures. An important concept from this chapter is confidence intervals and its extensions (simultaneous confidence intervals, Bonferroni's Interval, etc).
Then comes the most common statistical method of all time, the linear regression. Simple linear regression is introduced and generalized to multiple linear regression. Some linear algebra knowledge would be very helpful here. Alongside these models, we also covered the confidence intervals, prediction intervals and hypothesis testings related.
Then, we examine selection of the variable that is vital in the process of modeling, as well as the effect of residuals, influence of observations and presence of outliers within the model. Subsequently, we look at certain issues involved when the 4 main assumptions (linearity, equal variance, independence, and normality) of linear regression is violated/dropped.
Linearity is violated when the mean function is mis-specified. This will lead to bias in the estimation of the mean response given the values of predictor variables. The confidence and prediction intervals will also be off. The usual remedies are transforming the predictor variable, the response variable, or both. Also covered is the lack of fit test.
When the variance of the observations are unequal, the standard error estimates would be invalied and the usual confidence and prediction intervals will too wide at some values and too narrow at the others. These effects can be overcome by using weighted least squares estimation or by transformation.
Often, the data we obtained are correlated. In such cases, the standard error estimates of the coefficients are no longer valid, neither are the associated t-statistics and p-values. Detection and remedies depend on the type of correlation (time or geography). For temporally correlated data, either insert omitted variables, use the Cochrane-Orcutt procedure, and include time trends and seasonal effects if present. For spatially correlated data, an iterative procedure similar to the Cochrane-Orcuutt procedure may work.
Non-normal errors pose less of an issue. The expectations and variance do not change. Subjected to suitable conditions, the coefficients will be approximately, asymptotically normal and F tests will be approximately valid. Transformation of the response variable is also a possible remedy.
The last part of the module moves on to extensions of linear regression. Indicator variables can be used to formulate ANOVA problems as linear regressions. Analysis of covariance (ANCOVA) is also briefly introduced. Nonlinear regression was covered at the end but it was not an part of the final exam scope.
This module equip undergraduates with statistical skills vital for all engineering, natural and social science disciplines. Thus, it is highly recommended for all who wish to have a deeper understanding of the linear regression, its assumptions, the associated procedures and extensions.
Workload: Heavy
Difficulty: Difficult
Grade: B+
Course Coverage:
1. Analysis of Variance (ANOVA)
2. Simple Linear Regression
3. Multiple Linear Regression
4. Variable Selection
5. Residuals, Influence & Outliers
6. Departures from Assumptions
7. Indicator Variables
8. Nonlinear Regression
This module cover studies the assumptions, computations and tests involved in regression analysis.
The module begins with deriving preliminary results such as the F test, and its role in hypothesis testing. Then the lecture moves on to ANOVA, a collection of statistical models used in order to analyze the differences between groups and their associate procedures. An important concept from this chapter is confidence intervals and its extensions (simultaneous confidence intervals, Bonferroni's Interval, etc).
Then comes the most common statistical method of all time, the linear regression. Simple linear regression is introduced and generalized to multiple linear regression. Some linear algebra knowledge would be very helpful here. Alongside these models, we also covered the confidence intervals, prediction intervals and hypothesis testings related.
Then, we examine selection of the variable that is vital in the process of modeling, as well as the effect of residuals, influence of observations and presence of outliers within the model. Subsequently, we look at certain issues involved when the 4 main assumptions (linearity, equal variance, independence, and normality) of linear regression is violated/dropped.
Linearity is violated when the mean function is mis-specified. This will lead to bias in the estimation of the mean response given the values of predictor variables. The confidence and prediction intervals will also be off. The usual remedies are transforming the predictor variable, the response variable, or both. Also covered is the lack of fit test.
When the variance of the observations are unequal, the standard error estimates would be invalied and the usual confidence and prediction intervals will too wide at some values and too narrow at the others. These effects can be overcome by using weighted least squares estimation or by transformation.
Often, the data we obtained are correlated. In such cases, the standard error estimates of the coefficients are no longer valid, neither are the associated t-statistics and p-values. Detection and remedies depend on the type of correlation (time or geography). For temporally correlated data, either insert omitted variables, use the Cochrane-Orcutt procedure, and include time trends and seasonal effects if present. For spatially correlated data, an iterative procedure similar to the Cochrane-Orcuutt procedure may work.
Non-normal errors pose less of an issue. The expectations and variance do not change. Subjected to suitable conditions, the coefficients will be approximately, asymptotically normal and F tests will be approximately valid. Transformation of the response variable is also a possible remedy.
The last part of the module moves on to extensions of linear regression. Indicator variables can be used to formulate ANOVA problems as linear regressions. Analysis of covariance (ANCOVA) is also briefly introduced. Nonlinear regression was covered at the end but it was not an part of the final exam scope.
This module equip undergraduates with statistical skills vital for all engineering, natural and social science disciplines. Thus, it is highly recommended for all who wish to have a deeper understanding of the linear regression, its assumptions, the associated procedures and extensions.
Workload: Heavy
Difficulty: Difficult
Grade: B+