Title: Forecasting the Recovery: A Capstone Analysis of US Housing Data (2005-2013): Business Statistics and Analysis Capstone Project
Introduction: From Data to Dollars
For my Business Statistics and Analysis Capstone, I tackled the complex dynamics of the US housing market using HUD's THADS data (2005–2013). The goal wasn't just to report what happened, but to build a robust model capable of forecasting future market values. This project showcases the power of statistical rigor in transforming messy, skewed real-world data into actionable predictive insights.
Key Analysis Highlights
Measuring the Crisis Impact on Rent: I used paired t-tests (2007 vs. 2009) on Fair Market Rent (FMR) and found that the 2008 Subprime Crisis did not decrease rents; instead, the crisis accelerated the rise in mean FMR, as foreclosures moved people into the rental market. This finding directly challenges assumptions about market deflation.
Modeling Market Value Drivers: I developed a Multiple Linear Regression model for single-family home values, employing log transformations (
LN(VALUE) ,LN(FMR) ,LN(UTILITY ) to correct for extreme skewness. The model confirmed that local FMR is the single strongest predictor of a home’s value, and quantified the penalty associated with a unit being vacant (approx 11.5% reduction).Building a Time-Lagged Forecasting Engine: The final step involved building a predictive model using 2011 features (
Predictors_2011 ) to forecast 2013 values (VALUE_2013 ). This required merging datasets on theCONTROL variable and obtaining a set ofbeta coefficients.Validation and Risk Quantification: I validated the model's performance on a 1,000-unit holdout sample. The final measure of accuracy, the Mean Absolute Deviation (MAD), was calculated at $136,966. This single metric quantifies the average forecasting error, allowing a business analyst to understand the financial risk when utilizing the model for strategic planning.
Conclusion
This capstone demonstrated end-to-end analytical capability: from cleaning and transforming complex real-estate data to running predictive validation. The result is a statistically grounded, highly interpretable model that quantifies market drivers and provides a measure of confidence for future forecasts.
Comments
Post a Comment