Predicting

HOUSING PRICES

ETL, Machine Learning, Python, SQL, AWS, Tableau


GitHub Repository

Project Description


Overview:
Housing prices in the United States (US) continue to increase as incomes rise, unemployment drops, and industries grow. Our team selected this topic in order to predict how housing prices will change over the years as we decide where we want to relocate long-term.

Objective:
By analyzing housing market data and trends between 2015-2019, the Housing Price Prediction Tool will predict whether the housing market will rise or drop in the 50 largest cities in the US. For example, someone who works in the Technology sector will be able to compare the income, housing price, and population demographics of San Francisco, Austin, and Seattle while they are applying for jobs. This could help them better understand similarities and differences between different cities and aid their decision making process.


Background: This was the final project in my Data Analytics Boot Camp. The goal of the project was to implement many of the skills we learned



Questions To Answer:



Presentation




Technologies & Tools Used


Technologies Used


Data Exploration Phase


ETL Process


Data Analysis Phase


Detailed descriptions of our data analysis can be found in our presentation.

Here are the housing price trends of New York (top) and Los Angeles (bottom), after we cleaned null values from our data. We found that the housing prices in Los Angeles to rise in a more linear and predictable fashion compared to New York, which was more sporadic.

Los Angeles and New York


Data Sources




Database




Machine Learning


Preliminary Data Processing

After our preliminary processing, were able to perform an initial unsupervised clustering. We attained the following 3D Pricincipal Cluster Analysis Plot:

Initial 3D PCA
Preliminary Feature Engineering, Feature Selection, & the Decision-making Process: Splitting Data Into Testing & Training Sets

For our final linear regression model, we used an 80/20 testing/training split to achieve our results. The testing/training splits we tried in other methods are shown in the table below.

Explanation of Model Choice (Including Limitations & Benefits)

Here are the models we tried, along with results we got:

ML Trials


Dashboard


We used Tableau to create and host our dashboard. It will be directly tied to our Postgres database hosted on AWS via a direct connection.




Analysis Results


After completing the the project and viewing the prediction, we can see that not all housing prices will be increasing in the next year. The machine learning model selected allowed us to get a RMSE of less than $200, which offers a strong prediction from the data provided. If we look at a city like Honolulu for example, we can determine that other factors may be an indicator of the housing market decline. The unemployment rate dropped from 2018 to 2019, but the percentage decrease was a lot smaller that in years past, which can indicate the unemployment percentage will begin to either level out or increase. This can then impact the housing market as more people are unable to purchase homes. New York shows a similar scenario. We also noticed that some cities housing prices are not increasing at such a high rate as they have done in years past. Boston for instance, is beginning to level out.

In conclusion, the data points we provided can be correlated in determining the increase or decrease of the housing market. We also believe there are many other data points we should look at the get a better picture. For example, viewing by zip code instead of city, looking at political party majority in the area, weather, etc.



Recommendations For Future Analysis




Improvements We Would Have Made


One major area where we feel we could have improved our project is by taking more time to discover more data sets and factors that may influence housing prices. There are likely many variables we could not find data on handily, and that would probably be the best place to improve our project.


Contact Me