Creating A Database Of

CRAIGSLIST VEHICLES

Python, ETL & SQL


GitHub Repository


Background & Overview


The goal of this project is to practice my SQL and Python skills using a real-world dataset. As a car enthusiast and someone who has spent a lot of time on Craigslist buying and selling bicycles, I designed this project around a Kaggle dataset I found that has automotive classified listings from Craigslist.

Follow along in the Jupyter Notebook embedded below!


Notebook




ETL Summary


Through the ETL process, the dataset was reduced from 423,857 listings to 137,212 (-68%%). This removed some outliers, null values, and listings with messy inputs.

In the end, the dataset was loaded into a SQLite database. The database has 2 tables -- one with the 137,212 cleaned listings, and the other with the original number of locations. The listings table houses most of the relevant information we expect to use.

The location table includes all the location information for these listings.

Here is a Sankey chart depicting the pruning process we went through with the data:


Sankey Chart depicting the data transformation process.


The primary key that links these tables is the ID column:


Database diagram.




For the next part of this project, I used this database to create a dashboard to visualize the data.

Check Out the Dashboard!



Contact Me