Objective

This supervised machine learning project attempts to address class imbalances by resampling loan application information to used to assess credit risk.

Follow along the Notebook below to check out the analysis!

Data

The dataset used in this project came from LendingTree, and can be found in the Resources folder of my GitHub Repository for this project.

Resampling Techniques Methods Used

Naive Random Oversampling
SMOTE Oversampling
Undersampling
Combination Sampling (SMOTEENN)

Notebook

Summary, Analysis, and Recommendations

The F1 values for all four resampling models is very poor, at 0.01 to 0.02 for the high-risk class.

Balanced accuracy scores are also similar for the Random Naive Oversampling, SMOTE Oversampling, and SMOTEENN Combination Sampling methods (ranging from 0.622-0.624). The Undersampled method underperformed in this regard, with a balanced accuracy score of only 0.488.

Of the methods utilized in this module, I would not recommend any of them, as the accuracy scores are not strong enough to inspire confidence in the model's ability to accurately determine a high-risk credit application. However, if one of these models had to be chosen, I would suggest using the SMOTEEN Combination model, since it has the highest recall value (0.7) for high-risk applications and a balanced accuracy score similar to our better performing models (0.622).

Analyzing

CREDIT RISK

Supervised Machine Learning

Objective

Data

Resampling Techniques Methods Used

Notebook

Summary, Analysis, and Recommendations

Contact Me