This supervised machine learning project attempts to address class imbalances by resampling loan application information to used to assess credit risk.
Follow along the Notebook below to check out the analysis!
The dataset used in this project came from LendingTree, and can be found in the Resources folder of my GitHub Repository for this project.
The F1 values for all four resampling models is very poor, at 0.01 to 0.02 for the high-risk class.
Balanced accuracy scores are also similar for the Random Naive Oversampling, SMOTE Oversampling, and SMOTEENN Combination Sampling methods (ranging from 0.622-0.624). The Undersampled method underperformed in this regard, with a balanced accuracy score of only 0.488.
Of the methods utilized in this module, I would not recommend any of them, as the accuracy scores are not strong enough to inspire confidence in the model's ability to accurately determine a high-risk credit application. However, if one of these models had to be chosen, I would suggest using the SMOTEEN Combination model, since it has the highest recall value (0.7) for high-risk applications and a balanced accuracy score similar to our better performing models (0.622).