Predicting Defaults on Credit Card Payments
This model will predict the probability that a credit card holder will default on their payment given their payment history and demographic information.
Load libraries:
library(readxl)
library(xgboost)
library(caTools)
library(config)
Load config file:
model_config <- get("model_config")
This notebook refers to model-b.
Load data
Load the data and view its contents:
# Data from https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
df <- read_excel("data/default-of-credit-card clients.xls", skip = 1)
df
Split data into training and testing sets
Split the data set into 80% training and 20% testing portions:
set.seed(123)
sample <- sample.split(df, SplitRatio = 0.80)
train <- as.matrix(subset(df, sample == TRUE))
x_train <- train[,-25]
y_train <- train[, 25, drop=FALSE]
test <- as.matrix(subset(df, sample == FALSE))
x_test <- test[,-25]
y_test <- test[, 25, drop=FALSE]
Train model
Train the model using a binary classification algoritm:
bst <- xgboost(data = x_train,
label = y_train,
max.depth = 2,
eta = 1,
nthread = 2,
nrounds = 2,
booster = "gblinear",
objective = "binary:logistic")
[1] train-error:0.211000
[2] train-error:0.186708
Show the most important factors and their weights:
imp_matrix <- xgb.importance(feature_names = colnames(x_train), model = bst)
xgb.plot.importance(importance_matrix = imp_matrix)
Serialize model
Serialize the trained model to a file for later use:
model_save_path = paste0(model_config$path_prefix, model_config$id, model_config$path_suffix)
xgb.save(bst, model_save_path)
[1] TRUE
Model Testing
Generate model predictions on the test data and calculate the accuracy of the model:
pred <- predict(bst, x_test)
err <- mean(as.numeric(pred > 0.5) != y_test)
print(paste("Test Accuracy =", 1-err))
[1] "Test Accuracy = 0.814833333333333"
Test model predictions
Generate a prediction for an account that we know has a good payment history:
test_data <- matrix(c(35, 500000, 1, 1, 1, 58, -2, -2, -2, -2, -2, -2, 13709, 5006, 31130, 3180, 0, 5293, 5006, 31178, 3180, 0, 5293, 768), nrow = 1)
pred_good <- predict(bst, test_data)
pred_good
[1] 0.03666128
This account has a 0.0366613 probability of defaulting on their payment.
Generate a prediction for an account that we know defaults on their payment:
test_data <- matrix(c(1, 20000, 2, 2, 1, 24, 2, 2, -1, -1, -2, -2, 3913, 3102, 689, 0, 0, 0, 0, 689, 0, 0, 0, 0), nrow = 1)
pred_bad <- predict(bst, test_data)
pred_bad
[1] 0.5630285
This account has a 0.5630285 probability of defaulting on their payment.
LS0tCnRpdGxlOiAiQ3JlZGl0IFJpc2sgTW9kZWxpbmcgaW4gUlN0dWRpbyIKcmVzb3VyY2VfZmlsZXM6Ci0gY29uZmlnLnltbApvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIyMgUHJlZGljdGluZyBEZWZhdWx0cyBvbiBDcmVkaXQgQ2FyZCBQYXltZW50cwoKVGhpcyBtb2RlbCB3aWxsIHByZWRpY3QgdGhlIHByb2JhYmlsaXR5IHRoYXQgYSBjcmVkaXQgY2FyZCBob2xkZXIgd2lsbCBkZWZhdWx0IG9uIHRoZWlyIHBheW1lbnQgZ2l2ZW4gdGhlaXIgcGF5bWVudCBoaXN0b3J5IGFuZCBkZW1vZ3JhcGhpYyBpbmZvcm1hdGlvbi4KCkxvYWQgbGlicmFyaWVzOgoKYGBge3IgbWVzc2FnZT1GQUxTRX0KbGlicmFyeShyZWFkeGwpCmxpYnJhcnkoeGdib29zdCkKbGlicmFyeShjYVRvb2xzKQpsaWJyYXJ5KGNvbmZpZykKYGBgCgpMb2FkIGNvbmZpZyBmaWxlOgoKYGBge3J9Cm1vZGVsX2NvbmZpZyA8LSBnZXQoIm1vZGVsX2NvbmZpZyIpCmBgYAoKVGhpcyBub3RlYm9vayByZWZlcnMgdG8gKipgciBtb2RlbF9jb25maWckaWRgKiouCgojIyMgTG9hZCBkYXRhCgpMb2FkIHRoZSBkYXRhIGFuZCB2aWV3IGl0cyBjb250ZW50czoKCmBgYHtyfQojIERhdGEgZnJvbSBodHRwczovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvZGF0YXNldHMvZGVmYXVsdCtvZitjcmVkaXQrY2FyZCtjbGllbnRzCmRmIDwtIHJlYWRfZXhjZWwoImRhdGEvZGVmYXVsdC1vZi1jcmVkaXQtY2FyZCBjbGllbnRzLnhscyIsIHNraXAgPSAxKQpkZgpgYGAKCiMjIyBTcGxpdCBkYXRhIGludG8gdHJhaW5pbmcgYW5kIHRlc3Rpbmcgc2V0cwoKU3BsaXQgdGhlIGRhdGEgc2V0IGludG8gODAlIHRyYWluaW5nIGFuZCAyMCUgdGVzdGluZyBwb3J0aW9uczoKCmBgYHtyIHdhcm5pbmc9RkFMU0V9CnNldC5zZWVkKDEyMykKc2FtcGxlIDwtIHNhbXBsZS5zcGxpdChkZiwgU3BsaXRSYXRpbyA9IDAuODApCgp0cmFpbiA8LSBhcy5tYXRyaXgoc3Vic2V0KGRmLCBzYW1wbGUgPT0gVFJVRSkpCnhfdHJhaW4gPC0gdHJhaW5bLC0yNV0KeV90cmFpbiAgPC0gdHJhaW5bLCAyNSwgZHJvcD1GQUxTRV0KCnRlc3QgPC0gYXMubWF0cml4KHN1YnNldChkZiwgc2FtcGxlID09IEZBTFNFKSkKeF90ZXN0IDwtIHRlc3RbLC0yNV0KeV90ZXN0ICA8LSB0ZXN0WywgMjUsIGRyb3A9RkFMU0VdCmBgYAoKIyMjIFRyYWluIG1vZGVsCgpUcmFpbiB0aGUgbW9kZWwgdXNpbmcgYSBiaW5hcnkgY2xhc3NpZmljYXRpb24gYWxnb3JpdG06CgpgYGB7cn0KYnN0IDwtIHhnYm9vc3QoZGF0YSA9IHhfdHJhaW4sCiAgICAgICAgICAgICAgIGxhYmVsID0geV90cmFpbiwKICAgICAgICAgICAgICAgbWF4LmRlcHRoID0gMiwKICAgICAgICAgICAgICAgZXRhID0gMSwKICAgICAgICAgICAgICAgbnRocmVhZCA9IDIsCiAgICAgICAgICAgICAgIG5yb3VuZHMgPSAyLAogICAgICAgICAgICAgICBib29zdGVyID0gImdibGluZWFyIiwKICAgICAgICAgICAgICAgb2JqZWN0aXZlID0gImJpbmFyeTpsb2dpc3RpYyIpCmBgYAoKU2hvdyB0aGUgbW9zdCBpbXBvcnRhbnQgZmFjdG9ycyBhbmQgdGhlaXIgd2VpZ2h0czoKCmBgYHtyfQppbXBfbWF0cml4IDwtIHhnYi5pbXBvcnRhbmNlKGZlYXR1cmVfbmFtZXMgPSBjb2xuYW1lcyh4X3RyYWluKSwgbW9kZWwgPSBic3QpCnhnYi5wbG90LmltcG9ydGFuY2UoaW1wb3J0YW5jZV9tYXRyaXggPSBpbXBfbWF0cml4KQpgYGAKCiMjIyBTZXJpYWxpemUgbW9kZWwKClNlcmlhbGl6ZSB0aGUgdHJhaW5lZCBtb2RlbCB0byBhIGZpbGUgZm9yIGxhdGVyIHVzZToKCmBgYHtyfQptb2RlbF9zYXZlX3BhdGggPSBwYXN0ZTAobW9kZWxfY29uZmlnJHBhdGhfcHJlZml4LCBtb2RlbF9jb25maWckaWQsIG1vZGVsX2NvbmZpZyRwYXRoX3N1ZmZpeCkKeGdiLnNhdmUoYnN0LCBtb2RlbF9zYXZlX3BhdGgpCmBgYAoKIyMjIE1vZGVsIFRlc3RpbmcKCkdlbmVyYXRlIG1vZGVsIHByZWRpY3Rpb25zIG9uIHRoZSB0ZXN0IGRhdGEgYW5kIGNhbGN1bGF0ZSB0aGUgYWNjdXJhY3kgb2YgdGhlCm1vZGVsOgoKYGBge3J9CnByZWQgPC0gcHJlZGljdChic3QsIHhfdGVzdCkKZXJyIDwtIG1lYW4oYXMubnVtZXJpYyhwcmVkID4gMC41KSAhPSB5X3Rlc3QpCnByaW50KHBhc3RlKCJUZXN0IEFjY3VyYWN5ID0iLCAxLWVycikpCmBgYAoKIyMjIFRlc3QgbW9kZWwgcHJlZGljdGlvbnMKCkdlbmVyYXRlIGEgcHJlZGljdGlvbiBmb3IgYW4gYWNjb3VudCB0aGF0IHdlIGtub3cgaGFzIGEgZ29vZCBwYXltZW50IGhpc3Rvcnk6CgpgYGB7cn0KdGVzdF9kYXRhIDwtIG1hdHJpeChjKDM1LCA1MDAwMDAsIDEsIDEsIDEsIDU4LCAtMiwgLTIsIC0yLCAtMiwgLTIsIC0yLCAxMzcwOSwgNTAwNiwgMzExMzAsIDMxODAsIDAsIDUyOTMsIDUwMDYsIDMxMTc4LCAzMTgwLCAwLCA1MjkzLCA3NjgpLCBucm93ID0gMSkKcHJlZF9nb29kIDwtIHByZWRpY3QoYnN0LCB0ZXN0X2RhdGEpCnByZWRfZ29vZApgYGAKClRoaXMgYWNjb3VudCBoYXMgYSBgciBwcmVkX2dvb2RgIHByb2JhYmlsaXR5IG9mIGRlZmF1bHRpbmcgb24gdGhlaXIgcGF5bWVudC4KCi0tLQoKR2VuZXJhdGUgYSBwcmVkaWN0aW9uIGZvciBhbiBhY2NvdW50IHRoYXQgd2Uga25vdyBkZWZhdWx0cyBvbiB0aGVpciBwYXltZW50OgoKYGBge3J9CnRlc3RfZGF0YSA8LSBtYXRyaXgoYygxLCAyMDAwMCwgMiwgMiwgMSwgMjQsIDIsIDIsIC0xLCAtMSwgLTIsIC0yLCAzOTEzLCAzMTAyLCA2ODksIDAsIDAsIDAsIDAsIDY4OSwgMCwgMCwgMCwgMCksIG5yb3cgPSAxKQpwcmVkX2JhZCA8LSBwcmVkaWN0KGJzdCwgdGVzdF9kYXRhKQpwcmVkX2JhZApgYGAKClRoaXMgYWNjb3VudCBoYXMgYSBgciBwcmVkX2JhZGAgcHJvYmFiaWxpdHkgb2YgZGVmYXVsdGluZyBvbiB0aGVpciBwYXltZW50Lgo=