Predicting Defaults on Credit Card Payments

This model will predict the probability that a credit card holder will default on their payment given their payment history and demographic information.

Load libraries:

library(readxl)
library(xgboost)
library(caTools)
library(config)

Load config file:

model_config <- get("model_config")

This notebook refers to model-b.

Load data

Load the data and view its contents:

# Data from https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
df <- read_excel("data/default-of-credit-card clients.xls", skip = 1)
df

Split data into training and testing sets

Split the data set into 80% training and 20% testing portions:

set.seed(123)
sample <- sample.split(df, SplitRatio = 0.80)

train <- as.matrix(subset(df, sample == TRUE))
x_train <- train[,-25]
y_train  <- train[, 25, drop=FALSE]

test <- as.matrix(subset(df, sample == FALSE))
x_test <- test[,-25]
y_test  <- test[, 25, drop=FALSE]

Train model

Train the model using a binary classification algoritm:

bst <- xgboost(data = x_train,
               label = y_train,
               max.depth = 2,
               eta = 1,
               nthread = 2,
               nrounds = 2,
               booster = "gblinear",
               objective = "binary:logistic")
[1] train-error:0.211000 
[2] train-error:0.186708 

Show the most important factors and their weights:

imp_matrix <- xgb.importance(feature_names = colnames(x_train), model = bst)
xgb.plot.importance(importance_matrix = imp_matrix)

Serialize model

Serialize the trained model to a file for later use:

model_save_path = paste0(model_config$path_prefix, model_config$id, model_config$path_suffix)
xgb.save(bst, model_save_path)
[1] TRUE

Model Testing

Generate model predictions on the test data and calculate the accuracy of the model:

pred <- predict(bst, x_test)
err <- mean(as.numeric(pred > 0.5) != y_test)
print(paste("Test Accuracy =", 1-err))
[1] "Test Accuracy = 0.814833333333333"

Test model predictions

Generate a prediction for an account that we know has a good payment history:

test_data <- matrix(c(35, 500000, 1, 1, 1, 58, -2, -2, -2, -2, -2, -2, 13709, 5006, 31130, 3180, 0, 5293, 5006, 31178, 3180, 0, 5293, 768), nrow = 1)
pred_good <- predict(bst, test_data)
pred_good
[1] 0.03666128

This account has a 0.0366613 probability of defaulting on their payment.


Generate a prediction for an account that we know defaults on their payment:

test_data <- matrix(c(1, 20000, 2, 2, 1, 24, 2, 2, -1, -1, -2, -2, 3913, 3102, 689, 0, 0, 0, 0, 689, 0, 0, 0, 0), nrow = 1)
pred_bad <- predict(bst, test_data)
pred_bad
[1] 0.5630285

This account has a 0.5630285 probability of defaulting on their payment.

LS0tCnRpdGxlOiAiQ3JlZGl0IFJpc2sgTW9kZWxpbmcgaW4gUlN0dWRpbyIKcmVzb3VyY2VfZmlsZXM6Ci0gY29uZmlnLnltbApvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIyMgUHJlZGljdGluZyBEZWZhdWx0cyBvbiBDcmVkaXQgQ2FyZCBQYXltZW50cwoKVGhpcyBtb2RlbCB3aWxsIHByZWRpY3QgdGhlIHByb2JhYmlsaXR5IHRoYXQgYSBjcmVkaXQgY2FyZCBob2xkZXIgd2lsbCBkZWZhdWx0IG9uIHRoZWlyIHBheW1lbnQgZ2l2ZW4gdGhlaXIgcGF5bWVudCBoaXN0b3J5IGFuZCBkZW1vZ3JhcGhpYyBpbmZvcm1hdGlvbi4KCkxvYWQgbGlicmFyaWVzOgoKYGBge3IgbWVzc2FnZT1GQUxTRX0KbGlicmFyeShyZWFkeGwpCmxpYnJhcnkoeGdib29zdCkKbGlicmFyeShjYVRvb2xzKQpsaWJyYXJ5KGNvbmZpZykKYGBgCgpMb2FkIGNvbmZpZyBmaWxlOgoKYGBge3J9Cm1vZGVsX2NvbmZpZyA8LSBnZXQoIm1vZGVsX2NvbmZpZyIpCmBgYAoKVGhpcyBub3RlYm9vayByZWZlcnMgdG8gKipgciBtb2RlbF9jb25maWckaWRgKiouCgojIyMgTG9hZCBkYXRhCgpMb2FkIHRoZSBkYXRhIGFuZCB2aWV3IGl0cyBjb250ZW50czoKCmBgYHtyfQojIERhdGEgZnJvbSBodHRwczovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvZGF0YXNldHMvZGVmYXVsdCtvZitjcmVkaXQrY2FyZCtjbGllbnRzCmRmIDwtIHJlYWRfZXhjZWwoImRhdGEvZGVmYXVsdC1vZi1jcmVkaXQtY2FyZCBjbGllbnRzLnhscyIsIHNraXAgPSAxKQpkZgpgYGAKCiMjIyBTcGxpdCBkYXRhIGludG8gdHJhaW5pbmcgYW5kIHRlc3Rpbmcgc2V0cwoKU3BsaXQgdGhlIGRhdGEgc2V0IGludG8gODAlIHRyYWluaW5nIGFuZCAyMCUgdGVzdGluZyBwb3J0aW9uczoKCmBgYHtyIHdhcm5pbmc9RkFMU0V9CnNldC5zZWVkKDEyMykKc2FtcGxlIDwtIHNhbXBsZS5zcGxpdChkZiwgU3BsaXRSYXRpbyA9IDAuODApCgp0cmFpbiA8LSBhcy5tYXRyaXgoc3Vic2V0KGRmLCBzYW1wbGUgPT0gVFJVRSkpCnhfdHJhaW4gPC0gdHJhaW5bLC0yNV0KeV90cmFpbiAgPC0gdHJhaW5bLCAyNSwgZHJvcD1GQUxTRV0KCnRlc3QgPC0gYXMubWF0cml4KHN1YnNldChkZiwgc2FtcGxlID09IEZBTFNFKSkKeF90ZXN0IDwtIHRlc3RbLC0yNV0KeV90ZXN0ICA8LSB0ZXN0WywgMjUsIGRyb3A9RkFMU0VdCmBgYAoKIyMjIFRyYWluIG1vZGVsCgpUcmFpbiB0aGUgbW9kZWwgdXNpbmcgYSBiaW5hcnkgY2xhc3NpZmljYXRpb24gYWxnb3JpdG06CgpgYGB7cn0KYnN0IDwtIHhnYm9vc3QoZGF0YSA9IHhfdHJhaW4sCiAgICAgICAgICAgICAgIGxhYmVsID0geV90cmFpbiwKICAgICAgICAgICAgICAgbWF4LmRlcHRoID0gMiwKICAgICAgICAgICAgICAgZXRhID0gMSwKICAgICAgICAgICAgICAgbnRocmVhZCA9IDIsCiAgICAgICAgICAgICAgIG5yb3VuZHMgPSAyLAogICAgICAgICAgICAgICBib29zdGVyID0gImdibGluZWFyIiwKICAgICAgICAgICAgICAgb2JqZWN0aXZlID0gImJpbmFyeTpsb2dpc3RpYyIpCmBgYAoKU2hvdyB0aGUgbW9zdCBpbXBvcnRhbnQgZmFjdG9ycyBhbmQgdGhlaXIgd2VpZ2h0czoKCmBgYHtyfQppbXBfbWF0cml4IDwtIHhnYi5pbXBvcnRhbmNlKGZlYXR1cmVfbmFtZXMgPSBjb2xuYW1lcyh4X3RyYWluKSwgbW9kZWwgPSBic3QpCnhnYi5wbG90LmltcG9ydGFuY2UoaW1wb3J0YW5jZV9tYXRyaXggPSBpbXBfbWF0cml4KQpgYGAKCiMjIyBTZXJpYWxpemUgbW9kZWwKClNlcmlhbGl6ZSB0aGUgdHJhaW5lZCBtb2RlbCB0byBhIGZpbGUgZm9yIGxhdGVyIHVzZToKCmBgYHtyfQptb2RlbF9zYXZlX3BhdGggPSBwYXN0ZTAobW9kZWxfY29uZmlnJHBhdGhfcHJlZml4LCBtb2RlbF9jb25maWckaWQsIG1vZGVsX2NvbmZpZyRwYXRoX3N1ZmZpeCkKeGdiLnNhdmUoYnN0LCBtb2RlbF9zYXZlX3BhdGgpCmBgYAoKIyMjIE1vZGVsIFRlc3RpbmcKCkdlbmVyYXRlIG1vZGVsIHByZWRpY3Rpb25zIG9uIHRoZSB0ZXN0IGRhdGEgYW5kIGNhbGN1bGF0ZSB0aGUgYWNjdXJhY3kgb2YgdGhlCm1vZGVsOgoKYGBge3J9CnByZWQgPC0gcHJlZGljdChic3QsIHhfdGVzdCkKZXJyIDwtIG1lYW4oYXMubnVtZXJpYyhwcmVkID4gMC41KSAhPSB5X3Rlc3QpCnByaW50KHBhc3RlKCJUZXN0IEFjY3VyYWN5ID0iLCAxLWVycikpCmBgYAoKIyMjIFRlc3QgbW9kZWwgcHJlZGljdGlvbnMKCkdlbmVyYXRlIGEgcHJlZGljdGlvbiBmb3IgYW4gYWNjb3VudCB0aGF0IHdlIGtub3cgaGFzIGEgZ29vZCBwYXltZW50IGhpc3Rvcnk6CgpgYGB7cn0KdGVzdF9kYXRhIDwtIG1hdHJpeChjKDM1LCA1MDAwMDAsIDEsIDEsIDEsIDU4LCAtMiwgLTIsIC0yLCAtMiwgLTIsIC0yLCAxMzcwOSwgNTAwNiwgMzExMzAsIDMxODAsIDAsIDUyOTMsIDUwMDYsIDMxMTc4LCAzMTgwLCAwLCA1MjkzLCA3NjgpLCBucm93ID0gMSkKcHJlZF9nb29kIDwtIHByZWRpY3QoYnN0LCB0ZXN0X2RhdGEpCnByZWRfZ29vZApgYGAKClRoaXMgYWNjb3VudCBoYXMgYSBgciBwcmVkX2dvb2RgIHByb2JhYmlsaXR5IG9mIGRlZmF1bHRpbmcgb24gdGhlaXIgcGF5bWVudC4KCi0tLQoKR2VuZXJhdGUgYSBwcmVkaWN0aW9uIGZvciBhbiBhY2NvdW50IHRoYXQgd2Uga25vdyBkZWZhdWx0cyBvbiB0aGVpciBwYXltZW50OgoKYGBge3J9CnRlc3RfZGF0YSA8LSBtYXRyaXgoYygxLCAyMDAwMCwgMiwgMiwgMSwgMjQsIDIsIDIsIC0xLCAtMSwgLTIsIC0yLCAzOTEzLCAzMTAyLCA2ODksIDAsIDAsIDAsIDAsIDY4OSwgMCwgMCwgMCwgMCksIG5yb3cgPSAxKQpwcmVkX2JhZCA8LSBwcmVkaWN0KGJzdCwgdGVzdF9kYXRhKQpwcmVkX2JhZApgYGAKClRoaXMgYWNjb3VudCBoYXMgYSBgciBwcmVkX2JhZGAgcHJvYmFiaWxpdHkgb2YgZGVmYXVsdGluZyBvbiB0aGVpciBwYXltZW50Lgo=