Fintech · Predictive Analytics · TVS Motor Capstone

Forecasting Credit Card
Spend at the Individual Level

Predicting average credit card consumption for the next 3 months using customer demographics, transaction history, and financial behaviour to enable banks to personalise offers and anticipate customer needs.

32,820 rows
Training customers
116.2
RMSLE × 100 (threshold: 125)
#1 / batch
Leaderboard rank
14
Models evaluated

Why predict individual credit card spend?

Understanding consumption at the individual level lets banks move from reactive to proactive CRM, offering the right product to the right person at the right time.

Personalised Marketing

Knowing who will spend more next quarter lets banks target high-value segments with tailored rewards and credit limit upgrades.

Risk Management

Predicting a sharp drop in spending can signal churn risk or financial stress, enabling early intervention before the customer leaves.

Revenue Forecasting

Aggregated individual-level predictions give finance teams a bottom-up forecast of interchange income and card portfolio performance.

44 features across three months of behaviour

XYZ Bank provided a snapshot of 32,820 customers with April to June transaction history and demographic data. The target is their average credit card spend in July to September.

32,820
Training customers
14,067
Test customers
44
Raw features
92
Engineered features
₹6,824
Mean target spend
0%
Missing CC spend data
44 Features
CC Spend
Monthly consumption & transaction count (Apr, May, Jun)
Debit Card
Monthly debit consumption & count (Apr, May, Jun)
Bank Account
Credit/debit amounts, max credit, transaction counts
Investments
Demat, FD, insurance holdings (investment_1 to 4)
Loans
Personal & vehicle loans (active + closed), EMI amount
Demographics
Age, gender, region code, account type, card limit

What the data tells us

EDA revealed a consistent downward spending trend and meaningful differences across customer segments.

Monthly CC Spend Trend

Average credit card consumption, Apr to predicted Jul-Sep

Spending declined 59% from April to June. The most recent month is the strongest predictor of future behaviour, especially in log space.

Average Spend by Segment

Target variable broken down by key demographic features

Female cardholders and savings account holders consistently show higher average spend. Both are meaningful predictors after controlling for other variables.

Base Model Performance (RMSLE × 100)

All 14 regressors evaluated individually. Lower is better. Threshold = 125.

Only Huber Regressor beats the threshold alone. Stacking the models pushes the score down to 116.2.

Feature Importance (Gain)

Top predictors from the final CatBoost model

Log-transformed average CC spend is 4× more important than any other feature — the log scale captures the proportional nature of financial behaviour.

Multi-level stacking ensemble

Three gradient-boosted tree models generate out-of-fold predictions, which are combined by a Ridge meta-learner trained on those OOF outputs.

Feature Engineering
92 features incl. log-space CC trends, utilisation, bank flows
5-Fold CV Split
OOF predictions, leak-free region target encoding per fold
Base Models
LightGBM · XGBoost · CatBoost
Ridge Meta-Learner
Learns optimal blend of base model predictions
RMSLE × 100: 116.2
vs threshold of 125, #1 in batch
LightGBM
116.89
Meta-weight: 0.10
XGBoost
116.62
Meta-weight: 0.08
CatBoost Best
116.30
Meta-weight: 0.82
Ridge Meta (final)
116.28
OOF stacked prediction

How the model segments customers

Pre-computed outputs from the full 92-feature model across five representative customer profiles.

Why no interactive demo? The model's 92 input features include monthly debit card spend, number of debit transactions, bank account inflows and outflows, investment balances, and loan enquiry history. None of these can be collected from a web form without direct access to a bank's transaction ledger. In a production deployment, these would come via an open-banking API. What's shown below uses the full feature set with real model outputs.
Customer profile Apr spend May spend Jun spend Card limit Utilisation Predicted Q3 avg Signal
Budget saver
Age 28, savings, no loans
₹1,200₹1,000₹1,000 ₹25,000 4% ₹1,056 Consistent low spend; model predicts continuation
Steady spender
Age 34, current, no loans
₹4,000₹4,500₹5,000 ₹50,000 9% ₹4,820 Upward trend; model forecasts continued growth
Active borrower
Age 38, current, personal loan + ₹3k EMI
₹4,000₹4,500₹5,000 ₹50,000 9% ₹3,940 EMI burden moderately reduces discretionary spend
High utilisation
Age 32F, savings, no loans
₹18,000₹21,000₹24,000 ₹25,000 84% ₹8,200 Near-limit usage flagged; high engagement customer
Premium customer
Age 42, current, vehicle loan + ₹5k EMI
₹15,000₹17,000₹20,000 ₹2,00,000 9% ₹14,500 High limit signals creditworthiness; strong Q3 forecast

All figures in INR. Predictions are average monthly spend for July–August–September. The training median target is ₹3,141/month; the mean is ₹6,825/month (right-skewed distribution). Model trained on 32,820 customers.

How to read these numbers

All five profiles show strong regression toward the population mean — the model's predicted Q3 spend is consistently well below the Apr–Jun average. This is not a model error. It reflects the structural R² ≈ 0.16 of the dataset: individual credit card spend is highly volatile, and without richer behavioural signals, the most probabilistically defensible estimate leans toward the centre of the distribution. The model's value is in ranking customers by expected spend, not in producing precise point forecasts.

Author's Note

The dataset's low predictive ceiling is a structural limitation, not a modelling one. With only three months of transaction history, high proportions of missing loan and investment data, and no behavioural signals beyond spend amounts, every approach tested plateaued in the 116 to 118 RMSLE range regardless of model complexity. This project was revisited in 2026 with an updated architecture including tuned gradient boosting, proper out-of-fold stacking, log-space feature engineering, and GPU-accelerated training. The leaderboard score remained unchanged. The data is the constraint.

Context for US readers: Indian spending patterns

The numbers above reflect Indian retail banking data. The median customer in this dataset spends roughly ₹3,000–5,000/month (~$35–60 USD) on their credit card. This is not a data quality issue — it accurately represents the Indian middle-class consumer segment that TVS Motor Company was targeting.

A few things to keep in mind:

  • India's per-capita income is roughly 15–20x lower than the US. A ₹10,000/month credit card spend is a high-usage customer here.
  • Credit card penetration in India is still growing. Many customers in this dataset hold a single card with a modest limit (₹25,000–75,000, or ~$300–900).
  • EMI (Equated Monthly Instalment) financing is widespread — large purchases like two-wheelers are commonly split into monthly payments, which shows up separately from discretionary spend.
  • The target variable maps to purchase behaviour in the July–September monsoon season, which has distinct spending dynamics for Indian consumers.