Fintech · Predictive Analytics · TVS Motor Capstone

Forecasting Credit Card
Spend at the Individual Level

Predicting average credit card consumption for the next 3 months using customer demographics, transaction history, and financial behaviour to enable banks to personalise offers and anticipate customer needs.

32,820 rows

Training customers

116.2

RMSLE × 100 (threshold: 125)

#1 / batch

Leaderboard rank

Models evaluated

The Problem

Why predict individual credit card spend?

Understanding consumption at the individual level lets banks move from reactive to proactive CRM, offering the right product to the right person at the right time.

Personalised Marketing

Knowing who will spend more next quarter lets banks target high-value segments with tailored rewards and credit limit upgrades.

Risk Management

Predicting a sharp drop in spending can signal churn risk or financial stress, enabling early intervention before the customer leaves.

Revenue Forecasting

Aggregated individual-level predictions give finance teams a bottom-up forecast of interchange income and card portfolio performance.

The Dataset

44 features across three months of behaviour

XYZ Bank provided a snapshot of 32,820 customers with April to June transaction history and demographic data. The target is their average credit card spend in July to September.

32,820

Training customers

14,067

Test customers

Raw features

Engineered features

₹6,824

Mean target spend

Missing CC spend data

44 Features

CC Spend

Monthly consumption & transaction count (Apr, May, Jun)

Debit Card

Monthly debit consumption & count (Apr, May, Jun)

Bank Account

Credit/debit amounts, max credit, transaction counts

Investments

Demat, FD, insurance holdings (investment_1 to 4)

Loans

Personal & vehicle loans (active + closed), EMI amount

Demographics

Age, gender, region code, account type, card limit

Key Insights

What the data tells us

EDA revealed a consistent downward spending trend and meaningful differences across customer segments.

Monthly CC Spend Trend

Average credit card consumption, Apr to predicted Jul-Sep

Spending declined 59% from April to June. The most recent month is the strongest predictor of future behaviour, especially in log space.

Average Spend by Segment

Target variable broken down by key demographic features

Female cardholders and savings account holders consistently show higher average spend. Both are meaningful predictors after controlling for other variables.

Base Model Performance (RMSLE × 100)

All 14 regressors evaluated individually. Lower is better. Threshold = 125.

Only Huber Regressor beats the threshold alone. Stacking the models pushes the score down to 116.2.

Feature Importance (Gain)

Top predictors from the final CatBoost model

Log-transformed average CC spend is 4× more important than any other feature — the log scale captures the proportional nature of financial behaviour.

The Model

Multi-level stacking ensemble

Three gradient-boosted tree models generate out-of-fold predictions, which are combined by a Ridge meta-learner trained on those OOF outputs.

Feature Engineering

92 features incl. log-space CC trends, utilisation, bank flows

→

5-Fold CV Split

OOF predictions, leak-free region target encoding per fold

→

Base Models

LightGBM · XGBoost · CatBoost

→

Ridge Meta-Learner

Learns optimal blend of base model predictions

→

RMSLE × 100: 116.2

vs threshold of 125, #1 in batch

LightGBM

116.89

Meta-weight: 0.10

XGBoost

116.62

Meta-weight: 0.08

CatBoost Best

116.30

Meta-weight: 0.82

Ridge Meta (final)

116.28

OOF stacked prediction

Model Scenarios

How the model segments customers

Pre-computed outputs from the full 92-feature model across five representative customer profiles.

Why no interactive demo? The model's 92 input features include monthly debit card spend, number of debit transactions, bank account inflows and outflows, investment balances, and loan enquiry history. None of these can be collected from a web form without direct access to a bank's transaction ledger. In a production deployment, these would come via an open-banking API. What's shown below uses the full feature set with real model outputs.

Customer profile	Apr spend	May spend	Jun spend	Card limit	Utilisation	Predicted Q3 avg	Signal
Budget saver Age 28, savings, no loans	₹1,200	₹1,000	₹1,000	₹25,000	4%	₹1,056	Consistent low spend; model predicts continuation
Steady spender Age 34, current, no loans	₹4,000	₹4,500	₹5,000	₹50,000	9%	₹4,820	Upward trend; model forecasts continued growth
Active borrower Age 38, current, personal loan + ₹3k EMI	₹4,000	₹4,500	₹5,000	₹50,000	9%	₹3,940	EMI burden moderately reduces discretionary spend
High utilisation Age 32F, savings, no loans	₹18,000	₹21,000	₹24,000	₹25,000	84%	₹8,200	Near-limit usage flagged; high engagement customer
Premium customer Age 42, current, vehicle loan + ₹5k EMI	₹15,000	₹17,000	₹20,000	₹2,00,000	9%	₹14,500	High limit signals creditworthiness; strong Q3 forecast

All figures in INR. Predictions are average monthly spend for July–August–September. The training median target is ₹3,141/month; the mean is ₹6,825/month (right-skewed distribution). Model trained on 32,820 customers.

How to read these numbers

All five profiles show strong regression toward the population mean — the model's predicted Q3 spend is consistently well below the Apr–Jun average. This is not a model error. It reflects the structural R² ≈ 0.16 of the dataset: individual credit card spend is highly volatile, and without richer behavioural signals, the most probabilistically defensible estimate leans toward the centre of the distribution. The model's value is in ranking customers by expected spend, not in producing precise point forecasts.

Author's Note

The dataset's low predictive ceiling is a structural limitation, not a modelling one. With only three months of transaction history, high proportions of missing loan and investment data, and no behavioural signals beyond spend amounts, every approach tested plateaued in the 116 to 118 RMSLE range regardless of model complexity. This project was revisited in 2026 with an updated architecture including tuned gradient boosting, proper out-of-fold stacking, log-space feature engineering, and GPU-accelerated training. The leaderboard score remained unchanged. The data is the constraint.

Context for US readers: Indian spending patterns

The numbers above reflect Indian retail banking data. The median customer in this dataset spends roughly ₹3,000–5,000/month (~$35–60 USD) on their credit card. This is not a data quality issue — it accurately represents the Indian middle-class consumer segment that TVS Motor Company was targeting.

A few things to keep in mind:

India's per-capita income is roughly 15–20x lower than the US. A ₹10,000/month credit card spend is a high-usage customer here.
Credit card penetration in India is still growing. Many customers in this dataset hold a single card with a modest limit (₹25,000–75,000, or ~$300–900).
EMI (Equated Monthly Instalment) financing is widespread — large purchases like two-wheelers are commonly split into monthly payments, which shows up separately from discretionary spend.
The target variable maps to purchase behaviour in the July–September monsoon season, which has distinct spending dynamics for Indian consumers.

Forecasting Credit CardSpend at the Individual Level