Predicting average credit card consumption for the next 3 months using customer demographics, transaction history, and financial behaviour to enable banks to personalise offers and anticipate customer needs.
Understanding consumption at the individual level lets banks move from reactive to proactive CRM, offering the right product to the right person at the right time.
Knowing who will spend more next quarter lets banks target high-value segments with tailored rewards and credit limit upgrades.
Predicting a sharp drop in spending can signal churn risk or financial stress, enabling early intervention before the customer leaves.
Aggregated individual-level predictions give finance teams a bottom-up forecast of interchange income and card portfolio performance.
XYZ Bank provided a snapshot of 32,820 customers with April to June transaction history and demographic data. The target is their average credit card spend in July to September.
EDA revealed a consistent downward spending trend and meaningful differences across customer segments.
Average credit card consumption, Apr to predicted Jul-Sep
Spending declined 59% from April to June. The most recent month is the strongest predictor of future behaviour, especially in log space.
Target variable broken down by key demographic features
Female cardholders and savings account holders consistently show higher average spend. Both are meaningful predictors after controlling for other variables.
All 14 regressors evaluated individually. Lower is better. Threshold = 125.
Only Huber Regressor beats the threshold alone. Stacking the models pushes the score down to 116.2.
Top predictors from the final CatBoost model
Log-transformed average CC spend is 4× more important than any other feature — the log scale captures the proportional nature of financial behaviour.
Three gradient-boosted tree models generate out-of-fold predictions, which are combined by a Ridge meta-learner trained on those OOF outputs.
Pre-computed outputs from the full 92-feature model across five representative customer profiles.
| Customer profile | Apr spend | May spend | Jun spend | Card limit | Utilisation | Predicted Q3 avg | Signal |
|---|---|---|---|---|---|---|---|
| Budget saver Age 28, savings, no loans |
₹1,200 | ₹1,000 | ₹1,000 | ₹25,000 | 4% | ₹1,056 | Consistent low spend; model predicts continuation |
| Steady spender Age 34, current, no loans |
₹4,000 | ₹4,500 | ₹5,000 | ₹50,000 | 9% | ₹4,820 | Upward trend; model forecasts continued growth |
| Active borrower Age 38, current, personal loan + ₹3k EMI |
₹4,000 | ₹4,500 | ₹5,000 | ₹50,000 | 9% | ₹3,940 | EMI burden moderately reduces discretionary spend |
| High utilisation Age 32F, savings, no loans |
₹18,000 | ₹21,000 | ₹24,000 | ₹25,000 | 84% | ₹8,200 | Near-limit usage flagged; high engagement customer |
| Premium customer Age 42, current, vehicle loan + ₹5k EMI |
₹15,000 | ₹17,000 | ₹20,000 | ₹2,00,000 | 9% | ₹14,500 | High limit signals creditworthiness; strong Q3 forecast |
All figures in INR. Predictions are average monthly spend for July–August–September. The training median target is ₹3,141/month; the mean is ₹6,825/month (right-skewed distribution). Model trained on 32,820 customers.
All five profiles show strong regression toward the population mean — the model's predicted Q3 spend is consistently well below the Apr–Jun average. This is not a model error. It reflects the structural R² ≈ 0.16 of the dataset: individual credit card spend is highly volatile, and without richer behavioural signals, the most probabilistically defensible estimate leans toward the centre of the distribution. The model's value is in ranking customers by expected spend, not in producing precise point forecasts.
The dataset's low predictive ceiling is a structural limitation, not a modelling one. With only three months of transaction history, high proportions of missing loan and investment data, and no behavioural signals beyond spend amounts, every approach tested plateaued in the 116 to 118 RMSLE range regardless of model complexity. This project was revisited in 2026 with an updated architecture including tuned gradient boosting, proper out-of-fold stacking, log-space feature engineering, and GPU-accelerated training. The leaderboard score remained unchanged. The data is the constraint.
The numbers above reflect Indian retail banking data. The median customer in this dataset spends roughly ₹3,000–5,000/month (~$35–60 USD) on their credit card. This is not a data quality issue — it accurately represents the Indian middle-class consumer segment that TVS Motor Company was targeting.
A few things to keep in mind: