Skip to content

We will segment customers based on their purchasing behavior patterns using a modified LRFM metric to create more more targeted and personalized promotions.

Notifications You must be signed in to change notification settings

SatriaImawan12/Customer-Segmentation-and-Profiling-by-Modified-LRFM-Analysis-using-CLARA-Algorithm-

Repository files navigation

Customer Segmentation and Profiling by Modified LRFM Analysis using CLARA Algorithm to Optimize Marketing Strategy

Table of Contents

Introduction

This project aims to enhance the marketing strategy of a company by understanding customer behavior more deeply through customer segmentation and profiling. We utilize a modified LRFM (Length, Recency, Frequency, Monetary) analysis and the CLARA (Clustering Large Applications) algorithm to achieve this.

Problem Analysis

Retention Issues

cohort_diskon.png cohort_non_diskon.png

  • Users with their first transaction using a discount show similar or worse retention compared to non-discount users.
  • The existing discount promotions are not optimal in attracting new customers and retaining them in the long term.

Goals

  • Understand customer purchasing behavior to optimize promotional strategies for better targeting.
  • Increase customer retention and maintain existing customer loyalty.

Solution

  • A more personalized promotional strategy analysis by segmenting customers based on purchasing behavior patterns using the modified LRFM metrics.

Data Source

Workflow

  1. Business & Data Understanding
  2. Data Preprocessing
  3. Feature Engineering
  4. Modelling (CLARA)
  5. Model Evaluation
  6. Profiling
  7. Business Recommendation & Solution

Data Preprocessing

  • Data Cleaning: Handling missing values, invalid data, and outliers.
  • Data Merging: Standardizing column names and formatting data.
  • Feature Engineering: Creating and modifying features for clustering.

Data Cleaning

  • Removed transactions with zero gross amount (0.57%).
  • Replaced missing discount values with 0 (69.83%).
  • Deleted rows with invalid data (4,991 rows).
  • Kept outliers to preserve valuable transaction information.

Feature Engineering

  • Length: Difference between the date of the last transaction and the first transaction for each user.
  • Recency Score (1/R): Inverse of the value of the difference between the last transaction date and the reference date (2025-01-01).
  • Monetary per Frequency (M/F): Average amount spent per transaction.

Importance of LRFM Metrics

  • Target promotional strategies accurately.
  • Identify customers at risk of churn and the most valuable customers.

Modelling

Algorithm Comparison Table

Algorithm Robust to Outliers Handles Large Datasets Time Complexity Low Dimension Performance
CLARA Yes Yes Middle Good
K-Means No Yes Low Excellent
DBSCAN Yes Moderate High Good
K-Medoids Yes No High Good

Advantages of CLARA

  • Handles large datasets efficiently.
  • Provides robust clustering results due to sampling.

Disadvantages of CLARA

  • Performance can be sensitive to the choice of parameters.
  • May not be as accurate on small datasets due to its reliance on sampling.

Algorithm: CLARA

  • A modification of the PAM (Partitioning Around Medoids) algorithm.
  • Uses medoids as cluster centers and sampling methods for efficiency with large datasets.

Default Parameters

  • n_clusters = 8
  • init = ‘build’
  • n_sampling = None

Best Parameter Combination

  • n_clusters = 4
  • n_sampling = 300
  • init = k-medoids++
  • random_state = 42
  • Silhouette Score: 0.652

best_model.png

Model Evaluation

Cluster Profiles

  1. At Risk (413 customers): Long-lasting, medium spenders, last transaction a long time ago.
  2. Potential Loyalist (452 customers): Long-lasting, small spenders, most recent transactions.
  3. Lost (111 customers): Short-term, very small spenders, last transaction a long time ago.
  4. Loyal (453 customers): Long-lasting, medium spenders, most recent transactions.

Profiling

Cluster Name Number of Customers Description
At Risk 413 Long-lasting, medium spenders, last transaction a long time ago
Potential Loyalist 452 Long-lasting, small spenders, most recent transactions
Lost 111 Short-term, very small spenders, last transaction a long time ago
Loyal 453 Long-lasting, medium spenders, most recent transactions
  • At Risk: Re-engage with personalized offers.
  • Potential Loyalist: Encourage repeat purchases with targeted campaigns.
  • Lost: Reactivate with exclusive discounts and limited-time offers.
  • Loyal: Reward loyalty with community-building initiatives and early access programs.

Business Recommendations

At Risk

  • Special treatment to make customers feel valued, such as early access to new products or exclusive member offers.

Potential Loyalist

  • Personalized offers, nostalgic campaigns, and customer education.

Lost

  • Exclusive large discounts and time-limited offers to entice customers to return.

Loyal

  • Community-building initiatives, customer appreciation programs, and early access to new collections.

Dashboard

References

Important Note

All datasets and project results are used solely for educational purposes and do not reflect actual values. Please do not use this project as a reference or recommendation.


Thank you for checking out our project! We hope this work contributes to the optimization of marketing strategies through effective customer segmentation and profiling.

About

We will segment customers based on their purchasing behavior patterns using a modified LRFM metric to create more more targeted and personalized promotions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published