Scalable Multi-Class Classification with LightGBM and CatBoostt

Category:

Data Science

Client:

Duration:

2 Months

Preview

GitHub Page

Project Overview

This project aims to solve a multi-class classification problem using a dataset of 1.2 million samples. The dataset was preprocessed, balanced, and used to train and evaluate two advanced machine learning models: LightGBM and CatBoost. The models were optimized and deployed using the ONNX runtime for efficient inference.

Table of Contents

Key Features

Preprocessing of 1.2 million rows of raw data (scaling, cleaning, and feature selection).
Addressed class imbalance using Partial SMOTE and class-weight adjustments.
Trained LightGBM and CatBoost classifiers with hyperparameter tuning.
Exported trained models in ONNX format for fast inference in production.
Provided detailed evaluation with metrics such as accuracy, precision, recall, and F1-score.
Designed clear visualizations of data distribution, feature importance, and model performance.

Dataset Information

Number of Rows: 1,200,000
Number of Features: 15 (plus a target class column)
Target Classes: 3 distinct categories.
Problem: Multi-class classification with imbalanced classes.

Technologies Used

Python: Primary programming language.
Libraries:
- Data Processing: pandas, numpy, scikit-learn, imblearn
- Model Training: LightGBM, CatBoost
- Deployment: ONNX, onnxmltools, onnxruntime
- Visualization: Matplotlib, Seaborn
Tools:
- Google Colab (with GPU for faster training)
- ONNX Runtime for deployment.

Pipeline and Workflow

Data Preprocessing:
Removed null values, scaled features, and mapped target labels.
Train-Test Split:
80% data for training and 20% for testing.
Class Balancing:
Applied Partial SMOTE to oversample underrepresented classes.
Feature Engineering:
Analyzed feature importance using trained models.
Model Training:
Trained LightGBM and CatBoost with hyperparameter tuning.
Model Evaluation:
Evaluated models with metrics like accuracy, precision, recall, and confusion matrices.
ONNX Conversion:
Converted both models to ONNX format for lightweight, high-speed inference.

Model Evaluation

LightGBM:

Accuracy: ~72.2%
Precision (weighted): 74%
Recall (weighted): 72%
F1-Score (weighted): 71%

CatBoost:

Accuracy: ~72.1%
Precision (weighted): 73%
Recall (weighted): 72%
F1-Score (weighted): 71%

ONNX Conversion

Both the LightGBM and CatBoost models were successfully converted to ONNX format for deployment:

LightGBM ONNX Model: lightgbm_model_onnx.onnx
CatBoost ONNX Model: catboost_model_onnx.onnx

ONNX Runtime was used to verify predictions, ensuring consistent performance between the original models and their ONNX equivalents.

Results

Successfully built a pipeline for training multi-class classification models.
Achieved over 72% accuracy on both LightGBM and CatBoost models.
Demonstrated the use of ONNX for efficient deployment and edge inference.

License

This project is licensed under the MIT License.

Preview

GitHub Page

Other Projects

Data Anlaysis

Sales Insight

Data Anlaysis

Sales Insight

Data Anlaysis

Sales Insight

Machine Learing

Volleyball Player Statistics Analysis for NCAA Division 3 (2024)

Machine Learing

Volleyball Player Statistics Analysis for NCAA Division 3 (2024)

Machine Learing

Volleyball Player Statistics Analysis for NCAA Division 3 (2024)

Data Analysis

Movie Network Analysis Project

Data Analysis

Movie Network Analysis Project

Data Analysis

Movie Network Analysis Project

Data Analysis

Twitch Social Network Analysis Project

Data Analysis

Twitch Social Network Analysis Project

Data Analysis

Scalable Multi-Class Classification with LightGBM and CatBoostt

Other Projects

Sales Insight

Sales Insight

Sales Insight

Volleyball Player Statistics Analysis for NCAA Division 3 (2024)

Volleyball Player Statistics Analysis for NCAA Division 3 (2024)

Volleyball Player Statistics Analysis for NCAA Division 3 (2024)

Movie Network Analysis Project

Movie Network Analysis Project

Movie Network Analysis Project

Twitch Social Network Analysis Project

Twitch Social Network Analysis Project

Twitch Social Network Analysis Project

Contact Me

Social

Useful Links

Contact Me

Social

Useful Links

Contact Me

Social

Useful Links