This landing page presents the comprehensive project for P4AI-DS (CO3135). It spans three core data modalities - Tabular, Text, and Image - guiding each dataset through rigorous Exploratory Data Analysis (EDA) and robust Machine Learning modeling.
| Name | Student ID |
|---|---|
| Le Nguyen Khang | 2352470 |
| Nguyen Anh Thy | 2353169 |
| Lam My Phuong | 2352950 |
The objective of this project is to build an end-to-end Data Science pipeline across tabular, text, and image modalities. The project is divided into two assignment: Assignment 1: Exploratory Data Analysis (EDA) to deeply understand and visualize the inherent patterns in the data, and Assignment 2: Machine Learning to train, evaluate, and benchmark predictive algorithms.
For each modality, the EDA phase covers schema inspection, missing value handling, and visual insight generation. The Machine Learning phase builds upon these findings to deploy optimal preprocessing techniques, benchmark multiple models, and expose feature importance to validate the models' mathematical logic against human insights.
Comprehensive exploratory data analysis decoding the underlying drivers of compensation in the tech labor market.
Exploratory data analysis for the News Category dataset, focused on class balance, text lengths, missing values, yearly distribution, and keyword patterns.
Complete EDA covering tabular demographics and image-level analysis — adoption speed distributions, feature correlations, quality metrics, breed gallery, and dimensionality reduction.
The following public datasets were used in this project: