Exploratory Data Analysis Report

Complete analysis covering tabular demographics and image-level insights across 14,993 pet listings and 72,766 images.

EDA Results

Dataset Overview

Tabular Data

Loading dataset description...

Image Data

Loading...

Sample Rows

Data Types & Missing Values

Adoption Speed Distribution

Distribution of adoption speed categories across all 14,993 listings. Speed 0 = adopted same day, Speed 4 = not adopted after 100+ days.

Proportion of dogs vs cats in the dataset.

Key Insights

  • Loading insights...

Correlation Analysis

Correlation heatmap of numeric features. Upper triangle masked. Hover for exact values.

Top Correlated Feature Pairs

Key Insights

  • Loading...

Health & Adoption

Proportional distribution of adoption speeds by health status.

Vaccination status vs adoption speed proportional breakdown.

Key Findings

  • Healthy pets dominate the dataset and show a moderately balanced adoption distribution
  • Vaccinated pets tend to have slightly faster adoption compared to unvaccinated ones
  • Very few pets have serious injuries — sample sizes too small for strong conclusions

Does Age Play a Role in How Quickly Adoption Occurs?

Younger animals tend to have a better chance of being adopted — both dogs and cats. Dogs are slightly older than cats on average, but the median age for both is around 3 months.

Age (months) by adoption speed — Dogs. Younger dogs are adopted faster.

Age (months) by adoption speed — Cats. Similar trend to dogs.

Key Insights

  • Weak negative association between age and adoption speed — younger pets tend to be adopted more quickly
  • Outliers exist in all age buckets, but the trend is consistent across both dogs and cats
  • Pets with adoption speed 4 (not adopted) tend to be slightly older on average

Does the Listing Fee Influence Adoption Speed?

The vast majority of pets are listed for free. Pets priced at RM 20 show slightly faster adoption than free listings.

Top fee values vs adoption speed — proportional breakdown. Most pets are free (fee = 0).

Key Insights

  • Free pets (fee = 0) dominate the dataset; their adoption speed distribution reflects the overall average
  • Pets with fee ≈ RM 20 show slightly faster adoption — may reflect a higher-quality or more committed listing
  • Very high fees tend to have higher "no adoption" rates — price may deter potential adopters

Image Size Distribution & Aspect Ratio

Width vs Height scatter colored by animal type (Dog/Cat). Shows the diversity of image dimensions.

File size distribution (KB) by animal type.

Aspect ratio distribution with reference lines for common ratios.

Key Insights

  • Image dimensions: High diversity — images range from small thumbnails to high-resolution photos
  • File sizes: Most images are 20 – 100 KB, manageable for batch processing
  • Aspect ratios: Majority cluster near 4:3 and 3:2 — plan augmentation accordingly

Photo Count Analysis

Distribution of number of photos uploaded per pet listing.

Photo count vs adoption speed — pets with more photos tend to be adopted faster.

Key Insights — Strongest Visual Signal

  • Loading...

Color Space Analysis & Quality Metrics

Quality Metrics by Adoption Speed

Composite quality score (normalized average of all 5 metrics) vs adoption speed.

Key Insights — Quality Metrics

  • Image quality alone does not predict adoption speed — photo count remains the dominant factor
  • Among all quality metrics, blurriness is the main culprit — blurry images show a higher tendency towards rejection (Speed 4)
  • Composite quality score tested as an alternative — distributions overlap substantially across all speeds
  • Quality metrics may still be useful as auxiliary features in a multi-modal model

Dominant Color Analysis

Palettes by Adoption Speed

Palettes: Dog vs Cat

Key Insights

  • Dominant colors across adoption speeds are similar — warm browns, whites, and blacks prevail
  • No strong differentiation in color palettes between fast and slow adoption
  • Dogs and cats have slightly different color profiles due to breed differences

Breed Image Similarity & Visual Clusters

Breed mean image features (brightness, sharpness, contrast, colorfulness, saturation) are compared pairwise to see which breeds produce visually similar photos. Agglomerative clustering groups breeds into 5 visual clusters per type, then each cluster's adoption speed profile is shown.

Dogs — Breed Image Similarity

Cosine similarity between breed mean image feature vectors. Brighter = more similar visual style.

Adoption speed proportions for each visual cluster — do certain photo styles predict faster adoption?

Cats — Breed Image Similarity

Cosine similarity between cat breed mean image feature vectors.

Adoption speed proportions per visual cluster for cats.

Breed vs Adoption Speed (Image-Sampled)

Dog breed adoption speed proportions computed from the quality-sampled image set.

Cat breed adoption speed proportions computed from the quality-sampled image set.

Cross-Correlation: Dog Breeds × Cat Breeds

Cosine similarity between each dog breed's mean feature vector and each cat breed's. High values mean their photos share a similar visual style — useful for understanding cross-species visual overlap.

Dog breeds (rows) × Cat breeds (columns). Warmer colours = more similar image quality profiles.

Combined Clustering — Dogs & Cats Together

All dog and cat breeds clustered jointly using agglomerative clustering. Dog breeds are shown in blue, cat breeds in orange. Clusters that mix both types share a common visual style across species.

Adoption speed proportions per combined cluster.

Key Insights

  • Loading...

Feature Visualization (PCA & t-SNE)

PCA explained variance — how much information each principal component captures from the 5 quality features.

t-SNE projection of quality features colored by adoption speed. Perplexity = 30, 1000 iterations.

Key Insights

  • PCA shows first 2 components capture approximately 60% of variance — moderate redundancy in quality features
  • t-SNE projection does not show clear clusters by adoption speed, confirming image quality alone is insufficient
  • Some local structure is visible, driven by brightness/contrast differences rather than adoption outcomes

Cross-Modality Analysis

Photo count x quality score interaction — mean adoption speed as color. Lower values (darker) indicate faster adoption.

Quality metrics comparison between dogs and cats.

Combined Findings

  • High photo count combined with moderate quality shows fastest adoption in the interaction heatmap
  • Dogs and cats have similar quality metric distributions
  • Photo count remains the dominant factor regardless of quality level

Dataset Characteristics & Methodology

Dataset Characteristics

  • 14,993 listings with images and tabular features; AdoptionSpeed (0–4) is the regression target
  • Class imbalance — Speed 4 dominates; weighted loss or oversampling required
  • Breed visual clusters confirm images carry breed-discriminative signal
  • Tabular features (breed, age, health, maturity size) correlate with visual appearance — images can be used to predict them

Image → Features → Adoption Speed

  • Images correlate with tabular features that drive adoption speed — breed, apparent age, health, and size are all visually detectable
  • Using tabular features as intermediate targets gives the image model structured supervision, producing richer embeddings than training on AdoptionSpeed alone
  • Predicted features are then aggregated to regress AdoptionSpeed — a fully image-driven pipeline with no tabular input at inference

Proposed Image Pipeline

  1. Backbone: EfficientNet-B3 pretrained on ImageNet, fine-tuned on pet images (224 × 224, normalised)
  2. Multi-task heads: breed classification + tabular feature regression (Age, MaturitySize, Health) as auxiliary targets to enrich the embedding
  3. Adoption head: shared embedding → MLP → AdoptionSpeed regression (ordinal cross-entropy or MSE)
  4. Evaluation: quadratic weighted kappa (primary) + per-class F1 to monitor minority-class recall