Skip to content

c0d9nqa3/Happy-Whale-Identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Happy Whale and Dolphin Identification - Silver Medal Solution

Competition Overview

This repository contains the Silver Medal solution (top 2%) for the Kaggle competition Happy Whale and Dolphin Identification. The goal was to develop models that can correctly identify individual whales and dolphins from images of their tails (flukes), which is crucial for marine conservation efforts and population monitoring.

Technical Approach

1. Image Data Preprocessing - Feature-based Image Cropping

  • Trained a YOLOv5x6 object detection model using open-source annotation data
  • Cropped dorsal fins and body parts from both training and test sets
  • Focused on extracting distinctive morphological features for improved identification accuracy

2. Dorsal Fin Feature Extraction Model

  • Split training data into training and validation subsets
  • Implemented EfficientNet-B7 as the backbone architecture
  • Integrated DOLG (Diversified Orthogonal Local and Global) feature fusion layer, combining features from the last two modules of the backbone
  • Utilized ArcFace loss function to enhance intra-class compactness and inter-class separation
  • Optimized feature representation for fine-grained visual recognition

3. Pseudo-labeling with Noise-tolerant Data Fusion

  • Extracted embedding features from test set using the trained model
  • Constructed pseudo-labels by selecting high-confidence predictions from validation results
  • Re-trained the backbone model using combined data from Step 2 training subset and pseudo-labeled test data
  • Implemented noise-resistant training strategies to handle label uncertainty

4. Clustering and Ranking

  • Extracted embedding features from both training and test sets using the final trained backbone model
  • Trained K-Nearest Neighbors (KNN) model on training set embeddings
  • Generated distance-based inferences for test set embeddings
  • Applied ranking algorithm to obtain top-5 class predictions as final results

Key Technical Innovations

  • Multi-stage Feature Fusion: DOLG integration for diversified orthogonal feature representation
  • Advanced Metric Learning: ArcFace implementation for robust embedding space optimization
  • Semi-supervised Learning: Effective pseudo-labeling strategy leveraging model confidence
  • Ensemble-free Solution: Single-model approach achieving competitive performance through careful architecture design

Additional Resources

Dataset Cropping References

Research Papers

Implementation Details

The solution demonstrates effective application of computer vision techniques for wildlife conservation, particularly addressing challenges in:

  • Fine-grained visual recognition of marine mammals
  • Handling limited training data per individual
  • Managing class imbalance in ecological datasets
  • Developing robust feature representations for conservation applications

License

This project is provided for research and educational purposes. Commercial use requires explicit permission from the author.


This solution achieved Silver Medal ranking (top 2%) in the Kaggle Happy Whale and Dolphin Identification competition, demonstrating state-of-the-art performance in marine mammal identification using computer vision techniques.

About

Kaggle competition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published