Predicting Income Using Census Data

Author: Tanuj Ranjith Program: Stanford Pre-Collegiate Summer Institutes — Intro to Machine Learning

How to View the Notebook

You can view the notebook in several ways:

Directly on GitHub: Click on the .ipynb file.
If it doesn’t render properly:
- Download it and open locally in Jupyter Notebook or VS Code.
- Or open it on Google Colab here: Open in Google Colab

Project Description

A machine-learning project completed as part of the Stanford Pre-Collegiate Intro to Machine Learning program. The goal was to predict whether a person’s income exceeds $50K/year using the Adult Income dataset from the UCI Machine Learning Repository.

Performed:

Data cleaning, label encoding, and exploratory data analysis
Feature preprocessing for numerical and categorical variables
Model training and evaluation using various classifiers

Models Used

Tuned AdaBoost
Gradient Boosting
XGBoost
AdaBoost
Tuned Gradient Boosting
Tuned Bagging Classifier
Random Forest
Bagging Classifier
Logistic Regression
Tuned Random Forest
Decision Tree
Tuned Decision Tree
Support Vector Machine (SVM)

Dataset Details

Source: UCI Adult Census Dataset
Records: 48,842 | Features: 14
Target: income → >50K or <=50K
Feature Examples: age, education, occupation, marital-status, race, sex, hours-per-week, capital-gain/loss, native-country

Performance Overview

Model	Accuracy	Recall	Precision	F1
Tuned AdaBoost	0.868	0.612	0.805	0.679
Gradient Boosting	0.865	0.589	0.761	0.667
XGBoost	0.874	0.675	0.756	0.704
Logistic Regression	0.831	0.454	0.715	0.549

Tuned AdaBoost achieved the best overall performance, balancing recall and precision. Top predictors included education, capital gain, and hours-per-week.

Libraries & Tools

Developed using Python with: scikit-learn, pandas, numpy, matplotlib, seaborn.

Future Improvements

Improve recall using deeper ensemble architectures or SMOTE balancing.
Add model explainability tools like SHAP or LIME.
Experiment with neural networks for feature abstraction.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Final submission files		Final submission files
adult		adult
README.md		README.md
Stanford-ML-Project Notebook.ipynb		Stanford-ML-Project Notebook.ipynb
Stanford-ML-Project Research Poster.png		Stanford-ML-Project Research Poster.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Income Using Census Data

How to View the Notebook

Project Description

Models Used

Dataset Details

Performance Overview

Libraries & Tools

Future Improvements

References

About

Uh oh!

Releases

Packages

Languages

tanujranjith/Stanford-ML-Project

Folders and files

Latest commit

History

Repository files navigation

Predicting Income Using Census Data

How to View the Notebook

Project Description

Models Used

Dataset Details

Performance Overview

Libraries & Tools

Future Improvements

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages