EDA Smoking in UK
A final project for CSCI3022 Intro to Data Science
Project Overview
This project explores smoking habits in the UK through Exploratory Data Analysis (EDA), using Python and common data science libraries.
The dataset, sourced from Kaggle, was analyzed to uncover trends in behavior, with a personal motivation: to better understand smoking patterns as part of my own journey toward living a healthier, smoke-free life.
The dataset was cleaned and explored to extract statistical summaries — including mean, median, and mode — for features such as age and smoking frequency on weekends.
Key Features
- Performed data cleaning and preprocessing to prepare the dataset for analysis.
- Conducted both univariate and multivariate analysis to examine the relationships between variables.
- Used Python libraries like NumPy, Pandas, Seaborn, and Matplotlib to create insightful visualizations.
- Derived meaningful statistical insights about smoking behavior using measures of central tendency and dispersion.
Leveraged visual tools to create scatter plots, box-and-whisker plots, and multivariate graphs that captured nuanced behavioral trends.
Technical Stack
- Languages: Python
- Data Analysis: Pandas, NumPy
- Visualization: Seaborn, Matplotlib
- Statistical Techniques: Descriptive statistics, Univariate & Multivariate Analysis
- Tools: Jupyter Notebook, VS Code, Git
Project Demo
Watch a full walkthrough of the project below: