EDA Smoking in UK | Alberto Espinosa

Project Overview

This project explores smoking habits in the UK through Exploratory Data Analysis (EDA), using Python and common data science libraries.
The dataset, sourced from Kaggle, was analyzed to uncover trends in behavior, with a personal motivation: to better understand smoking patterns as part of my own journey toward living a healthier, smoke-free life.

The dataset was cleaned and explored to extract statistical summaries — including mean, median, and mode — for features such as age and smoking frequency on weekends.

Key Features

Performed data cleaning and preprocessing to prepare the dataset for analysis.
Conducted both univariate and multivariate analysis to examine the relationships between variables.
Used Python libraries like NumPy, Pandas, Seaborn, and Matplotlib to create insightful visualizations.
Derived meaningful statistical insights about smoking behavior using measures of central tendency and dispersion.

Leveraged visual tools to create scatter plots, box-and-whisker plots, and multivariate graphs that captured nuanced behavioral trends.

Technical Stack

Languages: Python
Data Analysis: Pandas, NumPy
Visualization: Seaborn, Matplotlib
Statistical Techniques: Descriptive statistics, Univariate & Multivariate Analysis
Tools: Jupyter Notebook, VS Code, Git

Project Demo

Watch a full walkthrough of the project below: