Health Insurance Marketplace Analysis
Healthcare Insurance Marketplace Analysis (HIMA) aims to provide patterns and data insights for customers and clients in the healthcare market.
Project Details / Background
This project aims to conduct data analysis on health insurance marketplace data come from the Centers for Medicare & Medicaid Service (CMS). Starting in March 2010, Affordable Care Act is designed to provide benefits for all Americans in bringing reform to the health insurance market. This project aims to analysis the data including everything related to all Federal Health Insurance in the United States.
Health Insurance Exchange Public Use Files (PUFs) are health insurance files released by the Centers for Medicare & Medicaid Services for easy access to data regarding each year’s information on health insurance. The data captures the characteristics of individuals and families who have enrolled in health plans through the state and federal health insurance exchanges and provides a comprehensive view of the current health insurance landscape. There are more than 13 million data entries in the dataset. Data are available for plan years 2014 to 2023, for the purpose of this project, I am using the federal-level data range from 2019 to 2022.
Data cleaning was performed using Python Pandas looking for NULL and Missing Values in the dataset. The data are then stored in AWS cloud S3 storage and connected to EC2 using Secure Shell Protocol(SSH). Four datasets are imported into Athena database, and query transformation and aggregation were performed to output useful data for further analysis.
Main numbers taking into accounts are individual/family rate, copayment amount/percentage, and deductible amount/percentage. After data analysis and visualization, some clear patterns were identified:

- Individual rates from all age groups are increasing over the past few years. Individual rate for children and teenagers are growing most, much higher growth rate than other age groups;
- dental-only plans have a more dichotomous distribution where higher rated plans tend to increase over time, while lower-rated plans tend to decrease over time;
- West Virginia has the highest average Federal Health Insurance Individual rate; Alaska has the highest average Federal Health Insurance Family rate;
- copay after deductible in general requires much fewer copay than with deductibles. And majority of insurances provides copay without deductible has copay amount lower than 100 dollars.
Part of the analysis process and findings:
full code and documentation:
here