Health Insurance Marketplace Analysis
Healthcare Insurance Marketplace Analysis (HIMA) aims to provide patterns and data insights
for customers and clients in the healthcare market.
Project Details / Background
This project aims to conduct data analysis on health insurance marketplace data come from the Centers for Medicare & Medicaid Service (CMS). Starting in March
2010, Affordable Care Act is designed to provide benefits for all Americans in
bringing reform to the health insurance market. This project aims to analysis the data
including everything related to all Federal Health Insurance in the United States.
Health Insurance Exchange Public Use Files (PUFs) are health insurance files
released by the Centers for Medicare & Medicaid Services for easy access to data
regarding each year’s information on health insurance. The data captures the
characteristics of individuals and families who have enrolled in health plans through
the state and federal health insurance exchanges and provides a comprehensive view
of the current health insurance landscape. There are more than 13 million data entries in the dataset.
Data are available for plan years 2014 to 2023, for the purpose of this project, I am
using the federal-level data range from 2019 to 2022.
Dataset processing example
Data cleaning was performed using Python Pandas looking for NULL and Missing Values in the dataset.
The data are then stored in AWS cloud S3 storage and connected to EC2 using Secure Shell Protocol(SSH).
Four datasets are imported into Athena database, and query transformation and aggregation were performed to output
useful data for further analysis.
Main numbers taking into accounts are individual/family rate, copayment amount/percentage, and deductible amount/percentage.
After data analysis and visualization, some clear patterns were identified:
- Individual rates from all age groups are increasing over the past few years. Individual rate for children and teenagers are growing most, much higher growth rate than other age groups;
- dental-only plans have a more dichotomous distribution where higher rated plans tend to increase over time, while lower-rated plans tend to decrease over time;
- West Virginia has the highest average Federal Health Insurance Individual rate; Alaska has the highest average Federal Health Insurance Family rate;
- copay after deductible in general requires much fewer copay than with deductibles. And majority of insurances provides copay without deductible has copay amount lower than 100 dollars.
- Individual rates from all age groups are increasing over the past few years. Individual rate for children and teenagers are growing most, much higher growth rate than other age groups;
- dental-only plans have a more dichotomous distribution where higher rated plans tend to increase over time, while lower-rated plans tend to decrease over time;
- West Virginia has the highest average Federal Health Insurance Individual rate; Alaska has the highest average Federal Health Insurance Family rate;
- copay after deductible in general requires much fewer copay than with deductibles. And majority of insurances provides copay without deductible has copay amount lower than 100 dollars.
Part of the analysis process and findings:
full code and documentation:
here
Results Showcase
SQL Query Transformation on Athena
Individual Plan Rates Comparison Cross US States
Family Plan Rates from 2019-2022
Individual Rate Plans by Age Groups from 2019-2022