NCAA Basketball March Mania 2023
The goal of the project is training model for probability prediction of March Madness Tournament game results based on previous game history in NCAA league.
Project Details / Background
The NCAA Division I men's basketball tournament, branded as NCAA March Madness and commonly called March Madness. The tournament consists of 68 teams and was first conducted in 1939. 68 teams were competing in seven rounds of a single-elimination bracket. The NCAA project has been ongoing for 6 years since 2018. Previous analysts and researchers have contributed many important findings and insights through Exploratory Data Analysis and Statistical Analysis. Based on previous results and stats, I selected several important features that are correlated with prediction results.
Related Features Processed dataset sample
Most of the features require statistical computation using percentage metrics and function. Combined with objective ratings from internet, 10 different features were implemented into model building. Model evaluation was then performed using the processed dataset. And Xgboost has been found to arrive to the most optimal results using previous years' results as K-fold cross-validation. Finally, some manuel override predictions were made to polarize prediction score for better log-loss score. The model predicted the results for 2023 and it achieves a Brier Score of 0.18795 (Baseline: 0.25). Codes could be found on Github.