Machine learning is being applied on almost every sports field. But when it comes to cricket, there is very little work being done. Lets see if we can apply simple machine learning models to predict player scores of Pakistan Team.
Predicting Player Scores using Machine Learning
We have data of over 1000 ODI matches from 2009-2017. The data has been scraped from various websites and is in the form of a database.
Features and Labels:
This is one of the most important parts in machine learning. We will keep our features simple since this is just an experiment to see if machine learning works in cricket or not. Our features are:
- Opponent Team (Each team is assigned an id)
- Ground (Each ground is assigned an id)
- Country in which a match is being played. (Each country is assigned an id)
- Whether the match is day or day and night. (Binary feature)
Our labels are ranges of scores of each player. For now, we have only opted to go with 3 score classes:
- Class 0: Score from 0 to 30
- Class 1: Score from 30 to 80
- Class 2: Score from 80+
After selecting our features and labels, I just wrote simple queries to fetch data from the database and applied a few machine learning models for Pakistan Players. Lets see the results.
The accuracy for results for a few Pakistani Players come out to be. I tried 5 different machine learning models and for each player, chose the one that was giving the best results for that player. The algorithms I used are K nearest neighbors, Logistic Regression, Decision Tree, Random Forests and Naive Bayes. Random Forest usually performs best.
- Sharjeel Khan – 71% accuracy
- Babar Azam – 37% accuracy
- Shoaib Malik – 71% accuracy
- Sarfaraz Ahmed – 86% accuracy
- Umar Akmal – 70% accuracy
- Kamran Akmal – 82% accuracy
- Ahmed Shehzad – 52% accuracy
- Muhammad Hafeez – 46% accuracy
- Younis Khan – 80% accuracy
- Azhar Ali – 54% accuracy
From these accuracies, we can clearly see that our models can find patterns for some players but not all players. One thing to note here is that even 50% accuracy is good in our case since this is a 3 class classification and 50% means more than random guessing which is 33% (100/3)
These are just numbers, you would want to see some actual results as well, right? Lets check what our model gives us.
Original Score Ranges = [0 0 1 0 1 1 0]
Predicted Score Ranges = [1 0 0 0 1 1 0]
We can clearly see that the model is performing well. It told us when will Sharjeel score 30+ twice right and only once wrong.
This is definitely not the case with all players since we have only used a small number of features. But this gives us an idea about how we can use machine learning to predict players. We aim to build a better classifier in the future with more sophisticated features. This was just an experiment that we wanted to share with the community.
If you have any questions, feel free to ask.