2021 Big Data Cup
The data sets available for use have been crafted specifically for the Big Data Cup. They are not in original format but are intended to be a small portion of Stathletes’ data that is translated for public consumption. They focus on two key areas of the game that have grown in prominence over the past year:
Women’s hockey: (2018 Women’s Olympic Hockey Tournament sample and NCAA games)
Interest in women’s sports has increased during the pandemic. By providing access to data about the game played at its highest level, we hope participants work with this robust data set from international tournaments.
Scouting: [40 game sample from the Erie Otters in 2019/2020]
With COVID-related interruptions in many junior / development leagues, many draft eligible players are sidelined for most of the 2020-2021 season. This allows us to ask questions about what we can look for in players that might deepen our understanding of development.
Thanks to the Erie Otters to allow us to publish a sample data set from the 2019/2020 season.
Access the data here.
NOTE: As noted on the legal agreement, this data cannot be re-sold and is intended to be used for research purposes only.
2021 Timeline and Key Dates
- January - Data is released: GitHub link
- February - Open office hours + HANIC
- March 5th - Big Data Cup submissions due
- March 26th, 27th - 2021 Ottawa Hockey Analytics Conference, finalists will be invited to present at this event
- March 27th - Winners in each category announced (4 winners in total)
- Women’s Hockey
- Scouting Data
We want to foster each participant’s ability to both evaluate data from a process and a technical perspective. To assist with this, we will be scheduling Office Hours for the Big Data Cup (specific times to be announced in February). Subject matter experts in both sport and analysis will be available to answer your questions and provide feedback. Check back for details on ways to sign up.
Interested in participating?
Anyone interested is encouraged to apply and data will be provided publicly to advance hockey research.
There will be 2 categories for participants:
- Highschool & undergraduate - all participants must provide proof of enrollment at at highschool or undergraduate level
- Open - This category includes graduate students and anyone interested in hockey research.
Teams can be 1-4 participants.
Finalists will be selected* on or by March 15th, will receive a complimentary ticket to the Conference, and will have the opportunity to present their findings to our panel of NHL executives.
Prizes will be awarded to top qualifiers.
Participation in Big Data Cup competition is open any individual regardless of background, experience, previous analysis, or public work.
*Evaluation criteria includes, but is not limited to: a demonstrated ability to creating actionable insights for a general manager or head coach working in hockey and not just research); generating creative ideas, which may mean borrowing and applying ideas from other sports, leveraging domain knowledge, and/or filling gaps created by limitations of public data; a performative understanding of how to work with large data sets.
Final submissions will be due MARCH 5, 2021 and should:
- Define the question you asked
- Provide a short summary of your approach
- Give an overview of your findings
- Identify key action points from your analysis
- Data and code may be included as an appendix
Maximum 6 pages, including figures (size limit 10GB on submission).
Submissions can be emailed to: firstname.lastname@example.org with subject line: Big Data Cup 2021.
Please note that email size is limited to 25MB, to send larger submissions (up to 10GB), use Dropbox, Google Drive or other file-sharing services and include the link in your submission email.