1. 1
1 1
1
MINOR-2 PROJECT
SYNOPSIS
For
Analyzing Olympic Performance using Azure
Services
Submitted By
Specialization SAP ID Name
Big Data (NH) 500093061 Gautam Pande
Big Data (NH) 500097073 Manas Singh
Big Data (NH) 500091355 Saachi Gupta
Department of Informatics
School Of Computer Science
University of Petroleum & Energy Studies
Dehradun- 248007. Uttarakhand
Dr. Surbhi Saraswat Dr. Shamik Tiwari
2. 2
2 1
1
Project Guide Cluster Head
School of Computer Science
University of Petroleum & Energy Studies, Dehradun
Index
S.No. Title Page No.
1 Abstract 3
2 Introduction 3
3 Literature Review 4
4 Problem Statement 5
5 Objective 5
6 Methodology 6
7 PERT Chart 8
8 References 9
4. 4
4
1
Project Title: Analyzing Olympic Performance
using Azure Services
1.Abstract
The project's goal is to examine past Olympic data to gain understanding of how different nations
have performed throughout time. The research will make use of Azure services and include
important variables like medal tallies, athlete demographics, and trends from different Olympic
games. The objective is to offer a thorough understanding of the Olympics dataset by means of
sophisticated analytics, visual aids, and possible machine learning uses.
2.Introduction
In a time of digital revolution, this project employs Microsoft Azure's advanced toolkit to
strategically dissect the intricacies of Olympic data analysis. Through the smooth integration of
Synapse Analytics, Azure Databricks, Data Lake Gen 2, and Azure Data Factory, the project aims to
offer a comprehensive solution for deriving meaningful insights from the diverse array of Olympic
statistics. This project aims to enable users to identify trends, patterns, and correlations in the
massive body of Olympic data by concentrating on data orchestration, storage, analytics, and
machine learning.
The project's fundamental idea is to take advantage of the distinct advantages offered by each Azure
service to build a dynamic ecosystem that expedites data workflows and enables advanced analytics.
This project seeks to reshape the Olympic data analysis environment by utilizing the coordinated
constructive collaboration of Azure Data Factory for effective pipelines, Data Lake Gen 2 for secure
storage, Synapse Analytics for powerful querying, and Azure Databricks for collaborative machine
learning. The intention is to provide a strong framework that not only analyzes the nuances of
Olympic data but also opens new possibilities for creative thinking and well-informed decision-
making in the field of international sports to sports analysts, researchers, and enthusiasts.
5. 5
5
1
3. Literature Review
Data-driven decision-making and sports analytics have become essential elements of contemporary
sports management and strategy. The world of sports data analysis has changed due to the
incorporation of cutting-edge technologies, especially cloud-based solutions. The importance of
cloud computing in sports analytics is demonstrated by research by Albert and Ng (2018), who
stress the platform's ability to manage massive datasets effectively and enable real-time analysis.
Microsoft Azure is a well-known platform for managing a wide range of data sources in several
industries, including sports, thanks to its portfolio of services that includes Azure Data Factory, Data
Lake Gen 2, Synapse Analytics, and Azure Databricks (Chen et al., 2019).
Because of its capacity to automate, schedule, and manage intricate data workflows, Azure Data
Factory has been used in literature to orchestrate data pipelines (Chaudhary et al., 2020).
Furthermore, research by Sun et al. (2021) and Sharma and Arora (2017) highlight the critical role
that Azure Data Lake Gen 2 plays in offering secure and scalable storage solutions for large
datasets. Microsoft's integrated analytics service, Synapse Analytics, has received praise for its data
warehousing capabilities and for offering a strong platform for data exploration and query
optimization (Gadepally et al., 2019). Furthermore, Azure Databricks has been acknowledged as a
catalyst for obtaining useful insights from massive amounts of data due to its collaborative
environment for advanced analytics and machine learning (Zaharia et al., 2016).
Tax and Joustra (2015) analyzed 13 years of Dutch football competition data, comparing a model
based on betting odds alone with a hybrid incorporating additional match features. They highlighted
the unsuitability of cross-validation for sports prediction due to the time-ordered data nature. A
literature review informed feature selection, employing techniques like PCA, Sequential Forward
Selection, ReliefF, and Correlation-Based Feature Subset Selection. Nine classification algorithms
were tested via WEKA, with naive Bayes and ANN achieving the highest accuracy (54.7%) on the
full feature set. FURIA led in a betting odds-only model (55.3%), slightly surpassing the full set
without statistical significance. In a hybrid model, LogitBoost with ReliefF yielded the highest
accuracy (56.1%). The public data model versus the betting odds model difference wasn't
statistically significant, highlighting betting odds' viability as match outcome predictors.
6. 6
6
1
3.Problem Statement
Comprehensive insights are hampered by the absence of a uniform framework that makes use of
Microsoft Azure services for Olympic data analysis. There is a deficiency in comprehensive sports
analytics solutions due to the focus of current studies on specific technologies. By combining Azure
Data Factory, Data Lake Gen 2, Synapse Analytics, and Azure Databricks, this study seeks to close
this gap and improve well-informed decision-making in the international sports industry.
4.Objectives
Analyze historical Olympic data to identify trends and patterns.
Investigate factors influencing a country's performance in the Olympics.
Visualize and present key insights in an interactive and meaningful way.
Explore the potential for machine learning to predict future Olympic outcomes based on
historical data.
7. 7
7
1
5.Methodology
1. Data Preparation and Storage:
Upload the Olympics dataset to Azure Storage.
Organize the data in Azure Blob Storage or Azure Data Lake Storage.
2. Azure Databricks:
Create a Databricks workspace for advanced analytics.
Explore the dataset using Spark notebooks for deeper insights.
3. Azure Synapse Analytics:
Create a Synapse workspace and dedicated SQL pool.
Load the dataset into Synapse SQL Data Warehouse using Azure Data Factory.
4. Azure Machine Learning:
Set up an Azure Machine Learning workspace.
Investigate the potential for predictive modeling based on historical Olympic data.
5. Power BI (For Visualization):
Establish a Power BI workspace for creating interactive dashboards.
Connect Power BI to Azure Synapse Analytics for real-time data visualization.
10. 1
0
`
1
References
A machine learning framework for sport result prediction - Rory P. Bunker a, Fadi Thabtah.
Olympic-Data-Analysis - Tanish Khandelwal.
Olympics Data Analyzer with Prediction- Hitanshi Shah, Jay Sheth, Hetvi Savla, Jyoti
Bansode, Bijal Patel, Aruna Yewale
DATA ANALYSIS AND VISUALIZATION OF OLYMPICS USING PYSPARK AND
DASH-PLOTLY - Harshal S. Kudale, Mihir V. Phadnis, Pooja J. Chittar, Kalpesh P.
Zarkar,Prof. Balaji K. Bodhke