SlideShare a Scribd company logo
1 of 10
1
1 1
1
MINOR-2 PROJECT
SYNOPSIS
For
Analyzing Olympic Performance using Azure
Services
Submitted By
Specialization SAP ID Name
Big Data (NH) 500093061 Gautam Pande
Big Data (NH) 500097073 Manas Singh
Big Data (NH) 500091355 Saachi Gupta
Department of Informatics
School Of Computer Science
University of Petroleum & Energy Studies
Dehradun- 248007. Uttarakhand
Dr. Surbhi Saraswat Dr. Shamik Tiwari
2
2 1
1
Project Guide Cluster Head
School of Computer Science
University of Petroleum & Energy Studies, Dehradun
Index
S.No. Title Page No.
1 Abstract 3
2 Introduction 3
3 Literature Review 4
4 Problem Statement 5
5 Objective 5
6 Methodology 6
7 PERT Chart 8
8 References 9
3
3
1
4
4
1
Project Title: Analyzing Olympic Performance
using Azure Services
1.Abstract
The project's goal is to examine past Olympic data to gain understanding of how different nations
have performed throughout time. The research will make use of Azure services and include
important variables like medal tallies, athlete demographics, and trends from different Olympic
games. The objective is to offer a thorough understanding of the Olympics dataset by means of
sophisticated analytics, visual aids, and possible machine learning uses.
2.Introduction
In a time of digital revolution, this project employs Microsoft Azure's advanced toolkit to
strategically dissect the intricacies of Olympic data analysis. Through the smooth integration of
Synapse Analytics, Azure Databricks, Data Lake Gen 2, and Azure Data Factory, the project aims to
offer a comprehensive solution for deriving meaningful insights from the diverse array of Olympic
statistics. This project aims to enable users to identify trends, patterns, and correlations in the
massive body of Olympic data by concentrating on data orchestration, storage, analytics, and
machine learning.
The project's fundamental idea is to take advantage of the distinct advantages offered by each Azure
service to build a dynamic ecosystem that expedites data workflows and enables advanced analytics.
This project seeks to reshape the Olympic data analysis environment by utilizing the coordinated
constructive collaboration of Azure Data Factory for effective pipelines, Data Lake Gen 2 for secure
storage, Synapse Analytics for powerful querying, and Azure Databricks for collaborative machine
learning. The intention is to provide a strong framework that not only analyzes the nuances of
Olympic data but also opens new possibilities for creative thinking and well-informed decision-
making in the field of international sports to sports analysts, researchers, and enthusiasts.
5
5
1
3. Literature Review
Data-driven decision-making and sports analytics have become essential elements of contemporary
sports management and strategy. The world of sports data analysis has changed due to the
incorporation of cutting-edge technologies, especially cloud-based solutions. The importance of
cloud computing in sports analytics is demonstrated by research by Albert and Ng (2018), who
stress the platform's ability to manage massive datasets effectively and enable real-time analysis.
Microsoft Azure is a well-known platform for managing a wide range of data sources in several
industries, including sports, thanks to its portfolio of services that includes Azure Data Factory, Data
Lake Gen 2, Synapse Analytics, and Azure Databricks (Chen et al., 2019).
Because of its capacity to automate, schedule, and manage intricate data workflows, Azure Data
Factory has been used in literature to orchestrate data pipelines (Chaudhary et al., 2020).
Furthermore, research by Sun et al. (2021) and Sharma and Arora (2017) highlight the critical role
that Azure Data Lake Gen 2 plays in offering secure and scalable storage solutions for large
datasets. Microsoft's integrated analytics service, Synapse Analytics, has received praise for its data
warehousing capabilities and for offering a strong platform for data exploration and query
optimization (Gadepally et al., 2019). Furthermore, Azure Databricks has been acknowledged as a
catalyst for obtaining useful insights from massive amounts of data due to its collaborative
environment for advanced analytics and machine learning (Zaharia et al., 2016).
Tax and Joustra (2015) analyzed 13 years of Dutch football competition data, comparing a model
based on betting odds alone with a hybrid incorporating additional match features. They highlighted
the unsuitability of cross-validation for sports prediction due to the time-ordered data nature. A
literature review informed feature selection, employing techniques like PCA, Sequential Forward
Selection, ReliefF, and Correlation-Based Feature Subset Selection. Nine classification algorithms
were tested via WEKA, with naive Bayes and ANN achieving the highest accuracy (54.7%) on the
full feature set. FURIA led in a betting odds-only model (55.3%), slightly surpassing the full set
without statistical significance. In a hybrid model, LogitBoost with ReliefF yielded the highest
accuracy (56.1%). The public data model versus the betting odds model difference wasn't
statistically significant, highlighting betting odds' viability as match outcome predictors.
6
6
1
3.Problem Statement
Comprehensive insights are hampered by the absence of a uniform framework that makes use of
Microsoft Azure services for Olympic data analysis. There is a deficiency in comprehensive sports
analytics solutions due to the focus of current studies on specific technologies. By combining Azure
Data Factory, Data Lake Gen 2, Synapse Analytics, and Azure Databricks, this study seeks to close
this gap and improve well-informed decision-making in the international sports industry.
4.Objectives
 Analyze historical Olympic data to identify trends and patterns.
 Investigate factors influencing a country's performance in the Olympics.
 Visualize and present key insights in an interactive and meaningful way.
 ⁠Explore the potential for machine learning to predict future Olympic outcomes based on
historical data.
7
7
1
5.Methodology
1.⁠ ⁠Data Preparation and Storage:
 Upload the Olympics dataset to Azure Storage.
 Organize the data in Azure Blob Storage or Azure Data Lake Storage.
⁠2. Azure Databricks:
 Create a Databricks workspace for advanced analytics.
 Explore the dataset using Spark notebooks for deeper insights.
3.⁠ ⁠Azure Synapse Analytics:
 Create a Synapse workspace and dedicated SQL pool.
 Load the dataset into Synapse SQL Data Warehouse using Azure Data Factory.
4.⁠ ⁠Azure Machine Learning:
 Set up an Azure Machine Learning workspace.
 Investigate the potential for predictive modeling based on historical Olympic data.
5.⁠ ⁠Power BI (For Visualization):
 Establish a Power BI workspace for creating interactive dashboards.
 Connect Power BI to Azure Synapse Analytics for real-time data visualization.
8
8
1
Block Diagram:
9
9
1
6. PERT Chart
1
0
`
1
References
 A machine learning framework for sport result prediction - Rory P. Bunker a, Fadi Thabtah.
 Olympic-Data-Analysis - Tanish Khandelwal.
 Olympics Data Analyzer with Prediction- Hitanshi Shah, Jay Sheth, Hetvi Savla, Jyoti
Bansode, Bijal Patel, Aruna Yewale
 DATA ANALYSIS AND VISUALIZATION OF OLYMPICS USING PYSPARK AND
DASH-PLOTLY - Harshal S. Kudale, Mihir V. Phadnis, Pooja J. Chittar, Kalpesh P.
Zarkar,Prof. Balaji K. Bodhke

More Related Content

Similar to synopsis (1).docx

Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsIRJET Journal
 
Enhancing The Data Mining Capabilities in large scale IT Industry: A Comprehe...
Enhancing The Data Mining Capabilities in large scale IT Industry: A Comprehe...Enhancing The Data Mining Capabilities in large scale IT Industry: A Comprehe...
Enhancing The Data Mining Capabilities in large scale IT Industry: A Comprehe...IRJET Journal
 
B04124012020
B04124012020B04124012020
B04124012020IOSR-JEN
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 
DSBDA Miniproject Assignment - TE A (1).pdf
DSBDA Miniproject Assignment - TE A (1).pdfDSBDA Miniproject Assignment - TE A (1).pdf
DSBDA Miniproject Assignment - TE A (1).pdfAbhiThorat6
 
Survey Surfer : A Web Based Data Gathering and Analysis Application
Survey Surfer : A Web Based Data Gathering and Analysis ApplicationSurvey Surfer : A Web Based Data Gathering and Analysis Application
Survey Surfer : A Web Based Data Gathering and Analysis Applicationaciijournal
 
SURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATION
SURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATIONSURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATION
SURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATIONaciijournal
 
Survey Surfer : A Web Based Data Gathering and Analysis Application
Survey Surfer : A Web Based Data Gathering and Analysis ApplicationSurvey Surfer : A Web Based Data Gathering and Analysis Application
Survey Surfer : A Web Based Data Gathering and Analysis Applicationaciijournal
 
Visualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningVisualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningIRJET Journal
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisIRJET Journal
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine ScrapperIRJET Journal
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfArmyTrilidiaDevegaSK
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataIJSTA
 
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsJongwook Woo
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slidesARDC
 
Analysis of Trends in Stock Market.pdf
Analysis of Trends in Stock Market.pdfAnalysis of Trends in Stock Market.pdf
Analysis of Trends in Stock Market.pdfValerie Felton
 

Similar to synopsis (1).docx (20)

Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
Enhancing The Data Mining Capabilities in large scale IT Industry: A Comprehe...
Enhancing The Data Mining Capabilities in large scale IT Industry: A Comprehe...Enhancing The Data Mining Capabilities in large scale IT Industry: A Comprehe...
Enhancing The Data Mining Capabilities in large scale IT Industry: A Comprehe...
 
B04124012020
B04124012020B04124012020
B04124012020
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
DSBDA Miniproject Assignment - TE A (1).pdf
DSBDA Miniproject Assignment - TE A (1).pdfDSBDA Miniproject Assignment - TE A (1).pdf
DSBDA Miniproject Assignment - TE A (1).pdf
 
Survey Surfer : A Web Based Data Gathering and Analysis Application
Survey Surfer : A Web Based Data Gathering and Analysis ApplicationSurvey Surfer : A Web Based Data Gathering and Analysis Application
Survey Surfer : A Web Based Data Gathering and Analysis Application
 
SURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATION
SURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATIONSURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATION
SURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATION
 
Survey Surfer : A Web Based Data Gathering and Analysis Application
Survey Surfer : A Web Based Data Gathering and Analysis ApplicationSurvey Surfer : A Web Based Data Gathering and Analysis Application
Survey Surfer : A Web Based Data Gathering and Analysis Application
 
Visualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningVisualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine Learning
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine Scrapper
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
 
Resume
ResumeResume
Resume
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slides
 
Analysis of Trends in Stock Market.pdf
Analysis of Trends in Stock Market.pdfAnalysis of Trends in Stock Market.pdf
Analysis of Trends in Stock Market.pdf
 

Recently uploaded

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

synopsis (1).docx

  • 1. 1 1 1 1 MINOR-2 PROJECT SYNOPSIS For Analyzing Olympic Performance using Azure Services Submitted By Specialization SAP ID Name Big Data (NH) 500093061 Gautam Pande Big Data (NH) 500097073 Manas Singh Big Data (NH) 500091355 Saachi Gupta Department of Informatics School Of Computer Science University of Petroleum & Energy Studies Dehradun- 248007. Uttarakhand Dr. Surbhi Saraswat Dr. Shamik Tiwari
  • 2. 2 2 1 1 Project Guide Cluster Head School of Computer Science University of Petroleum & Energy Studies, Dehradun Index S.No. Title Page No. 1 Abstract 3 2 Introduction 3 3 Literature Review 4 4 Problem Statement 5 5 Objective 5 6 Methodology 6 7 PERT Chart 8 8 References 9
  • 4. 4 4 1 Project Title: Analyzing Olympic Performance using Azure Services 1.Abstract The project's goal is to examine past Olympic data to gain understanding of how different nations have performed throughout time. The research will make use of Azure services and include important variables like medal tallies, athlete demographics, and trends from different Olympic games. The objective is to offer a thorough understanding of the Olympics dataset by means of sophisticated analytics, visual aids, and possible machine learning uses. 2.Introduction In a time of digital revolution, this project employs Microsoft Azure's advanced toolkit to strategically dissect the intricacies of Olympic data analysis. Through the smooth integration of Synapse Analytics, Azure Databricks, Data Lake Gen 2, and Azure Data Factory, the project aims to offer a comprehensive solution for deriving meaningful insights from the diverse array of Olympic statistics. This project aims to enable users to identify trends, patterns, and correlations in the massive body of Olympic data by concentrating on data orchestration, storage, analytics, and machine learning. The project's fundamental idea is to take advantage of the distinct advantages offered by each Azure service to build a dynamic ecosystem that expedites data workflows and enables advanced analytics. This project seeks to reshape the Olympic data analysis environment by utilizing the coordinated constructive collaboration of Azure Data Factory for effective pipelines, Data Lake Gen 2 for secure storage, Synapse Analytics for powerful querying, and Azure Databricks for collaborative machine learning. The intention is to provide a strong framework that not only analyzes the nuances of Olympic data but also opens new possibilities for creative thinking and well-informed decision- making in the field of international sports to sports analysts, researchers, and enthusiasts.
  • 5. 5 5 1 3. Literature Review Data-driven decision-making and sports analytics have become essential elements of contemporary sports management and strategy. The world of sports data analysis has changed due to the incorporation of cutting-edge technologies, especially cloud-based solutions. The importance of cloud computing in sports analytics is demonstrated by research by Albert and Ng (2018), who stress the platform's ability to manage massive datasets effectively and enable real-time analysis. Microsoft Azure is a well-known platform for managing a wide range of data sources in several industries, including sports, thanks to its portfolio of services that includes Azure Data Factory, Data Lake Gen 2, Synapse Analytics, and Azure Databricks (Chen et al., 2019). Because of its capacity to automate, schedule, and manage intricate data workflows, Azure Data Factory has been used in literature to orchestrate data pipelines (Chaudhary et al., 2020). Furthermore, research by Sun et al. (2021) and Sharma and Arora (2017) highlight the critical role that Azure Data Lake Gen 2 plays in offering secure and scalable storage solutions for large datasets. Microsoft's integrated analytics service, Synapse Analytics, has received praise for its data warehousing capabilities and for offering a strong platform for data exploration and query optimization (Gadepally et al., 2019). Furthermore, Azure Databricks has been acknowledged as a catalyst for obtaining useful insights from massive amounts of data due to its collaborative environment for advanced analytics and machine learning (Zaharia et al., 2016). Tax and Joustra (2015) analyzed 13 years of Dutch football competition data, comparing a model based on betting odds alone with a hybrid incorporating additional match features. They highlighted the unsuitability of cross-validation for sports prediction due to the time-ordered data nature. A literature review informed feature selection, employing techniques like PCA, Sequential Forward Selection, ReliefF, and Correlation-Based Feature Subset Selection. Nine classification algorithms were tested via WEKA, with naive Bayes and ANN achieving the highest accuracy (54.7%) on the full feature set. FURIA led in a betting odds-only model (55.3%), slightly surpassing the full set without statistical significance. In a hybrid model, LogitBoost with ReliefF yielded the highest accuracy (56.1%). The public data model versus the betting odds model difference wasn't statistically significant, highlighting betting odds' viability as match outcome predictors.
  • 6. 6 6 1 3.Problem Statement Comprehensive insights are hampered by the absence of a uniform framework that makes use of Microsoft Azure services for Olympic data analysis. There is a deficiency in comprehensive sports analytics solutions due to the focus of current studies on specific technologies. By combining Azure Data Factory, Data Lake Gen 2, Synapse Analytics, and Azure Databricks, this study seeks to close this gap and improve well-informed decision-making in the international sports industry. 4.Objectives  Analyze historical Olympic data to identify trends and patterns.  Investigate factors influencing a country's performance in the Olympics.  Visualize and present key insights in an interactive and meaningful way.  ⁠Explore the potential for machine learning to predict future Olympic outcomes based on historical data.
  • 7. 7 7 1 5.Methodology 1.⁠ ⁠Data Preparation and Storage:  Upload the Olympics dataset to Azure Storage.  Organize the data in Azure Blob Storage or Azure Data Lake Storage. ⁠2. Azure Databricks:  Create a Databricks workspace for advanced analytics.  Explore the dataset using Spark notebooks for deeper insights. 3.⁠ ⁠Azure Synapse Analytics:  Create a Synapse workspace and dedicated SQL pool.  Load the dataset into Synapse SQL Data Warehouse using Azure Data Factory. 4.⁠ ⁠Azure Machine Learning:  Set up an Azure Machine Learning workspace.  Investigate the potential for predictive modeling based on historical Olympic data. 5.⁠ ⁠Power BI (For Visualization):  Establish a Power BI workspace for creating interactive dashboards.  Connect Power BI to Azure Synapse Analytics for real-time data visualization.
  • 10. 1 0 ` 1 References  A machine learning framework for sport result prediction - Rory P. Bunker a, Fadi Thabtah.  Olympic-Data-Analysis - Tanish Khandelwal.  Olympics Data Analyzer with Prediction- Hitanshi Shah, Jay Sheth, Hetvi Savla, Jyoti Bansode, Bijal Patel, Aruna Yewale  DATA ANALYSIS AND VISUALIZATION OF OLYMPICS USING PYSPARK AND DASH-PLOTLY - Harshal S. Kudale, Mihir V. Phadnis, Pooja J. Chittar, Kalpesh P. Zarkar,Prof. Balaji K. Bodhke