SlideShare a Scribd company logo
1 of 36
Download to read offline
Data Science
Week 1. “Data revolution”
Introduction to data science
1
“The ascendance of data”
“Data, data, everywhere”
“Data deluge”
“Drowning in data”
What does all
this mean?!?
Week 1
Table of contents
1. Forms of data
i. Traditional (numerical data, textual data)
ii. Novel
2. Introduction to data science
i. Democratization of data science
ii. What is data science? Data scientist?
3. “Data driven development”
i. Data type by source and method of generation
ii. “Big data”, “open data”, “meta data” …
iii. Size of data
2
Data can be numbers
3
Team Player Apps Mins Goals Assists Yel Red SpG
PS
%
Aerial
sWon
Mot
M
Rating
Manchester City Kevin De Bruyne Manchester City, 28, M(CLR),FW15(1) 1317 6 9 1 - 3 82.1 0.3 4 7.93
Leicester James Maddison Leicester, 23, AM(CLR)16 1415 5 3 1 - 2.8 83.8 0.3 4 7.73
Leicester Ricardo Pereira Leicester, 26, D(LR),M(R)17 1530 2 1 1 - 0.5 79.9 1.5 1 7.72
Liverpool Sadio Mané Liverpool, 27, AM(CLR),FW15(1) 1329 9 5 1 - 2.4 79.5 1.3 5 7.61
Leicester Wilfred Ndidi Leicester, 23, DMC16 1440 2 - 2 - 0.9 84 2.6 2 7.6
Wolverhampton Wanderers Adama Traoré Wolverhampton Wanderers, 23, M(R),FW14(2) 1281 3 3 - - 1 72.7 1.3 3 7.54
Leicester Jamie Vardy Leicester, 32, AM(L),FW17 1530 16 3 2 - 2.7 71.5 1.4 3 7.53
Manchester City Raheem Sterling Manchester City, 25, M(CLR),FW16 1404 9 1 4 - 3.1 81.8 0.7 - 7.5
Manchester City Riyad Mahrez Manchester City, 28, AM(CLR)7(6) 680 4 4 - - 1.8 89.7 0.2 2 7.46
Tottenham Son Heung-Min Tottenham, 27, M(CLR),FW14(1) 1238 5 7 - 1 2.7 85.9 0.4 3 7.46
Liverpool Mohamed Salah Liverpool, 27, AM(CLR),FW14 1195 9 4 - - 3.9 77.1 0.3 3 7.42
Wolverhampton Wanderers Raúl Jiménez Wolverhampton Wanderers, 28, FW17 1475 6 5 2 - 3.2 73.6 2 1 7.39
Liverpool Virgil van Dijk Liverpool, 28, D(C)17 1530 3 - 1 - 0.8 87.6 5.4 3 7.39
Manchester City Rodrigo Manchester City, 23, DMC13(2) 1154 2 1 4 - 0.5 91.9 2.3 1 7.38
Data can be text
4
Team Player Apps Mins Goals Assists Yel Red SpG
PS
%
Aerial
sWon
Mot
M
Rating
Manchester City Kevin De Bruyne Manchester City, 28, M(CLR),FW15(1) 1317 6 9 1 - 3 82.1 0.3 4 7.93
Leicester James Maddison Leicester, 23, AM(CLR)16 1415 5 3 1 - 2.8 83.8 0.3 4 7.73
Leicester Ricardo Pereira Leicester, 26, D(LR),M(R)17 1530 2 1 1 - 0.5 79.9 1.5 1 7.72
Liverpool Sadio Mané Liverpool, 27, AM(CLR),FW15(1) 1329 9 5 1 - 2.4 79.5 1.3 5 7.61
Leicester Wilfred Ndidi Leicester, 23, DMC16 1440 2 - 2 - 0.9 84 2.6 2 7.6
Wolverhampton Wanderers Adama Traoré Wolverhampton Wanderers, 23, M(R),FW14(2) 1281 3 3 - - 1 72.7 1.3 3 7.54
Leicester Jamie Vardy Leicester, 32, AM(L),FW17 1530 16 3 2 - 2.7 71.5 1.4 3 7.53
Manchester City Raheem Sterling Manchester City, 25, M(CLR),FW16 1404 9 1 4 - 3.1 81.8 0.7 - 7.5
Manchester City Riyad Mahrez Manchester City, 28, AM(CLR)7(6) 680 4 4 - - 1.8 89.7 0.2 2 7.46
Tottenham Son Heung-Min Tottenham, 27, M(CLR),FW14(1) 1238 5 7 - 1 2.7 85.9 0.4 3 7.46
Liverpool Mohamed Salah Liverpool, 27, AM(CLR),FW14 1195 9 4 - - 3.9 77.1 0.3 3 7.42
Wolverhampton Wanderers Raúl Jiménez Wolverhampton Wanderers, 28, FW17 1475 6 5 2 - 3.2 73.6 2 1 7.39
Liverpool Virgil van Dijk Liverpool, 28, D(C)17 1530 3 - 1 - 0.8 87.6 5.4 3 7.39
Manchester City Rodrigo Manchester City, 23, DMC13(2) 1154 2 1 4 - 0.5 91.9 2.3 1 7.38
These two types have been analyzed
quantitatively (statistically) for decades
5
In the data science era, there are additional types
of data we can now use and analyze!
Data can be pictures (photograph, drawing, moving
images etc)
6
https://quickdraw.withgoogle.com/#
Data can be pictures
AI powered app Hananona
7
Take a picture
of a flower and
the app tells
you the name of
the flower
Data can be pictures
“Eigenfaces”
8
Nev Acar. Eigenfaces:
Recovering Humans
from Ghosts.
Towarddatascience.com
Data can be sound (music, voice, noise)
9
Data can be sound
10
Data can be lots of texts
11https://www.theguardian.com/books/booksblog/2017/dec/13/harry-potter-botnik-jk-rowling
Data can be location
COVID-19 cases in the United States
12Source: Johns Hopkins University
Data can be movement of people
(origin-destination)
13
Visualization by ShinagawaJP@Twitter
Data courtesy of Agoop. Tokyo residents’ travel over 4 days in 2019.
Data can be location and movement of, say, soccer
players on the field
14Source: IEEE
15
Source: National
Institute of Japanese
Literature.
https://www.nijl.ac.jp/
koten/kuzushiji/post-
4.html
Can you read
this?
This is a very
famous piece
of writing.
Data can be handwriting from the
11th Century
Kuzushiji data
Efforts to translate kuzushiji text into digital text using AI
16Source: Sankei Shimbun
Introduction to Data Science
Democratization of data science
What is data science? Data
scientist?
17
Cornelissen (2018). The Democratization of Data
Science
• New uses for data sicence are found
everyday in _________________ sectors.
• Many organizations today relegate all data
knowledge to a “handful of people.” How?
_______________
• Why is such approach problematic?
______________
• “Why would non-data scientists need to learn
data science?” ____________________
• What three things does the author propose to
democratize data science? ____________ 18
Grus (2015)
• “Ascendance of Data”
– What generates data? __________. A piece
of data is generated everytime ________
occurs.
• What is Data Science? Data scientist?
“…a data scientist is someone
who____________________.” (p.2)
19
What is data science? Data scientist? (Grus, chapter1)
20
“…a data scientist is someone who
extracts insights from messy data.”
(p.2)
What is data science? Robinson and Nolis
21
“Data science is the practice of using
data to try to understand and solve
real-world problems.” (p.5)
What is data science? Robinson and Nolis
22
No single
person can do
it all
“Data Driven Development”
and a few key concepts
23
Data-driven ____________ (World Bank)
• Business
• Government
• International development
24
World Bank. (2018). Data Driven Development
Chapter 1. Data: The Fuel of the Future
A data typology (p.2-3)
• Big data
• Personal data
• Open data
• Metadata
• Data platforms
25
World Bank. (2018). Data Driven Development
Chapter 1. Data: The Fuel of the Future
A data typology (p.2-)
26
Big data is characterized by its massive size and
complexity
Table 1.2. Big data
Data generation Intentional Unintentional
Human Primary content Data exhaust
Machine Secondary content Internet of Things data
Data
Agent
World Bank. (2018). Data Driven Development
Chapter 1. Data: The Fuel of the Future
A data typology (p.2-)
27
Open data are made available by both businesses
and governments. Example of private sector open data used by
other businesses?
World Bank. (2018). Data Driven Development
Chapter 1. Data: The Fuel of the Future
A data typology (p.2-)
28
Metadata is “data about data”
A phone call’s main “data” is the content of the conversation.
Its “metadata” includes the date and time of call, the number
called, the duration of the call, etc.
World Bank. (2018). Data Driven Development
Chapter 1. Data: The Fuel of the Future
A data typology (p.2-) Data platform
29
Facebook, for example, is a
a). Peer-only data platform
b). Intranet data platform
c). Multisided data platform
Uber connects _____ with ______.
AirBnB connects ______with _____.
Mercari connects ______ with ________.
Facebook connects _______ with _________.
World Bank. (2018). Data Driven Development
Chapter 1. Data: The Fuel of the Future
How governments use data (p.5-)
30
Transformation from e-government (1990s-, data
is just the payload of a transaction) to digital
government (2010-, data as strategic asset)
In summary, the “data revolution” means…
• There is a lot more data today (volume)
• There are new kinds of data today
(variety)
• Data accumulate rapidly (velocity)
• Human beings produce a lot more data
today (both intentional and unintentional)
• Data are increasingly generated and used
by non-humans (machines)
31
Additional background info
Size of data
• The smallest unit: a bit (a contraction of
“binary digit”)
• Eight (8) bits make up one (1) byte
• One alphabet letter is one (1) byte
• One Japanese (or Chinese etc) character
is two (2) or more bytes
32
Size of data
33
34
On Twitter (and other social
networking sites),
Data begets data, which then
begets even more data
Illustration using realDonaldTrumprealDonaldTrumprealDonaldTrumprealDonaldTrump
https://twitter.com/realDonaldTrump
35
An example
of how data
can quickly
get big and
complex
36
Size of data
It is estimated that Google has
4,000,000,000,000,000,000 bytes of data

More Related Content

Similar to Data science week_1

Data Warehouse
Data WarehouseData Warehouse
Data WarehouseSana Alvi
 
A Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsA Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsIRJET Journal
 
SWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning TechniquesSWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning Techniquesijistjournal
 
A Review On Data Mining From Past To The Future
A Review On Data Mining From Past To The FutureA Review On Data Mining From Past To The Future
A Review On Data Mining From Past To The FutureKaela Johnson
 
A Decision Support System for Inbound Marketers: An Empirical Use of Latent D...
A Decision Support System for Inbound Marketers: An Empirical Use of Latent D...A Decision Support System for Inbound Marketers: An Empirical Use of Latent D...
A Decision Support System for Inbound Marketers: An Empirical Use of Latent D...Meisam Hejazi Nia
 
Privacy in the Age of Big Data: Exploring the Role of Modern Identity Managem...
Privacy in the Age of Big Data: Exploring the Role of Modern Identity Managem...Privacy in the Age of Big Data: Exploring the Role of Modern Identity Managem...
Privacy in the Age of Big Data: Exploring the Role of Modern Identity Managem...Arab Federation for Digital Economy
 
Using Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsUsing Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsNeo4j
 
Building Social Life Networks 130818
Building Social Life Networks 130818Building Social Life Networks 130818
Building Social Life Networks 130818Ramesh Jain
 
Mining Big Data to Predicting Future
Mining Big Data to Predicting FutureMining Big Data to Predicting Future
Mining Big Data to Predicting FutureIJERA Editor
 
Data Mining @ Information Age
Data Mining @ Information AgeData Mining @ Information Age
Data Mining @ Information AgeIIRindia
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 
PatternLanguageOfData
PatternLanguageOfDataPatternLanguageOfData
PatternLanguageOfDatakimErwin
 
Getting Smart, Not Big, With Data
Getting Smart, Not Big, With DataGetting Smart, Not Big, With Data
Getting Smart, Not Big, With Dataaradovic
 
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...Michael Mortenson
 
Top 10 Read articles in Web & semantic technology
 Top  10 Read articles in Web & semantic technology Top  10 Read articles in Web & semantic technology
Top 10 Read articles in Web & semantic technologydannyijwest
 

Similar to Data science week_1 (20)

Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
A Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsA Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and Trends
 
SWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning TechniquesSWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning Techniques
 
A Review On Data Mining From Past To The Future
A Review On Data Mining From Past To The FutureA Review On Data Mining From Past To The Future
A Review On Data Mining From Past To The Future
 
Datamining
DataminingDatamining
Datamining
 
A Decision Support System for Inbound Marketers: An Empirical Use of Latent D...
A Decision Support System for Inbound Marketers: An Empirical Use of Latent D...A Decision Support System for Inbound Marketers: An Empirical Use of Latent D...
A Decision Support System for Inbound Marketers: An Empirical Use of Latent D...
 
Privacy in the Age of Big Data: Exploring the Role of Modern Identity Managem...
Privacy in the Age of Big Data: Exploring the Role of Modern Identity Managem...Privacy in the Age of Big Data: Exploring the Role of Modern Identity Managem...
Privacy in the Age of Big Data: Exploring the Role of Modern Identity Managem...
 
Using Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsUsing Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale Analytics
 
Building Social Life Networks 130818
Building Social Life Networks 130818Building Social Life Networks 130818
Building Social Life Networks 130818
 
future2020
future2020future2020
future2020
 
Mining Big Data to Predicting Future
Mining Big Data to Predicting FutureMining Big Data to Predicting Future
Mining Big Data to Predicting Future
 
Data Mining @ Information Age
Data Mining @ Information AgeData Mining @ Information Age
Data Mining @ Information Age
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
PatternLanguageOfData
PatternLanguageOfDataPatternLanguageOfData
PatternLanguageOfData
 
Getting Smart, Not Big, With Data
Getting Smart, Not Big, With DataGetting Smart, Not Big, With Data
Getting Smart, Not Big, With Data
 
Jobs Complexity
Jobs ComplexityJobs Complexity
Jobs Complexity
 
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
 
Top 10 Read articles in Web & semantic technology
 Top  10 Read articles in Web & semantic technology Top  10 Read articles in Web & semantic technology
Top 10 Read articles in Web & semantic technology
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Chapter1 is344(intro-to-gis)
Chapter1 is344(intro-to-gis)Chapter1 is344(intro-to-gis)
Chapter1 is344(intro-to-gis)
 

More from Keiko Ono

Political science meets computer science
Political science meets computer sciencePolitical science meets computer science
Political science meets computer scienceKeiko Ono
 
人口知能・自然言語処理・社会科学・政治学
人口知能・自然言語処理・社会科学・政治学人口知能・自然言語処理・社会科学・政治学
人口知能・自然言語処理・社会科学・政治学Keiko Ono
 
Data science week_2_visualization
Data science week_2_visualizationData science week_2_visualization
Data science week_2_visualizationKeiko Ono
 
US presidential selection: the Electoral College challenged (again)
US presidential selection: the Electoral College challenged (again)US presidential selection: the Electoral College challenged (again)
US presidential selection: the Electoral College challenged (again)Keiko Ono
 
The latest trends in academic research on GIS
The latest trends in academic research on GISThe latest trends in academic research on GIS
The latest trends in academic research on GISKeiko Ono
 
サーベイに基づく公的データが問題になった2例(2018年)
サーベイに基づく公的データが問題になった2例(2018年)サーベイに基づく公的データが問題になった2例(2018年)
サーベイに基づく公的データが問題になった2例(2018年)Keiko Ono
 
比べてみた関関同立の研究力 Top Kansai private universities' research output
比べてみた関関同立の研究力 Top Kansai private universities' research output 比べてみた関関同立の研究力 Top Kansai private universities' research output
比べてみた関関同立の研究力 Top Kansai private universities' research output Keiko Ono
 
Research analysis waseda_2018
Research analysis waseda_2018Research analysis waseda_2018
Research analysis waseda_2018Keiko Ono
 
Journal analysis SSJJ 2018
Journal analysis SSJJ 2018Journal analysis SSJJ 2018
Journal analysis SSJJ 2018Keiko Ono
 
Journal analysis JJPS_2017
Journal analysis JJPS_2017Journal analysis JJPS_2017
Journal analysis JJPS_2017Keiko Ono
 
How academic research on GitHub has evolved in the last several years
How academic research on GitHub has evolved in the last several yearsHow academic research on GitHub has evolved in the last several years
How academic research on GitHub has evolved in the last several yearsKeiko Ono
 
Multiple regression: creating a dummy variable and an interaction
Multiple regression: creating a dummy variable and an interaction Multiple regression: creating a dummy variable and an interaction
Multiple regression: creating a dummy variable and an interaction Keiko Ono
 
The Russians, the US presidency in peril, and the Mueller investigation
The Russians, the US presidency in peril, and the Mueller investigationThe Russians, the US presidency in peril, and the Mueller investigation
The Russians, the US presidency in peril, and the Mueller investigationKeiko Ono
 
Multiple regression with interaction term 2018
Multiple regression with interaction term 2018Multiple regression with interaction term 2018
Multiple regression with interaction term 2018Keiko Ono
 
Data visualization and its application to politics and elections
Data visualization and its application to politics and electionsData visualization and its application to politics and elections
Data visualization and its application to politics and electionsKeiko Ono
 
Open meteorological data_himeji
Open meteorological data_himejiOpen meteorological data_himeji
Open meteorological data_himejiKeiko Ono
 
Encoding in excel
Encoding in excelEncoding in excel
Encoding in excelKeiko Ono
 
Data tools transform_multiple_dummy
Data tools transform_multiple_dummyData tools transform_multiple_dummy
Data tools transform_multiple_dummyKeiko Ono
 
LITERATURE Dengue fever 2017
LITERATURE Dengue fever 2017LITERATURE Dengue fever 2017
LITERATURE Dengue fever 2017Keiko Ono
 

More from Keiko Ono (20)

Political science meets computer science
Political science meets computer sciencePolitical science meets computer science
Political science meets computer science
 
人口知能・自然言語処理・社会科学・政治学
人口知能・自然言語処理・社会科学・政治学人口知能・自然言語処理・社会科学・政治学
人口知能・自然言語処理・社会科学・政治学
 
Data science week_2_visualization
Data science week_2_visualizationData science week_2_visualization
Data science week_2_visualization
 
US presidential selection: the Electoral College challenged (again)
US presidential selection: the Electoral College challenged (again)US presidential selection: the Electoral College challenged (again)
US presidential selection: the Electoral College challenged (again)
 
The latest trends in academic research on GIS
The latest trends in academic research on GISThe latest trends in academic research on GIS
The latest trends in academic research on GIS
 
サーベイに基づく公的データが問題になった2例(2018年)
サーベイに基づく公的データが問題になった2例(2018年)サーベイに基づく公的データが問題になった2例(2018年)
サーベイに基づく公的データが問題になった2例(2018年)
 
比べてみた関関同立の研究力 Top Kansai private universities' research output
比べてみた関関同立の研究力 Top Kansai private universities' research output 比べてみた関関同立の研究力 Top Kansai private universities' research output
比べてみた関関同立の研究力 Top Kansai private universities' research output
 
Research analysis waseda_2018
Research analysis waseda_2018Research analysis waseda_2018
Research analysis waseda_2018
 
Journal analysis SSJJ 2018
Journal analysis SSJJ 2018Journal analysis SSJJ 2018
Journal analysis SSJJ 2018
 
Journal analysis JJPS_2017
Journal analysis JJPS_2017Journal analysis JJPS_2017
Journal analysis JJPS_2017
 
How academic research on GitHub has evolved in the last several years
How academic research on GitHub has evolved in the last several yearsHow academic research on GitHub has evolved in the last several years
How academic research on GitHub has evolved in the last several years
 
Multiple regression: creating a dummy variable and an interaction
Multiple regression: creating a dummy variable and an interaction Multiple regression: creating a dummy variable and an interaction
Multiple regression: creating a dummy variable and an interaction
 
The Russians, the US presidency in peril, and the Mueller investigation
The Russians, the US presidency in peril, and the Mueller investigationThe Russians, the US presidency in peril, and the Mueller investigation
The Russians, the US presidency in peril, and the Mueller investigation
 
Multiple regression with interaction term 2018
Multiple regression with interaction term 2018Multiple regression with interaction term 2018
Multiple regression with interaction term 2018
 
Data visualization and its application to politics and elections
Data visualization and its application to politics and electionsData visualization and its application to politics and elections
Data visualization and its application to politics and elections
 
Saitama
SaitamaSaitama
Saitama
 
Open meteorological data_himeji
Open meteorological data_himejiOpen meteorological data_himeji
Open meteorological data_himeji
 
Encoding in excel
Encoding in excelEncoding in excel
Encoding in excel
 
Data tools transform_multiple_dummy
Data tools transform_multiple_dummyData tools transform_multiple_dummy
Data tools transform_multiple_dummy
 
LITERATURE Dengue fever 2017
LITERATURE Dengue fever 2017LITERATURE Dengue fever 2017
LITERATURE Dengue fever 2017
 

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 

Data science week_1

  • 1. Data Science Week 1. “Data revolution” Introduction to data science 1 “The ascendance of data” “Data, data, everywhere” “Data deluge” “Drowning in data” What does all this mean?!?
  • 2. Week 1 Table of contents 1. Forms of data i. Traditional (numerical data, textual data) ii. Novel 2. Introduction to data science i. Democratization of data science ii. What is data science? Data scientist? 3. “Data driven development” i. Data type by source and method of generation ii. “Big data”, “open data”, “meta data” … iii. Size of data 2
  • 3. Data can be numbers 3 Team Player Apps Mins Goals Assists Yel Red SpG PS % Aerial sWon Mot M Rating Manchester City Kevin De Bruyne Manchester City, 28, M(CLR),FW15(1) 1317 6 9 1 - 3 82.1 0.3 4 7.93 Leicester James Maddison Leicester, 23, AM(CLR)16 1415 5 3 1 - 2.8 83.8 0.3 4 7.73 Leicester Ricardo Pereira Leicester, 26, D(LR),M(R)17 1530 2 1 1 - 0.5 79.9 1.5 1 7.72 Liverpool Sadio Mané Liverpool, 27, AM(CLR),FW15(1) 1329 9 5 1 - 2.4 79.5 1.3 5 7.61 Leicester Wilfred Ndidi Leicester, 23, DMC16 1440 2 - 2 - 0.9 84 2.6 2 7.6 Wolverhampton Wanderers Adama Traoré Wolverhampton Wanderers, 23, M(R),FW14(2) 1281 3 3 - - 1 72.7 1.3 3 7.54 Leicester Jamie Vardy Leicester, 32, AM(L),FW17 1530 16 3 2 - 2.7 71.5 1.4 3 7.53 Manchester City Raheem Sterling Manchester City, 25, M(CLR),FW16 1404 9 1 4 - 3.1 81.8 0.7 - 7.5 Manchester City Riyad Mahrez Manchester City, 28, AM(CLR)7(6) 680 4 4 - - 1.8 89.7 0.2 2 7.46 Tottenham Son Heung-Min Tottenham, 27, M(CLR),FW14(1) 1238 5 7 - 1 2.7 85.9 0.4 3 7.46 Liverpool Mohamed Salah Liverpool, 27, AM(CLR),FW14 1195 9 4 - - 3.9 77.1 0.3 3 7.42 Wolverhampton Wanderers Raúl Jiménez Wolverhampton Wanderers, 28, FW17 1475 6 5 2 - 3.2 73.6 2 1 7.39 Liverpool Virgil van Dijk Liverpool, 28, D(C)17 1530 3 - 1 - 0.8 87.6 5.4 3 7.39 Manchester City Rodrigo Manchester City, 23, DMC13(2) 1154 2 1 4 - 0.5 91.9 2.3 1 7.38
  • 4. Data can be text 4 Team Player Apps Mins Goals Assists Yel Red SpG PS % Aerial sWon Mot M Rating Manchester City Kevin De Bruyne Manchester City, 28, M(CLR),FW15(1) 1317 6 9 1 - 3 82.1 0.3 4 7.93 Leicester James Maddison Leicester, 23, AM(CLR)16 1415 5 3 1 - 2.8 83.8 0.3 4 7.73 Leicester Ricardo Pereira Leicester, 26, D(LR),M(R)17 1530 2 1 1 - 0.5 79.9 1.5 1 7.72 Liverpool Sadio Mané Liverpool, 27, AM(CLR),FW15(1) 1329 9 5 1 - 2.4 79.5 1.3 5 7.61 Leicester Wilfred Ndidi Leicester, 23, DMC16 1440 2 - 2 - 0.9 84 2.6 2 7.6 Wolverhampton Wanderers Adama Traoré Wolverhampton Wanderers, 23, M(R),FW14(2) 1281 3 3 - - 1 72.7 1.3 3 7.54 Leicester Jamie Vardy Leicester, 32, AM(L),FW17 1530 16 3 2 - 2.7 71.5 1.4 3 7.53 Manchester City Raheem Sterling Manchester City, 25, M(CLR),FW16 1404 9 1 4 - 3.1 81.8 0.7 - 7.5 Manchester City Riyad Mahrez Manchester City, 28, AM(CLR)7(6) 680 4 4 - - 1.8 89.7 0.2 2 7.46 Tottenham Son Heung-Min Tottenham, 27, M(CLR),FW14(1) 1238 5 7 - 1 2.7 85.9 0.4 3 7.46 Liverpool Mohamed Salah Liverpool, 27, AM(CLR),FW14 1195 9 4 - - 3.9 77.1 0.3 3 7.42 Wolverhampton Wanderers Raúl Jiménez Wolverhampton Wanderers, 28, FW17 1475 6 5 2 - 3.2 73.6 2 1 7.39 Liverpool Virgil van Dijk Liverpool, 28, D(C)17 1530 3 - 1 - 0.8 87.6 5.4 3 7.39 Manchester City Rodrigo Manchester City, 23, DMC13(2) 1154 2 1 4 - 0.5 91.9 2.3 1 7.38
  • 5. These two types have been analyzed quantitatively (statistically) for decades 5 In the data science era, there are additional types of data we can now use and analyze!
  • 6. Data can be pictures (photograph, drawing, moving images etc) 6 https://quickdraw.withgoogle.com/#
  • 7. Data can be pictures AI powered app Hananona 7 Take a picture of a flower and the app tells you the name of the flower
  • 8. Data can be pictures “Eigenfaces” 8 Nev Acar. Eigenfaces: Recovering Humans from Ghosts. Towarddatascience.com
  • 9. Data can be sound (music, voice, noise) 9
  • 10. Data can be sound 10
  • 11. Data can be lots of texts 11https://www.theguardian.com/books/booksblog/2017/dec/13/harry-potter-botnik-jk-rowling
  • 12. Data can be location COVID-19 cases in the United States 12Source: Johns Hopkins University
  • 13. Data can be movement of people (origin-destination) 13 Visualization by ShinagawaJP@Twitter Data courtesy of Agoop. Tokyo residents’ travel over 4 days in 2019.
  • 14. Data can be location and movement of, say, soccer players on the field 14Source: IEEE
  • 15. 15 Source: National Institute of Japanese Literature. https://www.nijl.ac.jp/ koten/kuzushiji/post- 4.html Can you read this? This is a very famous piece of writing. Data can be handwriting from the 11th Century
  • 16. Kuzushiji data Efforts to translate kuzushiji text into digital text using AI 16Source: Sankei Shimbun
  • 17. Introduction to Data Science Democratization of data science What is data science? Data scientist? 17
  • 18. Cornelissen (2018). The Democratization of Data Science • New uses for data sicence are found everyday in _________________ sectors. • Many organizations today relegate all data knowledge to a “handful of people.” How? _______________ • Why is such approach problematic? ______________ • “Why would non-data scientists need to learn data science?” ____________________ • What three things does the author propose to democratize data science? ____________ 18
  • 19. Grus (2015) • “Ascendance of Data” – What generates data? __________. A piece of data is generated everytime ________ occurs. • What is Data Science? Data scientist? “…a data scientist is someone who____________________.” (p.2) 19
  • 20. What is data science? Data scientist? (Grus, chapter1) 20 “…a data scientist is someone who extracts insights from messy data.” (p.2)
  • 21. What is data science? Robinson and Nolis 21 “Data science is the practice of using data to try to understand and solve real-world problems.” (p.5)
  • 22. What is data science? Robinson and Nolis 22 No single person can do it all
  • 23. “Data Driven Development” and a few key concepts 23
  • 24. Data-driven ____________ (World Bank) • Business • Government • International development 24
  • 25. World Bank. (2018). Data Driven Development Chapter 1. Data: The Fuel of the Future A data typology (p.2-3) • Big data • Personal data • Open data • Metadata • Data platforms 25
  • 26. World Bank. (2018). Data Driven Development Chapter 1. Data: The Fuel of the Future A data typology (p.2-) 26 Big data is characterized by its massive size and complexity Table 1.2. Big data Data generation Intentional Unintentional Human Primary content Data exhaust Machine Secondary content Internet of Things data Data Agent
  • 27. World Bank. (2018). Data Driven Development Chapter 1. Data: The Fuel of the Future A data typology (p.2-) 27 Open data are made available by both businesses and governments. Example of private sector open data used by other businesses?
  • 28. World Bank. (2018). Data Driven Development Chapter 1. Data: The Fuel of the Future A data typology (p.2-) 28 Metadata is “data about data” A phone call’s main “data” is the content of the conversation. Its “metadata” includes the date and time of call, the number called, the duration of the call, etc.
  • 29. World Bank. (2018). Data Driven Development Chapter 1. Data: The Fuel of the Future A data typology (p.2-) Data platform 29 Facebook, for example, is a a). Peer-only data platform b). Intranet data platform c). Multisided data platform Uber connects _____ with ______. AirBnB connects ______with _____. Mercari connects ______ with ________. Facebook connects _______ with _________.
  • 30. World Bank. (2018). Data Driven Development Chapter 1. Data: The Fuel of the Future How governments use data (p.5-) 30 Transformation from e-government (1990s-, data is just the payload of a transaction) to digital government (2010-, data as strategic asset)
  • 31. In summary, the “data revolution” means… • There is a lot more data today (volume) • There are new kinds of data today (variety) • Data accumulate rapidly (velocity) • Human beings produce a lot more data today (both intentional and unintentional) • Data are increasingly generated and used by non-humans (machines) 31
  • 32. Additional background info Size of data • The smallest unit: a bit (a contraction of “binary digit”) • Eight (8) bits make up one (1) byte • One alphabet letter is one (1) byte • One Japanese (or Chinese etc) character is two (2) or more bytes 32
  • 34. 34 On Twitter (and other social networking sites), Data begets data, which then begets even more data Illustration using realDonaldTrumprealDonaldTrumprealDonaldTrumprealDonaldTrump https://twitter.com/realDonaldTrump
  • 35. 35 An example of how data can quickly get big and complex
  • 36. 36 Size of data It is estimated that Google has 4,000,000,000,000,000,000 bytes of data