꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Is Big Data Good or Evil
1. Is Big Data Good or Evil?
Arthur Charpentier and Ewen Gallic
Eco Club
Institut Franco-Américain
Rennes, January 2017
1
2. ● Arthur Charpentier ( @freakonometrics)
○ Assistant Professor, University of Rennes 1
○ Ph. D. in Applied Mathematics, KU Leuven
○ Fellow of the French Institute of Actuaries
○ MSc in Mathematics Applied to Economics, Paris Dauphine University
○ MSc in Statistics, ENSAE, Paris
● Ewen Gallic ( @3wen)
○ Ph. D. Student in Economics, University of Rennes 1
○ MSc in Econometrics, University of Rennes 1
2
4. Data from our activity
4
Everyday, data are collected from our activities such as:
● Paying our basket at the grocery store, at the pharmacy, …
● Doing a Google search
● Listening to some music on Spotify
● Sending an e-mail
● Posting an update on a social network
5. Different kind of data are collected:
Data Created/Shared by us
5
● Texts from conversations on our
phones or on Facebook, Twitter, …
6. Different kind of data are collected:
Data Created/Shared by us
6
● Texts from conversations on our
phones or on Facebook, Twitter, …
● Photos posted on Instagram, Facebook,
Tumblr, Pinterest, …
7. Different kind of data are collected:
Data Created/Shared by us
7
● Texts from conversations on our
phones or on Facebook, Twitter, …
● Photos posted on Instagram, Facebook,
Tumblr, Pinterest, …
● Videos uploaded on Snapchat,
YouTube, …
8. Different kind of data are collected:
Data Created/Shared by us
8
● Texts from conversations on our
phones or on Facebook, Twitter, …
● Photos posted on Instagram, Facebook,
Tumblr, Pinterest, …
● Videos uploaded on Snapchat,
YouTube, …
● Data footprint
● …
9. Sensors record our activity:
Data Collected by Sensors
9
● GPS tracking on our phones
Source: Michael Wallace
Source: Aaron Parecki, via FlowingData
10. Sensors record our activity:
Data Collected by Sensors
10
● GPS tracking on our phones
● Activity trackers, Sleep trackers
Fitbit watch
11. Sensors record our activity:
Data Collected by Sensors
11
● GPS tracking on our phones
● Activity trackers, Sleep trackers
● In-car tracking devices
YouDrive
12. ● Climate data from weather stations
Sensors record our activity:
Data Collected by Sensors
12
● GPS tracking on our phones
● Activity trackers, Sleep trackers
● In-car tracking devices
● …
Weather Station
13. The 4 Vs of Big Data
Four specific attributes to define Big Data (the four Vs):
13
VOLUME VELOCITY
VERACITY VARIETY
14. The 4 Vs of Big Data
14
Source: IBM Big Data & Analytics Hub
= 2.5 x 1018
bytes = 2.5 billion gigabytes
~ 357,142,857 three hours HD movies
on Netflix
15. The 4 Vs of Big Data
15
Source: IBM Big Data & Analytics Hub
16. The 4 Vs of Big Data
16
Source: IBM Big Data & Analytics Hub
~15% of 2016 US GDP
17. The 4 Vs of Big Data
17
Source: IBM Big Data & Analytics Hub
● Structured data, e.g.:
○ Time
○ Date
○ Value
○ …
● Unstructured data, e.g.:
○ Video
○ Podcast
○ Social Media Status
○ …
19. The Angel: Healthcare
19
● Data-driven analysis to predict a
disease’s geographical spread
○ Ebola in 2014
Map of Ebola cases in West Africa from January
2014 to December 2015.
Source: World Health Organization
20. The Angel: Healthcare
20
● Data-driven analysis to predict a
disease’s geographical spread
○ Ebola in 2014
● Google Flu Trend
○ Aims at predicting flu
○ Big failure in 2013
Divergence of Google Flu Trends
Source: How accurate is Google Flu Trends?
Keith Winstein (2013)
21. The Angel: Healthcare
21
● Data-driven analysis to predict a
disease’s geographical spread
○ Ebola in 2014
● Google Flu Trend
○ Aims at predicting flu
○ Big failure in 2013
● Creation of Electronic Health Records
○ Better diagnostic
○ Reduced costs
22. ● Telemetry
○ Fairer premiums
○ Better knowledge of driving habits
○ Drop in number of accidents
The Angel: Insurance
22
Source: TIA Technology
● Telemetry
23. ● Telemetry
○ Fairer premiums
○ Better knowledge of driving habits
○ Drop in number of accidents
The Angel: Insurance
23
● Detection of fraudulent claims
○ In France, the insurance association
(FFA) estimates fraudulent claims to
amount 5% of claims
Fraudulent claims:
€2.5 Billion in 2015
Source: FFA via L'argus de
l'Assurance
25. The Angel: Jobs
● Curricula processing
○ Faster
○ Reduce discrimination?
○ Help to find suitable candidates for a
position
25
72% of résumés are never
seen by human eyes
Source: Cathy O’Neil, “Weapons of Math
Destruction” (2016) Crown
26. ● Adjusting schedules
○ During peak hours and off-peak hours
The Angel: Jobs
● Curricula processing
○ Faster
○ Reduce discrimination?
○ Help to find suitable candidates for a
position
26
27. Nate Silver, the developer of PECOTA,
Editor-in-chief of FiveThirtyEight
The Angel: Sports
27
● Forecasting Player Performances
○ Nate Smith’s PECOTA
28. ● Providing guidance
○ Kirk Goldsberry
The Angel: Sports
28
● Forecasting Player Performances
○ Nate Smith’s PECOTA
Source: BallR by Todd W. Schneider
(Reproduction of Goldsberry’s chart)
30. ● Growing number of sensors to monitor
our “wellness”
○ Intrusive
○ Insurers get in the way
The Demon: Healthcare
30
Apple Watch
● What about security of the standardized
medical data?
○ What happens if your future employer get his
hands on this kind of data?
31. The Demon: Insurance
● Moving towards the individual
○ Individual pricing
○ Opaque
○ Blind to inequalities
31
32. The Demon: Jobs
● Optimized schedules
○ Just-in-time economy applied to human being
○ Work to live, or live to work?
○ Hits people in desperate need of money
● Curricula processing
○ Asymmetric information
○ No feedback on rejected candidates
● Algorithms designed to fire people
○ Opaque procedures
○ Unfair
32
33. Recap
● Data are created or shared by us, or even recorded by sensors
● A huge amount is created every day, and comes in different forms (text, video, …)
● It may help to increase welfare, or understanding patterns around us…
● But it also contributes to an increase of inequalities
33