9/19/2019 Heiko Paulheim 1
Big Data, Smart Algorithms, and Market Power
A Computer Scientist’s Perspective
Heiko Paulheim
Chair for Data Science
University of Mannheim
Heiko Paulheim
9/19/2019 Heiko Paulheim 2
Introductory Example: GPS vs. Smart Phones
• Tests show: smart phones do the job better
– with smart phones on the rise, GPS sales decline
0
5.000
10.000
15.000
20.000
25.000
30.000
GPSsales
Smart phonesales
Source: Statista
Data for Germany;
US looks similar
9/19/2019 Heiko Paulheim 3
Computer Science Interlude: Navigation
• Problem: find the shortest path through a network
• Solution: known since the 1950s
– can be written down in less than 20 lines
End
Start
2km
2km
1km
1km
1km
3km
2km
1km
9/19/2019 Heiko Paulheim 4
Computer Science Interlude: Navigation
• Usually, we do not want the shortest way
– but the fastest
• We need to estimate times
End
Start
0:05 0:15
0:10
0:10
0:15
0:15
0:05
0:10
9/19/2019 Heiko Paulheim 5
Estimating Times for Edges
• Static: path length and speed limit
• Dynamic: live car movements
• Google Maps: owned by Google
– So is Android (market share US: 48%, Germany: 73%, China: 79%)
– i.e., about one android phone in every other car
Source: https://gs.statcounter.com/os-market-share/mobile/
9/19/2019 Heiko Paulheim 6
Visual Depiction
• One Android phone in every other car
Image: Bing Maps
9/19/2019 Heiko Paulheim 7
Improving Navigation
• Ingredients:
– A simple standard textbook algorithm from the 1950s
– A lot of data
• Better navigation
– Usually: not by smarter algorithms
– But by better (=bigger) data!
End
Start
0:05
0:10
0:15
0:10 0:25
0:10
0:15
0:15
0:05
Image: https://neo4j.com/blog/top-13-resources-graph-theory-algorithms/
9/19/2019 Heiko Paulheim 8
A.I. Winters and A Paradigm Shift
• AI has a massive uptake since the 2010s
– But using very different paradigms
1st
AI Winter
2nd
AI Winter
Fast & Horvitz (2016): Long-Term Trends in the Public Perception of Artificial Intelligence
9/19/2019 Heiko Paulheim 9
An Example for AI: Go
• 1990s
– Using handcrafted rules
• i.e., smart algorithms
– Often defeated by children
2010s
Using data from millions of
games
i.e., big data
AlphaGo: Beat some of world’s
best players in 2016
9/19/2019 Heiko Paulheim 10
AI in the Big Data Age (1)
• Algorithms are fairly simple and well known
• Data matters
Banko & Brill (2001): Scaling to Very Very Large Corpora for Natural Language Disambiguation
smarter
algorithm
more
data
9/19/2019 Heiko Paulheim 11
AI in the Big Data Age (2)
• Algorithms are fairly simple and well known
• Data matters
Banko & Brill (2001): Scaling to Very Very Large Corpora for Natural Language Disambiguation
more data:
trivial baseline
beats smart
algorithms
9/19/2019 Heiko Paulheim 12
Big Data: Long vs. Wide Data
• Long data = more records of the same kind
– e.g., GPS data from more users
• Wide data = more information about the same records
– e.g., additional information about users
Lehmberg & Hassanzadeh (2018): Ontology Augmentation Through Matching with Web Tables
9/19/2019 Heiko Paulheim 13
It’s All about Patterns in Data
• Examples
– Traffic movements
– Online user behavior
– Cliques in social networks
– …
• Methods:
– Data Mining
– Machine Learning
– …
→ Intensively researched since the 1980s
Image: https://factordaily.com/balaraman-ravindran-reinforcement-learning/
9/19/2019 Heiko Paulheim 14
Patterns in Long Data
9/19/2019 Heiko Paulheim 15
Patterns in Long Data
9/19/2019 Heiko Paulheim 16
Patterns in Wide Data
9/19/2019 Heiko Paulheim 18
Big Data: Long vs. Wide Data
• Example: YouTube (owned by Google)
– Display videos to the user that are as interesting as possible
• Long data: users’ interaction histories
• Wide data:
users’ interaction histories + Google Web searches + visited places
+ Google Play music preferences + ...
9/19/2019 Heiko Paulheim 19
Big Data: Long vs. Wide Data
• Example: Facebook
– Display as much content of interest as possible
• Long data: user profile and interactions
• Wide data:
user profile and interactions + WhatsApp chats
In Germany,
OVG Hamburg
prohibits this
combination!
Image: https://www.instagram.com/p/Bt3OG4DFOsK/
9/19/2019 Heiko Paulheim 20
Big Data: Long vs. Wide Data
• Example: WeChat
• Started as chat application
– showing advertisement based on chats
– later added: apps-in-app (shopping, payment, …)
– CS perspective: rather an OS than an app
• Long data
– Many people’s chats
• Wide data
– Chats
– Shopping history (also includes: products viewed)
– Payment history
Image: Wikipedia
9/19/2019 Heiko Paulheim 21
Take Aways
• Modern AI Systems
– Rely on massive amounts of data
– Processed with fairly simple algorithms
• Algorithms are often well known
– e.g., textbooks, research papers
– It is hard to own an algorithm
• Data is crucial
– Longer data (e.g., acquiring more customers)
– Wider data (e.g., merging businesses)
– It is easy to own data
9/19/2019 Heiko Paulheim 22
Big Data, Smart Algorithms, and Market Power
A Computer Scientist’s Perspective
Heiko Paulheim
Chair for Data Science
University of Mannheim
Heiko Paulheim

Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspective

  • 1.
    9/19/2019 Heiko Paulheim1 Big Data, Smart Algorithms, and Market Power A Computer Scientist’s Perspective Heiko Paulheim Chair for Data Science University of Mannheim Heiko Paulheim
  • 2.
    9/19/2019 Heiko Paulheim2 Introductory Example: GPS vs. Smart Phones • Tests show: smart phones do the job better – with smart phones on the rise, GPS sales decline 0 5.000 10.000 15.000 20.000 25.000 30.000 GPSsales Smart phonesales Source: Statista Data for Germany; US looks similar
  • 3.
    9/19/2019 Heiko Paulheim3 Computer Science Interlude: Navigation • Problem: find the shortest path through a network • Solution: known since the 1950s – can be written down in less than 20 lines End Start 2km 2km 1km 1km 1km 3km 2km 1km
  • 4.
    9/19/2019 Heiko Paulheim4 Computer Science Interlude: Navigation • Usually, we do not want the shortest way – but the fastest • We need to estimate times End Start 0:05 0:15 0:10 0:10 0:15 0:15 0:05 0:10
  • 5.
    9/19/2019 Heiko Paulheim5 Estimating Times for Edges • Static: path length and speed limit • Dynamic: live car movements • Google Maps: owned by Google – So is Android (market share US: 48%, Germany: 73%, China: 79%) – i.e., about one android phone in every other car Source: https://gs.statcounter.com/os-market-share/mobile/
  • 6.
    9/19/2019 Heiko Paulheim6 Visual Depiction • One Android phone in every other car Image: Bing Maps
  • 7.
    9/19/2019 Heiko Paulheim7 Improving Navigation • Ingredients: – A simple standard textbook algorithm from the 1950s – A lot of data • Better navigation – Usually: not by smarter algorithms – But by better (=bigger) data! End Start 0:05 0:10 0:15 0:10 0:25 0:10 0:15 0:15 0:05 Image: https://neo4j.com/blog/top-13-resources-graph-theory-algorithms/
  • 8.
    9/19/2019 Heiko Paulheim8 A.I. Winters and A Paradigm Shift • AI has a massive uptake since the 2010s – But using very different paradigms 1st AI Winter 2nd AI Winter Fast & Horvitz (2016): Long-Term Trends in the Public Perception of Artificial Intelligence
  • 9.
    9/19/2019 Heiko Paulheim9 An Example for AI: Go • 1990s – Using handcrafted rules • i.e., smart algorithms – Often defeated by children 2010s Using data from millions of games i.e., big data AlphaGo: Beat some of world’s best players in 2016
  • 10.
    9/19/2019 Heiko Paulheim10 AI in the Big Data Age (1) • Algorithms are fairly simple and well known • Data matters Banko & Brill (2001): Scaling to Very Very Large Corpora for Natural Language Disambiguation smarter algorithm more data
  • 11.
    9/19/2019 Heiko Paulheim11 AI in the Big Data Age (2) • Algorithms are fairly simple and well known • Data matters Banko & Brill (2001): Scaling to Very Very Large Corpora for Natural Language Disambiguation more data: trivial baseline beats smart algorithms
  • 12.
    9/19/2019 Heiko Paulheim12 Big Data: Long vs. Wide Data • Long data = more records of the same kind – e.g., GPS data from more users • Wide data = more information about the same records – e.g., additional information about users Lehmberg & Hassanzadeh (2018): Ontology Augmentation Through Matching with Web Tables
  • 13.
    9/19/2019 Heiko Paulheim13 It’s All about Patterns in Data • Examples – Traffic movements – Online user behavior – Cliques in social networks – … • Methods: – Data Mining – Machine Learning – … → Intensively researched since the 1980s Image: https://factordaily.com/balaraman-ravindran-reinforcement-learning/
  • 14.
    9/19/2019 Heiko Paulheim14 Patterns in Long Data
  • 15.
    9/19/2019 Heiko Paulheim15 Patterns in Long Data
  • 16.
    9/19/2019 Heiko Paulheim16 Patterns in Wide Data
  • 17.
    9/19/2019 Heiko Paulheim18 Big Data: Long vs. Wide Data • Example: YouTube (owned by Google) – Display videos to the user that are as interesting as possible • Long data: users’ interaction histories • Wide data: users’ interaction histories + Google Web searches + visited places + Google Play music preferences + ...
  • 18.
    9/19/2019 Heiko Paulheim19 Big Data: Long vs. Wide Data • Example: Facebook – Display as much content of interest as possible • Long data: user profile and interactions • Wide data: user profile and interactions + WhatsApp chats In Germany, OVG Hamburg prohibits this combination! Image: https://www.instagram.com/p/Bt3OG4DFOsK/
  • 19.
    9/19/2019 Heiko Paulheim20 Big Data: Long vs. Wide Data • Example: WeChat • Started as chat application – showing advertisement based on chats – later added: apps-in-app (shopping, payment, …) – CS perspective: rather an OS than an app • Long data – Many people’s chats • Wide data – Chats – Shopping history (also includes: products viewed) – Payment history Image: Wikipedia
  • 20.
    9/19/2019 Heiko Paulheim21 Take Aways • Modern AI Systems – Rely on massive amounts of data – Processed with fairly simple algorithms • Algorithms are often well known – e.g., textbooks, research papers – It is hard to own an algorithm • Data is crucial – Longer data (e.g., acquiring more customers) – Wider data (e.g., merging businesses) – It is easy to own data
  • 21.
    9/19/2019 Heiko Paulheim22 Big Data, Smart Algorithms, and Market Power A Computer Scientist’s Perspective Heiko Paulheim Chair for Data Science University of Mannheim Heiko Paulheim