SlideShare a Scribd company logo
An Introduction to
BIG DATA
CUSO Seminar on Big Data
Prof. Dr. Philippe Cudré-Mauroux
http://exascale.info
May 22, 2014
Fribourg–Switzerland
1
2
On the Menu Today
• Big Data: Context
• Big Data: Buzzwords
– 3 Vs of Big Data
• Big Data Landscape
• Hadoop
• Big Data in Switzerland
Instant Quizz
• 3 Vs of Big Data?
• CAP?
• Hadoop?
• Spark?
3
Exascale Data Deluge
• Science
– Biology
– Astronomy
– Remote Sensing
• Web companies
– Ebay
– Yahoo
• Financial services,
retail companies
governments, etc.
© Wired 2009
➡ New data formats
➡ New machines
➡ Peta & exa-scale datasets
➡ Obsolescence of traditional
information infrastructures
The Web as the Main Driver
5
© Qmee
Big Data Central Theorem
Data+Technology  Actionable Insight  $$
6
Big Data Buzz
7
Between now and 2015, the firm expects big data to
create some 4.4 million IT jobs globally; of those,
1.9 million will be in the U.S. Applying an economic
multiplier to that estimate, Gartner expects each new
big-data-related IT job to create work for three more
people outside the tech industry, for a total of almost
6 million more U.S. jobs.
Growth in the Asia Pacific Big Data
market is expected to accelerate rapidly
in two to three years time, from a mere
US$258.5 million last year to in excess
of $1.76 billion in 2016, with highest
growth in the storage segment.
Big Data as a New Class of Asset
• The Age of Big Data (NYTimes Feb. 11, 2012)
http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-
world.html
“Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at
Google and now Facebook, are masters at harnessing the data of the Web
— online searches, posts and messages — with Internet advertising. At the
World Economic Forum last month in Davos, Switzerland, Big Data was a
marquee topic. A report by the forum, “Big Data, Big Impact,” declared data
a new class of economic asset, like currency or gold.”
8
9
The 3-Vs of Big Data
• Volume
– Amount of data
• Velocity
– speed of data in and out
• Variety
– range of data types and sources
• [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-
variety information assets that require new forms of processing to
enable enhanced decision making, insight discovery and process
optimization" 10
What can you do with the data
• Reporting
– Post Hoc
– Real time
• Monitoring (fine-grained)
• Exploration
• Finding Patterns
• Root Cause Analysis
• Closed-loop Control
• Model construction
• Prediction
• …
11
© Mike Franklin
10 ways big data changes everything
• Some concrete examples
– http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/
1. Can gigabytes predict the next Lady Gaga?
2. How big data can curb the world’s energy consumption
3. Big data is now your company’s virtual assistant
4. The future of Foursquare is data-fueled recommendations
5. How Twitter data-tracked cholera in Haiti
6. Revolutionizing Web publishing with big data
7. Can cell phone data cure society’s ills?
8. How data can help predict and create video hits
9. The new face of data visualization
10. One hospital’s embrace of big data
12
Typical Big Data Success Story
• Modeling users through Big Data
– Online ads sale / placement [e.g., Facebook]
– Personalized Coupons [e.g., Target]
– Product Placement [Walmart]
– Content Generation [e.g., NetFlix]
– Personalized learning [e.g., Duolingo]
– HR Recruiting [e.g., Gild]
13
More Data => Better Answers?
• Not that easy…
• More Rows: Algorithmic complexity kicks in
• More Columns: Exponentially more hypotheses
• Another formulation of the problem:
– Given an inferential goal and a fixed computational budget,
provide a guarantee that the quality of inference will increase
monotonically as data accrue (without bound)
• In other words:
=> Data should be a resource, not a load
14
© Mike Jordan
Big Data Infrastructures
15
A Concrete Example: Zynga
16
Leading the Pack of Wolves: Hadoop
• Google: Map/Reduce paper published 2004
• Open source variant: Hadoop
• Map-reduce = high-level programming model and
implementation for large-scale parallel data processing
• Right now most overhyped system in CS
17
What about Swiss Big Data?
• Competitive Research Groups
• Swiss Big Data User Group
• Swiss companies playing catch-up
– Productized Big Data systems at leading telcos & financial
companies
– Big Data is not a new technology: it's a fact;
• Deal with it  POCs in most banks, insurance companies, retailers
18
Tasty Bites of Big Data (1)
Thursday afternoon
• 13:30-15:00: Big Data Profiling
Felix Naumann (Hasso Plattner Institute)
• 15:15-16:45: Realtime Analytics
Christoph Koch (EPFL)
• 16:45-17:45: Current Trends and Challenges in Big Data
Benchmarking
Kais Sachs (SAP / Spec)
19
Tasty Bites of Big Data (2)
Friday
• 9:00 - 10:30: Structured Data in Web Search
Alon Halevy (Google)
• 10:45 - 12:15: Human Computation for Big Data
Gianluca Demartini (UNIFR)
• 13:30-15:00: Analysing and Querying Big Scientific Data
Thomas Heinis (EPFL)
• 15:00-16:30: The Evolution of Big Data Frameworks
Carlo Curino (Microsoft Research)
20
Social Event, Friday – Beer
Tasting!
Basse-Ville Fribourg / 15 CHF per Person
Everything You Always Wanted to Know
About Beer. * But Were Afraid to Ask!
18:00 @ Café du Belvédère, Grand-Rue
36
19:00 @ Fri-Mousse, Rue de la
Samaritaine 19
Limited Places, Inscription is mandatory at:
http://xr.si

More Related Content

What's hot

Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?
Anna Kuhn
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
Rupen Momaya
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
Dharmesh Tank
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
Matthew Urdan
 
Big data - What is It?
Big data - What is It?Big data - What is It?
Big data - What is It?
Nicole Aidney
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
Sandip Tipayle Patil
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
Collabor8now Ltd
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
Yaman Hajja, Ph.D.
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
Prashant Kumar Jadia
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
Kenny Huang Ph.D.
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
Shahbaz Anjam
 
Big data
Big dataBig data
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
Mohit Saini
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
Ritika Barethia
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
Richard Vidgen
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges
Experian_US
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
Bernard Marr
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
ali easazadeh
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
Bernard Marr
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
The Marketing Distillery
 

What's hot (20)

Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
 
Big data - What is It?
Big data - What is It?Big data - What is It?
Big data - What is It?
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Big data
Big dataBig data
Big data
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
 

Similar to An Introduction to Big Data

Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
eGov Innovation Center
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
Sanoj Kumar
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
IIIT Allahabad
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
eXascale Infolab
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
Sandip Tipayle Patil
 
Big data
Big dataBig data
Big data
Mahmudul Alam
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
berasrujana
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
The Wisdom Daily
 
Big data basics
Big data basicsBig data basics
Big data basics
Roel Palmaers
 
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria? Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
INACAP
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptx
PentaTech
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
KARTIKEY TRIPATHI
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
Sandip Tipayle Patil
 
Big Data 2.0
Big Data 2.0Big Data 2.0
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
Vivek Gautam
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Happiest Minds Technologies
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big Data
Sonovate
 

Similar to An Introduction to Big Data (20)

Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data
Big dataBig data
Big data
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
Big data basics
Big data basicsBig data basics
Big data basics
 
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria? Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptx
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data 2.0
Big Data 2.0Big Data 2.0
Big Data 2.0
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big Data
 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
eXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
eXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
eXascale Infolab
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
eXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
eXascale Infolab
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
eXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
eXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
eXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
eXascale Infolab
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
eXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
eXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
eXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
eXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
eXascale Infolab
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
eXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
 

Recently uploaded

the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
huseindihon
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
kittycrispy617
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
AnujaGaikwad28
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Samuel Jackson
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
kuldeepsharmaks8120
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
44annissa
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
huseindihon
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured DataFine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
kevig
 
Data Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learnersData Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learners
mohamed Ibrahim
 
Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
LINAT
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
saadkhan1485265
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
Jongwook Woo
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
birajmohan012
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 
Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)
Alireza Kamrani
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 

Recently uploaded (20)

the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured DataFine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
 
Data Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learnersData Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learners
 
Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 
Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 

An Introduction to Big Data

  • 1. An Introduction to BIG DATA CUSO Seminar on Big Data Prof. Dr. Philippe Cudré-Mauroux http://exascale.info May 22, 2014 Fribourg–Switzerland 1
  • 2. 2 On the Menu Today • Big Data: Context • Big Data: Buzzwords – 3 Vs of Big Data • Big Data Landscape • Hadoop • Big Data in Switzerland
  • 3. Instant Quizz • 3 Vs of Big Data? • CAP? • Hadoop? • Spark? 3
  • 4. Exascale Data Deluge • Science – Biology – Astronomy – Remote Sensing • Web companies – Ebay – Yahoo • Financial services, retail companies governments, etc. © Wired 2009 ➡ New data formats ➡ New machines ➡ Peta & exa-scale datasets ➡ Obsolescence of traditional information infrastructures
  • 5. The Web as the Main Driver 5 © Qmee
  • 6. Big Data Central Theorem Data+Technology  Actionable Insight  $$ 6
  • 7. Big Data Buzz 7 Between now and 2015, the firm expects big data to create some 4.4 million IT jobs globally; of those, 1.9 million will be in the U.S. Applying an economic multiplier to that estimate, Gartner expects each new big-data-related IT job to create work for three more people outside the tech industry, for a total of almost 6 million more U.S. jobs. Growth in the Asia Pacific Big Data market is expected to accelerate rapidly in two to three years time, from a mere US$258.5 million last year to in excess of $1.76 billion in 2016, with highest growth in the storage segment.
  • 8. Big Data as a New Class of Asset • The Age of Big Data (NYTimes Feb. 11, 2012) http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the- world.html “Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.” 8
  • 9. 9
  • 10. The 3-Vs of Big Data • Volume – Amount of data • Velocity – speed of data in and out • Variety – range of data types and sources • [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high- variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" 10
  • 11. What can you do with the data • Reporting – Post Hoc – Real time • Monitoring (fine-grained) • Exploration • Finding Patterns • Root Cause Analysis • Closed-loop Control • Model construction • Prediction • … 11 © Mike Franklin
  • 12. 10 ways big data changes everything • Some concrete examples – http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/ 1. Can gigabytes predict the next Lady Gaga? 2. How big data can curb the world’s energy consumption 3. Big data is now your company’s virtual assistant 4. The future of Foursquare is data-fueled recommendations 5. How Twitter data-tracked cholera in Haiti 6. Revolutionizing Web publishing with big data 7. Can cell phone data cure society’s ills? 8. How data can help predict and create video hits 9. The new face of data visualization 10. One hospital’s embrace of big data 12
  • 13. Typical Big Data Success Story • Modeling users through Big Data – Online ads sale / placement [e.g., Facebook] – Personalized Coupons [e.g., Target] – Product Placement [Walmart] – Content Generation [e.g., NetFlix] – Personalized learning [e.g., Duolingo] – HR Recruiting [e.g., Gild] 13
  • 14. More Data => Better Answers? • Not that easy… • More Rows: Algorithmic complexity kicks in • More Columns: Exponentially more hypotheses • Another formulation of the problem: – Given an inferential goal and a fixed computational budget, provide a guarantee that the quality of inference will increase monotonically as data accrue (without bound) • In other words: => Data should be a resource, not a load 14 © Mike Jordan
  • 17. Leading the Pack of Wolves: Hadoop • Google: Map/Reduce paper published 2004 • Open source variant: Hadoop • Map-reduce = high-level programming model and implementation for large-scale parallel data processing • Right now most overhyped system in CS 17
  • 18. What about Swiss Big Data? • Competitive Research Groups • Swiss Big Data User Group • Swiss companies playing catch-up – Productized Big Data systems at leading telcos & financial companies – Big Data is not a new technology: it's a fact; • Deal with it  POCs in most banks, insurance companies, retailers 18
  • 19. Tasty Bites of Big Data (1) Thursday afternoon • 13:30-15:00: Big Data Profiling Felix Naumann (Hasso Plattner Institute) • 15:15-16:45: Realtime Analytics Christoph Koch (EPFL) • 16:45-17:45: Current Trends and Challenges in Big Data Benchmarking Kais Sachs (SAP / Spec) 19
  • 20. Tasty Bites of Big Data (2) Friday • 9:00 - 10:30: Structured Data in Web Search Alon Halevy (Google) • 10:45 - 12:15: Human Computation for Big Data Gianluca Demartini (UNIFR) • 13:30-15:00: Analysing and Querying Big Scientific Data Thomas Heinis (EPFL) • 15:00-16:30: The Evolution of Big Data Frameworks Carlo Curino (Microsoft Research) 20
  • 21. Social Event, Friday – Beer Tasting! Basse-Ville Fribourg / 15 CHF per Person Everything You Always Wanted to Know About Beer. * But Were Afraid to Ask! 18:00 @ Café du Belvédère, Grand-Rue 36 19:00 @ Fri-Mousse, Rue de la Samaritaine 19 Limited Places, Inscription is mandatory at: http://xr.si