Internet Infrastructures
for Big Data
Philippe Cudré-Mauroux
eXascale Infolab, University of Fribourg
Switzerland
VeriSign...
eXascale Infolab
• New lab @ U. of Fribourg, Switzerland
• Financed by Swiss Federal State / companies / private
foundatio...
On the Menu Today
• Big Data!
– Big Data Buzz
– 3 Big Data projects w/ XI & Verisign
3
Exascale Data Deluge
• Science
– Biology
– Astronomy
– Remote Sensing
• Web companies
– Ebay
– Yahoo
• Financial services,...
Big Data “Central Theorem”
Data+Technology  Actionable Insight  $$
Reporting, Monitoring, Root Cause Analysis,
(User) Mo...
Big Data Buzz
6
Between now and 2015, the firm expects big data to
create some 4.4 million IT jobs globally; of those, 1.9...
Big Data Everywhere!
• The Age of Big Data (NYTimes Feb. 11, 2012)
http://www.nytimes.com/2012/02/12/sunday-review/big-dat...
8
Big Data Infrastructures
9
The 3-Vs of Big Data
• Volume
– amount of data
• Velocity
– speed of data in and out
• Variety
– range of data types and s...
Volume: Fixing the Hadoop
Distributed File System
• Hadoop (YARN): “cluster Operating System”
• Often synonymous with Big ...
HDFS Blocks Placement Strategy
Rack 1 Rack 2
● 1st replica on local
node or random
node
● 2nd replica on a
different node ...
Solution: Hadaps File Placement
• Assigns weights to DataNodes
– I/O-bound jobs finish earlier on new media
– CPU-bound jo...
Velocity: Real-Time Data
Management
• Smart(er) Cities!
– Electricity provisioning
– Water Networks
14
Example: Scalable Anomaly
Detection
• Detecting leaks / pipe bursts / contamination
in real-time for water distribution ne...
Data at each Vertex!
• Spatial + temporal statistical processing (mini-
Lisas)
• Stream processing (Storm) + Array process...
Results
(anomalies
Detected)
17
Variety: Sharing Data Locally & Globally
• 70+% of the world’s population has no or
very limited access to the Web
[Ahmed ...
Our Solution: ERS, the
Entity Registry System
• Three-tier solution to deploy data-powered apps
– Flexible
• Seamlessly re...
Ongoing Deployments
• Entity-powered apps for the Sugar Learning
Platform
• Ambient Assisted Living of elderly persons
in ...
Special Thanks to…
• Vincenzo Russo, Benoit Perroud, Matt
Thomas, Romain Cholat and the whole
Verisign Fribourg office
• B...
http://exascale.info
Big thanks to the whole XI crew!
Questions?
VeriSign EMEA
June 26, 2014
22
Upcoming SlideShare
Loading in …5
×

Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)

225
-1

Published on

Internet Infrastructures for Big Data
Talk given at Verisign's Distinguished Speaker Series, 2014
Prof. Philippe Cudre-Mauroux
eXascale Infolab
http://exascale.info/

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
225
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)

  1. 1. Internet Infrastructures for Big Data Philippe Cudré-Mauroux eXascale Infolab, University of Fribourg Switzerland VeriSign EMEA June 26, 2014 1
  2. 2. eXascale Infolab • New lab @ U. of Fribourg, Switzerland • Financed by Swiss Federal State / companies / private foundations • Big (non-relational) data management (Volume, Velocity, Variety) (… mostly) 2
  3. 3. On the Menu Today • Big Data! – Big Data Buzz – 3 Big Data projects w/ XI & Verisign 3
  4. 4. Exascale Data Deluge • Science – Biology – Astronomy – Remote Sensing • Web companies – Ebay – Yahoo • Financial services, retail companies governments, etc. © Wired 2009 ➡ New data formats ➡ New machines ➡ Peta & exa-scale datasets ➡ Obsolescence of traditional information infrastructures 4
  5. 5. Big Data “Central Theorem” Data+Technology  Actionable Insight  $$ Reporting, Monitoring, Root Cause Analysis, (User) Modelization, Prediction 5
  6. 6. Big Data Buzz 6 Between now and 2015, the firm expects big data to create some 4.4 million IT jobs globally; of those, 1.9 million will be in the U.S. Applying an economic multiplier to that estimate, Gartner expects each new big- data-related IT job to create work for three more people outside the tech industry, for a total of almost 6 million more U.S. jobs. Growth in the Asia Pacific Big Data market is expected to accelerate rapidly in two to three years time, from a mere US$258.5 million last year to in excess of $1.76 billion in 2016, with highest growth in the storage segment.
  7. 7. Big Data Everywhere! • The Age of Big Data (NYTimes Feb. 11, 2012) http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in- the-world.html “Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.” 7
  8. 8. 8
  9. 9. Big Data Infrastructures 9
  10. 10. The 3-Vs of Big Data • Volume – amount of data • Velocity – speed of data in and out • Variety – range of data types and sources • [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" Coming up: 3 examples from XI 10
  11. 11. Volume: Fixing the Hadoop Distributed File System • Hadoop (YARN): “cluster Operating System” • Often synonymous with Big Data • Used everywhere (… even in CH) 11
  12. 12. HDFS Blocks Placement Strategy Rack 1 Rack 2 ● 1st replica on local node or random node ● 2nd replica on a different node in a different rack ● 3rd replica on a different node in same rack as 2nd replica ➡Not hardware-aware ➡Block level rather than file level
  13. 13. Solution: Hadaps File Placement • Assigns weights to DataNodes – I/O-bound jobs finish earlier on new media – CPU-bound jobs finish earlier on new CPUs • Uses lower utilization servers first • Moves more blocks to newer generations • Operates on file level Up to 300% performance improvement by activating all nodes 1 A 1 2 B 1 2 C 1 2 D 2 3 E 2 3 F 2 3 2 34 56 7 8 9 Blocks Weight 123456 789 1 2 3 4 5 6 7 8 9 10 10 10
  14. 14. Velocity: Real-Time Data Management • Smart(er) Cities! – Electricity provisioning – Water Networks 14
  15. 15. Example: Scalable Anomaly Detection • Detecting leaks / pipe bursts / contamination in real-time for water distribution networks 15
  16. 16. Data at each Vertex! • Spatial + temporal statistical processing (mini- Lisas) • Stream processing (Storm) + Array processing (SciDB) base station 29 sensor 1053 sensor 1054 base station 17 base station 42Peer Information Management overlay Array Data Management System OLTP HYRISE OLAP OLTP HYRISE OLAP OLTP HYRISE OLAP Anomaly Detection Alert Sliding-Window Average Data Gap Event Mini-Lisa Computations Missing Data? Anomaly Detected? Yes No Yes Anomaly Event Delta Compression Fluctuation? Yes Publish Value Event No No Alive Event Stream Processing Flow 16
  17. 17. Results (anomalies Detected) 17
  18. 18. Variety: Sharing Data Locally & Globally • 70+% of the world’s population has no or very limited access to the Web [Ahmed Shams 2013] 18
  19. 19. Our Solution: ERS, the Entity Registry System • Three-tier solution to deploy data-powered apps – Flexible • Seamlessly reconcile entities in local / ad-hoc / global modes – Collaborative • Transactional consistency, data versioning – Scalable • Bridges, scale-out servers, tunable consistency – Open-source • https://github.com/ers-devs 19
  20. 20. Ongoing Deployments • Entity-powered apps for the Sugar Learning Platform • Ambient Assisted Living of elderly persons in tropical environments 20
  21. 21. Special Thanks to… • Vincenzo Russo, Benoit Perroud, Matt Thomas, Romain Cholat and the whole Verisign Fribourg office • Burt Kaliski and his team • Allison Mankin, Scott Hollenbeck, Debra Anderson & the Internet Infrastructures Grant team … for their continued support
  22. 22. http://exascale.info Big thanks to the whole XI crew! Questions? VeriSign EMEA June 26, 2014 22

×