• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 

A real time architecture using Hadoop and Storm @ FOSDEM 2013

on

  • 16,736 views

 

Statistics

Views

Total Views
16,736
Views on SlideShare
14,211
Embed Views
2,525

Actions

Likes
46
Downloads
433
Comments
3

30 Embeds 2,525

http://nathan.gs 941
http://dschool.co 408
http://localhost 376
http://www.scoop.it 245
http://datacrunchers.eu 153
https://twitter.com 151
http://www.dschool.co 124
http://www.linkedin.com 61
http://iitkgpsv.org 27
http://josephconventpatna.dschool.co 4
http://pinterest.com 4
http://new.datacrunchers.eu 3
http://www.iitkgpsv.org 3
http://jcc.dschool.co 3
http://kvdanapur.dschool.co 3
http://www.kred.com 3
http://www.twylah.com 2
http://mountcarmelschoolpatna.dschool.co 2
http://pcspatna.dschool.co 1
http://stxaviershighschoolpatna.dschool.co 1
http://sbe-ridgecrestcharterkern.dschool.co 1
https://www.linkedin.com 1
http://stanthonyschool.dschool.co 1
http://bwgsdoranda.dschool.co 1
http://www.surendranathcentenary.dschool.co 1
http://kred.com 1
http://snipick.com 1
http://webcache.googleusercontent.com 1
https://www.google.be 1
http://bwgsnamkum.dschool.co 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

13 of 3 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    A real time architecture using Hadoop and Storm @ FOSDEM 2013 A real time architecture using Hadoop and Storm @ FOSDEM 2013 Presentation Transcript

    • A real-time architecture using Hadoop and Storm.
    • Speakers Nathan Bijnens Geert Van Landeghem @nathan_gs @gvanlandeghem A real-time architecture using Hadoop & Storm. 2
    • Our Vision Volume Big Data test A real-time architecture using Hadoop & Storm. 3
    • Big Data Velocity test A real-time architecture using Hadoop & Storm. 4
    • Our Vision Volume test Variety A real-time architecture using Hadoop & Storm. 5
    • CreditsNathan Marz Engineer at Backtype (now Twitter). Storm Cascalog ElephantDB manning.com/marz A real-time architecture using Hadoop & Storm. 6
    • A Data System A real-time architecture using Hadoop & Storm. 7
    • Data is more than Information Not all information is equal. Some information is derived from other pieces of information. A real-time architecture using Hadoop & Storm. 8
    • Data is more than Information Eventually you will reach the most This is the information you hold true, simple because it exists. A real-time architecture using Hadoop & Storm. 9
    • EventsEverything we do generates events:- Pay with Credit Card- Commit to Git- Click on a webpage- Tweet A real-time architecture using Hadoop & Storm. 10
    • Events - Before Events used to manipulate the master data. A real-time architecture using Hadoop & Storm. 11
    • Events - After Today, events are the master data. A real-time architecture using Hadoop & Storm. 12
    • Data System everything. A real-time architecture using Hadoop & Storm. 13
    • Events Data is Immutable A real-time architecture using Hadoop & Storm. 14
    • Events Data is Time Based A real-time architecture using Hadoop & Storm. 15
    • Capturing change traditionallyPerson Location Person LocationNathan Antwerp Nathan GhentGeert Dendermonde Geert DendermondeJohn Ghent John Ghent A real-time architecture using Hadoop & Storm. 16
    • Capturing changePerson Location Time Person Location TimeNathan Antwerp 2005-01-01 Nathan Antwerp 2005-01-01Geert Dendermonde 2011-10-08 Geert Dendermonde 2011-10-08John Ghent 2010-05-02 John Ghent 2010-05-02 Nathan Ghent 2013-02-03 A real-time architecture using Hadoop & Storm. 17
    • Query The data you query is often transformed, aggregated, ... A real-time architecture using Hadoop & Storm. 18
    • Query Query = function ( data ) A real-time architecture using Hadoop & Storm. 19
    • Number of people living in each city. Person Location Time Location Count Nathan Antwerp 2005-01-01 Ghent 2 Geert Dendermonde 2011-10-08 Dendermonde 1 John Ghent 2010-05-02 Nathan Ghent 2013-02-03 A real-time architecture using Hadoop & Storm. 20
    • Query All Data Query A real-time architecture using Hadoop & Storm. 21
    • Query: Precompute All Data Precomputed View Query A real-time architecture using Hadoop & Storm. 22
    • Layered Architecture Batch Layer Speed Layer Serving Layer A real-time architecture using Hadoop & Storm. 23
    • Layered Architecture Cassandra QueryIncoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 24
    • Batch Layer A real-time architecture using Hadoop & Storm. 25
    • Batch LayerIncoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 26
    • Batch Layer Unrestrained computation. A real-time architecture using Hadoop & Storm. 27
    • Batch Layer Horizontal scalable. A real-time architecture using Hadoop & Storm. 28
    • Batch Layer High Latency. matter. A real-time architecture using Hadoop & Storm. 29
    • Batch Layer Stores master copy of data set... append only. A real-time architecture using Hadoop & Storm. 30
    • Batch Layer A real-time architecture using Hadoop & Storm. 31
    • Batch: View generation View #1 Master Dataset View #2 MapReduce View #3 A real-time architecture using Hadoop & Storm. 32
    • MapReduce 1. Take a large problem and divide it into sub-problems … MAP 2. Perform the same function on all sub-problems … DoWork() DoWork() DoWork() 3. Combine the output from all sub-problems REDUCE … Output A real-time architecture using Hadoop & Storm. 33
    • Batch View Database Read only database. No random writes required. A real-time architecture using Hadoop & Storm. 34
    • Batch View DatabaseElephantDBSplout A real-time architecture using Hadoop & Storm. 35
    • Batch Layer Just a few hours of data. Not yet Data absorbed into Batch Views absorbed. Time Now A real-time architecture using Hadoop & Storm. 36
    • Speed Layer A real-time architecture using Hadoop & Storm. 37
    • Overview CassandraIncoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 38
    • Speed Layer Stream processing. A real-time architecture using Hadoop & Storm. 39
    • Speed Layer Continuous computation. A real-time architecture using Hadoop & Storm. 40
    • Speed Layer Transactional. A real-time architecture using Hadoop & Storm. 41
    • Speed Layer Storing a limited window of data. Compensating for the last few hours of data. A real-time architecture using Hadoop & Storm. 42
    • Speed Layer All the complexity is isolated in the Speed layer auto- corrected. A real-time architecture using Hadoop & Storm. 43
    • CAPYou have a choice between: Availability - Queries are eventual consistent. Consistency - Queries are consistent. A real-time architecture using Hadoop & Storm. 44
    • Eventual accuracySome algorithms are hard to implement in real time. For those cases we could estimate the results. A real-time architecture using Hadoop & Storm. 45
    • Speed Layer Real Time View 1Incoming Data Real Time View 2 A real-time architecture using Hadoop & Storm. 46
    • StormMessage passing.Distributed processing.Horizontally scalable.Incremental algorithms.Fast.Data in motion. A real-time architecture using Hadoop & Storm. 47
    • StormMessage passing.Distributed processing.Horizontally scalable.Incremental algorithms.Fast.Data in motion. A real-time architecture using Hadoop & Storm. 48
    • Storm Nimbus Zookeeper Supervisor Supervisor Supervisor Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Node Worker Node Worker Node A real-time architecture using Hadoop & Storm. 49
    • StormTupleStream A real-time architecture using Hadoop & Storm. 50
    • StormSpoutBolt A real-time architecture using Hadoop & Storm. 51
    • StormGrouping A real-time architecture using Hadoop & Storm. 52
    • Speed Layer ViewsThe views are stored in Read & Write database.- Cassandra- Hbase- MongoDB- MySQL- ElasticSearch-Much more complex than a read only view. A real-time architecture using Hadoop & Storm. 53
    • Serving Layer A real-time architecture using Hadoop & Storm. 54
    • Overview Cassandra QueryIncoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 55
    • Serving Layer This layer queries the Batch & Real Time views and merges it. A real-time architecture using Hadoop & Storm. 56
    • Serving Layer Batch Views Merge Real Time Views A real-time architecture using Hadoop & Storm. 57
    • Overview A real-time architecture using Hadoop & Storm. 58
    • Overview Cassandra QueryIncoming Data Hadoop Elephant DB A real-time architecture using Hadoop & Storm. 59
    • Lambda ArchitectureCan discard any view, batch and real time, and justrecreate everything from the master data.Mistakes are corrected via recomputation.- Write bad data? Remove the data & recompute.- Bug in view generation? Just recompute the view.Data storage is highly optimized. A real-time architecture using Hadoop & Storm. 60
    • Recommendations A real-time architecture using Hadoop & Storm. 61
    • Serialization & Schema Catch errors as quickly as they happen. Validation on write vs on read. A real-time architecture using Hadoop & Storm. 62
    • Serialization & Schema CSV is actually a serialization language that is just poorly defined. A real-time architecture using Hadoop & Storm. 63
    • Serialization & Schema Use a format with a schema.- Thrift- Avro- Protobuffers A real-time architecture using Hadoop & Storm. 64
    • Questions? What are your needs? @nathan_gs & @gvanlandeghem A real-time architecture using Hadoop & Storm. 65
    • DataCrunchersWe enable companies in envisioning, defining andimplementing a data strategy.A one-stop-shop for all your Big Data needs.The first Big Data Consultancy agency in Belgium. A real-time architecture using Hadoop & Storm. 66
    • Jobs We are hiring. jobs@datacrunchers.eu A real-time architecture using Hadoop & Storm. 67