• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
A real-time architecture using Hadoop and Storm @ BigData.be

A real-time architecture using Hadoop and Storm @ BigData.be






Total Views
Views on SlideShare
Embed Views



4 Embeds 188

http://datacrunchers.eu 125
https://twitter.com 40
http://www.scoop.it 22
http://localhost 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • 44 times as much data in the next decade, 15 Zb in 2015Data silos (erp, crm, …)CustomersTrimble (3Tb in hun database systeem)Truvo (wijzigen van een index duurt 24u)Traditionele systemen kunnen dit volume niet aan.How many data do you have?Turn 12 terabytes of Tweets created each day into improved product sentiment analysisConvert 350 billion annual meter readings to better predict power consumption
  • Real timeTime sensitivedecisiontakingFrauddetectionEnergy allocationMarketing campaignsMarket transactionsSolution:Real-time solutions in combination with batch (hadoop)Nosql systems
  • StructuredUnstructured80% is unstructured data, A key drawback of using traditional relational database systems is that they're not good at handling variable data. A flexible data modelWord, email, foto, text, video, …?What are your needs regarding variety?The end result: bringing structure into unstructured dataMonitor 100’s of live video feeds from surveillance cameras to target points of interestExploit the 80% data growth in images, video and documents to improve customer satisfaction
  • We live in a ever changing world. an organization ability to quickly modify their computing systems to respond to changing business requirements.Change control (adding or changing features of a system between releases)Customization of specific data servicesHow Agile are you?How to be AgileCultureSystemsSchema flexibility
  • It is easier to store all data in a cost effective way.Compare to DWH world.
  • The # of followers on Twitter = all follows & unfollows combined.
  • Data = event
  • It is easier to store all data in a cost effective way.Compare to DWH world.
  • Only CD, no more CRUD.Information might ofcourse change.Fault Tolerance
  • Allows state regeneration. Eg. What was my bank balance on 1 may 2005?
  • Queries as pure functions that take all data as input is the most general formulation.Different functions may look at different portions and aggregate information in different ways.
  • Too slow; might be petabyte scale
  • The batch layer can calculate anything (given enough time).
  • Doesn’t have to be Hadoop. The importance here is a Distributed FS combined with a processing framework.
  • Source: PolybasePass2012.pptx
  • In some circumstances.
  • In some circumstances.
  • Consistency (all nodes see the same data at the same time)Availability (a guarantee that every request receives a response about whether it was successful or failed)Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)http://codahale.com/you-cant-sacrifice-partition-tolerance/
  • Eg. Unique counts
  • Nimbus:Manages the clusterWorker Node:Supervisor:Manages workers; restarts them if neededWorkerPhysical JVM process.Execute tasks (those are spread evenly across the workers)TasksEach in his own Thread. Is the actual Bolt or Spout.Processes the stream.
  • Tuple:Named list of valuesDynamicly typedStreamSequence of Tuples
  • SpoutSource of StreamsSometimes replayableBoltStream transformationsAt least 1 input stream0 - * output streams
  • http://www.quora.com/Apache-Hadoop/What-is-the-advantage-of-writing-custom-input-format-and-writable-versus-the-TextInputFormat-and-Text-writable/answer/Eric-Sammer?srid=PU&st=ns
  • http://www.quora.com/Apache-Hadoop/What-is-the-advantage-of-writing-custom-input-format-and-writable-versus-the-TextInputFormat-and-Text-writable/answer/Eric-Sammer?srid=PU&st=ns

A real-time architecture using Hadoop and Storm @ BigData.be A real-time architecture using Hadoop and Storm @ BigData.be Presentation Transcript

  • Volume
  • ----
  • ……DoWork() DoWork() DoWork()…
  • --
  • -------
  • --
  • ---