Your SlideShare is downloading. ×
Twitter Storm: Ereignisverarbeitung in Echtzeit
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Twitter Storm: Ereignisverarbeitung in Echtzeit

2,264
views

Published on

Hadoop bzw. MapReduce eignet sich sehr gut, um grosse Datenmengen effizient verarbeiten zu können. Eine Verarbeitung in Hadoop ist jedoch immer batch-orientiert, d.h. es braucht eine gewisse Zeit, bis …

Hadoop bzw. MapReduce eignet sich sehr gut, um grosse Datenmengen effizient verarbeiten zu können. Eine Verarbeitung in Hadoop ist jedoch immer batch-orientiert, d.h. es braucht eine gewisse Zeit, bis ein Resultat zur Verfügung steht. Für gewisse Anwendungsfälle kann dies ausreichend sein, andere Anwendungsfälle benötigen jedoch Daten in Echtzeit. Für die Lösung solcher Problemstellungen, gibt es seit einigen Jahren sogennante Complex-Event Processing (CEP) Systeme. Diese lassen es zu, direkt auf dem eingehenden Ereignisstrom Abfragen/Berechungen und Verabeitugnen durchzuführen, ohne diese Informationen erst in einer Datenbank abspeichern zu müssen.

Twitter Storm ist ein Open Source Framework für die Verarbeitung von Datenströmem in Echtzeit. Es wird auch als "Hadoop für Echtzeitverarbeitung" bezeichnet, wobei das Programmiermodell sich doch stark von Hadoop unterscheidet. Storm ist mehrheitlich in Clojure geschrieben und unterstützt Java direkt. Die grundlegenden Bausteine, die Spouts und die Bolts können sowohl in Java wie auch in anderen Programmiersprachen implementiert werden.
Diese Session präsentiert, wie man mit Hilfe von Twitter Storm Applikaitonen implementieren kann und zeigt entsprechende Anwendungsfälle, welche sich mit Twitter Storm lösen lassen. Zudem wird diskutiert, wie sich Storm mit Hadoop und NoSQL sinnvoll kombinieren lässt.

Published in: Technology, Education

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,264
On Slideshare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 2013 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
 WELCOME Twitter Storm: Ereignisverarbeitung in Echtzeit Guido Schmutz
 Java Forum Stuttgart 2013 4.7.2013 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 1
  • 2. 2013 © Trivadis Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem Solution-Engineering und der Erbringung von IT-Services mit Fokussierung auf und Technologien im D-A-CH-Raum. Unsere Leistungen erbringen wir aus den strategischen Geschäftsfeldern: Trivadis Services übernimmt den korrespondierenden Betrieb Ihrer IT Systeme. Unser Unternehmen 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit B E T R I E B 2
  • 3. 2013 © Trivadis Mit über 600 IT- und Fachexperten bei Ihnen vor Ort 3 11 Trivadis Niederlassungen mit
 über 600 Mitarbeitenden 200 Service Level Agreements Mehr als 4'000 Trainingsteilnehmer Forschungs- und Entwicklungs- budget: CHF 5.0 / EUR 4 Mio. Finanziell unabhängig und
 nachhaltig profitabel Erfahrung aus mehr als 1'900 Projekten pro Jahr bei über 800 Kunden Stand 12/2012 Hamburg Düsseldorf Frankfurt Freiburg München Wien Basel ZürichBern Lausanne 3 Stuttgart 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 3
  • 4. 2013 © Trivadis Guido Schmutz •  Working for Trivadis for more than 16 years •  Oracle ACE Director for Fusion Middleware and SOA •  Co-Author of different books •  Consultant, Trainer Software Architect for Java, Oracle, SOA, EDA, BigData und FastData •  Member of Trivadis Architecture Board •  Technology Manager @ Trivadis •  More than 20 years of software development 
 experience •  Contact: guido.schmutz@trivadis.com •  Blog: http://guidoschmutz.wordpress.com •  Twitter: gschmutz 4.7.2013 4 Twitter Storm: Ereignisverarbeitung in Echtzeit
  • 5. 2013 © Trivadis Big Data Definition (Gartner et al) 14.02.2013 Big Data 4 Sales 5 Velocity Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing “Traditional” computing in RDBMS 
 is not scalable enough. 
 We search for “linear scalability” “Only … structured information 
 is not enough” – “95% of produced data in unstructured” Characteristics of Big Data: Its Volume, Velocity and Variety in combination + Veracity (IBM) - information uncertainty + Time to action ? – Big Data + Event Processing = Fast Data
  • 6. 2013 © Trivadis 14.02.2013 Big Data 4 Sales 6
  • 7. 2013 © Trivadis Velocity 24. April 2013 Big Data und Fast Data 7 §  Velocity requirement examples: §  Recommendation Engine §  Predictive Analytics §  Marketing Campaign Analysis §  Customer Retention and Churn Analysis §  Social Graph Analysis §  Capital Markets Analysis §  Risk Management §  Rogue Trading §  Fraud Detection §  Retail Banking §  Network Monitoring §  Research and Development
  • 8. 2013 © Trivadis Agenda 1.  What is Twitter Storm 2.  Main Concepts 3.  Trident 4.  BigData and FastData combined – Lambda Architecture 5.  Summary 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 8
  • 9. 2013 © Trivadis What is Twitter Storm ? A platform for doing analysis on streams of data as they come in, so you can react to data as it happens. •  A highly distributed real-time computation system •  Provides general primitives to do real-time computation •  To simplify working with queues & workers •  scalable and fault-tolerant •  complementary to Hadoop Originated at Backtype, acquired by Twitter in 2011 Open Sourced late 2011 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 9
  • 10. 2013 © Trivadis Hadoop vs. Storm 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 10 Storm Real-time processing Topologies run forever Stateless nodes Scalable Guarantees no data loss Open Source => Fast, reactive, real time procesing Hadoop Batch processing Jobs runs to completion Stateful nodes Scalable Guarantees no data loss Open Source => big batch processing = !=
  • 11. 2013 © Trivadis Idea: Simplify dealing with queue & workers 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 11 Scaling is painful – queue partitioning & worker deploy Operational overhead – worker failures & queue backups No guarantees on data processing
  • 12. 2013 © Trivadis What does Storm provide? §  At least once message processing §  Fault-tolerant §  Horizontal scalability §  No intermediate queues §  Less operational overhead §  „Just works“ §  Broad set of use cases §  Stream processing §  Continous computation §  Distributed RPC 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 12
  • 13. 2013 © Trivadis Agenda 1.  What is Twitter Storm 2.  Main Concepts 3.  Trident 4.  BigData and FastData combined – Lambda Architecture 5.  Summary 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 13
  • 14. 2013 © Trivadis Main Concepts – Shown based on Twitter example Show the tweets related to Java Forum (use hashtag #JFS2013) and show the count of the hashtags used 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 14 not showing #JFS2013
  • 15. 2013 © Trivadis Main Concepts - Streams Stream •  an unbounded sequence of tuples •  core abstraction in Storm •  Defined with a schema that names the 
 fields in the tuple •  Value must be serializable •  Every stream has an id Topology •  graph where each node is a spout or bolt •  edges indicating which bolt subscribes 
 to which streams 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 15 Source of stream A Source of stream B Subscribes: A Emits: C Subscribes: A Emits: D Subscribes: A & B Emits: - Subscribes: C & D Emits: - T T T T T T T T
  • 16. 2013 © Trivadis Worker •  executes a subset of a topology, may run one or more executors for one or more components Executor •  thread spawned by worker Task •  performs the actual data processing Main Concepts – Worker, Executor, Task 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 16 Worker Process Task Task Task Task
  • 17. 2013 © Trivadis Main Concepts - Spout A spout is the component which acts as the source of streams in a topology Generally spouts read tuples from an external source and emit them into the topology Spouts can emit more than one stream Step 1: Use a Spout to subscribe to the Twitter Filter Streaming API to get all the tweets containing the hashtag #JFS2013 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 17 Twitter Stream Tweets tweet Tweet
 Stream
  • 18. 2013 © Trivadis Main Concepts - Spout 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 18 Twitter Stream Tweets tweet Tweet
 Stream By calling this method, Storm requests that the Spout emits tuples to the output collector Called when a task for this component is initialized Used to declare streams and their schmeas. it is possible to declare serveral streams and specifying the one to use when emiting tuples
  • 19. 2013 © Trivadis Bolt is doing the processing of the message •  filtering, functions, aggregations, joins, talking to databases…. •  Can do simple stream transformations •  Complex transformations often require multiple steps and therefore multiple bolts Bolts can emit one or more streams Step 2: Use a bolt to retrieve the hashtags for each tweet Main Concepts - Bolt 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 19 Twitter Stream Tweets tweet Hashtag
 Splitter Tweet
 Stream hashtag
  • 20. 2013 © Trivadis Main Concepts - Bolt 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 20 Twitter Stream Tweets tweet Hashtag
 Splitter Tweet
 Stream hashtag Execute is called for each tuple arriving on the input stream Declares the output fields for the component and is called when bolt is created
  • 21. 2013 © Trivadis Main Concepts - Bolt Step 3: Use a bolt to count the occurences of the hashtag in Redis NoSQL 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 21 Twitter Stream Tweets tweet Hashtag
 Splitter Tweet
 Stream hashtag Hashtag
 Counter
  • 22. 2013 © Trivadis Main Concepts - Bolt 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 22 Twitter Stream Tweets tweet Hashtag
 Splitter Tweet
 Stream hashtag Hashtag
 Counter Prepare is called when bolt is created Execute is called for each tuple arriving on the input stream
  • 23. 2013 © Trivadis Main Concepts - Topology Step 4: Setup the topology with the necessary groupings (distribution) Each Spout or Bolt are running N instances in parallel 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 23 Twitter Stream Tweets Hashtag
 Splitter Tweet
 Stream Hashtag
 Counter Hashtag
 Splitter Hashtag
 Counter Shuffle Fields Shuffle grouping is random grouping Fields grouping is grouped by value, such that equal value results in equal task All grouping replicates to all tasks Global grouping makes all tuples go to one task None grouping makes bolt run in the same thread as bold/spout it subscribes to Direct grouping producer (task that emits) controls which consumer will receive
  • 24. 2013 © Trivadis Main Concepts - Topology 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 24 Create Stream called „tweet-stream“ Run this bolt by 2 workers Run this spout by 1 worker Subscribe to stream „tweet-stream“ using shuffle grouping Create stream called „hashtag-splitter“ Subscribe to stream „hashtag-splitter“ using fields grouping on field „hashtag“
  • 25. 2013 © Trivadis Sample – How does it work ? 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 25 Twitter Stream Mein Präsentation „Twitter Storm: Ereignisverarbeitung in Echtzeit“ Morgen um 15:35 am Java Forum Stuttgart #JFS2013 #storm #fastdata CU there! Hashtag
 Splitter Tweet
 Stream Hashtag
 Counter Hashtag
 Splitter Hashtag
 Counter
  • 26. 2013 © Trivadis Sample – How does it work ? 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 26 Twitter Stream Mein Präsentation „Twitter Storm: Ereignisverarbeitung in Echtzeit“ Morgen um 15:35 am Java Forum Stuttgart #JFS2013 #storm #fastdata CU there! Hashtag
 Splitter Tweet
 Stream Hashtag
 Counter Hashtag
 Splitter Hashtag
 Counter storm fastdata jsf2013
  • 27. 2013 © Trivadis Sample – How does it work ? 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 27 Twitter Stream Mein Präsentation „Twitter Storm: Ereignisverarbeitung in Echtzeit“ Morgen um 15:35 am Java Forum Stuttgart #JFS2013 #storm #fastdata CU there! Hashtag
 Splitter Tweet
 Stream Hashtag
 Counter Hashtag
 Splitter Hashtag
 Counter storm fastdata jfs2013 INCR jfs2013 INCR storm INCR fastdata storm = 1 fastdata =1 jfs2013= 1
  • 28. 2013 © Trivadis Agenda 1.  What is Twitter Storm 2.  Main Concepts 3.  Trident 4.  BigData and FastData combined – Lambda Architecture 5.  Summary 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 28
  • 29. 2013 © Trivadis What is Trident? •  High-Level abstraction on top of storm •  Makes it easier to build topologies •  Core data model is the stream •  Processed as a series of batches •  Stream is partitioned among nodes in cluster •  5 kinds of operations in Trident •  Operations that apply locally to each partition and cause no network transfer •  Repartitioning operations that don‘t change the contents •  Aggregation operations that do network transfer •  Operations on grouped streams •  Merges and Joins 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 29
  • 30. 2013 © Trivadis What is Trident? Supports •  Joins •  Aggregations •  Grouping •  Functions •  Filters Similar to Hadoop and Pig/Cascading 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 30
  • 31. 2013 © Trivadis Trident Concepts - Topology 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 31 Twitter Stream Tweets tweet Hashtag
 Splitter Tweet
 Stream hashtag Hashtag
 Normalizer Persistent
 Aggregate hashtag groupBylocal Bolt Bolt
  • 32. 2013 © Trivadis Trident Concepts - Function •  takes in a set of input fields and emits zero or more tuples as output •  fields of the output tuple are appended to the original input tuple in the stream •  If a function emits no tuples, the original input tuple is filtered out •  Otherwise the input tuple is duplicated for each output tuple 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 32
  • 33. 2013 © Trivadis Agenda 1.  What is Twitter Storm 2.  Main Concepts 3.  Trident 4.  BigData and FastData combined – Lambda Architecture 5.  Summary 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 33
  • 34. 2013 © Trivadis Lambda Architecture 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 34 Immutable data Batch View Query Data Stream Realtime View Incoming Data Serving Layer Speed Layer Batch Layer A B C D E F G
  • 35. 2013 © Trivadis Lambda Architecture 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 35 Speed Layer Precompute Views query Source: Marz, N. & Warren, J. (2013) Big Data. Manning. Batch Layer Precomputed information All data Incremented information Process stream Incoming Data Batch recompute Realtime increment Serving Layer batch view batch view real time view real time view Merge
  • 36. 2013 © Trivadis Lambda Architecture 24. April 2013 Big Data und Fast Data 36 one possible product/framework mapping Speed Layer Precompute Views query Batch Layer Precomputed information All data Incremented information Process stream Incoming Data Batch recompute Realtime increment Serving Layer batch view batch view real time view real time view Merge
  • 37. 2013 © Trivadis Agenda 1.  What is Twitter Storm 2.  Main Concepts 3.  Trident 4.  BigData and FastData combined – Lambda Architecture 5.  Summary 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 37
  • 38. 2013 © Trivadis Summary Express real-time processing naturally Not a Hadoop/MapReduce competitor Supports other languages Hard to debug Object Serialization Didn‘t cover §  Reliability through guaranteed message processing §  Distributed RPC §  Storm cluster setup and deploy 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 38
  • 39. 2013 © Trivadis Resources Website (http://storm-project.net/) Wiki (https://github.com/nathanmarz/storm/wiki) Doc (https://github.com/nathanmarz/storm/wiki/Documentation) Storm-starter project (https://github.com/nathanmarz/storm-starter) Mailing list (http://groups.google.com/group/storm-user) 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 39
  • 40. 2013 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
 Thank You! Trivadis AG Guido Schmutz
 guido.schmutz@trivadis.com 4.7.2013 Twitter Storm: Ereignisverarbeitung in Echtzeit 40