Storm is a distributed, reliable, fault-tolerant system for processing streams of data.
In this track we will introduce Storm framework, explain some design concepts and considerations, and show some real world examples to explain how to use it to process large amounts of data in real time, in a distributed environment. We will describe how we can scale this solution very easily as more data need to be processed.
We will explain all you need to know to get started with Storm and some tips on how to get your Spouts, Bolts and Topologies up and running in the cloud.
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Sonal Raj
This talk briefly outlines the Storm framework and Neo4J graph database, and how to compositely use them to perform computations on complex graphs in Python using the Petrel and Py2neo packages. This talk was given at PyCon India 2013.
Apache Storm 0.9 basic training - VerisignMichael Noll
Apache Storm 0.9 basic training (130 slides) covering:
1. Introducing Storm: history, Storm adoption in the industry, why Storm
2. Storm core concepts: topology, data model, spouts and bolts, groupings, parallelism
3. Operating Storm: architecture, hardware specs, deploying, monitoring
4. Developing Storm apps: Hello World, creating a bolt, creating a topology, running a topology, integrating Storm and Kafka, testing, data serialization in Storm, example apps, performance and scalability tuning
5. Playing with Storm using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/09/15/apache-storm-training-deck-and-tutorial/
Many thanks to the Twitter Engineering team (the creators of Storm) and the Apache Storm open source community!
Talk held at the FrOSCon 2013 on 24.08.2013 in Sankt Augustin, Germany
Agenda:
- Why Twitter Storm?
- What is Twitter Storm?
- What to do with Twitter Storm?
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Sonal Raj
This talk briefly outlines the Storm framework and Neo4J graph database, and how to compositely use them to perform computations on complex graphs in Python using the Petrel and Py2neo packages. This talk was given at PyCon India 2013.
Apache Storm 0.9 basic training - VerisignMichael Noll
Apache Storm 0.9 basic training (130 slides) covering:
1. Introducing Storm: history, Storm adoption in the industry, why Storm
2. Storm core concepts: topology, data model, spouts and bolts, groupings, parallelism
3. Operating Storm: architecture, hardware specs, deploying, monitoring
4. Developing Storm apps: Hello World, creating a bolt, creating a topology, running a topology, integrating Storm and Kafka, testing, data serialization in Storm, example apps, performance and scalability tuning
5. Playing with Storm using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/09/15/apache-storm-training-deck-and-tutorial/
Many thanks to the Twitter Engineering team (the creators of Storm) and the Apache Storm open source community!
Talk held at the FrOSCon 2013 on 24.08.2013 in Sankt Augustin, Germany
Agenda:
- Why Twitter Storm?
- What is Twitter Storm?
- What to do with Twitter Storm?
Learning Stream Processing with Apache StormEugene Dvorkin
Over the last couple years, Apache Storm became a de-facto standard for developing real-time analytics and complex event processing applications. Storm enables to tackle real-time data processing challenges the same way Hadoop enables batch processing of Big Data. Storm enables companies to have "Fast Data" alongside with "Big Data". Some use cases where Storm can be used are Fraud Detection, Operation Intelligence, Machine Learning, ETL, Analytics, etc.
In this meetup, Eugene Dvorkin, Architect @WebMD and NYC Storm User Group organizer will teach Apache Storm and Stream Processing fundamentals. While this meeting is geared toward new Storm users, experienced users may find something interesting as well.
Following topics will be covered:
• Why use Apache Storm?
• Common use cases
• Storm Architecture - components, concepts, topology
• Building simple Storm topology with Java and Groovy
• Trident and micro-batch processing
• Fault tolerance and guaranteed message delivery
• Running and monitoring Storm in production
• Kafka
• Storm at WebMD
• Resources
Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.
There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.
http://github.com/tjake/stormscraper
Bobby Evans and Tom Graves, the engineering leads for Spark and Storm development at Yahoo will talk about how these technologies are used on Yahoo's grids and reasons why to use one or the other.
Bobby Evans is the low latency data processing architect at Yahoo. He is a PMC member on many Apache projects including Storm, Hadoop, Spark, and Tez. His team is responsible for delivering Storm as a service to all of Yahoo and maintaining Spark on Yarn for Yahoo (Although Tom really does most of that work).
Tom Graves a Senior Software Engineer on the Platform team at Yahoo. He is an Apache PMC member on Hadoop, Spark, and Tez. His team is responsible for delivering and maintaining Spark on Yarn for Yahoo.
This slides are for a brief seminar that I give in a Ph.D. exam "Perspective in Parallel Computing" (held by prof. Marco Danelutto) at University of Pisa (Italy).
They are a rapid introduction to Apache Storm and how it relates to classical algorithmic skeleton parallel frameworks
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
adoop plays a central role for Yahoo! to provide personalized experiences for our users and create value for our advertisers. In this talk, we will discuss the convergence of low-latency processing and Hadoop platform. To enable the convergence, we have developed Storm-on-YARN to enable Storm streaming/microbatch applications and Hadoop batch applications hosted in a single cluster. Storm applications could leverage YARN for resource management, and apply Hadoop style security to Hadoop datasets on HDFS and HBase. In Storm-on-YARN, YARN is used to launch Storm application master (Nimbus), and enable Nimbus to request resources for Storm workers (Supervisors). YARN resource manager and Storm scheduler work together to support multi-tenancy and high availability. HDFS enables Storm to achieve higher availability of Nimbus itself. We are introducing Hadoop style security into Storm through JAAS authentication (Kerberos and Digest). Storm servers (Nimbus and DRPC) will be configured with authorization plugins for access control and audit. The security context enables Storm applications to access authorized datasets only (including those created by Hadoop applications). Yahoo! is making our contribution on Storm and YARN available as open source. We will work with industry partners to foster the convergence of low-latency processing and big-data.
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
Storm, a popular framework from Twitter, is used for real-time event processing. The challenge presented is how to manage the state of your real-time data processing at all times. In addition, you need Storm to integrate with your batch processing system (such as Hadoop) in a consistent manner.
This session will demonstrate how to integrate Storm with an in-memory database/grid, and explore various strategies for integrating the data grid with Hadoop and Cassandra, seamlessly. By achieving smooth integration with consistent management, you will be able to easily manage all the tiers of you Big Data stack in a consistent and effective way.
- See more at: http://nosql2013.dataversity.net/sessionPop.cfm?confid=74&proposalid=5526#sthash.FWIdqRHh.dpuf
Developing Java Streaming Applications with Apache StormLester Martin
Apache Storm, http://storm.apache.org, is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. During this presentation, a simple Java-based streaming application will be built from scratch!
Code examples can be found at https://github.com/lestermartin/streaming-exploration.
Learning Stream Processing with Apache StormEugene Dvorkin
Over the last couple years, Apache Storm became a de-facto standard for developing real-time analytics and complex event processing applications. Storm enables to tackle real-time data processing challenges the same way Hadoop enables batch processing of Big Data. Storm enables companies to have "Fast Data" alongside with "Big Data". Some use cases where Storm can be used are Fraud Detection, Operation Intelligence, Machine Learning, ETL, Analytics, etc.
In this meetup, Eugene Dvorkin, Architect @WebMD and NYC Storm User Group organizer will teach Apache Storm and Stream Processing fundamentals. While this meeting is geared toward new Storm users, experienced users may find something interesting as well.
Following topics will be covered:
• Why use Apache Storm?
• Common use cases
• Storm Architecture - components, concepts, topology
• Building simple Storm topology with Java and Groovy
• Trident and micro-batch processing
• Fault tolerance and guaranteed message delivery
• Running and monitoring Storm in production
• Kafka
• Storm at WebMD
• Resources
Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.
There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.
http://github.com/tjake/stormscraper
Bobby Evans and Tom Graves, the engineering leads for Spark and Storm development at Yahoo will talk about how these technologies are used on Yahoo's grids and reasons why to use one or the other.
Bobby Evans is the low latency data processing architect at Yahoo. He is a PMC member on many Apache projects including Storm, Hadoop, Spark, and Tez. His team is responsible for delivering Storm as a service to all of Yahoo and maintaining Spark on Yarn for Yahoo (Although Tom really does most of that work).
Tom Graves a Senior Software Engineer on the Platform team at Yahoo. He is an Apache PMC member on Hadoop, Spark, and Tez. His team is responsible for delivering and maintaining Spark on Yarn for Yahoo.
This slides are for a brief seminar that I give in a Ph.D. exam "Perspective in Parallel Computing" (held by prof. Marco Danelutto) at University of Pisa (Italy).
They are a rapid introduction to Apache Storm and how it relates to classical algorithmic skeleton parallel frameworks
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
adoop plays a central role for Yahoo! to provide personalized experiences for our users and create value for our advertisers. In this talk, we will discuss the convergence of low-latency processing and Hadoop platform. To enable the convergence, we have developed Storm-on-YARN to enable Storm streaming/microbatch applications and Hadoop batch applications hosted in a single cluster. Storm applications could leverage YARN for resource management, and apply Hadoop style security to Hadoop datasets on HDFS and HBase. In Storm-on-YARN, YARN is used to launch Storm application master (Nimbus), and enable Nimbus to request resources for Storm workers (Supervisors). YARN resource manager and Storm scheduler work together to support multi-tenancy and high availability. HDFS enables Storm to achieve higher availability of Nimbus itself. We are introducing Hadoop style security into Storm through JAAS authentication (Kerberos and Digest). Storm servers (Nimbus and DRPC) will be configured with authorization plugins for access control and audit. The security context enables Storm applications to access authorized datasets only (including those created by Hadoop applications). Yahoo! is making our contribution on Storm and YARN available as open source. We will work with industry partners to foster the convergence of low-latency processing and big-data.
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
Storm, a popular framework from Twitter, is used for real-time event processing. The challenge presented is how to manage the state of your real-time data processing at all times. In addition, you need Storm to integrate with your batch processing system (such as Hadoop) in a consistent manner.
This session will demonstrate how to integrate Storm with an in-memory database/grid, and explore various strategies for integrating the data grid with Hadoop and Cassandra, seamlessly. By achieving smooth integration with consistent management, you will be able to easily manage all the tiers of you Big Data stack in a consistent and effective way.
- See more at: http://nosql2013.dataversity.net/sessionPop.cfm?confid=74&proposalid=5526#sthash.FWIdqRHh.dpuf
Developing Java Streaming Applications with Apache StormLester Martin
Apache Storm, http://storm.apache.org, is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. During this presentation, a simple Java-based streaming application will be built from scratch!
Code examples can be found at https://github.com/lestermartin/streaming-exploration.
Code as Data workshop: Using source{d} Engine to extract insights from git re...source{d}
This workshop will teach you the basics git concepts (such as references, commits, and blobs) and how they can be mapped into a series of relational tables.
Once we understand the basic concepts we will show how language classification and program parsing are available as SQL custom functions, how to use them correctly, and how to obtain aggregate results with `GROUP BY` and friends. We will discuss Universal Abstract Syntax Trees and how some advanced checks can be done on top this language agnostic structure. Running these checks at scale requires some extra knowledge and we’ll discuss the challenges and their possible solutions.
To finish, we will also discuss how the information in git repositories encodes a form of social network which can be used to better understand the engineering processes of a given organization.
Matthew Russell's "Unleashing Twitter Data for Fun and Insight" presentation from Strata 2011. Matthew Russell's "Unleashing Twitter Data for Fun and Insight" presentation from Strata 2011. See http://strataconf.com/strata2011/public/schedule/detail/17714 for an overview of the talk.
How to access & analyse Twitter big data. Full working example using Storm and RedStorm in Ruby & JRuby. Code on github https://github.com/colinsurprenant/tweitgeist and live demo http://tweitgeist.needium.com/
Slides on the new Janus Lua plugin, as presented at the Real-time Communications devroom at FOSDEM2018. Describes in detail what the challenges were, and how we addressed them, with a few real examples at the end (including a demo written specifically for this session).
An introduction to git, assuming very little. I introduce some core concepts, the commands used to work with them, and briefly touch on Github flow (interpreted in quite a specific way) and recap the commands used for that.
The examples could be used as exercises for a class learning git live, with a bit of fleshing out.
Process the Twitter stream using Storm & Redstorm with Ruby & JRuby. Full working demo, code on github https://github.com/colinsurprenant/tweitgeist and live demo http://tweitgeist.needium.com/
Derrière ce titre putaclic se cache une réalité pour une partie de notre industrie.
Les boucles for/while sont des structures itératives proposant le plus bas niveau d'abstraction. Les langages modernes proposent encore de nos jours ces structures car elles ont leur utilité dans quelques cas exceptionnels.
Ces 10 dernières années, de nouvelles structures d'itérations sont apparues, proposant un plus haut niveau d'abstraction : donc une meilleure productivité, moins de ligne de code, donc moins de bug potentiels (que nous décrirons).
Nous partirons d'exemples de code simple et montrerons leur équivalent via ces nouvelles structures puis observerons les avantages (et inconvénients ?). Les exemples seront en JavaScript mais bien entendu applicable dans d'autres langages (Java, C#, Python, Ruby, C++, Scala, Go, Rust, ...).
This article is about using Serverless platform OpenWhisk. The example shows how to do auto retweeting in Python to illustrate an application of serverless approach. Originally published in October 2017 edition of Open Source For You magazine - shared under CC BY SA-3.0 License.
Similar to Realtime processing with storm presentation (20)
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
10. What is Storm?
● Is a highly distributed realtime computation system.
● Provides general primitives to do realtime computation.
● Can be used with any programming language.
● It's scalable and fault-tolerant.
11. Example - Main Concepts
https://github.com/geisbruch/strata2012
12. Main Concepts - Example
"Count tweets related to strataconf and show hashtag totals."
13. Main Concepts - Spout
● First thing we need is to read all tweets related to strataconf from twitter
● A Spout is a special component that is a source of messages to be processed.
Tweets Tuples
Spout
14. Main Concepts - Spout
public class ApiStreamingSpout extends BaseRichSpout {
SpoutOutputCollector collector;
TwitterReader reader;
public void nextTuple() {
Tweet tweet = reader.getNextTweet();
if(tweet != null)
collector.emit(new Values(tweet));
}
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
reader = new TwitterReader(conf.get("track"), conf.get("user"), conf.get("pass"));
this.collector = collector;
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("tweet"));
}
}
Tweets Tuples
Spout
15. Main Concepts - Bolt
● For every tweet, we find hashtags
● A Bolt is a component that does the processing.
tweet hashtags
hashtags
stream tweet hashtags
Spout HashtagsSplitter
16. Main Concepts - Bolt
public class HashtagsSplitter extends BaseBasicBolt {
public void execute(Tuple input, BasicOutputCollector collector) {
Tweet tweet = (Tweet)input.getValueByField("tweet");
for(String hashtag : tweet.getHashTags()){
collector.emit(new Values(hashtag));
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("hashtag"));
}
}
tweet hashtags
tweet hashtags
hashtags
stream
Spout HashtagsSplitter
17. Main Concepts - Bolt
public class HashtagsCounterBolt extends BaseBasicBolt {
private Jedis jedis;
public void execute(Tuple input, BasicOutputCollector collector) {
String hashtag = input.getStringByField("hashtag");
jedis.hincrBy("hashs", key, val);
}
public void prepare(Map conf, TopologyContext context) {
jedis = new Jedis("localhost");
}
}
tweet
stream tweet hashtag
hashtag
hashtag
Spout HashtagsSplitter HashtagsCounterBolt
18. Main Concepts - Topology
public static Topology createTopology() {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("tweets-collector", new ApiStreamingSpout(), 1);
builder.setBolt("hashtags-splitter", new HashTagsSplitter(), 2)
.shuffleGrouping("tweets-collector");
builder.setBolt("hashtags-counter", new HashtagsCounterBolt(), 2)
.fieldsGrouping("hashtags-splitter", new Fields("hashtags"));
return builder.createTopology();
}
fields
shuffle
HashtagsSplitter HashtagsCounterBolt
Spout
HashtagsSplitter HashtagsCounterBolt
19. Example Overview - Read Tweet
Spout Splitter Counter Redis
This tweet has #hello
#world hashtags!!!
20. Example Overview - Split Tweet
Spout Splitter Counter Redis
This tweet has #hello
#world hashtags!!! #hello
#world
21. Example Overview - Split Tweet
Spout Splitter Counter Redis
This tweet has #hello
#world hashtags!! #hello
incr(#hello)
#hello = 1
#world
incr(#world)
#world = 1
22. Example Overview - Split Tweet
Spout Splitter Counter Redis
#hello = 1
This tweet has #bye
#world hashtags
#world = 1
23. Example Overview - Split Tweet
Spout Splitter Counter Redis
#hello = 1
#world
#bye
This tweet has #bye
#world hashtags #world = 1
24. Example Overview - Split Tweet
Spout Splitter Counter Redis
#hello = 1
#world
#bye
This tweet has #bye incr(#bye)
#world hashtags #world = 2
incr(#world) #bye =1
25. Main Concepts - In summary
○ Topology
○ Spout
○ Stream
○ Bolt
51. Trident - Query
Stream singleHashtag =
drpcStream.each(new Fields("args"), new Split(), new Fields("hashtag"))
DRPC Hashtag
DRPC stream Splitter
Server
52. Trident - Query
Stream hashtagsCountQuery =
singleHashtag.stateQuery(hashtagCountMemoryState, new Fields("hashtag"),
new MapGet(), new Fields("count"));
DRPC Hashtag Hashtag Count
stream Splitter Query
DRPC
Server
53. Trident - Query
Stream drpcCountAggregate =
hashtagsCountQuery.groupBy(new Fields("hashtag"))
.aggregate(new Fields("count"), new Sum(), new Fields("sum"));
DRPC Hashtag Hashtag Count Aggregate
DRPC stream Splitter Query Count
Server
54. Trident - Query - Complete Example
topology.newDRPCStream("counts",drpc)
.each(new Fields("args"), new Split(), new Fields("hashtag"))
.stateQuery(hashtags, new Fields("hashtag"),
new MapGet(), new Fields("count"))
.each(new Fields("count"), new FilterNull())
.groupBy(new Fields("hashtag"))
.aggregate(new Fields("count"), new Sum(), new Fields("sum"));
55. Trident - Query - Complete Example
DRPC Hashtag Hashtag Count Aggregate
DRPC stream Splitter Query Count
Server
56. Trident - Query - Complete Example
Client Call
DRPC Hashtag Hashtag Count Aggregate
DRPC stream Splitter Query Count
Server
57. Trident - Query - Complete Example
Hold the
Execute
Request
DRPC Hashtag Hashtag Count Aggregate
DRPC stream Splitter Query Count
Server
58. Trident - Query - Complete Example
Request continue
holded
DRPC Hashtag Hashtag Count Aggregate
DRPC stream Splitter Query Count
Server
On finish return to DRPC
server
59. Trident - Query - Complete Example
DRPC Hashtag Hashtag Count Aggregate
DRPC stream Splitter Query Count
Server
Return response to the
client
61. Running Topologies - Local Cluster
Config conf = new Config();
conf.put("some_property","some_value")
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("twitter-hashtag-summarizer", conf, builder.createTopology());
● The whole topology will run in a single machine.
● Similar to run in a production cluster
● Very Useful for test and development
● Configurable parallelism
● Similar methods than StormSubmitter
62. Running Topologies - Production
package twitter.streaming;
public class Topology {
public static void main(String[] args){
StormSubmitter.submitTopology("twitter-hashtag-summarizer",
conf, builder.createTopology());
}
}
mvn package
storm jar Strata2012-0.0.1.jar twitter.streaming.Topology
"$REDIS_HOST" $REDIS_PORT "strata,strataconf"
$TWITTER_USER $TWITTER_PASS
63. Running Topologies - Cluster Layout
● Single Master process Storm Cluster
Supervisor
● No SPOF (where the processing is done)
● Automatic Task distribution Nimbus Supervisor
(The master
(where the processing is done)
Process)
● Amount of task configurable by
process Supervisor
(where the processing is done)
● Automatic tasks re-distribution on
supervisors fails
Zookeeper Cluster
(Only for coordination and cluster state)
● Work continue on nimbus fails
64. Running Topologies - Cluster Architecture - No SPOF
Storm Cluster
Supervisor
(where the processing is done)
Nimbus Supervisor
(The master Process) (where the processing is done)
Supervisor
(where the processing is done)
65. Running Topologies - Cluster Architecture - No SPOF
Storm Cluster
Supervisor
(where the processing is done)
Nimbus Supervisor
(The master Process) (where the processing is done)
Supervisor
(where the processing is done)
66. Running Topologies - Cluster Architecture - No SPOF
Storm Cluster
Supervisor
(where the processing is done)
Nimbus Supervisor
(The master Process) (where the processing is done)
Supervisor
(where the processing is done)
68. Other features - Non JVM languages
{
● Use Non-JVM Languages "conf": {
"topology.message.timeout.secs": 3,
// etc
● Simple protocol },
"context": {
● Use stdin & stdout for communication "task->component": {
"1": "example-spout",
"2": "__acker",
● Many implementations and abstraction
"3": "example-bolt"
layers done by the community },
"taskid": 3
},
"pidDir": "..."
}
69. Contribution
● A collection of community developed spouts, bolts, serializers, and other goodies.
● You can find it in: https://github.com/nathanmarz/storm-contrib
● Some of those goodies are:
○ cassandra: Integrates Storm and Cassandra by writing tuples to a Cassandra Column Family.
○ mongodb: Provides a spout to read documents from mongo, and a bolt to write documents to mongo.
○ SQS: Provides a spout to consume messages from Amazon SQS.
○ hbase: Provides a spout and a bolt to read and write from HBase.
○ kafka: Provides a spout to read messages from kafka.
○ rdbms: Provides a bolt to write tuples to a table.
○ redis-pubsub: Provides a spout to read messages from redis pubsub.
○ And more...