This document provides a summary and comparison of key concepts between Apache Storm and Apache Hadoop. Storm allows for real-time processing of data streams through topologies that run continuously, as opposed to batch processing of jobs in Hadoop. Storm provides guarantees of no data loss and scalability similar to Hadoop. The document outlines Storm components, stream processing concepts like groupings, and how to achieve exactly-once processing through transactional spouts and storing metadata with state updates.
A tutorial presentation based on storm.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
A tutorial presentation based on storm.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
Some of the biggest issues at the center of analyzing large amounts of data are query flexibility, latency, and fault tolerance. Modern technologies that build upon the success of “big data” platforms, such as Apache Hadoop, have made it possible to spread the load of data analysis to commodity machines, but these analyses can still take hours to run and do not respond well to rapidly-changing data sets.
A new generation of data processing platforms -- which we call “stream architectures” -- have converted data sources into streams of data that can be processed and analyzed in real-time. This has led to the development of various distributed real-time computation frameworks (e.g. Apache Storm) and multi-consumer data integration technologies (e.g. Apache Kafka). Together, they offer a way to do predictable computation on real-time data streams.
In this talk, we will give an overview of these technologies and how they fit into the Python ecosystem. As part of this presentation, we also released streamparse, a new Python that makes it easy to debug and run large Storm clusters.
Links:
* http://parse.ly/code
* https://github.com/Parsely/streamparse
* https://github.com/getsamsa/samsa
Talk held at the FrOSCon 2013 on 24.08.2013 in Sankt Augustin, Germany
Agenda:
- Why Twitter Storm?
- What is Twitter Storm?
- What to do with Twitter Storm?
Storm is a distributed, reliable, fault-tolerant system for processing streams of data.
In this track we will introduce Storm framework, explain some design concepts and considerations, and show some real world examples to explain how to use it to process large amounts of data in real time, in a distributed environment. We will describe how we can scale this solution very easily as more data need to be processed.
We will explain all you need to know to get started with Storm and some tips on how to get your Spouts, Bolts and Topologies up and running in the cloud.
This talk presents a number of conceptual and technical challenges that we discovered while building JRebel. At first, the JVM wasn't designed for live updates, so we will talk about the engine that mitigates the problem. Secondly, the diversity of Java ecosystem, created by the variety of application servers, the frameworks and tools, makes it challenging in creating a generic solution that would fit the majority of developers. We will see, how Java platform itself allows us to develop a solution by applying bytecode instrumentation mechanism.
JRebel does live code reloading to ensure that the developer can keep instantly alternating between the developing environment and the web browser, to save wasted time and increase the productivity flow.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
Engaging users in real-time is the topic of our times. Whether it’s a game, a shop, or a content-network, the aim remains the same: providing a personalized experience. In this workshop we will look under the hood of Apache Storm and lay a firm foundation on how to use it with PHP. By that, you can leverage your existing codebase and PHP expertise for an entirely new world: real-time analytics and business logic working on message streams. During the course of the workshop, we will introduce Apache Storm and take a look at all of its components. We will then skyrocket the applicability of Storm by showing you how to implement their components with PHP. All exercises will be conducted using an example project, the infamous and most exhilarating lolcat kitten game ever conceived: Plan 9 From Outer Kitten. In order to follow the hands-on excercises, you will need a development VM prepared by us with all relevant system components and our project repositories. To make the workshop experience as smooth as possible for all participants, please bring a prepared computer to the workshop, as there will be no time to deal with installation and setup issues. Please download all prerequisites and install them as described: VM, Plan 9 webapp, Plan 9 storm backend, (Tutorial: https://github.com/DECK36/plan9_workshop_tutorial ).
Is your profiler speaking the same language as you? -- Docklands JUGSimon Maple
Profilers are absolute beasts. And profilers might prove useful to pinpoint the performance issues in your Java applications.
By using profilers, developers are fortunate to find the root cause of an issue at hand. However, it requires effort to actually comprehend the data collected by the profiler. Due to the inherent complexity of the data, one has to understand how this data is collected. And thus understand how the profiler actually works.
During this talk we will go through the classic profiler features. What is a hotspot? What is the difference between sampling and instrumentation from the profiler point of view? What are the problems with either of those methods? What is the time budget of the application? And more!
I will also showcase a new kid on the block among the profiling tools: XRebel. This tool provides insight into application behaviour and permits the developers to discover application level issues.
Some of the biggest issues at the center of analyzing large amounts of data are query flexibility, latency, and fault tolerance. Modern technologies that build upon the success of “big data” platforms, such as Apache Hadoop, have made it possible to spread the load of data analysis to commodity machines, but these analyses can still take hours to run and do not respond well to rapidly-changing data sets.
A new generation of data processing platforms -- which we call “stream architectures” -- have converted data sources into streams of data that can be processed and analyzed in real-time. This has led to the development of various distributed real-time computation frameworks (e.g. Apache Storm) and multi-consumer data integration technologies (e.g. Apache Kafka). Together, they offer a way to do predictable computation on real-time data streams.
In this talk, we will give an overview of these technologies and how they fit into the Python ecosystem. As part of this presentation, we also released streamparse, a new Python that makes it easy to debug and run large Storm clusters.
Links:
* http://parse.ly/code
* https://github.com/Parsely/streamparse
* https://github.com/getsamsa/samsa
Talk held at the FrOSCon 2013 on 24.08.2013 in Sankt Augustin, Germany
Agenda:
- Why Twitter Storm?
- What is Twitter Storm?
- What to do with Twitter Storm?
Storm is a distributed, reliable, fault-tolerant system for processing streams of data.
In this track we will introduce Storm framework, explain some design concepts and considerations, and show some real world examples to explain how to use it to process large amounts of data in real time, in a distributed environment. We will describe how we can scale this solution very easily as more data need to be processed.
We will explain all you need to know to get started with Storm and some tips on how to get your Spouts, Bolts and Topologies up and running in the cloud.
This talk presents a number of conceptual and technical challenges that we discovered while building JRebel. At first, the JVM wasn't designed for live updates, so we will talk about the engine that mitigates the problem. Secondly, the diversity of Java ecosystem, created by the variety of application servers, the frameworks and tools, makes it challenging in creating a generic solution that would fit the majority of developers. We will see, how Java platform itself allows us to develop a solution by applying bytecode instrumentation mechanism.
JRebel does live code reloading to ensure that the developer can keep instantly alternating between the developing environment and the web browser, to save wasted time and increase the productivity flow.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
Engaging users in real-time is the topic of our times. Whether it’s a game, a shop, or a content-network, the aim remains the same: providing a personalized experience. In this workshop we will look under the hood of Apache Storm and lay a firm foundation on how to use it with PHP. By that, you can leverage your existing codebase and PHP expertise for an entirely new world: real-time analytics and business logic working on message streams. During the course of the workshop, we will introduce Apache Storm and take a look at all of its components. We will then skyrocket the applicability of Storm by showing you how to implement their components with PHP. All exercises will be conducted using an example project, the infamous and most exhilarating lolcat kitten game ever conceived: Plan 9 From Outer Kitten. In order to follow the hands-on excercises, you will need a development VM prepared by us with all relevant system components and our project repositories. To make the workshop experience as smooth as possible for all participants, please bring a prepared computer to the workshop, as there will be no time to deal with installation and setup issues. Please download all prerequisites and install them as described: VM, Plan 9 webapp, Plan 9 storm backend, (Tutorial: https://github.com/DECK36/plan9_workshop_tutorial ).
Is your profiler speaking the same language as you? -- Docklands JUGSimon Maple
Profilers are absolute beasts. And profilers might prove useful to pinpoint the performance issues in your Java applications.
By using profilers, developers are fortunate to find the root cause of an issue at hand. However, it requires effort to actually comprehend the data collected by the profiler. Due to the inherent complexity of the data, one has to understand how this data is collected. And thus understand how the profiler actually works.
During this talk we will go through the classic profiler features. What is a hotspot? What is the difference between sampling and instrumentation from the profiler point of view? What are the problems with either of those methods? What is the time budget of the application? And more!
I will also showcase a new kid on the block among the profiling tools: XRebel. This tool provides insight into application behaviour and permits the developers to discover application level issues.
This slides are for a brief seminar that I give in a Ph.D. exam "Perspective in Parallel Computing" (held by prof. Marco Danelutto) at University of Pisa (Italy).
They are a rapid introduction to Apache Storm and how it relates to classical algorithmic skeleton parallel frameworks
Developing Java Streaming Applications with Apache StormLester Martin
Apache Storm, http://storm.apache.org, is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. During this presentation, a simple Java-based streaming application will be built from scratch!
Code examples can be found at https://github.com/lestermartin/streaming-exploration.
torm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Finally Java SE 7 is GA and you can start using it. This talk will cover the most important new features of the language and the virtual machine. It will also cover some features that did not make it in to the SE 7 release. Finally we will discuss current state of Java as an ecosystem and my analysis and hopes for the future.
Apache Storm 0.9 basic training - VerisignMichael Noll
Apache Storm 0.9 basic training (130 slides) covering:
1. Introducing Storm: history, Storm adoption in the industry, why Storm
2. Storm core concepts: topology, data model, spouts and bolts, groupings, parallelism
3. Operating Storm: architecture, hardware specs, deploying, monitoring
4. Developing Storm apps: Hello World, creating a bolt, creating a topology, running a topology, integrating Storm and Kafka, testing, data serialization in Storm, example apps, performance and scalability tuning
5. Playing with Storm using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/09/15/apache-storm-training-deck-and-tutorial/
Many thanks to the Twitter Engineering team (the creators of Storm) and the Apache Storm open source community!
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
A tale of scale & speed: How the US Navy is enabling software delivery from l...
Storm 0.8.2
1. Slide updated for
STORM 0.8.2
STORM
COMPARISON – INTRODUCTION - CONCEPTS
PRESENTATION BY KASPER MADSEN
NOVEMBER - 2012
2. HADOOP VS STORM
Batch processing Real-time processing
Jobs runs to completion Topologies run forever
JobTracker is SPOF* No single point of failure
Stateful nodes Stateless nodes
Scalable Scalable
Guarantees no data loss Guarantees no data loss
Open source Open source
* Hadoop 0.21 added some checkpointing
SPOF: Single Point Of Failure
3. COMPONENTS
Nimbus daemon is comparable to Hadoop JobTracker. It is the master
Supervisor daemon spawns workers, it is comparable to Hadoop TaskTracker
Worker is spawned by supervisor, one per port defined in storm.yaml configuration
Executor is spawned by worker, run as a thread
Task is spawned by executors, run as a thread
Zookeeper* is a distributed system, used to store metadata. Nimbus and
Supervisor daemons are fail-fast and stateless. All state is kept in Zookeeper.
Notice all communication between Nimbus and
Supervisors are done through Zookeeper
On a cluster with 2k+1 zookeeper nodes, the system
can recover when maximally k nodes fails.
* Zookeeper is an Apache top-level project
4. EXECUTORS
Executor is a new abstraction
• Disassociate tasks of a
component to #threads
• Allows dynamically
changing #executors,
without changing #tasks
• Makes elasticity much
simpler, as semantics are
kept valid (e.g. for a
grouping)
• Enables elasticity in a
multi-core environment
5. STREAMS
Stream is an unbounded sequence of tuples.
Topology is a graph where each node is a spout or bolt, and the edges indicate
which bolts are subscribing to which streams.
• A spout is a source of a stream
• A bolt is consuming a stream (possibly emits a new one)
Subscribes: A
• An edge represents a grouping Emits: C
Subscribes: C & D
Subscribes: A
Source of stream A Emits: D
Source of stream B
Subscribes:A & B
6. GROUPINGS
Each spout or bolt are running X instances in parallel (called tasks).
Groupings are used to decide which task in the subscribing bolt, the tuple is sent to
Shuffle grouping is a random grouping
Fields grouping is grouped by value, such that equal value results in equal task
All grouping replicates to all tasks
Global grouping makes all tuples go to one task
None grouping makes bolt run in same thread as bolt/spout it subscribes to
Direct grouping producer (task that emits) controls which consumer will receive
4 tasks 3 tasks
2 tasks
2 tasks
7. TestWordSpout ExclamationBolt ExclamationBolt
EXAMPLE
TopologyBuilder builder = new TopologyBuilder(); Create stream called ”words”
Run 10 tasks
builder.setSpout("words", new TestWordSpout(), 10);
Create stream called ”exclaim1”
builder.setBolt("exclaim1", new ExclamationBolt(), 3) Run 3 tasks
Subscribe to stream ”words”,
.shuffleGrouping("words"); using shufflegrouping
Create stream called ”exclaim2”
builder.setBolt("exclaim2", new ExclamationBolt(), 2)
Run 2 tasks
.shuffleGrouping("exclaim1"); Subscribe to stream ”exclaim1”,
using shufflegrouping
A bolt can subscribe to an unlimited number of
streams, by chaining groupings.
The sourcecode for this example is part of the storm-starter project on github
8. TestWordSpout ExclamationBolt ExclamationBolt
EXAMPLE – 1
TestWordSpout
public void nextTuple() {
Utils.sleep(100);
final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"};
final Random rand = new Random();
final String word = words[rand.nextInt(words.length)];
_collector.emit(new Values(word));
}
The TestWordSpout emits a random string from the
array words, each 100 milliseconds
9. TestWordSpout ExclamationBolt ExclamationBolt
EXAMPLE – 2
ExclamationBolt Prepare is called when bolt is created
OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
} Execute is called for each tuple
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
} declareOutputFields is called when bolt is created
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
declareOutputFields is used to declare streams and their schemas. It
is possible to declare several streams and specify the stream to use
when outputting tuples in the emit function call.
10. TRIDENT TOPOLOGY
Trident topology is a new abstraction built on top of STORM primitives
• Supports
• Joins
• Aggregations
• Grouping
• Functions
• Filters
• Easy to use, read the wiki
• Guarantees exactly-once processing - if using (opaque) transactional spout
• Some basic ideas are equal to the deprecated transactional topology*
• Tuples are processed as small batches
• Each batch gets a transaction id, if batch is replayed same txid is given
• State updates are strongly ordered among batches
• State updates atomically stores meta-data with data
• Transactional topology is superseded by the Trident topology from 0.8.0
*see my first slide (march 2012) on STORM, for detailed information. www.slideshare.com/KasperMadsen
11. EXACTLY-ONCE-PROCESSING - 1
Transactional spouts guarantees same data is replayed for every batch
Guaranteeing exactly-once-processing for transactional spouts
• txid is stored with data, such that last txid that updated the data is known
• Information is used to know what to update in case of replay
Example
1. Currently processing txid: 2, with data [”man”, ”dog”, ”dog”]
2. Current state is:
”man” => [count=3, txid=1]
”dog” => [count=2, txid=2]
3. Batch with txid 2, fails and gets replayed.
4. Resulting state is
”man” => [count=4, txid=2]
”dog” => [count=2, txid=2]
5. Because txid is stored with the data, it is known the count for “dog” should
not be increased again.
12. EXACTLY-ONCE-PROCESSING - 2
Opaque transactional spout is not guaranteed to replay same data for a failed
batch, as originally existed in the batch.
• Guarantees every tuple is successfully processed in exactly one batch
• Useful for having exactly-once-processing and allowing some inputs to fail
Guaranteeing exactly-once-processing for opaque transactional spouts
•
Same trick doesn’t work, as replayed batch might be changed, meaning
some state might now have stored incorrect data. Consider previous
example!
• Problem is solved by storing more meta-data with data (previous value)
Example
Step Data Count prevValue Txid Updates dog
count then fails
1 2 dog 1 cat 2,1 0,0 1,1
2 1 dog 2 cat 3,1 2,1 2,1
2.1 2 dog 2 cat 4, 3 2,1 2,2
Consider the potential problems if the Batch contains new data, but updates
new data for 2.1 doesn’t contain any cat. ok as previous values are used
13. ELASTICITY
• Rebalancing workers and executors (not tasks)
• Pause spouts
• Wait for message timeout
• Set new assignment
• All moved tasks will be killed and restarted in new location
• Swapping (STORM 0.8.2)
• Submit inactive new topology
• Pause spouts of old topology
• Wait for message timeout of old topology
• Activate new topology
• Deactivate old topology
• Kill old topology What about state on tasks
which are killed and restarted?
It is up to the user to solve!
14. LEARN MORE
Website (http://storm-project.net/)
Wiki (https://github.com/nathanmarz/storm/wiki)
Storm-starter (https://github.com/nathanmarz/storm-starter)
Mailing list (http://groups.google.com/group/storm-user)
#storm-user room on freenode
UTSL: https://github.com/nathanmarz/storm
More slides: www.slideshare.net/KasperMadsen
from: http://www.cupofjoe.tv/2010/11/learn-lesson.html