This presentation examines some of the top stream analytic platforms in the enterprise. The slide deck explores the characteristics of enterprise stream analytic solutions and discusses the capabilties of some of the top stream analytic platform in the current market.
2. About Us
• Emerging technology firm focused on helping enterprises build breakthrough
software solutions
• Building software solutions powered by disruptive enterprise software trends
-Machine learning and data science
-Cyber-security
-Enterprise IOT
-Powered by Cloud and Mobile
• Bringing innovation from startups and academic institutions to the enterprise
• Award winning agencies: Inc 500, American Business Awards, International
Business Awards
3. • The elements of stream analytic solutions
• Stream analytic platforms: on-premise vs. cloud
• On-premise stream analytic platforms
• Cloud stream analytic services
• Complementary technologies
Agenda
5. • Real time data ingestion
• Execute SQL queries on dynamic streams of data
• Time window queries
• Connect query outputs to new data streams
• Leverage reference data in the stream queries
Capabilities of Stream Analytic Solutions
8. Capabilities of Stream Analytic Solutions
•Extensibility
•Control
•Rich programming model
•Integration with on-
premise big data pipeline
•Complex infrastructure
•Scalability
•Maintenance and
monitoring
•Simple provisioning
•Elastic scalability
•Integrated with PaaS
offerings
•Rich monitoring and
management
experience
•Integration with on-
premise systems
•Extensibility
•Lack of customization
On-premise stream analytic platforms Cloud stream analytic services
11. Apache Storm
• Stream processing framework with
micro-batching capabilities
• Included in most Hadoop distributions
• Main model (spouts and bolts)
-One at a time
-Lower latency
-Operates on tuple streams
• Trident
-Micro-batching
-Higher throughput
12. Apache Storm: Benefits vs. Challenges
• Broad adoption
• Included in Hadoop distributions
• Vibrant community
• Extensibility
• Support for different programming
languages
• Increasing competition from newer
stacks
• Performance limitations at very large
scale
Benefits Challenges
13. Apache Spark
• Micro-batching processing framework
• Elastic scalability models
• Receivers split data into batches
• Spark Streaming processes batches and
produces results
• High throughput – higher latency
• Functional APIs
14. Spark Streaming: Benefits vs. Challenges
• MPP infrastructure
• Interoperability with other Spark
programming models (Java, Python,
SQL)
• Integration with messaging frameworks
• Extensibility
• Included in most Hadoop distributions
• Time window queries
• Complex infrastructure setup
• Integration with line of business systems
Benefits Challenges
15. Apache Samza
• Built to address some of the limitations
of Apache Storm
• Deep integration with Samza and Yarn
• Simple API comparable to map-reduce
• Leverages Yarn for task distribution,
fault tolerance and scalability
16. Apache Samza: Benefits vs. Challenges
• Highly scalable, fault-tolerant model
• Stateful stream data processing
• Extensibility
• Simple infrastructure
• Small adoption
• Low level API
• Heavy IO operations
Benefits Challenges
17. Apache Flink Streaming
• Alternative to Spark
• Everything is a stream
• Platform to unity batch and stream
processing
• True streaming with adjustable latency
and throughput
• Support different stream sources and
transformations
18. Apache Flink Streaming: Benefits vs. Challenges
• Combine batch and stream data
processing
• Expressive APIs
• Data flows and transformation
• Extensiblity
• Small adoption
• Limited state management
• High availability models
Benefits Challenges
19. Akka Streams
• Micro-service, actor oriented model
• Messaging driven
• Isolated failures
• Reactive programming model based on
source, sinks and flows
• DSL for stream data manipulation
20. Akka Streams: Benefits vs. Challenges
• Rich stream data processing model
• Extensibility
• Concurrency and thread-safey
• Leverage mainstream Java and Scala
programming models
• Small adoption
• Dependent on Akka’s architecture style
• Support for languages outside the JVM
Benefits Challenges
23. AWS Kinesis
• Native stream data services in AWS
• Combines three products in a single
platform
-Kinesis Streams
-Kinesis Firehose
-Kinesis Analytics
• Kinesis Streams allows to collect data
streams from any applications
• Kinesis Firehose provides a model to
load streaming data into AWS
• Kinesis Analytics allow the execution of
SQL queries over data streams
24. AWS Kinesis: Benefits vs. Challenges
• Elastic scalability model
• Simple provisioning
• Interoperable APIs
• Very complete suite of platforms
• AWS Kinesis Analytics hasn’t been
released
• Interoperability with on-premise data
streams
Benefits Challenges
25. Azure Stream Analytics
• Native stream analytic service in the
Azure platform
• Allow the execution of SQL queries over
dynamic streams of data
• Integrates with the other components of
the Cortana Analytics suite
• Leverages Azure Event Hub for high
volume data ingestion
• Very rich monitoring and analytic
capabilities
26. Azure Stream Analytcis: Benefits vs. Challenges
• Elastic scalability model
• Simple provisioning
• Interoperable APIs
• Very complete suite of platforms
• Rich SQL query and analytics model
• Interoperability with on-premise data
streams
• Extensibility
Benefits Challenges
27. Bluemix Streaming Analytics
• Native stream analytic service in the
IBM Bluemix platform
• Built upon IBM Streams technology
• Allow the execution of SQL queries over
dynamic streams of data
• Support interactive and programmatic
query models
• Rich analytic and monitoring
capabilities
• Stream visualization graph
28. Azure Stream Analytcis: Benefits vs. Challenges
• Elastic scalability model
• Simple provisioning
• Interoperable APIs
• Rich SQL query and analytics model
• Adoption
• Interoperability with on-premise data
streams
• Extensibility
Benefits Challenges
30. Capabilities of Enterprise Stream Analytic Solutions
• Stream tracking
• Replay and simulation
• Stream data testing
• Integration with line of business systems
• Stream data search
• Integration with mainstream analytic tools
32. Other Relevant Technologies in Stream Analytic Solutions
• Enterprise messaging platforms
• Time series databases
• Stream data connectors
33. Enterprise Messaging Platforms
• Persistent messaging
• Pub-sub messaging
• Support for multiple messaging
patterns
• Ordered messaging
34. Time Series Databases
• Store time stamped data
• Time series query functions
• Integrate real time and reference data
35. Stream data connectors
• Develop stream data sources from line
of business systems
• Integrate real time and reference data
from enterprise systems into the stream
data pipeline
• Combine real time data from multiple
line of business systems into single data
streams
36. Summary
• Stream data processing and analytics is a key element of modern enterprise data pipelines
• Some of the lead on-premise stream analytic stacks include: Apache Storm, Apache Samza,
Spark Streaming, Flink Streaming, Akka….
• Some of the lead cloud stream analytic services include: AWS Kinesis, Azure Stream
Analytics, Bluemix Streaming Analytics…
• You can’t buy everything! Stream analytic solution require custom implementations
• When building stream analytic solutions, consider complementary technologies such as
enterprise messaging stacks or time series databases