Stream Analytics in the Enterprise
About Us
• Emerging technology firm focused on helping enterprises build breakthrough
software solutions
• Building software solutions powered by disruptive enterprise software trends
-Machine learning and data science
-Cyber-security
-Enterprise IOT
-Powered by Cloud and Mobile
• Bringing innovation from startups and academic institutions to the enterprise
• Award winning agencies: Inc 500, American Business Awards, International
Business Awards
• The elements of stream analytic solutions
• Stream analytic platforms: on-premise vs. cloud
• On-premise stream analytic platforms
• Cloud stream analytic services
• Complementary technologies
Agenda
The elements of enterprise stream analytic
solutions
• Real time data ingestion
• Execute SQL queries on dynamic streams of data
• Time window queries
• Connect query outputs to new data streams
• Leverage reference data in the stream queries
Capabilities of Stream Analytic Solutions
Stream analytic platforms
Cloud vs. On-premise stream analytic platforms
Capabilities of Stream Analytic Solutions
•Extensibility
•Control
•Rich programming model
•Integration with on-
premise big data pipeline
•Complex infrastructure
•Scalability
•Maintenance and
monitoring
•Simple provisioning
•Elastic scalability
•Integrated with PaaS
offerings
•Rich monitoring and
management
experience
•Integration with on-
premise systems
•Extensibility
•Lack of customization
On-premise stream analytic platforms Cloud stream analytic services
On-premise stream analytic platforms
Lead Platforms
Apache Storm
Apache Spark
Apache Samza
Apache Flink
Akka
Apache Storm
• Stream processing framework with
micro-batching capabilities
• Included in most Hadoop distributions
• Main model (spouts and bolts)
-One at a time
-Lower latency
-Operates on tuple streams
• Trident
-Micro-batching
-Higher throughput
Apache Storm: Benefits vs. Challenges
• Broad adoption
• Included in Hadoop distributions
• Vibrant community
• Extensibility
• Support for different programming
languages
• Increasing competition from newer
stacks
• Performance limitations at very large
scale
Benefits Challenges
Apache Spark
• Micro-batching processing framework
• Elastic scalability models
• Receivers split data into batches
• Spark Streaming processes batches and
produces results
• High throughput – higher latency
• Functional APIs
Spark Streaming: Benefits vs. Challenges
• MPP infrastructure
• Interoperability with other Spark
programming models (Java, Python,
SQL)
• Integration with messaging frameworks
• Extensibility
• Included in most Hadoop distributions
• Time window queries
• Complex infrastructure setup
• Integration with line of business systems
Benefits Challenges
Apache Samza
• Built to address some of the limitations
of Apache Storm
• Deep integration with Samza and Yarn
• Simple API comparable to map-reduce
• Leverages Yarn for task distribution,
fault tolerance and scalability
Apache Samza: Benefits vs. Challenges
• Highly scalable, fault-tolerant model
• Stateful stream data processing
• Extensibility
• Simple infrastructure
• Small adoption
• Low level API
• Heavy IO operations
Benefits Challenges
Apache Flink Streaming
• Alternative to Spark
• Everything is a stream
• Platform to unity batch and stream
processing
• True streaming with adjustable latency
and throughput
• Support different stream sources and
transformations
Apache Flink Streaming: Benefits vs. Challenges
• Combine batch and stream data
processing
• Expressive APIs
• Data flows and transformation
• Extensiblity
• Small adoption
• Limited state management
• High availability models
Benefits Challenges
Akka Streams
• Micro-service, actor oriented model
• Messaging driven
• Isolated failures
• Reactive programming model based on
source, sinks and flows
• DSL for stream data manipulation
Akka Streams: Benefits vs. Challenges
• Rich stream data processing model
• Extensibility
• Concurrency and thread-safey
• Leverage mainstream Java and Scala
programming models
• Small adoption
• Dependent on Akka’s architecture style
• Support for languages outside the JVM
Benefits Challenges
Cloud stream analytic platforms
Lead Platforms
AWS Kinesis Analytics
Azure Stream Analytics
Bluemix Stream Analytics
AWS Kinesis
• Native stream data services in AWS
• Combines three products in a single
platform
-Kinesis Streams
-Kinesis Firehose
-Kinesis Analytics
• Kinesis Streams allows to collect data
streams from any applications
• Kinesis Firehose provides a model to
load streaming data into AWS
• Kinesis Analytics allow the execution of
SQL queries over data streams
AWS Kinesis: Benefits vs. Challenges
• Elastic scalability model
• Simple provisioning
• Interoperable APIs
• Very complete suite of platforms
• AWS Kinesis Analytics hasn’t been
released
• Interoperability with on-premise data
streams
Benefits Challenges
Azure Stream Analytics
• Native stream analytic service in the
Azure platform
• Allow the execution of SQL queries over
dynamic streams of data
• Integrates with the other components of
the Cortana Analytics suite
• Leverages Azure Event Hub for high
volume data ingestion
• Very rich monitoring and analytic
capabilities
Azure Stream Analytcis: Benefits vs. Challenges
• Elastic scalability model
• Simple provisioning
• Interoperable APIs
• Very complete suite of platforms
• Rich SQL query and analytics model
• Interoperability with on-premise data
streams
• Extensibility
Benefits Challenges
Bluemix Streaming Analytics
• Native stream analytic service in the
IBM Bluemix platform
• Built upon IBM Streams technology
• Allow the execution of SQL queries over
dynamic streams of data
• Support interactive and programmatic
query models
• Rich analytic and monitoring
capabilities
• Stream visualization graph
Azure Stream Analytcis: Benefits vs. Challenges
• Elastic scalability model
• Simple provisioning
• Interoperable APIs
• Rich SQL query and analytics model
• Adoption
• Interoperability with on-premise data
streams
• Extensibility
Benefits Challenges
You can’t buy everything!
Capabilities of Enterprise Stream Analytic Solutions
• Stream tracking
• Replay and simulation
• Stream data testing
• Integration with line of business systems
• Stream data search
• Integration with mainstream analytic tools
Complementary technologies
Other Relevant Technologies in Stream Analytic Solutions
• Enterprise messaging platforms
• Time series databases
• Stream data connectors
Enterprise Messaging Platforms
• Persistent messaging
• Pub-sub messaging
• Support for multiple messaging
patterns
• Ordered messaging
Time Series Databases
• Store time stamped data
• Time series query functions
• Integrate real time and reference data
Stream data connectors
• Develop stream data sources from line
of business systems
• Integrate real time and reference data
from enterprise systems into the stream
data pipeline
• Combine real time data from multiple
line of business systems into single data
streams
Summary
• Stream data processing and analytics is a key element of modern enterprise data pipelines
• Some of the lead on-premise stream analytic stacks include: Apache Storm, Apache Samza,
Spark Streaming, Flink Streaming, Akka….
• Some of the lead cloud stream analytic services include: AWS Kinesis, Azure Stream
Analytics, Bluemix Streaming Analytics…
• You can’t buy everything! Stream analytic solution require custom implementations
• When building stream analytic solutions, consider complementary technologies such as
enterprise messaging stacks or time series databases
Thanks
http://Tellago.com
Info@Tellago.com

Stream Analytics in the Enterprise

  • 1.
    Stream Analytics inthe Enterprise
  • 2.
    About Us • Emergingtechnology firm focused on helping enterprises build breakthrough software solutions • Building software solutions powered by disruptive enterprise software trends -Machine learning and data science -Cyber-security -Enterprise IOT -Powered by Cloud and Mobile • Bringing innovation from startups and academic institutions to the enterprise • Award winning agencies: Inc 500, American Business Awards, International Business Awards
  • 3.
    • The elementsof stream analytic solutions • Stream analytic platforms: on-premise vs. cloud • On-premise stream analytic platforms • Cloud stream analytic services • Complementary technologies Agenda
  • 4.
    The elements ofenterprise stream analytic solutions
  • 5.
    • Real timedata ingestion • Execute SQL queries on dynamic streams of data • Time window queries • Connect query outputs to new data streams • Leverage reference data in the stream queries Capabilities of Stream Analytic Solutions
  • 6.
  • 7.
    Cloud vs. On-premisestream analytic platforms
  • 8.
    Capabilities of StreamAnalytic Solutions •Extensibility •Control •Rich programming model •Integration with on- premise big data pipeline •Complex infrastructure •Scalability •Maintenance and monitoring •Simple provisioning •Elastic scalability •Integrated with PaaS offerings •Rich monitoring and management experience •Integration with on- premise systems •Extensibility •Lack of customization On-premise stream analytic platforms Cloud stream analytic services
  • 9.
  • 10.
    Lead Platforms Apache Storm ApacheSpark Apache Samza Apache Flink Akka
  • 11.
    Apache Storm • Streamprocessing framework with micro-batching capabilities • Included in most Hadoop distributions • Main model (spouts and bolts) -One at a time -Lower latency -Operates on tuple streams • Trident -Micro-batching -Higher throughput
  • 12.
    Apache Storm: Benefitsvs. Challenges • Broad adoption • Included in Hadoop distributions • Vibrant community • Extensibility • Support for different programming languages • Increasing competition from newer stacks • Performance limitations at very large scale Benefits Challenges
  • 13.
    Apache Spark • Micro-batchingprocessing framework • Elastic scalability models • Receivers split data into batches • Spark Streaming processes batches and produces results • High throughput – higher latency • Functional APIs
  • 14.
    Spark Streaming: Benefitsvs. Challenges • MPP infrastructure • Interoperability with other Spark programming models (Java, Python, SQL) • Integration with messaging frameworks • Extensibility • Included in most Hadoop distributions • Time window queries • Complex infrastructure setup • Integration with line of business systems Benefits Challenges
  • 15.
    Apache Samza • Builtto address some of the limitations of Apache Storm • Deep integration with Samza and Yarn • Simple API comparable to map-reduce • Leverages Yarn for task distribution, fault tolerance and scalability
  • 16.
    Apache Samza: Benefitsvs. Challenges • Highly scalable, fault-tolerant model • Stateful stream data processing • Extensibility • Simple infrastructure • Small adoption • Low level API • Heavy IO operations Benefits Challenges
  • 17.
    Apache Flink Streaming •Alternative to Spark • Everything is a stream • Platform to unity batch and stream processing • True streaming with adjustable latency and throughput • Support different stream sources and transformations
  • 18.
    Apache Flink Streaming:Benefits vs. Challenges • Combine batch and stream data processing • Expressive APIs • Data flows and transformation • Extensiblity • Small adoption • Limited state management • High availability models Benefits Challenges
  • 19.
    Akka Streams • Micro-service,actor oriented model • Messaging driven • Isolated failures • Reactive programming model based on source, sinks and flows • DSL for stream data manipulation
  • 20.
    Akka Streams: Benefitsvs. Challenges • Rich stream data processing model • Extensibility • Concurrency and thread-safey • Leverage mainstream Java and Scala programming models • Small adoption • Dependent on Akka’s architecture style • Support for languages outside the JVM Benefits Challenges
  • 21.
  • 22.
    Lead Platforms AWS KinesisAnalytics Azure Stream Analytics Bluemix Stream Analytics
  • 23.
    AWS Kinesis • Nativestream data services in AWS • Combines three products in a single platform -Kinesis Streams -Kinesis Firehose -Kinesis Analytics • Kinesis Streams allows to collect data streams from any applications • Kinesis Firehose provides a model to load streaming data into AWS • Kinesis Analytics allow the execution of SQL queries over data streams
  • 24.
    AWS Kinesis: Benefitsvs. Challenges • Elastic scalability model • Simple provisioning • Interoperable APIs • Very complete suite of platforms • AWS Kinesis Analytics hasn’t been released • Interoperability with on-premise data streams Benefits Challenges
  • 25.
    Azure Stream Analytics •Native stream analytic service in the Azure platform • Allow the execution of SQL queries over dynamic streams of data • Integrates with the other components of the Cortana Analytics suite • Leverages Azure Event Hub for high volume data ingestion • Very rich monitoring and analytic capabilities
  • 26.
    Azure Stream Analytcis:Benefits vs. Challenges • Elastic scalability model • Simple provisioning • Interoperable APIs • Very complete suite of platforms • Rich SQL query and analytics model • Interoperability with on-premise data streams • Extensibility Benefits Challenges
  • 27.
    Bluemix Streaming Analytics •Native stream analytic service in the IBM Bluemix platform • Built upon IBM Streams technology • Allow the execution of SQL queries over dynamic streams of data • Support interactive and programmatic query models • Rich analytic and monitoring capabilities • Stream visualization graph
  • 28.
    Azure Stream Analytcis:Benefits vs. Challenges • Elastic scalability model • Simple provisioning • Interoperable APIs • Rich SQL query and analytics model • Adoption • Interoperability with on-premise data streams • Extensibility Benefits Challenges
  • 29.
    You can’t buyeverything!
  • 30.
    Capabilities of EnterpriseStream Analytic Solutions • Stream tracking • Replay and simulation • Stream data testing • Integration with line of business systems • Stream data search • Integration with mainstream analytic tools
  • 31.
  • 32.
    Other Relevant Technologiesin Stream Analytic Solutions • Enterprise messaging platforms • Time series databases • Stream data connectors
  • 33.
    Enterprise Messaging Platforms •Persistent messaging • Pub-sub messaging • Support for multiple messaging patterns • Ordered messaging
  • 34.
    Time Series Databases •Store time stamped data • Time series query functions • Integrate real time and reference data
  • 35.
    Stream data connectors •Develop stream data sources from line of business systems • Integrate real time and reference data from enterprise systems into the stream data pipeline • Combine real time data from multiple line of business systems into single data streams
  • 36.
    Summary • Stream dataprocessing and analytics is a key element of modern enterprise data pipelines • Some of the lead on-premise stream analytic stacks include: Apache Storm, Apache Samza, Spark Streaming, Flink Streaming, Akka…. • Some of the lead cloud stream analytic services include: AWS Kinesis, Azure Stream Analytics, Bluemix Streaming Analytics… • You can’t buy everything! Stream analytic solution require custom implementations • When building stream analytic solutions, consider complementary technologies such as enterprise messaging stacks or time series databases
  • 37.