A LITTLE BEE BOOK
“How it Works”
Streaming Analytics
This book belongs to:
A LITTLE BEE BOOK
“How it Works”
Streaming Analytics
Adapted from a variety of sources by Bob Yelland
With thanks to Avi Patwardhan & Kimberly Madia
For more copies of this book, or to read others in the series, visit: littlebeelibrary.com
BACK NEXT
4
Sometimes two minutes is too late.
Organisations need to spot risks and opportunities in
high-velocity data – opportunities that often can be
detected and acted on only at a moment’s notice.
For time-sensitive processes such as thwarting
fraud, mitigating security threats or responding to
natural disasters, time is of the essence.
Real-time analytics (or stream computing) enables
continuous processing of data streams and can be
used to maximise the time value of data.
BACK NEXT
6
A key difference between stream computing and
online analytical processing (OLAP) is that the
latter requires data to be “at rest” before running
analytics.
Stream computing is a processing paradigm that
brings the analytics to the data, rather than storing
the data first.
The ability to be able to analyse data in real time
shifts the conversation from how to manage big
data to how to make sense of, analyse and act on it
at high velocities.
Analysing “data in motion” leads to immediate and
accurate decision making.
BACK NEXT
8
Stream computing can deliver a rapid return on
investment.
A healthcare firm realised 95% faster insight into
patient health by accelerating the execution of
complex algorithms. This saves lives by flagging the
risk of serious medical conditions. It also enables
effective targeting of patient care, thereby optimising
healthcare resources.
A utility company saved more than 700,000 gallons of
fuel and lowered costs for consumers by $24 million
by analysing the data from 2.3 million smart meters.
A telecommunication company improved marketing
effectiveness by 70% using behavioural based
segmentation to create dynamic offers.
BACK NEXT
10
There are four broad ways that stream computing is
being used today:
Streaming Extract, Transform and Load (ETL) –
Data is continuously cleaned and aggregated before
being pushed into data stores.
Triggers – Anomalous behaviour is detected in real
time, and further downstream actions are triggered
accordingly.
Data enrichment – Live data is enriched with more
information by joining it with a static dataset, allowing
for a more complete real-time analysis.
Complex sessions and continuous learning –
Events related to a live session (e.g. website activity)
are grouped together and analysed.
BACK NEXT
12
These uses have given rise to a number of industry
applications. For example:
Telecommunications
–– Call detail processing
–– Customer churn prediction
–– Device geomapping.
Travel and Transportation
–– Intelligent traffic management
–– Automotive telematics.
Energy and Utilities
–– Usage forecasting
–– Equipment monitoring.
Financial Services
–– Fraud detection & prevention
–– Targeted marketing
–– Cybersecurity monitoring.
BACK NEXT
14
Stream Processing (ESP) and Complex Event
Processing (CEP) are very similar concepts, but there
are some important differences:
Speed through Parallelism
CEP is often centralised. Stream applications
are deployed across many nodes to maximise
parallelism and scalability.
Deeper Analysis
CEP uses a rules engine to evaluate if-then-else style
rules, or an in-memory SQL database to perform
continuous simple queries. ESP provides more
options to analyse data though a comprehensive
programming language (SPL).
Broader Data Types
CEP engines handle structured data. Streams
have been designed to analyse all manner of data,
including image, video and acoustic data types.
BACK NEXT
16
Most stream computing platforms include two
core components: an application development
environment to build applications that ingest and
process data streams, and a runtime capability
designed to process data streams with low latency at
massive scale seamlessly across infrastructure.
In addition, streaming analytics toolkits improve the
productivity of developers and data scientists in
crafting complex analytics, such as natural language
processing, voice analytics and facial recognition.
BACK NEXT
18
There are a number of open source offerings that
can support streaming analytics:
Apache Storm, written in Clojure, was created
by Twitter and is composed of other open source
components, especially ZooKeeper for cluster
management, ZeroMQ for multicast messaging, and
Kafka for queued messaging.
Apache Spark, written in Scala, is a general
framework for large-scale data processing that
supports lots of different programming languages
and concepts such as MapReduce, in-memory
processing, stream processing, graph processing
and machine learning.
Apache Akka is a toolkit and runtime for building
highly concurrent, distributed, and resilient
message‑driven applications on the Java Virtual
Machine (JVM).
BACK NEXT
20
IBM InfoSphere Streams is an open platform that
blends the best elements of shareware, open source
software and open standards with powerful vendor-
developed technology.
IBM Streams is highly efficient, using 14.2 times fewer
hardware resources and delivering 12.3 times more
throughput compared to open source offerings.
It offers a highly scalable event server, integration
capabilities, and other typical features required for
implementing stream processing use cases.
IBM Streams is a leader in the Forrester Wave for
Big Data Analytics Platforms.
BACK NEXT
22
The latest IBM Streams update focuses on developer
productivity.
The Integrated Development Environment (IDE) is
based on Eclipse and offers visual development and
configuration.
It delivers faster streaming application delivery
by allowing the creation of streaming applications
in Java.
A developer with no prior Streams knowledge can
create applications in under an hour using Java
APIs for streaming analytic libraries such as natural
language processing, spatial, temporal, acoustic,
image recognition and more.
Time is of the essence…
why not trial IBM Streams today?
BACK NEXT
24
© Copyright IBM Corporation 2017. All Rights Reserved.
IBM, the IBM logo and ibm.com are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both.
Other product, company or service names may be trademarks or service marks of others.

Streaming analytics

  • 1.
    A LITTLE BEEBOOK “How it Works” Streaming Analytics
  • 2.
    This book belongsto: A LITTLE BEE BOOK “How it Works” Streaming Analytics Adapted from a variety of sources by Bob Yelland With thanks to Avi Patwardhan & Kimberly Madia For more copies of this book, or to read others in the series, visit: littlebeelibrary.com BACK NEXT
  • 3.
    4 Sometimes two minutesis too late. Organisations need to spot risks and opportunities in high-velocity data – opportunities that often can be detected and acted on only at a moment’s notice. For time-sensitive processes such as thwarting fraud, mitigating security threats or responding to natural disasters, time is of the essence. Real-time analytics (or stream computing) enables continuous processing of data streams and can be used to maximise the time value of data. BACK NEXT
  • 4.
    6 A key differencebetween stream computing and online analytical processing (OLAP) is that the latter requires data to be “at rest” before running analytics. Stream computing is a processing paradigm that brings the analytics to the data, rather than storing the data first. The ability to be able to analyse data in real time shifts the conversation from how to manage big data to how to make sense of, analyse and act on it at high velocities. Analysing “data in motion” leads to immediate and accurate decision making. BACK NEXT
  • 5.
    8 Stream computing candeliver a rapid return on investment. A healthcare firm realised 95% faster insight into patient health by accelerating the execution of complex algorithms. This saves lives by flagging the risk of serious medical conditions. It also enables effective targeting of patient care, thereby optimising healthcare resources. A utility company saved more than 700,000 gallons of fuel and lowered costs for consumers by $24 million by analysing the data from 2.3 million smart meters. A telecommunication company improved marketing effectiveness by 70% using behavioural based segmentation to create dynamic offers. BACK NEXT
  • 6.
    10 There are fourbroad ways that stream computing is being used today: Streaming Extract, Transform and Load (ETL) – Data is continuously cleaned and aggregated before being pushed into data stores. Triggers – Anomalous behaviour is detected in real time, and further downstream actions are triggered accordingly. Data enrichment – Live data is enriched with more information by joining it with a static dataset, allowing for a more complete real-time analysis. Complex sessions and continuous learning – Events related to a live session (e.g. website activity) are grouped together and analysed. BACK NEXT
  • 7.
    12 These uses havegiven rise to a number of industry applications. For example: Telecommunications –– Call detail processing –– Customer churn prediction –– Device geomapping. Travel and Transportation –– Intelligent traffic management –– Automotive telematics. Energy and Utilities –– Usage forecasting –– Equipment monitoring. Financial Services –– Fraud detection & prevention –– Targeted marketing –– Cybersecurity monitoring. BACK NEXT
  • 8.
    14 Stream Processing (ESP)and Complex Event Processing (CEP) are very similar concepts, but there are some important differences: Speed through Parallelism CEP is often centralised. Stream applications are deployed across many nodes to maximise parallelism and scalability. Deeper Analysis CEP uses a rules engine to evaluate if-then-else style rules, or an in-memory SQL database to perform continuous simple queries. ESP provides more options to analyse data though a comprehensive programming language (SPL). Broader Data Types CEP engines handle structured data. Streams have been designed to analyse all manner of data, including image, video and acoustic data types. BACK NEXT
  • 9.
    16 Most stream computingplatforms include two core components: an application development environment to build applications that ingest and process data streams, and a runtime capability designed to process data streams with low latency at massive scale seamlessly across infrastructure. In addition, streaming analytics toolkits improve the productivity of developers and data scientists in crafting complex analytics, such as natural language processing, voice analytics and facial recognition. BACK NEXT
  • 10.
    18 There are anumber of open source offerings that can support streaming analytics: Apache Storm, written in Clojure, was created by Twitter and is composed of other open source components, especially ZooKeeper for cluster management, ZeroMQ for multicast messaging, and Kafka for queued messaging. Apache Spark, written in Scala, is a general framework for large-scale data processing that supports lots of different programming languages and concepts such as MapReduce, in-memory processing, stream processing, graph processing and machine learning. Apache Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message‑driven applications on the Java Virtual Machine (JVM). BACK NEXT
  • 11.
    20 IBM InfoSphere Streamsis an open platform that blends the best elements of shareware, open source software and open standards with powerful vendor- developed technology. IBM Streams is highly efficient, using 14.2 times fewer hardware resources and delivering 12.3 times more throughput compared to open source offerings. It offers a highly scalable event server, integration capabilities, and other typical features required for implementing stream processing use cases. IBM Streams is a leader in the Forrester Wave for Big Data Analytics Platforms. BACK NEXT
  • 12.
    22 The latest IBMStreams update focuses on developer productivity. The Integrated Development Environment (IDE) is based on Eclipse and offers visual development and configuration. It delivers faster streaming application delivery by allowing the creation of streaming applications in Java. A developer with no prior Streams knowledge can create applications in under an hour using Java APIs for streaming analytic libraries such as natural language processing, spatial, temporal, acoustic, image recognition and more. Time is of the essence… why not trial IBM Streams today? BACK NEXT
  • 13.
    24 © Copyright IBMCorporation 2017. All Rights Reserved. IBM, the IBM logo and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Other product, company or service names may be trademarks or service marks of others.