© 2015 IBM Corporation
What’s new in Toolkits
IBM Streams 4.1
Ankit Pasricha
Toolkits Team Lead
ankitp@ca.ibm.com
2 © 2015 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL
PURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE
INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY
OF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,
WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR
OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR
THEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT
GOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion. Information regarding potential
future products is intended to outline our general product direction and it should not
be relied on in making a purchasing decision. The information mentioned regarding
potential future products is not a commitment, promise, or legal obligation to deliver
any material, code or functionality. Information about potential future products may
not be incorporated into any contract. The development, release, and timing of any
future features or functionality described for our products remains at our sole
discretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3 © 2015 IBM Corporation
Agenda
ď‚§ (New) Spark MLLib Toolkit
ď‚§ (New) Cybersecurity Toolkit
ď‚§ (New) Distributed Process Store (DPS) Toolkit
ď‚§ Messaging Toolkit
ď‚§ Geospatial Toolkit
ď‚§ Text Toolkit
ď‚§ Other updates
4 © 2015 IBM Corporation
ď‚§ Combines the power of Spark MLLib and real-time streaming capabilities of
Streams
ď‚§ Allows scoring of real-time streaming data using Spark models
ď‚§ Github project
ď‚§ http://ibmstreams.github.io/streamsx.sparkMLLib/
ď‚§ Support for a number of MLLib models
• Classification
• Linear SVM
• Naive Bayes
• Clustering
• KMeans
• Collaborative Filtering
• Regression
• Isotonic
• Linear
• Logistic
• Tree
• Decision Tree
• Gradient Boosted Trees
• Random Forest
Spark MLLib Toolkit
5 © 2015 IBM Corporation
Streams + Spark Demo
Incidents
Calls for
Service
(911, etc)
311
Code
Violations
Permits
Buildings Apache Spark
MLlib
hdfs
Historical City Data Sets
Model :
Is this call for
service a
false alarm?
Real-time
Calls for Service
Real-time
Predictions &
Relevant Context
IBM
Streams
Real-time
Dashboard
6 © 2015 IBM Corporation
Resources
ď‚§ Getting Started Guide:
https://developer.ibm.com/streamsdev/docs/getting-started-with-the-
spark-mllib-toolkit/
ď‚§ Documentation:
http://ibmstreams.github.io/streamsx.sparkMLLib/com.ibm.streamsx.
sparkmllib/doc/spldoc/html/
ď‚§ MLLib Guide: https://spark.apache.org/docs/latest/mllib-guide.html
ď‚§ Samples:
https://github.com/IBMStreams/streamsx.sparkMLLib/tree/master/sa
mples
7 © 2015 IBM Corporation
Cybersecurity Toolkit
ď‚§ The toolkit can detect active threats occurring within a network in
real-time.
ď‚§ Contains 3 machine-learning cybersecurity models:
ď‚§ DomainProfiling: Capable of analyzing DNS response records and reporting
on whether any domains are behaving suspiciously
ď‚§ HostProfiling: Capable of analyzing DNS response records and reporting if
individual hosts are behaving suspiciously
ď‚§ PredictiveBlacklisting: Capable of analyzing DNS response records and
predicting if a domain should be added to an internal blacklist
8 © 2015 IBM Corporation
Resources
ď‚§ Introduction to Cybersecurity toolkit:
https://developer.ibm.com/streamsdev/docs/detect-active-threats-in-
real-time-streams-cybersecurity-toolkit/
ď‚§ Getting Started Guide:
http://ibmstreams.github.io/streamsx.documentation/docs/4.1/cybers
ecurity/cybersecurity-getting-started/
ď‚§ Documentation: http://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.strea
ms.toolkits.doc/spldoc/dita/tk$com.ibm.streams.cybersecurity/tk$co
m.ibm.streams.cybersecurity.html?lang=en
ď‚§ Starter Apps:
https://github.com/IBMStreams/streamsx.cybersecurity.starterApps
9 © 2015 IBM Corporation
Distributed Process Store (DPS) Toolkit
ď‚§ Allows sharing of data across operators, Streams applications and
Streams and other applications.
– Provides a collection of APIs in Java, C++ and SPL to read/write from redis
– Support for Redis 2.8.x and 3.0
ď‚§ Java Example: Creating a distributed store
10 © 2015 IBM Corporation
Distributed Process Store (DPS) Toolkit
ď‚§ Java Example: Acquiring a distributed lock
ď‚§ Java Example: Writing data
11 © 2015 IBM Corporation
Resources
ď‚§ Documentation: https://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.strea
ms.toolkits.doc/spldoc/dita/tk$com.ibm.streamsx.dps/tk$com.ibm.st
reamsx.dps.html?lang=en
ď‚§ Samples:
https://github.com/IBMStreams/streamsx.dps/tree/master/com.ibm.st
reamsx.dps/samples
12 © 2015 IBM Corporation
Messaging Toolkit updates
ď‚§ Guaranteed Processing Support
ď‚§ KafkaConsumer
ď‚§ On checkpoint: save the offset within the message log
ď‚§ On reset: Replay messages from offset
ď‚§ JMSSource
ď‚§ Runs in a transacted session with MQ running in persistent mode
ď‚§ On checkpoint: Acknowledge read messages so they are removed from the queue
ď‚§ On reset: Start replaying any unacknowledged messages
ď‚§ Performance improvements in Kafka operators (Pre-release)
ď‚§ Using new KafkaProducer API
ď‚§ Developed in github: https://github.com/IBMStreams/streamsx.messaging
ď‚§ RabbitMQ support (Pre-release)
ď‚§ Kafka 0.9 and Message Hub support on Bluemix
13 © 2015 IBM Corporation
Geospatial Toolkit Update
ď‚§ New PointMapMatcher operator
ď‚§ We have a map, and a set of imprecise points
coming from a GPS or some other source in real
time
ď‚§ The data may only have a certain inherent
precision
ď‚§ There may be errors due signal noise
ď‚§ The map itself may be imprecise or
incorrect
ď‚§ We want to clean and smooth this data one
point at a time to lock the incoming points to the
road network.
 “Where is this entity right now?”
14 © 2015 IBM Corporation
Operator Details
14
PointMapMatcher
Entity Locations
Map Geometry Updates
Matches
Errors
15 © 2015 IBM Corporation15
Some use cases:
• Routing
• Traffic reports
• Transit scheduling
• Taxi/emergency dispatching
• Streams Dev article:
https://developer.ibm.com/streamsdev/do
cs/realtime-map-matching-in-streams-v4-
0-1/
16 © 2015 IBM Corporation
Text Toolkit update
ď‚§ Added support for AQLs generated from BI 4.0+ web tooling
ď‚§ 2 Step process
– Step 1: Create an extractor in BI web tool
17 © 2015 IBM Corporation
Text Toolkit update
ď‚§ Step 2: Load the extractor in the TextExtract operator for execution
stream<DataToAnalyze, ReferencesFound> TextExtractOutput =
TextExtract(InputFromSocialMedia)
{
param
moduleSearchPath : "etc/extractor" ;
inputDoc : "text" ;
outputViews : "ProductSearch" ;
outputMode : "multiPort" ;
}
ď‚§ For more information: http://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.stream
s.toolkits.doc/spldoc/dita/tk$com.ibm.streams.text/tk$com.ibm.strea
ms.text.html?lang=en
18 © 2015 IBM Corporation
Other updates
ď‚§ Bluemix support
ď‚§ HDFS Toolkit for Bluemix
ď‚§ Hbase Toolkit for Bluemix
ď‚§ More information: https://developer.ibm.com/streamsdev/docs/integrating-
streams-biginsights-hbase-service-bluemix/
ď‚§ Data Governance support
ď‚§ HDFS Toolkit
ď‚§ DB Toolkit
ď‚§ Inet Toolkit
ď‚§ Messaging Toolkit
ď‚§ Webcast Replay: https://developer.ibm.com/streamsdev/docs/streams-v-4-1-0-
developer-conference-replay/
ď‚§ Support for BI 4.1
ď‚§ HDFS Toolkit
ď‚§ Hbase Toolkit
19 © 2015 IBM Corporation
Questions?

What's New in Toolkits for IBM Streams V4.1

  • 1.
    © 2015 IBMCorporation What’s new in Toolkits IBM Streams 4.1 Ankit Pasricha Toolkits Team Lead ankitp@ca.ibm.com
  • 2.
    2 © 2015IBM Corporation Important Disclaimer THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: • CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR • ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE. IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
  • 3.
    3 © 2015IBM Corporation Agenda  (New) Spark MLLib Toolkit  (New) Cybersecurity Toolkit  (New) Distributed Process Store (DPS) Toolkit  Messaging Toolkit  Geospatial Toolkit  Text Toolkit  Other updates
  • 4.
    4 © 2015IBM Corporation  Combines the power of Spark MLLib and real-time streaming capabilities of Streams  Allows scoring of real-time streaming data using Spark models  Github project  http://ibmstreams.github.io/streamsx.sparkMLLib/  Support for a number of MLLib models • Classification • Linear SVM • Naive Bayes • Clustering • KMeans • Collaborative Filtering • Regression • Isotonic • Linear • Logistic • Tree • Decision Tree • Gradient Boosted Trees • Random Forest Spark MLLib Toolkit
  • 5.
    5 © 2015IBM Corporation Streams + Spark Demo Incidents Calls for Service (911, etc) 311 Code Violations Permits Buildings Apache Spark MLlib hdfs Historical City Data Sets Model : Is this call for service a false alarm? Real-time Calls for Service Real-time Predictions & Relevant Context IBM Streams Real-time Dashboard
  • 6.
    6 © 2015IBM Corporation Resources  Getting Started Guide: https://developer.ibm.com/streamsdev/docs/getting-started-with-the- spark-mllib-toolkit/  Documentation: http://ibmstreams.github.io/streamsx.sparkMLLib/com.ibm.streamsx. sparkmllib/doc/spldoc/html/  MLLib Guide: https://spark.apache.org/docs/latest/mllib-guide.html  Samples: https://github.com/IBMStreams/streamsx.sparkMLLib/tree/master/sa mples
  • 7.
    7 © 2015IBM Corporation Cybersecurity Toolkit  The toolkit can detect active threats occurring within a network in real-time.  Contains 3 machine-learning cybersecurity models:  DomainProfiling: Capable of analyzing DNS response records and reporting on whether any domains are behaving suspiciously  HostProfiling: Capable of analyzing DNS response records and reporting if individual hosts are behaving suspiciously  PredictiveBlacklisting: Capable of analyzing DNS response records and predicting if a domain should be added to an internal blacklist
  • 8.
    8 © 2015IBM Corporation Resources  Introduction to Cybersecurity toolkit: https://developer.ibm.com/streamsdev/docs/detect-active-threats-in- real-time-streams-cybersecurity-toolkit/  Getting Started Guide: http://ibmstreams.github.io/streamsx.documentation/docs/4.1/cybers ecurity/cybersecurity-getting-started/  Documentation: http://www- 01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.strea ms.toolkits.doc/spldoc/dita/tk$com.ibm.streams.cybersecurity/tk$co m.ibm.streams.cybersecurity.html?lang=en  Starter Apps: https://github.com/IBMStreams/streamsx.cybersecurity.starterApps
  • 9.
    9 © 2015IBM Corporation Distributed Process Store (DPS) Toolkit  Allows sharing of data across operators, Streams applications and Streams and other applications. – Provides a collection of APIs in Java, C++ and SPL to read/write from redis – Support for Redis 2.8.x and 3.0  Java Example: Creating a distributed store
  • 10.
    10 © 2015IBM Corporation Distributed Process Store (DPS) Toolkit  Java Example: Acquiring a distributed lock  Java Example: Writing data
  • 11.
    11 © 2015IBM Corporation Resources  Documentation: https://www- 01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.strea ms.toolkits.doc/spldoc/dita/tk$com.ibm.streamsx.dps/tk$com.ibm.st reamsx.dps.html?lang=en  Samples: https://github.com/IBMStreams/streamsx.dps/tree/master/com.ibm.st reamsx.dps/samples
  • 12.
    12 © 2015IBM Corporation Messaging Toolkit updates  Guaranteed Processing Support  KafkaConsumer  On checkpoint: save the offset within the message log  On reset: Replay messages from offset  JMSSource  Runs in a transacted session with MQ running in persistent mode  On checkpoint: Acknowledge read messages so they are removed from the queue  On reset: Start replaying any unacknowledged messages  Performance improvements in Kafka operators (Pre-release)  Using new KafkaProducer API  Developed in github: https://github.com/IBMStreams/streamsx.messaging  RabbitMQ support (Pre-release)  Kafka 0.9 and Message Hub support on Bluemix
  • 13.
    13 © 2015IBM Corporation Geospatial Toolkit Update  New PointMapMatcher operator  We have a map, and a set of imprecise points coming from a GPS or some other source in real time  The data may only have a certain inherent precision  There may be errors due signal noise  The map itself may be imprecise or incorrect  We want to clean and smooth this data one point at a time to lock the incoming points to the road network.  “Where is this entity right now?”
  • 14.
    14 © 2015IBM Corporation Operator Details 14 PointMapMatcher Entity Locations Map Geometry Updates Matches Errors
  • 15.
    15 © 2015IBM Corporation15 Some use cases: • Routing • Traffic reports • Transit scheduling • Taxi/emergency dispatching • Streams Dev article: https://developer.ibm.com/streamsdev/do cs/realtime-map-matching-in-streams-v4- 0-1/
  • 16.
    16 © 2015IBM Corporation Text Toolkit update  Added support for AQLs generated from BI 4.0+ web tooling  2 Step process – Step 1: Create an extractor in BI web tool
  • 17.
    17 © 2015IBM Corporation Text Toolkit update  Step 2: Load the extractor in the TextExtract operator for execution stream<DataToAnalyze, ReferencesFound> TextExtractOutput = TextExtract(InputFromSocialMedia) { param moduleSearchPath : "etc/extractor" ; inputDoc : "text" ; outputViews : "ProductSearch" ; outputMode : "multiPort" ; }  For more information: http://www- 01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.stream s.toolkits.doc/spldoc/dita/tk$com.ibm.streams.text/tk$com.ibm.strea ms.text.html?lang=en
  • 18.
    18 © 2015IBM Corporation Other updates  Bluemix support  HDFS Toolkit for Bluemix  Hbase Toolkit for Bluemix  More information: https://developer.ibm.com/streamsdev/docs/integrating- streams-biginsights-hbase-service-bluemix/  Data Governance support  HDFS Toolkit  DB Toolkit  Inet Toolkit  Messaging Toolkit  Webcast Replay: https://developer.ibm.com/streamsdev/docs/streams-v-4-1-0- developer-conference-replay/  Support for BI 4.1  HDFS Toolkit  Hbase Toolkit
  • 19.
    19 © 2015IBM Corporation Questions?