InfoSphere Streams comes standard with several real-time analytic toolkits to help provide quicker time to value. These include telecommunications event data, time series, text, messaging, database, geospatial, and more. Many of these toolkits are part of the InfoSphere Streams Open Source Project.
This presentation is an introduction to InfoSphere Streams. First, we position current market challenges in the area of big data. Then we discuss how context-aware stream computing from IBM InfoSphere Streams addresses these challenges. Finally we present how InfoSphere Streams provides unique value across a range of industries. You can get started now with our InfoSphere Streams Quick Start program and new open source project.
Quick Start: http://www-01.ibm.com/software/data/infosphere/streams/quick-start/
Open Source: https://github.com/IBMStreams
Clients need to move from data management to action based on real-time insight. Speed isn’t just about how fast data is produced or changed, BUT the speed at which data must be received, understood, and processed. This presentation will outline how to harness fast moving data inside and outside of your organization.
Your organization needs to shift from management of data to action. Organizations should:
Select valuable data and insights to be stored for further processing
Process and analyze perishable data to take real-time action
Harness and process streaming data such as video, acoustic, thermal, geospatial or sensors
Context-aware stream computing is a different paradigm – the left shows the traditional way data is accessed using queries to pull the data from a data storage device such as a data warehouse or database – which is still valid for many requirements.
The new context-aware stream computing paradigm brings data to the query – data is pushed or flows through the analytics.
Common drivers for those new use cases include:
When you need an immediate response/action and persisting and analyzing stored data isn’t fast enough.
When it is too expensive to store the data to be analyzed – e.g. most of it is throw-away and its more efficient to analyze/filter as you receive it and store the filtered results.
As discussed above, InfoSphere Streams is a development platform for limitless applications of real-time analytics. However, there is a pattern to how InfoSphere Streams applications are designed.
Ingest data from many sources & prepare it for analysis
Transform, filter, correlate, aggregate and enrich the data for analysis
Detect & predict events and patterns in the data
Decide how the results should be handled and act on them
Store any data that is of longer term value
Sometimes when people hear unstructured data, they think emails or social data. However, the universe is much larger. Think more broadly. No other vendor on the market matches this level of support. No other vendor provides the depth of analytical techniques. We will explore these techniques in the upcoming slides on the various toolkits.
Case Study: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=GBSE_IM_EZ_IEEN&htmlfid=IMC14829IEEN&attachment=IMC14829IEEN.PDF
Geospatial and spatiotemporal (including the time dimension) computing form a rich domain that requires specific expertise, such as knowledge of cartography (maps and map projections), geospatial geometry (shapes and locations and their representations), set theory (spatiotemporal relationships), interoperability standards and conventions, as well as the specifics of the application (location intelligence and location-based services for businesses, security and surveillance, geographic information systems, traffic patterns and route finding, etc.). This toolkit provides the beginnings of a set of capabilities that lets InfoSphere Streams provide the real-time component for emerging location-based applications.
Text is prolific in the era of big data. An example of text analytics is sentiment analysis based on a Twitter feed (the documents are tweets, in that case). Another example is combing through facebook comments to understand consumer behavior or to identify suspicious activities. InfoSphere Streams enables organizations to analyze this data and make sense of it in real time.
Case Study: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_YT_YV_USEN&htmlfid=YTC03637USEN&attachment=YTC03637USEN.PDF
Mathematical models are used not only in the natural sciences (such as biology and physics) and engineering disciplines (such as computer science and artificial intelligence), but also in the social science (such as economics and sociology) A model may help to explain a system and to study the effects of different components, and to make predictions about behavior.
MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.
Case Study: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_SW_SW_GBEN&htmlfid=SWC14107GBEN&attachment=SWC14107GBEN.PDF#loaded
The InfoSphere Streams Data Mining Toolkit analyzes and scores streaming data according to PMML (Predictive Model Markup Language)-standard models. The PMML support and scoring code is ported directly from the IBM InfoSphere Warehouse, ensuring consistency of results.
Also, supported is the R Project for statistical computing. http://www.r-project.org/
Video/image content analysis is the capability of automatically analyzing video to detect and determine temporal and spatial events. Many different functionalities can be implemented. Video Motion Detection is one of the simpler forms where motion is detected with regard to a fixed background scene. More advanced functionalities include video tracking and facial recognition.
Based on the internal representation that VCA generates in the machine, it is possible to build other functionalities, such as identification, behavior analysis or other forms of situation awareness.
VCA relies on good input video, so it is often combined with video enhancement technologies such as video denoising, image stabilization, unsharp masking and super-resolution.
Deploy statistics on streaming data quickly. Statistics is the study of the collection, analysis, interpretation, presentation and organization of data.
Case Study: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_IM_EZ_USEN&htmlfid=IMC14821USEN&attachment=IMC14821USEN.PDF
Operations Analysis focuses on analyzing machine data, which can include anything from IT machines to sensors, meters and GPS devices. It’s growing at exponential rates and comes in large volumes and a variety of formats, including in-motion, or streaming data. Leveraging machine data requires complex analysis and correlation across different types of data sets. By using big data for operations analysis, organizations can gain real-time visibility into operations, customer experience, transactions and behavior.
Using InfoSphere Streams, organizations can:
Gain real-time visibility into operations, customer experience and behavior
Analyze massive volumes of machine data with sub-second latency to identify events of interest as they occur
Apply predictive models and rules to identify potential anomalies or opportunities
Optimize service levels in real-time by combining operational and enterprise data
Case Study: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_IM_EZ_USEN&htmlfid=IMC14909USEN&attachment=IMC14909USEN.PDF#loaded
Every day, millions of call center conversations are recorded for quality assurance purposes. Until recently, call monitoring for agent performance and customer satisfaction was a manual process. At most it was possible to listen in on only a small fraction of incoming calls. With real-time analytics, you can understand your call center better. Provide call center managers and marketing executives accurate snapshots of their call centers over time, helping unearth trends and pinpoint anomalies by drilling down to the salient topics spoken during calls.
InfoSphere Streams isn’t just for voice data, there are lots of other sounds in our world. Hydrophones capturing wildlife sounds to better preserve the environment.
Find the answers to such questions as:
Why did calls peak on a certain calendar day?
How does the time of day affect what people are calling about?
What are the dominant topics people are talking about from week to week?
Are your customers responding to your ad campaigns?
Mobileum Public Case Study - http://www.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_IM_IM_USEN&htmlfid=IMC14915USEN&attachment=IMC14915USEN.PDF
Mobileum sought to help mobile operators increase roaming usage and deliver new revenue-generating solutions. But to do so, the company needed the ability to analyze millions of events each minute. With InfoSphere Streams they get the speed and insight needed. Now mobile operators are expected to recoup USD75 million in roaming fees. Large operators are expected to gain USD400 million in commission-related revenue annually. Client Quote:We expect that these operators will monetize about USD75 million in new revenue through programs that increase roaming usage, and can potentially generate USD400 million in commission-related revenue annually through targeted travel offers.
Adelos Public Case Study: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_IM_EZ_USEN&htmlfid=IMC14909USEN&attachment=IMC14909USEN.PDF#loaded
Adelos, Inc. needed to help one of its customers, a national laboratory supporting the U.S. Department of Energy, find a faster, more accurate way to detect, classify and track potential threats to its perimeters and border areas. Because the solution would effectively serve as the lab's central nervous system, it would have to meet strict technical requirements, including:
Interoperability, enabling lab personnel to instantly collect and analyze an array of data from video, acoustic and other types of sensors to create a holistic view of a situation
Scalability, to support new requirements as the lab’s fiber-optic arrays, surveillance areas and security perimeters change
Extensibility, serving as a framework to fit into the lab’s existing IT architecture and integrating with signal processors and mobile and mapping applications.
Captures and analyzes huge volumes of data-in-motion, providing unprecedented insight into security threats
Speeds processing of 275 MB of data from hours to milliseconds, for rapid analysis
Scalable and flexible solution that is capable of accommodating changing requirements
With InfoSphere Streams Adelos realizes the following benefits:
Captures and analyzes huge volumes of data-in-motion, providing unprecedented insight into security threats
Speeds processing of 275 MB of data from hours to milliseconds, for rapid analysis
Scalable and flexible solution that is capable of accommodating changing requirements
Client QuoteCapturing approximately 27 terabytes of data each day adds up fast and would be challenging and xcostly to store. IBM InfoSphere Streams offers a key advantage here, enabling data to be captured and analyzed in real time, helping organizations realize huge storage-related savings.
Celcom Public Case Study: http://www.ibm.com/common/ssi/cgi-bin/ssialias?subtype=AB&infotype=PM&appname=SWGE_IM_EZ_MYEN&htmlfid=IMC14888MYEN&attachment=IMC14888MYEN.PDF
Celcom Public Video: https://www.youtube.com/watch?v=PE5zrwqh_2s
Celcom Axiata needed to dramatically improve its understanding of its customers to help ensure customer retention and increase its market share. The communications service provider (CSP) wanted to use existing customer information and analytics to manage a complete marketing campaign process. The company sought a unified solution that would enable easier execution of multiwave/multitiered campaigns while reducing data silos and improving the operating view of current campaign calendars.Celcom wanted to target individual customers with the marketing message most appropriate to their usage and customer profile to improve campaign uptake rates and help build customer satisfaction and loyalty in addition to raising average revenue per user (ARPU) rates and reducing churn. Using InfoSphere Streams and other big data and analytics offerings, Celcom implemented a targeted marketing campaign management solution that uses near-real-time subscriber data to develop personalized, targeted campaigns. The solution collects and analyzes numerous data points—including a user’s interaction history and preferences as well as provider business rules and marketing objectives—to determine the best offers for each customer.
Results include:
Reduces new campaign launch time by more than 80 percent
Improves campaign performance by more than 70 percent, in turn increasing campaign ROI
Improves customer loyalty, increases ARPU and reduces churn through personalized campaigns and messaging
Client Quote:The solution helps the company use its marketing campaign funds wisely and gives it insight into what works and what doesn’t ; that way Celcom can continually improve our service its customers.
InfoSphere Streams is in production at many clients ranging from telecommunications, utilities, healthcare, governments around the world and more! Some clients are so successful, they wanted to tell the world. Watch the videos for Sprint, CenterPoint, Brocade, the Swedish Royal Institute of Technology, Astron, the University of Ontario and more. Refer clients to the youtube channel for InfoSphere Streams - http://www.youtube.com/playlist?list=PLCF04A48C22F34B19 for more technical videos and to hear from product experts.
IBM Watson Foundations is a big data and analytics platform that makes sense of all your data with innovative capabilities that provide real-time, real-world insights, so your organization can make better decisions with speed and confidence—and outperform the competition.
Start with data of any variety, volume, velocity
As data—all types in all forms from all sources—flows through your organization, it is ingested by IBM Watson Foundations, IBM's big data and analytics platform.
Next, harness all of your data
Explore your data, at rest or in motion, closer to where it resides for near-real-time analysis and insight.
Data management Manage your data more simply and cost-effectively from requirements to retirement
Data warehousing Provide clean, consolidated, consistent, timely and trusted data to your data warehouse
Content management Capture, share, analyze and govern unstructured data and put content into context
Hadoop systems Manage all data stored in its native form to speed analysis and insight
Stream computing Analyze massive volumes of streaming data to gain immediate insight from data in motion
Information integration and governance Protect, secure and integrate your data through its lifecycle to yield trusted insights
Yielding the desired business outcomes
IBM Watson Foundations delivers the best possible insight to fuel better decisions. See how organizations in all industries are using Watson Foundations to achieve business outcomes.
The goal of InfoSphere Streams Quick Start Edition is to allow clients to experiment on their own terms with stream computing. InfoSphere Streams Quick Start Edition provides an alternative option to open source because clients can experiment without a capital investment or time researching the open source options.
NOTE – The scale out architecture is available in the native installation option, not the VMware image. The VMware image is restricted to where the VM is running.
IBM context-aware stream computing gives you the ability to analyze massive data volumes quickly, often in real time, and turn data into actionable insight. InfoSphere Streams is an advanced computing platform that can quickly ingest, analyze and correlate information as it arrives from thousands of real-time sources. Because it can handle high throughput rates, InfoSphere Streams can analyze millions of events per second, enabling sub-millisecond response times and instant decision-making. Now you can get your hands on this technology with InfoSphere Streams Quick Start Edition, a no charge, downloadable, non-production version. With InfoSphere Streams Quick Start Edition, there is no data capacity and no time limitation, so you can experiment with streaming data and work with different use cases, on your own timeframe.
NOTE: InfoSphere Streams Quick Start Edition does not come with a support option. To explore support options, visit the InfoSphere Streams product page. - http://www-03.ibm.com/software/products/us/en/infosphere-streams
A place for developers by developers. It is your direct channel to the InfoSphere Streams development team and a place to discuss, learn and share ideas.
IBM has decided to create an open source project for some InfoSphere Streams components to speed development of applications, and harness the energies of the development community. Other developers can now extend the IBM source with new capabilities. In future releases, we expect to incorporate new function from the projects into InfoSphere Streams. We also expect other developers to contribute new InfoSphere Steams native functions, operators and toolkits into the new community to further accelerate adoption.
We believe a mix of open source and closed source is the best way to drive adoption in the marketplace, as seen by success with open source offerings like Apache Web Server and Eclipse. Having the full support of a vendor like IBM can lower risk while open source can help achieve customer requirements.
There are many resources for additional reading. Explore both business and technical resources. All resources publically accessible.