BOB TO DELIVER
Key Concept: So why do we need to pursue this opportunity together? Big data and analytics are two sides of the same coin. You can have all the information in the world, but if you can’t make sense of how to use it…what is the point? And you can have all of the most advanced analytics in the world but if you have no information to fuel it, then what value can it provide…none.
Key Points:
On one side of the coin there is a mass of data - data that is complex, data that is moving at high speed, and often contains lots of unstructured text. It needs to be captured, managed and delivered.
On the flip side there is analytics. The analytics is tasked with finding interesting and relevant trends and patterns in big data to inform decisions, optimize processes and even drive new business models.
And just as a coin can’t have only one side, you can’t have big data without analytics, or indeed analytics without big data – you need both to create the value.
That is why IBM is defining the Big Data AND Analytics as the category and as the growth market…it isn’t one or the other, it is and MUST BE both.
And as Ginni Rommety stated in her speech to the Council of Foreign Affairs…we have the capability to help clients move from gut-feel to fact-based decisions…and those organizations that fail to make that shift will lose.
We’re all familiar by now with the 4V’s of Big Data.
Volume (data at scale) is about rising volumes of data in all of your systems – which presents a challenge for both scaling those systems and also the integration points among them. And Variety (data in many forms) is about managing many types of data, and understanding and analyzing them in their native form.
I hear too often that big data is exclusively founded on volume and variety and that equates to Hadoop. While it is an entry, it’s not the only entry point, nor does it provide a comprehensive picture about a customer or the operations of the business.
The two biggest areas of interest for Smarter Enterprise is harnessing: Velocity (data in motion) which is the ingesting data and analyzing in-motion opening up the opportunity to compete on speed to insight and speed to act.
According to the IBM GTO study in 2012, by 2015, 80% of all available data will be uncertain and rising uncertainty = declining confidence. As complexity of big data rises it becomes harder to establish veracity, which is essential for confident decisions.
Transition: People’s thinking is shifting about their usage of analytics and their data. Yes, people use reporting and analysis as table stakes for their business operations, but with the transformation of the front office, and the emergence of systems of engagement, those old notions are changing.
Big Data & Analytics is proven and the potential value is only limited by imagination, time and money.
The Ottawa Hospital – located in Ottawa, Canada.
Results:
Enable the hospital to capture its current baseline response times to consult requests, and model and implement procedural improvements to reach a goal of 15 minutes or less.
Business Problem:
Hospitals have vast amounts of silo’ed patient data that cannot be used to help hospital decision-makers analyze and improve patient care easily. This can lead to inconsistent procedures and policies, such as in patient discharge procedures, or ordering Emergency Department (ED) physician consults.
When decision-making is based on experience or anecdotal information, there is no way to analyze and systematically improve problems. Worse, patients might stay in the hospital longer, wait longer in the ED, and their care might be compromised by poor decision-making.
Solution:
Ottawa Hospital is implementing a collaborative Care Process Management system that uses advanced analytics and statistical modeling to help improve its staffing and better coordinate care based on actual patient needs.
Solution improves accessibility to information as well as collaboration across the care team, reducing time now spent in trying to find data or even people within the hospital system.
Scenario modeling and simulation help the hospital select the physicians who can provide the best and fastest response to ED consult requests, matching patient situation and provider skills.
Business Analytics - Cognos BI, SPSS
GBS S&T, Global Services - Application Services® (GBS-AIS)
Lotus Connections®; Lotus Sametime®
WebSphere®: IBM Blueworks Live® ILOG JRules®
Industry Solutions: Healthcare: Analytics and Reporting; Collaborative Care and Health Information Exchange
FleetRisk Advisors – located in Alpharetta, Georgia.
Results:
FleetRisk Advisors’ services have helped their customers (trucking companies) reduce the incidence of minor accidents by 20 percent and serious accidents by as much as 80 percent.
Customers have also seen an increase in driver retention of roughly 30 percent in an industry in which roughly 100 percent turnover is normal.
Business Problem:
FleetRisk’s wanted to turn its ability to capture a lot of data and its expertise into helping trucking companies reduce accidents and retain their valuable drivers.
But to truly succeed, FleetRisk Advisors needed to extract even deeper predictive insights than its manual processes made possible and generate these insights faster so that customers would have the time to take truly preventive action.
Solution:
For each of the company’s customers’ truck drivers, a predictive model translates some 4,500 data elements, from a diverse range of sources, into a quantitative risk ratings related to the likelihood of on-the-job accidents, giving operators cues to where they need to intervene to prevent accidents and save lives. A similar score-based approach is used to rate an employee’s risk of defecting to another trucking company.
PureData System for Analytics
SPSS Collaboration and Deployment Services
SPSS Modeler, Modeler Desktop, Modeler Server
Standard Chartered Bank – headquarters in London, UK.
Results:
Saved an estimated USD20 million and cut process times by more than 80 percent using with an ECM solution as part of Standard Chartered Bank’s eOps program helps the communities it serves by providing earnings opportunities for rural people and reducing its carbon footprint.
Standard Chartered has won technological and social awards for its eOps system in England, Singapore and India.
Business Problem:
Standard Chartered’s brand promise “Here for Good” takes an inclusive view of sustainability, recognizing that the bank thrives when the communities it serves thrive. As a result, the bank conducts business, in a way that supports customers and clients, while also having a positive impact on the wider society.
Solution:
Standard Chartered standardizes on the IBM ECM solution across approximately 56 countries, helping more than 30,000 staff members save time processing roughly 50 million documents annually.
The eOPs program uses the ECM solution to further boost operating efficiency while improving people’s lives through by enabling virtual work to be done by the people in the villages and smaller towns they serve and getting paid for it.
IBM® FileNet® Content Manager, IBM Content Collector for Email, IBM Enterprise Records and ILOG
Aperity – located in Louisville, Kentucky. Aperity provides sales and marketers forecasting, segmentation, and simulation techniques for retail distribution and channel management.
Results:
Aperity leverages PureData™ System for Analytics to provide Sales forecasts for 18+ months with 95% accuracy
Highly accurate forecasting on more than 500,000 stores in minutes. Facilitate precise inventory management to prevent over- or under-stocking
Business Problem:
The ability to precisely track and forecast marketing and sales is essential to the success of retail and CPG companies, especially the need to keep operating expenses to a minimum.
Without accurate and timely reports, a company may miss distribution opportunities, misread market changes, completely overlook competitive activity, and ultimately lose money.
Solution:
Aperity developed its iSalesBrandManagement tool using IBM PureData for Analytics due to its ability to run complex analyses rapidly, and its simplicity to get up and running quickly without requiring ongoing maintenance or high operating costs.
Fuzzy Logix’s in-database computation engine called DB Lytix runs complex analytics on the IBM Netezza data warehouse appliance
Start Today Co – located in Japan, the company offers fashion-related services around its mainstay, ZOZOTOWN, one of the largest fashion shopping websites in Japan.
Results:
Three to five times higher email open rates. Five to ten times improved conversion rate
Estimated 90 percent reduction in time required to plan and implement new promotional campaigns.
Business Problem:
Start Today Co., Ltd. uses email as its main vehicle for outbound marketing campaigns - primarily broad advertisement campaigns to those who opt to receive email communications.
However, the company recognized great potential for much more targeted, user-driven marketing, but system performance issues and limited staff resources held it back from executing targeted campaigns.
Solution:
Uses predictive analytics to create customer profiles from their purchase histories and brand affinities to target which shop, product brands, and items customer are likely to be interested in and deliver the right offer in near-real time.
Combines PureData System for Analytics and Unica to automate time-consuming customer segmentation and response process that is required for one-to-one marketing.
ScotiaBank - located in Halifax, Nova Scotia, Scotiabank is the wholesale banking arm of the Scotiabank Group, serving 21M clients in more than 55 countries.
Results:
Changed 70 percent of counterparty exposure measurements by 20 percent or more with a centrally managed, consolidated, cross-asset view of counterparty credit risk
Business Problem:
The credit crisis in recent years demonstrated the speed at which credit defaults can cascade through the financial sector.
Banks need greater visibility into and control over counterparty credit risk. Scotiabank lacked a consolidated view of these risks, which forced traders into an overly conservative stance and kept the bank from maximizing credit line use. The organization needed to replace guesswork with a more realistic assessment of counterparty credit risk.
Solution:
Scotiabank uses IBM’s Algorithmics® Integrated Market and Credit Risk to keep credit risk contained, calculating risk with unprecedented accuracy by accounting for all key factors, including shifting market conditions, portfolio diversification and collateral.
VaaSan Group – one of the leading bakery operators in Northern Europe (Finland, Norway and Sweden) producing fresh bakery goods, bake-off products and crisp breads for sale in retail chains, restaurants and hotels.
Results:
Saw a rapid 30 percent increase in sales orders in Sweden and achieves an on-time delivery target of 98.5 percent.
Business Problem:
Vaasan was experiencing exponential business growth, but couldn’t accurately forecast fluctuating sales orders across the Nordic region, which meant it couldn’t accurately plan its resources and production schedules.
Always at risk of out-of-stock or excess inventory can be costly, so food manufacturers constantly strive to prepare for fluctuations in order volume.
Solution:
They can now identify trends in customer demand and generates a rolling sales forecast, helping the company predict its production requirements and prepare for fluctuating customer orders.
For example, when the company won a large-volume account, near-real-time analysis helped enable its sales team to know in minutes whether it was possible to fill the new order on time.
Cognos Business Intelligence; Cognos Controller; Cognos Planning; Cognos 10
Camden Council - Camden London Borough Council established in 1963, is one of 32 authority agencies across the United Kingdom charged with community governance.
Results:
Achieved savings of up to 30 percent per year for some households in pilot program
Expected reduction of 8,000 tonnes (8,818 tons) of CO2 for 2,500 homes over the lifetime of the project
Business Problem:
In housing developments that use district heating, residents often pay a fixed monthly heating fee. Whether residents turn down the heat when departing from the flat, or they leave the heat on and the windows open, they pay the same fee.
Local governments are seeking a solution to this challenge, as more than USD300 billion is spent globally on district heating for block housing developments as well as on college campuses and in commercial and public buildings.
Solution:
Develop a first-of-its-kind heat metering program is providing a cost-effective approach that rewards residents for energy efficiency.
Camden Council used IBM Business Partner Hildebrand Technology, a London-based energy consulting firm, to develop a pilot program to deploy individual metering systems in approximately 1,500 properties to allow Camden Council, and the residents, to measure usage as if each flat had its own water heater. Residents now have accountability on their energy usage so they adopt energy-saving practices.
Meter readings are captured every six seconds and loaded into Informix TimeSeries DataBlade Module using Informix TimeSeries Real-Time Loader software for later analysis.
Transition: So what are the reasons why organizations are compelled to act now?
1) Outperform in your industry – 75% of Leaders cite growth as the key source of value from analytics. Source: IBM IBV Study: Analytics: A blueprint for value, October 2013
Make speed a differentiator – to reduce the latency of decisions, business processes and every aspect improving a customer’s experience or optimization of a company’s infrastructure.
Monetize their data – data has its own value to create new products or services, for example, telcos are monetizing their location data, manufacturers are monetizing their maintenance data, healthcare organizations are leveraging their treatment data their organizations.
Be more right, more often – big data & analytics enables the use of more sources of data, and new forms of analytics that increase the understanding about what you are trying to analyze, and in turn, build deeper confidence to act faster.
2) Guard against risk – 46% of respondents were impacted by a cyber security breach over the past 24 months. Source: IBM Global Study on the Economic Impact of IT Risk, 2013
Guard against poor decision making – this equates to building confidence by ensuring the veracity, i.e., quality, timeliness and consistency of the information you are trying to analyze.
Protect the security and privacy of their data– many organizations are acting now to put in place stronger security and privacy measures and governance policies necessary to protect the organization from internal and external threats.
Get the risk-opportunity equation right –proactively identify and manage your risk exposure from data breaches to compliance with regulations.
3) Change the economics of business and IT – One in Five Organizations allocate more than 50% of IT budget to new projects. IBM Global Data Center Study, 2012
Relieve the pressure on IT infrastructure – capitalize on new approaches to IT infrastructure that appropriately leverage optimized systems and cloud for analytical workloads as a way to respond dynamically to demand.
Adopt a new approach to the onslaught of data – analyze data in its native form, in-motion or at rest, and only store what they need and dispose of the rest to lower storage costs and risk.
Eliminate hidden costs – unify your data and analytics initiatives to eliminate piecemeal approaches that in long run will cost more than an orchestrated roadmap.
Transition: So what are the three imperatives to success with your business imperative using Big Data & Analytics?
According to a brand new IBM IBV study called “Analytics, a blueprint for success” that came out in October 2013, there are 9 levers that represent the sets of capabilities that most differentiated leaders from other respondents in their survey.
We’ve synthesized them into three things you must get right:
First – a culture that infuses analytics everywhere. Develop a curiosity-driven and evidence-inspired workforce. And, infuse analytics into everything employees touch – we call this piece “Imagine It.”
Second - Invest in a big data & analytics platform. Build against a master plan: All types of data. All types of analytics. A full range of business outcomes. “Realize It.”
Third – be proactive about privacy, security and governance. Forge forward-thinking approaches to maximize impact whole balancing risk – and lastly “Trust It.”
Transition: Let’s start with Imagine It.
Provides big data and analytics capabilities to fuel Watson and our clients’ journey to Cognitive
Enables our clients to gain fresh insights in real-time, and act upon those insights with confidence
Sets the standard in market with the breadth and depth of capabilities required for any data and analytics initiative
Uniquely delivers innovative capabilities such as stream processing, in-memory computing, advanced predictive and exploration capabilities that our clients need – and the security, privacy and governance they, and their clients require.
Packaged so that clients can address their immediate need, build on what they have, and realize value at every step
Helps me discover (fresh insights)
Find patterns that I don’t even know to look for
Freedom to explore and follow my train of thought
Operates in timely fashion (real-time)
Real-time analytics as data flows through an organization
Enterprise-class Hadoop that runs 4x faster
Speed of thought analytics
Establishes trust (act with confidence)
Governance across complete data lifecycle inc. Hadoop
Security and privacy with compliance
Transparency and context to decision-making process
Key Points
BigInsights builds on top of open source Hadoop and augments it with mandatory capabilities needed by enterprises:
Optimizations that automatically tune Hadoop workloads and resources for faster performance
Intuitive spread-sheet style UI for data scientists to quickly examine/explore/discover data relationships
Development tooling that makes it easier for your technical team to create applications without first needing to go thru exhaustive training to become Hadoop experts
Packaging that makes it simpler to install, deploy, and manage
Accelerators with pre-packaged analytics patterns and best practice knowledge to solve generalized and industry big data problems
High-speed integration connectors to access any data type and source as well as share analyzed data with other applications and storage
Security and governance to ensure sensitive data is protected and secure
What makes BigInsights different than other Hadoop distributions?
It boils down to 3 main things: Enterprise Performance, (meaning enterprise-ready Hadoop features), and integration.
Analytics – BigInsights comes with a powerful text analytics engine, as well as a social and machine data analytics accelerator
Usability – with enhancements like Big SQL, which gives you SQL access to all of your data in Hadoop, Hive and HBase, and BigSheets, which lets you visualize your data in a familiar spreadsheet like interface,
Key Points
- Integrate v3 – the point is to have one platform to manage all of the data – there’s no point in having separate silos of data, each creating separate silos of insight. From the customer POV (a solution POV) big data has to be bigger than just one technology
Analyze v3 – very important point – we see big data as a viable place to analyze and store data. New technology is not just a pre-processor to get data into a structured DW for analysis. Significant area of value add by IBM – and the game has changed – unlike DBs/SQL, the market is asking who gets the better answer and therefore sophistication and accuracy of the analytics matters
Visualization – need to bring big data to the users – spreadsheet metaphor is the key to doing son
Development – need sophisticated development tools for the engines and across them to enable the market to develop analytic applications
Workload optimization – improvements upon open source for efficient processing and storage
Security and Governance – many are rushing into big data like the wild west. But there is sensitive data that needs to be protected, retention policies need to be determined – all of the maturity of governance for the structured world can benefit the big data world
I’ll give you a quick summary of BigInsights now, and then we’ll dive into details in the next several charts. BigInsights is IBM’s strategic platform for managing and analyzing persistent Big Data. As you’ll see, it’s based on open source and IBM technologies. Internally, the BigInsights project is being run like a start-up. By that, I mean that IBM is engaging deeply with a number of early customers to shape the future direction of our product. We’re purposefully keeping our plans flexible to accommodate rapidly changing requirements in this emerging technology area.
Some of the characteristics that distinguish BigInsights include its built-in support for analytics, its integration with other enterprise software, and its production readiness. We’ll talk move about these topics shortly. But before I leave this chart, I want to point out that IBM is uniquely positioned to provide customers with the necessary software, hardware, services, and research advances in the world of Big Data.
The Standard and Enterprise Editions are IBM’s supported production offerings. They contain IBM-unique technologies in addition to open source technologies. For those who want to work with a free, non-production version of BigInsights, we also offer our Quick Start Edition. It’s similar in content to Standard, plus it lets you experiment with Big R and text analytics as well. Details on the different editions are available as supplemental slides in this deck and in the announcement materials.
You saw this slide earlier, the next generation enterprise data warehouse. On this slide, you see the integration points highlighted.
BigInsights integrates with the Big Data Platform components, data integration and application integration
Constant Contact
Constant Contact, Inc., launched in 1998 and headquartered in Waltham, Massachusetts, wrote the book on Engagement Marketing™ – the new marketing success formula that helps small organizations create and grow customer relationships in today’s socially connected world.
More than half a million small businesses, nonprofits and associations worldwide use the company’s online marketing tools to generate new customers, repeat business, and referrals through email marketing, social media marketing, event marketing, local deals, digital storefronts, and online surveys. Only Constant Contact offers the proven combination of affordable tools and free education, including local seminars, personal coaching and award-winning product support.
With offices in Waltham, Massachusetts; Loveland, Colorado; Delray Beach, Florida; San Francisco, California; New York City, New York; and London, England.
Constant Contact was looking for help analyzing the 35 billion emails its customers send every year so that it can provide guidance on the best day or time to send email campaigns to have the greatest impact (defined by opens, clicks, etc.). To accomplish this task, the company partnered with IBM and Persistent Systems on a cutting-edge solution.
IBM and Persistent Systems recommended IBM BigInsights to support Constant Contact's analysis of 35 billion emails. IBM BigInsights along with IBM PureData for Analytics – powered by Netezza technology and Cognos stitch together to create a highly advantageous BigData solution.
Benefits
By Implementing IBM BigInsights, Constant Contact enjoyed dramatic performance improvement.
-performance improvement dramatically increased at over 40 times
-dramatically improved their customers’ performance based on big data analytics; increased performance of their customers’ email campaigns by 15 to 25%
-analysis time reduced from hours to seconds
The impact BigInsights has had on Constant Contact's business has been so dramatic the company is now expanding use of BigData to analyzing the content of emails to help its customers be even more successful.
Solutions Components
Software
IBM InfoSphere BigInsights
IBM PureData for Analytics – powered by Netezza technology
IBM Cognos BI
Case Study PDF: TBD – In development
“Our customers send roughly 35 billion emails every year, and with every email they send, we have more data that we can analyze and feed back to them to help improve their success. Our work analyzing email delivery times has already given our customers a 15-25% lift in their email campaign performance – and that means more customers in their doors and increased revenue.” — Jesse Harriott, Chief Analytics Officer, Constant Contact
Client Name: Teikoku Databank
Case Study Link:
http://www-01.ibm.com/software/success/cssdb.nsf/cs/JHUN-96C65N?OpenDocument&Site=corp&ref=crdb
Pull quote: “With IBM InfoSphere BigInsights it has become possible to process billions of items of textual data in 30 minutes”
— Mr. Satoshi Kitajima, an MBA Statistician in the SPECIA Team of the Business Analytics Division of the Market and Business Intelligence Department, Teikoku Databank
Company background:
Teikoku Databank’s history goes back more than 100 years to the establishment of Teikoku Koshinsha in 1900. Based on the corporate philosophy of “supporting economic activities and contributing to the development of society as a reliable information partner”, they are developing their business in areas such as corporate credit research, credit risk management services, database services, marketing services and e-commerce support services.
Solution components:
Software
• IBM® InfoSphere® BigInsights™
Business challenge:
Teikoku Databank has been providing its customers with reliable corporate information based on their credit research for more than 100 years, and owns a huge amount of corporate data such as corporate credit report files of 1.6 million companies, “COSMOS1” financial statements of 4.4 million terms worth of information gathered from 680,000 companies, a “COSMOS2” corporate profile database of 1.42 million companies, and other corporate data for 4.1 million companies. Recently, however, information published on the Internet has been starting to have a significant effect on company business, so responding to this situation has become an urgent task. To stay competitive, the company wanted to analyze its proprietary information in combination with Big Data gathered from the Internet.
The benefits:
Enables processing billions of items of textual data in 30 minutes, a process that used to take several days
Analyzes 4.75-fold more data for customers and enables faster response to customer requests
Enables increasing the number of enhanced offerings to customers as a key differentiator in the market
Vestas Wind Systems offers its wind turbine products as alternative energy solutions in a competitive market that is exploding in terms of demand, and characterized by extremely competitive pricing.
Wind turbines are a multi-million dollar investment with a lifespan of 20 to 30 years. The location chosen to install and operate a turbine can greatly impact the amount of power generated by the unit, as well as it how long it is able to remain in operation.
In order to determine the optimal placement for a turbine, a large number of location-dependent factors must be considered including temperature, precipitation, wind velocity, humidity, and atmospheric pressure.
The prior state of the art of location determination – took weeks of data analysis, with the ability to only leverage a fraction of the data.
Vestas is working with us to build a Big Data computing system that will start by analyzing about 2.6 PB of data with the expectation that it will grow to 6 over the next few years.
Using more, in fact all, of the available data will improve the effectiveness of the placement process, but they also expect the analysis process to go from weeks to days.
http://en.wikipedia.org/wiki/Vestas
Create weather models for optimal placement and operation
Background:
One of our early customers has gone from being a manufacturer of turbines to being an operator of turbines. This change in business model has wide-reaching implications.
To maximize profit, they must understand what design/technology investments have a compelling ROI and which ones don’t.
They need to understand what makes an ideal location for turbine placement. This is an important decision since most windmills stay in production for 20 to 25 years and the capital investment in the turbines if quite large.
How to operate the turbine in a way to maximize energy production. Ex: How should the blade be angled based on different weather conditions.
What kind of maintenance model will optimize costs and energy output? For example, in a dry environment you don’t have to perform maintenance on the equipment as frequently. What are the ideal intervals for maintenance based on location? Based on seasonal conditions?
The opportunity
Model weather for a given turbine location to optimize power generation and longevity of turbine.
The data for creating these models exists, but the task of building the models is non-trivial. . . Initial data sets are approaching 3 Petabytes, not including the sensor data from the installed units.
Total data volumes will exceed 6 PB quickly, and will include both highly structured and semi-structured information flows
The Solution:
InfoSphere BigInsights is capable of handling this massive volume and variety of data.
Build models to cover both forecasting and optimal, in-the-moment, operation of the power generation units.
The solution is flexible enough that the customers can use it to answer other questions of the data without redesigning the system.
That way, the customer can instrument everything, and use that data when they need it, how they need it.
Ultimately, InfoSphere BigInsights will allow the customers to use their data to make sure they're making the right decisions, and continue to make right decisions as needs evolve.
Client
A large European university
Business need
With 145,000 students, more than 4,500 professors and almost 5,000 people working as administrative and technical staff, this university is one of the largest in Europe. It looks and functions very much like a city, comprising one million cu m of building space on 2.5 hectares and containing the amenities and services its 100,000-plus campus members need on a daily basis. Its physical size and population in addition to its EU10 million annual electricity bill made the university the ideal location for a pilot study involving micro grids and observation of energy consumption patterns.
Phase 1 is of the project involved dividing the campus into nine “energy islands,” each of which includes one or more buildings as well as energy-generating machines and loads, such as lighting, refrigeration and data centers. Some islands feature generation power plants, such as a tri-generation site, photovoltaic roofs and district heating systems. Data collection and reporting are already underway. A dashboard is available to allow the energy manager to understand consumption needs, production potentials and the relations between the two. Project activities included:
Monitoring the energy balance for each energy island as well as for the campus as a whole
Analyzing the temporal profiles and identifying the priorities for improving energy efficiency without reducing service levels
Identifying the management rules in place to optimize the university’s use of energy
The university is now launching Phase 2 of its micro grid project with an aim toward better understanding and exploiting field data to make more-informed decisions about energy usage, planning and investments. Project directors aim to answer questions such as:
Can the university campus sustain itself from an energy perspective?
Can it sustain itself at least during daylight hours when photovoltaic roofs significantly contribute to the supply side?
Can the neighboring hospital campus with its additional generation and demand capacity be linked to the university? Would this enable the university to improve its energy balance, or would optimizing balance require accepting contributions from the public electricity network?
By recording local energy consumption patterns in combination with the generation capacities available from renewable, low-carbon sources, the university hopes to raise public awareness about harmful carbon emissions.
Solution implementation
The micro grid project at the university is part of the master plan to transition the school’s city into the world’s first postcarbon biosphere city. Currently in Phase 2, the university is gathering and analyzing data to understand the university’s energy consumption needs and patterns. Throughout the process, IBM has contributed a host of solutions, including:
IBM® Intelligent Operations Center technology
IBM InfoSphere® BigInsights ™ Enterprise Edition software
IBM InfoSphere Warehouse Enterprise Edition software
IBM Cognos® 8 Business Intelligence V8.4 software
IBM SPSS® Statistics Professional software
IBM Cognos Business Intelligence Enhanced Consumer software
IBM Cognos Business Intelligence Professional software
IBM Tivoli® Service Request Manager software
IBM Tivoli Monitoring for Energy Management Basic Device software
IBM Tivoli Monitoring for Energy Management software
IBM Tivoli Netcool® Omnibus Base
IBM Tivoli Business Service Manager software
IBM Tivoli Netcool Performance Manager software
IBM ILOG® CPLEX® Optimization Studio Developer Edition software
IBM ILOG CPLEX Optimizer Deployment Edition software
Researchers are using IBM software as the foundation for embedding alarms, rules, algorithms and automatic processes into the grid system as a way to maximize the energy efficiency of the university. The overall application platform, which consists of IBM solutions plus custom codes and models developed by the university, will or currently has the following capabilities and main functionalities and systems:
Asset management. Supervisory control and data acquisition (SCADA) sensors, actuators and other devices currently capture and record basic data, such as operational status and intervention history. To date, there is no intelligent integration among field devices; however, this is planned later in Phase 2. In the future, the university also plans to deploy IBM Maximo® Asset Management software for its asset management functionalities.
Monitoring system. Devices placed within the nine energy islands will collect and deliver data in real time or near-real time. In the case of exceptional behavior, synchronized alerts will provide additional information to the advanced analytics and business modeling.
Operational rules. The system will make some decisions automatically based on predefined rules. It will also allow researchers to test how different scenarios—such as changing sets or the quantities of field devices or integrating a new cogeneration plant installed at a remote site of the university—might affect energy consumption.
Advanced reporting. Comprehensive reporting of basic data as well as sophisticated analysis will provide students with research information.
Modeling. Information gathered from the system will be used as input to simulate different scenarios concerning the predictive energy behavior of each island. Scenarios will be based on standard consumption patterns and expected environmental parameters, the optimized energy flow among the islands, and the recommended investments or changes in consumption behaviors that must take place to get closer to the optimal self-sustained energy balance.
Benefits
The micro grid project is just getting underway, and researchers are in the process of collecting and analyzing data. As data analyses and modeling results become available, the university will have more insight into how and where energy is consumed and what changes are needed to curb greenhouse gas emissions. Researchers will also have greater visibility into how best to revise, update or replace the heating, refrigeration, lighting and thermal insulation in each of its buildings and how to optimize energy production. Ultimately, the university is confident that the solution will play a key role in reducing consumption levels and lowering costs. In the meantime, the program is gaining international attention and IBM can use the campus as a demonstration center for its Intelligent Operations Center technology as well as for organizing conferences, workshops and events about energy and buildings management.
The solution provides the university’s energy manager, students and researchers with comprehensive and detailed energy patterns for each of the nine energy islands and the campus as a whole. Researchers can conduct what-if analysis for different investment scenarios and to ascertain the self-sustainability capabilities for each island. For instance, analyses can be run to understand the financial and energy savings impact and implications of generating additional electrical power through photovoltaics. The university will also be able to determine the actual and potential levels of energy consumed for each island and determine the optimal usage and generation patterns to lower consumption and the consequent network needs.
Instrumented - The grid uses a wide variety of heterogeneous devices to acquire and collect data, all of which can be easily extended to include new and innovative data acquisition instruments as they become of interest.
Interconnected - The solution depends on both wired and wireless networks to connect the nine energy islands, data acquisition devices and the grid platform. In addition, field devices feed data to analytics and modeling software.
Intelligent - Together with smart grid technology, the system employs analytics, rules, algorithms and automatic processes to track and monitor energy production and consumption levels, providing researchers with an unprecedented and comprehensive view of the university’s energy needs and potential energy production capabilities. Data related to voltage, currents, power and temperature is analyzed and tracked, enabling the university to track consumption patterns in each of its islands and buildings and determine where energy is inefficiently used. For instance, by monitoring the consumption levels of appliances such as heating and refrigeration units, researchers can identify the most egregious consumers and make better decisions about which assets to replace or update. The data also contributes to insight about supply and demand, enabling the university to model and optimize the micro grid to meet future energy needs.
Key Points
As you’ve seen, we’ve made a significant investment in building the broadest and most complete big data platform in the industry. If we had to summarize our differentiators into a few categories, this would be it.
InfoSphere BigInsights has the complete set of capabilities to analyze large volumes and variety of data. We’ve taken open source Hadoop and added the following enhancements to make it enterprise-grade:
Performance optimizations that optimize Hadoop workloads resulting in faster answers
Leading-edge text analytics capabilities that delivers more accurate results than other approaches
Professional-grade developer tooling and administration consoles to develop and manage big applications and environments using existing skills
Enterprise-class security to protect confidential data and ensure data privacy
Built-in high-speed connectors to connect to new data sources and types as well as your existing enterprise systems
As you can see, InfoSphere BigInsights is the Clear Choice to analyze your large volumes and varieties of data
Key Points
New paradigm is required to analyze data in motion – some big data problems simply don’t allow you to persist and then analyze data
Can process multiple streams of data at the same data
Modular design that has unlimited scalability – millions of events per day
Designed for variety – to analyze many data types simultaneously
Video, audio, text, social media, devices (smart meters, RFID, instruments) as well as structured data
Can perform complex calculations on the data in real-time
Built-in integration with the other capabilities in the IBM big data platform
Data Warehousing (Netezza, InfoSphere Warehouse, Smart Analytics System)
Hadoop (BigInsights)
Key Points
1) Streams is the right capability when the primary big data challenge is analyze data that is in motion (Velocity) – because the business imperative requires a real-time response/action based on analyzing the data or the data is very large and want to more cost-effectively filter and remove data before moving into your data warehouse or Hadoop system. It can handle continuous or bursty streams of data – millions of events per second with microsecond latency.
2) Streams can process any type of data (Variety)– audio, video, network logs, sensors, social media such as Twitter, in addition to structured data.
3) And, Streams is designed to scale to process any size of data from Terabytes to Zetabytes per day
Volume
Variety
Velocity
Agility
Streams is a platform to build many applications for many industries. It can handle huge amounts of data, up to terabytes per second, or Petabytes per day.
It can fun a large variety of analytics – from historic analysis like data mining, to predictive analytics. And custom analytics such as image analysis, voice recognition, etc.
Since Streams is all done in memory, it has high velocity – it can respond to events in microseconds, 1/1000 of a millisecond. So, it is orders of magnitude faster than databases, which must first store data on disk drives.
Streams also provides tremendous agility to businesses. With the ability to dynamically added new applications that can tap into existing data streams and applications, businesses can respond more quickly to a changing world.
And the power developer and debugging tools we provide speed application development.
Key Points
InfoSphere Streams <Focus on bottom portion of the graphic> :
Manages multiple stream inputs
Analyzes and joins streams together for joint analysis
Can join or loop steams – perform multiple analytics on a stream
Output may be visualization or systematic action (a notification)
Other portion of InfoSphere Streams <top half of graphic> is a development environment
Need tools / IDE to develop streaming applications
Automatically optimizes deployment (e.g., co-locating or fusing operators on a single node if they are used together, etc)
The Streams programming model is to define a data flow graph that defines the connections among data sources (inputs), operators, and data sinks (outputs). Operators, and the streams that connect them, are the building blocks of the logical program.
The deployment model is to group, or “fuse”, operators into units called Processing Elements, or PEs. PEs are separate executables that form the building blocks of the distributed application.
A PE contains one or more operators
A program consisting of many operators may be fused into a single PE; at the other extreme, each operator may run in its own PE
Operators fused into a single PE are tightly coupled; data transfer is local and fast
Communication between PEs uses the network protocol and is therefore slower; but these more loosely coupled components can be flexibly distributed over available processing nodes and are more easily reusable
At the physical level, a Streams job (a running application) consists of multiple intercommunicating PEs.
This lets you place resource-hungry analytics on appropriately-sized nodes
Reuse of generic components is a side-benefit, but choosing the optimum level of component granularity and performance is currently more art than science
Streams provides the infrastructure to support the decomposition of applications into scalable chunks (PEs), and the deployment and operation of these PEs across stream-connected processing nodes.
Note: If asked, Streams currently only supports nodes with x86 architecture, running a Linux (Red Hat) operating system.
Key Points
Here’s an animation that helps show how InfoSphere Streams works and what you can do with it.
Each of these balls represents an operator. The data passes through each operator where some action is being performed on the data.
You can fuse data from multiple streams, you can modify it, annotate it, perform an analytics operation on it, fuse multiple streams or classify it. All of this can happen in less than a millisecond.
The InfoSphere Streams analysis results (events/data) can be directly output and viewed in a monitoring dashboard, stored in a data warehouse or BigInsights, passed to a predictive analytics system or business process management system to trigger a response/action or additional analytics.
All of this can happen in less than a millisecond.
Ease of use
Up & Running Faster with First Steps - guides users through post install setup steps and gets from install to running in just a few clicks
Drag & Drop graphical editor - allows users to build applications by dragging & dropping operators while automatically synching graphical and SPL source code views
Improved Visual Application Monitoring – provides an instance graph that displays the application health and metrics and allows users to quickly identify issues
Streams data visualization - allows users to dynamically add new views to running applications with charts provided out of the box
Enterprise Integration with:
Visualization integration
BigInsights integration (Enhanced)- enables user to visualize Streams data in BigInsights Console
Vivisimo integration (New)– enables user to visualize Streams data in Vivisimo CXO and stream data to Vivisimo index with a Vivisimo adapter
InfoSphere DataStage integration - allows users to perform deeper analysis on data as part of the info integration flow and get more timely results; a Streams ETL toolkit provides adapters that exchange data between Streams and DataStage
Netezza integration – Netezza Adapters use Netezza Native interfaces for optimized performance and allow separation of data preparation and load for flexibility and performance
Advanced Analytics Toolkits
Geospatial - high performance analysis and processing of geospatial data enables location based services by supporting GeoSpatial data types and functions
Time Series – rich set of functionality that includes generation (synthesizing or extracting), preprocessing (preparation and conditioning), analysis (statistics, correlations, decomposition and transformation), modeling (prediction, regression and tracking)
SPSS – uses IBM SPSS Modeler for developing & building predictive models and deploys models Streams via SPSS Scoring Operator; SPSS models are refreshed in Streams without suspending InfoSphere Streams
CEP - Uses patterns to detect composite events in streams of simple events, integration in Streams allows CEP style processing with High Performance and Rich analytics
XML Support
New support for XML – allows developers to fuse a broader range of traditional and non-traditional data
Key points: Streams analyzes a variety of data types. Many people think of Analytics as possible only using BI on warehouses, but Streams enables many different kinds of analytics as well.
The blue items are included in the Streams product (Mining in Microseconds, Statistics, Text Analysis, Geospatial, Predictive, and Advanced Mathematics)
The red items have been built for Streams (Acoustic, Image & Video) and have been used in various projects and engagements, but they have not been made a part of the product. Contact development if you think any of them could help you in an opportunity.
More later in this presentation on the toolkits available in the Streams product.
A few notes:
The “simple” in “Simple & Advanced Text” refers to basic functions for regular expression matching (and replacement) in string values; these are built into the Standard Toolkit, meaning they’re basically part of the language. “Advanced” refers to real text analysis using the same System T code used in BigInsights. This is new in Fix Pack 3 (November 2011).
Statistics included with Streams range from simple aggregate functions (average, sum, etc.) to the more advanced metrics included in the Financial Services Toolkit.
For true time series analysis, we have an advanced toolkit in development; this is covered under Advanced Mathematical Models in this slide.
Customer Reference Database Link: http://w3-01.ibm.com/sales/ssi/cgi-bin/ssialias?infotype=CR&subtype=NA&htmlfid=0CRDD-8SFKRD&appname=crmd
Dublin City Council, Ireland and IBM Research implemented a Intelligent Transportation System designed to provide updated speed and traffic flow measurements, travel time estimates and statistical aggregations of current traffic estimates and statistical aggregations of current traffic conditions in real-time.
Built on IBM’s Big Data Platform using InfoSphere Streams.
Solution provides Dublin City Council’s Roads & Traffic Department real-time visualization and visibility into the arrival times of their 1,000 buses on 150 routes and 5,000 stops daily.
Having this information has enabled the department to optimize bus routes and stop locations.
Benefits: using GPS positions of every bus enables analytics and visualization of:
Location of the buses along their respective routes, or identify buses that do not follow their assigned route, average speed of individual vehicles and aggregate the speeds measured within a given time window on shared sections of route,
Estimated the time of arrivals of the buses at their next stops along their route. Probability of delays at stops or travel times at stops at different time of the day or different days of the week from historical data in real-time.
Customer Reference Database Link: http://w3-01.ibm.com/sales/ssi/cgi-bin/ssialias?infotype=CR&subtype=NA&htmlfid=0CRDD-8PKQW6&appname=crmd
Client name: University of Ontario Institute of Technology
Subtitle: Leveraging key data to provide proactive patient care
The need: Today, patients are routinely connected to equipment that continuously monitors vital signs such as blood pressure, heart rate and temperature. The equipment issues an alert when any vital sign goes out of the normal range, prompting hospital staff to take action immediately, but many life-threatening conditions do not reach critical level right away. Often, signs that something is wrong begin to appear long before the situation becomes serious, and even a skilled and experienced nurse or physician might not be able to spot and interpret these trends in time to avoid serious complications. One example of such a hard-to-detect problem is nosocomial infection, which is contracted at the hospital and is life threatening to fragile patients such as premature infants. The indication is a pulse that is within acceptable limits, but not varying as it should. So, while the information needed to detect the infection is present, the indication is very subtle; rather than being a single warning sign, it is a trend over time that can be difficult to spot.The solution/benefit: With a shared interest in providing better patient care, Dr. Carolyn McGregor, Canada Research Chair in Health Informatics at the University of Ontario Institute of Technology (UOIT), and Dr. Andrew James, staff neonatologist at The Hospital for Sick Children (SickKids) in Toronto, partnered to find a way to make better use of the information produced by monitoring devices. Dr. McGregor visited researchers at the IBM T.J. Watson Research Center’s Industry Solutions Lab (ISL), who were extending a new stream-computing platform to support healthcare analytics. A three-way collaboration was established, with each group bringing a unique perspective—the hospital focus on patient care, the university’s ideas for using the data stream, and IBM providing the advanced analysis software and information technology expertise needed to turn this vision into reality. The result was Project Artemis, a highly flexible platform that aims to help physicians make better, faster decisions regarding patient care for a wide range of conditions. The earliest iteration of the project is focused on early detection of nosocomial infection by watching for reduced heart rate variability along with other indications. For safety reasons, in this development phase the information is being collected in parallel with established clinical practice and is not being made available to clinicians. The early indications of its efficacy are very promising. Project Artemis is based on IBM InfoSphere Streams. The IBM DB2 relational database provides the data management required to support future retrospective analysis of the collected data.
Customer Reference Database Link:
We have been working with an Indian Telco client for some time now to help reduce their billing costs and improve customer satisfaction.
Challenge:
Call Detail Record (CDR) processing within their data warehouse was sub-optimal,
Could not achieve real time billing which required handling billions of CDRs per day and de-duplication against 15 days worth of CDR data
Unable to support for future IT and Business with real-time analytics
Solution:
Single platform for mediation and real time analytics reduces IT complexity
The PMML standard is used to import data mining models from InfoSphere Warehouse. Offloaded the CDR processing to InfoSphere Streams resulting in enhanced data warehouse performance and improved TCO
Each incoming CDR is analyzed using these data mining models, allowing immediate detection of events (ex: dropped calls) that might create customer satisfaction issues.
Business Benefit:
Data now processed at the speed of Business - from 12 hours to 1 minute
HW Costs reduced to 1/8th
Support for future growth without the need to re-architect, more data, more analysis
Platform in-place for real-time analytics to drive revenue
The product management experts on this call are going to give you a lot of really good detailed information on the new releases of these products, but before they do that I want to just summarize for you as sales people the bottom line on what this announcement means - in terms of what you have to sell and what kind of deals and opportunities you should pursue.
#1 This release includes three new analytic accelerators. Analytic accelerators are pre-packaged, pre-developed sets of software tools that allow a customer to very quickly deploy one of our big data products to solve a specific business problem. These analytic accelerators are built on top of BigInsights and Streams and are part of the core product - once they GA, they will automatically ship with each order, and they don't cost extra. They are part of the base pricing model.
As a sales person because you now have these analytic accelerators "in your bag," you can now go out and call on customers and talk to them about big data and how it can solve specific business problems, knowing that the solution you have to offer is more complete than any of our competitors and will help that customer deploy more quickly and get value more quickly than would be possible making it themselves or using a competitors products. The three analytic accelerators are......
Key Points:
Without proper development tools/environments, you may have to code many things on your own – in many different tools
It isn’t optimized, requires vastly different skill sets, and therefore is risk and/or time consuming
Take one example with stream computing – you’ll see all of the various things that would have to be coded separately in order to do stream computing – and the impact to a customer (45% faster delivery)
Key Points
The IBM big data capabilities are all designed to work together and with existing analytics applications such as BI and predictive analytics. Here’s an example scenario:
1) Historic data is stored in the warehouse, where interesting patterns are detected, such as the pattern of credit card transactions that would indicate possible fraud.
2) These models can be defined in tools like IBM SPSS that create the PMML models.
3) The PMML models are then imported into InfoSphere Streams Studio to generate Streams programs that are executed to score the incoming records in real time.
4) Additional data sources such as RFID tags, blogs, or other information might be used to improve the confidence levels of the scoring algorithms.
5) These measures can be sent to Dashboards like Cognos Real Time Monitoring or business process management systems to trigger business processes to take immediate action as required.
6) Streams can also detect model drift, and a closed loop process started – as the models drift, new models can be updated and incorporate to provide continuous improvement.
IBM enables you to get started with your education or project in multiple ways:
Accelerated Discovery Lab - collaborative workspace & infrastructure at Almaden Research Facility for organizations that want to use some of the best minds in computer science to gain insights from their information sources.
Analytic Solution Centers - 9 centers around the world to give quick and easy access to a range of advanced analytics solutions, resources and IBM expertise.
IBM Experts - 9000+ consultants and the experience from 30,000+ analytics-driven client engagements across seventeen industries and over 170 countries.
Academic Initiative - IBM has forged 1,000+ academic partnerships with leading universities to prepare students for the expanding scope of careers. Numerous Big Data & Analytics books have been written by IBM thought leaders
Ecosystem - IBM has 2500+ business partners across Big Data & Analytics. They are extending the reach of our platform with unique value-add solutions.
IBM AnalyticsZone - technology downloads, information sharing and other resources to help in the analytics journey. With 40K+ members, it’s the world’s leading social network dedicated to business analytics.
IBM Big Data & Analytics Hub - content from 100+ contributors (analysts, industry luminaries and IBM SMEs.) 35K monthly visitors for live shows, videos, animations, blogs, infographics and more.
IBM BigData University – 120K people registered and is one of the fastest growing, free education sites on big data.
Transition: Big Data & Analytics is one of the foremost ways an organization can become a Smarter Enterprise.