Success with big data comes down to confidence. Without confidence in the underlying data, decision makers may not trust and act on analytic insight. You need confidence in your data – that it’s correct, trusted, and protected through automated integration, visual context, and agile governance. You need confidence in your ability to accelerate time to value, with fast deployments of big data appliances. Learn how clients have succeeded with big data by building confidence in their data, ability to deploy, and skills. Presenter: David Corrigan, Big Data specialist, IBM. Mer från dagen på http://bit.ly/sb13se
2. Big Data is the next Natural Resource
“Data is the New Oil”
“We have for the first time an economy based on
a key resource [Information] that is not only renewable,
but self-generating. Running out of it is not a problem,
but drowning in it is.”
– John Naisbitt
Harvesting any resource requires Mining, Refining and Delivering
2 2
IBM Confidential
3. The Era of Big Data Demands Confidence
Volume
Variety
Velocity
Veracity
Data at Scale
Data in Many Forms
Data in Motion
Data Uncertainty
Terabytes to
petabytes of data
Structured, unstructured, tex
t, multimedia
Analysis of streaming data
to enable decisions within
fractions of a second.
Managing the reliability and
predictability of inherently
imprecise data types.
4. Success With Big Data Comes Down to Confidence
Confidence in
Your Data
Confidence in
Accelerating
Value
Confidence in
Your Skills
…Before you act
on insight
…Before you start a
big data project
…To maximize the
value from big data
5. IBM’s Latest Innovations Build Big Data Confidence
InfoSphere Integration
& Governance
for Big Data
PureData System
for Hadoop
Big Data
Stampede
Confidence in your
Data
Confidence in
Accelerating Value
Confidence in your Skills
•
All the resources needed
to get value from big data
quickly
•
Software, expertise and
skills
•
Automated integration
•
•
Visual context to
understand data
Appliance simplicity for
Hadoop systems
•
Appliance speed - Get up
and running in hours
•
Agile governance to
protect sensitive big data
6. IBM’s Big Data Platform
A Holistic and Integrated Approach to Big Data & Analytics
CONSULTING and IMPLEMENTATION SERVICES
SOLUTIONS
Sales
Marketing
Finance
Risk
IT
Operations
HR
The Whole is Greater
Than the Sum of the Parts
Performance
Management
Risk
Analytics
Decision
Management
Content
Analytics
Broadest set of capabilities across
big data and analytics
•
Pre-integrated components accelerate
time to value
Pre-built industry and horizontal solutions
•
Delivered in multiple forms: software,
appliance, and cloud
•
ANALYTICS
•
•
Watson and Industry Solutions
Only vendor with embedded integration &
governance capabilities
Business Intelligence and Predictive Analytics
BIG DATA PLATFORM
Content
Management
Hadoop
System
Stream
Computing
Data
Warehouse
Information Integration and Governance
SECURITY, SYSTEMS, STORAGE AND CLOUD
7. Astron uses streaming analytics
to deliver insight from the world’s
largest radio telescope
Need
• The institute needed to develop a resource- and
energy-efficient way for astronomers to analyze an
unprecedented amount of unstructured data from what is
designed to become the largest radio telescope ever built
Benefits
• Accelerates the identification of relevant images and data
by approximately 99%, making the information available to
astronomers in minutes as opposed to several days
• Integrates data from more than 3,000 dishes and antennas
that make up the largest and fastest radio telescope in the
world
7
Home
8. By 2015, 80% Of All Available Data Will Be Uncertain
1 in 3
9000
100
90
7000
80
6000
70
5000
4000
3000
60
50
40
30
20
2000
10
Aggregate Uncertainty %
Global Data Volume in Exabytes
8000
Rising Uncertainty =
Declining Confidence
Make decisions on
untrustworthy data
1 in 2
Sensors
Internet of things
Social media
Video, Audio and Text
VoIP
1000
Enterprise Data
0
Multiple sources: IDC,Cisco
2005
Lack the information
that they need
2010
2015
60%
Have too much data
9. Innovations in Information Integration and Governance
Automated
Integration
Visual
Context
InfoSphere Data Click
Self-service access to a growing variety of big
data in traditional, NoSQL and Hadoop
sources
Information Governance
Dashboard
Immediate, visual context for critical decisions
Understand big data to leverage it better
InfoSphere Privacy & Security
Agile
Governance
Find and protect sensitive big data
Single point of security for traditional, NoSQL &
big data
2 Click
Data
Integration
170x
Faster
Imports
80%
Faster
Monitoring
10. Clients Maximize Value from Data Confidence
Automated Integration
Visual Context
Agile Governance
100%
95%
72%
Reduced time
to deliver reports
Time reduction
In Information
gathering
Reduction in
Fraudulent
Claims
Act on Insight
Improve Decisions
Mitigate Risk
11. Let’s Simplify Big Data:
Announcing IBM PureData System for Hadoop
HDFS
Designed to:
MapReduce
HCatalog
• Simplify the
building, deploying and
management of a
Hadoop cluster
Visualization
Pig
Hive
• Speed the time-to-value
for Hadoop and
unstructured data
Development
Tools
• Maximize the overall
analytic ecosystem
• Provide enterprise
security and platform
management
1Based
on IBM internal testing and customer feedback. "Custom built clusters" refer to clusters that are
not professionally pre-built, pre-tested and optimized. Individual results may vary.
Deploy
8x
Faster1
12. Benefits of IBM PureData System for Hadoop
• Deploy 8x Faster
than custom-built solutions1
Accelerate Big Data
Time to Value
• Built-in Visualization
to accelerate insight
• Built-in Analytic Accelerators2
unlike big data appliances on the market
• Single System Console
for full system administration
Simplify Big Data
Adoption & Consumption
• Rapid Maintenance Updates
with automation
• No Assembly Required
data load ready in hours
• Only Integrated Hadoop System
with Built-in Archiving Tools2
Implement EnterpriseClass Big Data
• Delivered with More Robust Security
than open source software
• Architected for High Availability
1Based on
2Based on
IBM internal testing and customer feedback. "Custom built clusters" refer to clusters that are not professionally pre-built, pre-tested and optimized. Individual results may vary.
current commercially available Big Data appliance product data sheets from large vendors. US ONLY CLAIM.
13. PureData for Hadoop Accelerates Big Data Use Cases
PureData
System
for
Hadoop
• Aggregation of data
• Ad-hoc analysis
• Simple
analytics/exploration
PureData
System for Hadoop
PureData
System for Analytics
PureData
System
for
Analytics
• Immediate storage
alternative of cold data
• Cost savings for cold
data
• Simple
analytics/exploration
• Explore new data
• Visualize with easy,
spreadsheet-style
analysis
• Identify useful
information and move to
other systems
14. Introducing Big Data Stampede:
Leading the Charge for Big Data Success
IBM Expertise
Big Data Platform
Removes the guesswork
and delivers savings in
time and cost
Provides the use of
unmatched capabilities
Standard
Roadmap
Stampede
Research
Use Case Selection
Tutorials/Training
BigInsights Quick Start
Expert/BVA
Product Selection
Skills & Knowledge
Transfer
Ensures client
self sufficiency and
big data capabilities
Education & Training
Services
Business
Value
Time to insights
Business
Value
IBM Expertise
15. A Year of Leadership and Innovation
DB2 with BLU
ACCELERATION
8-25x Faster
Reporting
& Analytics
BIG DATA
PLATFORM
BIG SQL,
2-10x Faster
Stream
Processing
PUREDATA
SYSTEM
for HADOOP
8x Faster
Deployment
9 New Academic
Collaborations
$100,000 in Awards
for Big Bata Curricula
PREDICTIVE
ANALYTICS
for BIG DATA
Improved
Visualization,
Automatically Find
Relevant Data
Preparing for
4.4 Million Big
Data Jobs in 2015
BUSINESS
INTELLIGENCE
Animated
Charting,
Extensible
Visualization
RISK, FINANCE
& CLOUD
Disclosure
Management,
Accelerated
Internal Reporting
16. Top 5 Big Data Use Cases
Each Requires Data Confidence for Success
Big Data Exploration
Enhanced 360o View
of the Customer
Operations Analysis
Security/Intelligence
Extension
Data Warehouse Augmentation
17. Global aerospace manufacturer
empowers staff with access to
critical information
Need
Link to the case study
Http// need to get link from
ibm.com
• Improve operational efficiencies by providing a unified
search, discovery and navigation capability to provide
fast access to relevant information across the
enterprise
Benefits
• Placed 50 additional aircraft into service worldwide
during the first year without a staffing increase
• Saved USD36 million/year in supporting the 24/7
aircraft-on-ground program
• Provided supply chain visibility to reduce cycle time,
saving millions of dollars on critical parts deliveries
17
Home
18. Elisa Corporation - Adding millions
of Euros in revenue with improved
information services
Need
• Elisa Corporation sought a deeper understanding of
customer needs as they expanded its offerings.
However, its existing information services platform
could not support the data-intensive analytics
required.
Benefits
• Provides a platform to drive millions of Euros in new
revenue
• Supports 200 to 600 times faster data analysis and
100 times faster load performance
• Delivers direct yearly cost savings of almost
EUR800,000 (USD1 million)
18
Home
19. IBM Big Data & Analytics Momentum
40,000
AnalyticsZone.com
Members
1550
30,000
1100
730
170
Big data
Clients
Business
Partners
Big Data
Clients
85
Info Agenda
Engagement
s
2010
860
Info Agenda
Engagement
s
2011
Source: IBM. Note: All numbers used are cumulative. 3/31/2013
10,000
Big Data University
Enrollments
9th
Analytics Solution
Center Opens
in Ohio
GBS Information and
Analytics Engagements
1040
Big Data
Clients
Big Data
Clients
1640
Business
Partners
2,300
Info Agenda
Engagements
2012
40,000
Big Data University
Enrollments
2215
Business
Partners
3,810
Info Agenda
Engagement
s
101,000
Big Data University
Enrollments
2013
Editor's Notes
KeypointsNatural resource that can smother youFor many organizations, they are being smothered – they think they want big data, but they can’t handle big dataCatchy StatementIs any data raw? Needs to be refined
Key PointsWe’re all familiar with the 3 V’sVolume is about rising volumes of data in all of your systems – which presents a challenge for both scaling those systems and also the integration points among themVariety is about managing many types of data, and understanding and analyzing them in their native form.Velocity is about ingesting data in real time and in-motionAnd veracity deals with the certainty, or truthfulness of big data. Veracity is a big issue – and one that directly relates to confidence. In fact, as the complexity of big data rises (the first 3 Vs grow), it actually becomes harder to establish veracity.
Key PointsConfidence is directly linked to success with big data Without data confidence – companies don’t trust their data. If you doubt the data, you doubt the insight.Without confidence in your ability to accelerate value or that you have the skills to execute – companies may not even start a new big data project. And they’d miss an incredible window of opportunity. We are the only vendor who is pushing confidence as a key aspect to big data success. We think it’s an important issue – one that firms need to deal with at the start of their big data journey.Catchy StatementConfidence is necessary for action. In order to act on big data opportunities and act on insight, you need confidence.
Key PointsWe have three exciting announcements all of which help build confidence in big data. First, new innovations in integration and governance help build data confidence. Automated integration enables organizations to rapidly ingest big data. Visual context helps them understand that data and their confidence level in that data so they can leverage it. And agile governance allows them to apply only the appropriate amount of governance for their use case.PureData System for Hadoop brings appliance simplicity to the world of Hadoop. You can be up and running with a Hadoop cluster in just hours.Big Data Stampede helps organizations address their skills gap – by providing software, expertise, and training to get value from big data straight away. Catchy StatementIBM is the only vendor addressing big data confidence in a holistic way.
Key PointsThe value of IBM’s big data platform is it’s breadth. We have the broadest set of capabilities for big data of any vendor.The whole becomes greater than the sum of the parts once we start integrating those components. And we’ve done that. Our data warehouses and Hadoop systems are well integrated with our IIG capabilities. Our analytic solutions such as Cognos and SPSS are integrated with the relevant components of the big data platform. Our industry solutions teams built industry specific and horizontal solutions based upon big data and analytics. Watson is one such example.And our partners build around this platform of capabilities with highly tailored and specific solutions for their clients.
Key PointsResearch shows that data uncertainty is rising along with the volume of data. Why is uncertainty rising?One reason is that we are tapping into external data more than ever before. When combining external data, sometimes from uncertain sources, the overall level of uncertainty rises.Another reason are the various inputs – there are more sources of data. More fragmented records that need to be reconciled.Look at the statistics on the right1/3 make decisions on untrustworthy data. That’s from 2012. What is it like today? Or in 2014? 1/2 lack information and want more, yet 60% have too much data. That’s a paradox. We want more, but we can’t handle it. The answer isn’t making data “smaller”It isn’t ignoring new sources of big data and insight.And it isn’t making the data perfectly certain – that’s a fools errand.It’s about understanding the level of uncertainty, or confidence and acting despite that uncertainty. It’s about making the data good enough that you’re comfortable to act. That’s the new role for Information Integration and Governance.Client Stories & Anecdotes An insurer was gathering data in Hadoop for a telematix use case. They dumped in location data based on a device in your car – which was then used to calculate a potential monthly premium discount based on your actual driving history. But it wasn’t long before the marketing department was asking other questions. What are the household driving patterns? Who was driving the car? How long did they stay at particular locations? The issue of confidence came to the front – and it exposed that they weren’t confident in the data without combining it with other sources (such as master customer data records). Their first step was to better classify and understand that data – using enterprise metadata.Catchy StatementMore data = more uncertainty – yet everyone wants even more data. How will they cope?
Key PointsAutomated integration – this is all about easy access to data no matter where it resides – inside your organization or outside.We have many exciting announcements that Martin will cover, but one in particular is Data ClickWith so many initiatives dependent on data, simply getting access to the right data is a challenge.InfoSphere Data Click accelerates a whole host of projects by making it easier to get started, without dealing with long waits for IT resourcesData Click has been very well received since its introduction last year, and now it is becoming even more helpful by enabling integration of data from more big data sources (JSON, NoSQL, Hadoop, lots of others via JDBC) Visual Context – this is extremely important because how can you have confidence in something you can’t see? That’s the point of the governance dashboard – to provide visual context on governance policies – visualizing 1000s of data points and policies and KPIs into a simple, easy to understand dashboard. And this can be shared with business users – so they can see data confidence whenever they need to.Agile Governance – The ability to apply governance only when and where it’s needed is crucial. The zone architecture has more places in which data is stored, more ways in which it is integrated. The only way to keep up is to make governance become agile and responsive. One example is privacy and security – where we have expanded the number of NoSQL and Hadoop big data systems that we support, in addition to the extensive list of relational systems already supported. Now you can have a centralized system to control big data security – which makes things much more simple. Catchy StatementYou won’t act on insight unless you have confidence in the data - these new innovations understand and improve big data confidence.
Key PointsScotiabankChange Data Capture to re-engineer its data system because of an an increase in demand for real time data from clients. CDC helped to facilitate data synchronization between the bank's various databases, ultimately reducing delivery time.$1 million in cost savingsSub-second data delivery timeNeed a real-time approach to cope with growing dataColtColt Technology Service Group wanted to improve their customer service with quicker data insights. InfoSphere DataStage and FastTrack help to consolidate data into a easy to find source and provides a common language among employees.Using PureData for Analytics to analyze customer data – and gain deeper customer insightUS$1.9 million (₤1.2 million) in annual savings; 90 percent reduction in the time to complete ‘wildcard’ searches; more than 95 percent reduction in the time to gather information“A major benefit of the integrated InfoSphere product range has been how it’s raised the visibility and importance of Business Intelligence governance within the organization,” says Herson. “We now have business and IT managers, strategists, planners, analysts and developers all sharing the same BI vision and common understanding of interface contracts, business terms and data definitions. This has resulted in users demanding their BI requirements are met from data obtained directly from the Enterprise Warehouse and not from localized legacy ‘spreadmarts’, such as Access databases and Excel spreadsheets.”MoneygramMoneyGram International has stopped more than US$37.7 million in fraud through its Global Compliance system based on IBM InfoSphere Identity Insight solutions.Understanding who its clients are is helping MoneyGram International identify and stop fraudulent, unauthorized money transfers, thereby addressing a common problem for financial institutions worldwide. Using a powerful, algorithms-based software platform, MoneyGram International can quickly identify questionable patterns, proactively enact processing rules, and quickly become compliant with new regulations, preventing thousands of customers from losing funds to fraud.Interesting anecdotal storyTed Bridenstine, systems development manager at MoneyGram—a leading global payment services company—underscores the importance of fraud detection with the story of a 100-year old grandmother who had contacted MoneyGram after receiving a call that her grandson had been arrested and needed US$2,500 for bail. Behind the scenes, MoneyGram’s fraud detection system flagged the transaction as suspicious. Analysts determined that it was likely part of a telephone scam and a MoneyGram representative contacted the customer to let her know that the wire had been stopped and her money was being refunded. Worried about her grandson’s safety, she threatened to take her business elsewhere if MoneyGram didn’t wire the money. The company representative implored the woman to contact a family member and verify the story.“She called back three days later in tears to thank the representative personally,” recalls Bridenstine. “She did verify that it was a fraud. She lives on Social Security and could not afford to lose money. The call was emotional and heartwarming.”Catchy Statement
A picture speaks a thousand words! Rolling out hadoop clusters is time consuming and resource intensive. Complicating the challenge is the continued shortage of Hadoop skills available in the market. We find that most organizations are organically growing those skills elongating time to value on leveraging big data.When we designed PureData for Hadoop we had one thought in mind- simplify the deployment of enterprise quality Hadoop. In fact, our first beta customer was able to get the system up and running and ready for action in 89 minutes as opposed to the weeks they spent building their own cluster. The PureData System for Hadoop will help speed time to value on Hadoop projects by 8X, that’s faster value and faster insight.Based on InfoSphere BigInsights, PureData for Hadoop extends the entry points available for our customers to leverage Hadoop, whether you are downloading our free version, leveraging our Enterprise software edition or speeding time to value with PureData for Hadoop, IBM has a solution that meets your needs for Hadoop.
So we talked about how much faster you can deploy the PureData System for Hadoop, that is a significant value for your IT organization, but what about the business? PureData Hadoop also helps to accelerate insight. With an easy to use spreadsheet like interface built in, you can achieve immediate insight from new data sources or previously untapped data sources. In addition to the visualization, PureData for Hadoop includes built in analytic accelerators . While many vendors offer analytic functions, IBM goes one step further to make these even more consumable. We group the functions into frameworks that help accelerate application development for social medial analysis, machine and text data. To help simplify the solution, the system offers a single console to fully manage hardware and software making day to day administration like maintenance updates simple. The system is up and running in hours giving you the shortest path to leveraging Hadoop in your organization.Lastly, we have included a capability that no other Big Data appliance has today because we built in archival software that allows you to easily offload cold data from your data PureData for Analytics warehouse to PureData for Hadoop. This capability represents a significant cost savings for the ecosystem. Of course the system is delivered with robust security and is integrated with Guadium, best in class security for Hadoop.
The landscape of how we architect our analytic ecosystem Is changing, fueled by the volume, velocity and variety of data we wish to analyze. As this change occurs, we see Hadoop being leveraged in some key use cases which complement the data warehouse. As a landing zone, Hadoop is an efficient and cost effective platform of aggregation of data, pre-processing and cleansing. We see many of our customers leveraging Hadoop both in front of the data warehouse as a landing zone and also behind the data warehouse to offload cold data. In fact, research done by EMA late 2012 showed that 51% of the clients in production with hadoop were leveraging it for archival. We also see Hadoop leveraged as a pre-processing hub to explore data that previously was unavailable to the enterprise. This capability can help identify new useful information that may be integrated with existing analytics to further enhance insight.
One of the challenges clients face is getting started with big data quickly.Typically getting started involves a lot of research, education, skills training, product comparisons, etc before ANY value is realized. “Stampede” removes the technical, skills & staffing barriers so clients can get quick value from “big data”. We’ve assembled everything you need in one package – including expert services. Benefits: Faster launch and time-to-value for big data initiatives Included: - All of the resources (people, process, technology) to ensure success with Big Data- Business Value Assessment to determine high value starting points with Big Data - Use of robust, scalable Big Data platform for building out big data solutions- Big Data experts to guide the client through proof of concept of a specific use case (Don’t go it alone)Hack-a-thon type sessions Exploratory analytics- Educational & training resources to ensure client self-sufficiency with big data projects
Key PointsWe make significant announcements in AprilBLU Acceleration is a breakthrough technology that enables speed of thought analytics – 8-25x faster. Our unique approach to dynamic in-memory computing offers both stability and performance for a new class of applications.We announced new innovations in our big data platform - easier accessibility with Big SQL and faster performance on stream computing. And we announced our PureData System for Hadoop – which today we are announcing is generally available. We followed that with announcements in June joining big data and analytics – new predictive analytic capabilities with enhanced visualization designed for big data. Specific analytic applications for risk and finance that take advantage of the big data.Just last month we announced further academic programs – 9 new academic collaborations with universities as we prepare for over 4 million big data jobs. We’ve continued to invest in academia across the board – something we started years ago with bigdatauniversity.com and partnerships with academic institutions.And I’m pleased to announce that we now have over 500 big data partners. A thriving partner ecosystem is essential to a platform strategy. We want partners to build out and complement our platform. We want partners to build solutions based on our capabilities. We have three of those partners here with us today – ones who have built big data and IIG solutions from our capabilities.Catchy StatementWe’ve had a busy year but we won’t rest on our laurels – we’re dedicated to every aspect of growing the big data market.
Key PointsThrough hundreds of client implementations, briefings and consultations – we’ve determined a common set of big data use casesEach of the use cases requires different big data technologyEach of the use cases requires a different set of governance capabilities and a different level of appropriate governanceFor example, big data exploration. This use case is all about ingesting big data quickly or discovering it in its source systems, determining its relative value, experimenting with big data, and utilizing it. From an IIG perspective – its critical that you be able to discover and determine the confidence of the data. That’s not so say it should be improved or governed yet while you’re exploring. It’s focused on understanding your confidence level in the data to determine if you trust the outcomes, or whether the data needs to be improved before it’s analyzed. Enhanced 360° View – this use case is about truly knowing everything about master entities such as the customer. In order to find big data for the customer, you first need to establish the unique customer record – and that’s where MDM along with data quality and integration play a role.Security and Intelligence Extension – this use case is about monitoring data – log data, network data – to prevent data loss, threats, fraud, among other things. IIG helps by providing automatic protection of sensitive data, masking it, and also aiding in the detection of fraudulent individuals and networks.Operations Analysis – this use case is all about analyzing operational data – from machines and networks – either streaming information or data at rest. It requires high volume data integration to move and integrate data among the zones.DW Augmentation – this use case focused on augmenting the DW – sometimes that means archiving data from the DW but still being able to access and analyze it, sometimes it includes complementing the DW with unstructured data and unconventional sources. IIG helps by providing high volume data integration to and from the DW, as well as archiving capabilities to track the lifecycle of data.
Key PointsWe’ve increased our momentum year after yearYou can see it in the growth in clients – with thousands of big data and analytics engagements we have the breadth of that experience. That experience drives our product roadmaps, our innovation, and our services people who know how to implement big data quickly.Over 100,000 registrants in big data university – that’s an incredible accomplishment. Our goal is to raise the market’s education level on big data and analytics and this has been an tremendous success, along with our developer days and hackathons. In addition to the 500+ big data platform partners, we have 2215 partners across big data and analytics. That’s the multiplying effect of the platform – those partners augment our technology with unique solutions that add value to specific markets. Catchy StatementOur momentum has been strong – but we think it will get stronger still with today’s announcement.Confidence in big data makes organizations confident that they can start this journey – and they can start it with a partner who will make them successful.