3. WHAT IS BIG DATA?
Big Data: Volume, Velocity, Variety and Complexity
Pattern-based Strategy
DATASETS WHOSE
Social Computing Context-Aware Computing
VOLUME, VELOCITY, VARIE
TY AND COMPLEXITY IS
Velocity Volume BEYOND THE ABILITY OF
AN ORGANIZATIONS’
Variety Complexity ABILITY TO
CAPTURE, PROCESS, STOR
INFORMATION SOURCES
E, MANAGE AND ANALYZE.
Social Media Video Audio Email Texts Mobile Transactional Data IT/OT Documents Search Engine Images
*Gartner, Inc., ―Big Data‖ Is Only the Beginning of Extreme Information Management‖,
3 HP Confidential Mark A. Beyer, Anne Lapkin, Nicholas Gall, Donald Feinberg, Valentin T. Sribar, Published 7 April 2011
4. “Big data has the potential to revolutionize
management. Simply put, because of big
data, managers can measure, and hence
know, radically more about their
businesses, and directly translate that
knowledge into improved decision making
and performance.”
Erik Brynjolfsson and Andrew McAfee, Harvard Business Review, September 11, 2012
5. BIG DATA – BY THE NUMBERS
The world‘s ‗digital universe‘ is in the process of adding 1.8ZB in 2011
with continuing exponential growth – projecting 35ZB in 2020*
70% of the information in the digital universe is generated by
individuals
$4 Trillion spent since 2005 by enterprises to capture, manage, store
and derive value from – information
• In 15 of the US economy‘s 17 sectors, companies with more
than 1,000 employees capture, manage store, on average, over
235TB of data
85% of information is unstructured:
• 97,000 TWEETS every second
• 12 MILLION TEXTS every minute
• 294 BILLION EMAILS every day *IDC, 2011 Digital Universe Study: Extracting Value from Chaos
Social Media
5 Video Audio Email Texts Documents Images
6. BIG DATA = BIG CHALLENGES
Senior Business and Technology Executives - Survey Data
50% 2% 34% 35%
Do not have an Can deliver the right 50% of information Organization is not
effective information, at right within organizations effective at accessing
information time to support is unconnected, enterprise information
strategy in place enterprise outcomes undiscovered and as and when needed
100% of the time unused for legal/compliance
and operational needs
6 HP Confidential
7. SUCCESS DEPENDS ON UNDERSTANDING AND
LEVERAGING ALL FORMS OF INFORMATION…
…Unstructured and Structured Information
…Today, organizations leverage just 5% of their information*
*―We estimate that firms effectively utilize less than 5% of available data‖. Big Data Will Help Shape Your Market’s Next
7 HP Confidential Big Winners, ―The Forrester Blogs for Enterprise Architecture Professionals‖ -Brian Hopkins‘, September 30, 2011
8. INFORMATION – CAPTURING ITS VALUE
US HEALTH GLOBAL
EUROPE
CARE US RETAIL MANUFACTURING PERSONAL
PUBLIC SECTOR
LOCATION DATA
Increase Increase net Increase Increase
Decrease
Industry margin by Industry value service
dev., assembl
Value per year per year by provider
y costs by
by revenue by
$300B 60+ % -50% €250B $100B
FORWARD-THINKING LEADERS HAVE BEGUN TO AGGRESSIVELY BUILD
THEIR ORGANIZATIONS’ BIG DATA CAPABILITIES
*McKinsey, 2011 Big Data: The Next Frontier of Innovation, Competition and Productivity
9. INFORMATION OPTIMIZATION
BRIDGING THE GAP FROM BIG DATA TO BUSINESS INSIGHT
Information Management Business Insight
Decision-makers must be ready to
Information management target exactly the right price,
has become the best product and/or service at each
overarching term to describe core customer segment.
the Companies require predictive
capturing, governing, manag analytics to drive competitive
ing, storing, leveraging and advantage. Predictive analysis
ensures that key decision-makers
disposing of all types of
can instantly act upon these
information assets within an insights across the business
organization
Both Information and Information Technology are vital²
² Source: Gartner – Re-imagining IT: Principles for a proposed new CIO manifesto
9 HP Confidential
10. SOCIAL MEDIA: WHERE‘S THE VALUE?
Mining Social Media for Client Insights and Trends
If Facebook were a country, It would be the third-largest in the world
(between India and the U.S.)
1400
1200 Twitter‘s world
1000 rank would be
800 No.7
600
400
200
0
CHINA INDIA FACEBOOK UNITED STATES INDONESIA BRAZIL
• The information in social media includes text (blogs, Facebook posts, twitter feeds), image (Flickr), video
(YouTube), and audio
• Today most of the activity in analyzing Social Media is in finding interesting connections of people (Social
Network Analysis) and analyzing text from blogs, Facebook and Twitter
• In the near future, organizations will want to expand beyond text to the full spectrum of content – image
audio and video
10 Source: Internet World Stats
11. MOVE FROM REACTIVE TO PREDICTIVE – AND
BEYOND
In Flight FORESIGHT
information - What will happen?
INSIGHT
- What is BETTER, HINDSIGH
happening now? SMARTER T – what
ENTERPRISE
happened?
DECISIONS
Active Inactive
information information
11 HP Confidential
12. Until recently, only Structured Unstructured
structured information has Amount 15% 85%
Growth 22% 62%
been leveraged……
13. DERIVE VALUE: FROM ALL INFORMATION
FORMS LEADS TO STRATEGIC DECISION
MAKING a
•The ability to identify
Track and understand •
customer‘s intent to purchase a
product from their twitter stream generate the most identify a customer‘s customer sentiments on social
and instantly generate the most relevant offer intent to purchase media sites and forums
relevant offer to that customer • Track service issues, brand
based on similar buying perception, customer noise
patterns in structured data.
• Eliminate disparate and siloed
• A bank‘s ability to analyze and information pools with
form a credit risk assessment comprehensive real-time risk
based upon sentiment in a analysis
phone conversation that is
• Identify cross-selling
statistically correlated with
opportunities
credit risk data.
• Improve operational efficiency
• Or a retailer comparing sensor
Similar buying patterns by identifying the most
data with pictures of store
in structured data profitable markets, customers
shelves to see in real-time what
and products
products are selling best.
13 HP Confidential
14. AUTOMATED REAL-TIME ANALYSIS – A GAME CHANGER
Using Context-aware Computing and Pattern-based Strategy
TODAY
LEVERAGING 100% ENTERPRISE RELEVANT INFORMATION – IMAGINE THE OPPORTUNITIES
High volume products Tesco compares sensor The New York Times
such as social games data with pictures of homepage feature articles
or apps can optimize store shelves to see in change multiple times a
their products within real-time what products day. How do they decide
minutes with real-time are selling best…… which story to feature
data. more prominently?
Zynga can test the Cityville ―Add Tesco uses information to analyze new Real-time data allows publishers to see
Coins & Cash‖ button for thousands business how to create the most what is trending throughout the day so
of users within minutes. effective promotions for specific that they can deliver the right content at
customer segments—and to inform the right time, and know which articles
decisions on pricing, promotions, and are spreading virally versus being found.
shelf allocation.
14 HP Confidential
USE CASE EXAMPLES ONLY
15. UNLOCKING THE VALUE OF INFORMATION
REQUIRES A CULTURAL SHIFT – ENTERPRISE-WIDE
Build an information-driven corporate culture.
By using information effectively to understand and align with your
business needs and preferences of your customers—is the key to
creating a competitive advantage in today's customer-empowered
15 HP Confidential
environment.
16. THANK YOU!
CHRIS SELLAND
CSELLAND@VERTICA.COM
(617) 386-4523
17. EVALUATE THE HP VERTICA ANALYTIC
PLATFORM
• Enterprise Edition
• Free 30 day evaluation
• Community Edition
• Free Download 1TB, 3 nodes
• www.vertica.com
Editor's Notes
“We’re entering a new world where information is the currency of the global enterprise,”
SLIDE OBJECTIVE – to highlight the presence of Extreme Information and its implicationsKEY POINTSWe highlighted earlier the tremendous VOLUME of information being created. But Gartner and others also have shared their findings and observations around the emergence of ‘Extreme Information’, which also involves three other complicating dimensions to information.Along with Volume, there’s the VELOCITY. Relevant, valuable information for an enterprise is coming at such a torrential pace it’s become a real-time phenomenon requiring real-time capabilities.Next is VARIETY – which you can see from just the subset of information sources shown here.Finally there’s the COMPLEXITY of where it lives and moves. From server to cloud to hybrid to device and back and forth.TRANSITION – Unstructured and semi-unstructured is what we call ‘Human Information’.
SLIDE OBJECTIVE – to illustrate the predominance of the new “human” information sources in what’s being created and consumed and needs to be leveraged.KEY POINTS:1. According to IDC’s Digital Universe Study released this summer, the world’s “digital universe” is in the process of generating 1.8 Zettabytes of information - with continuing exponential growth – projecting to 35 Zettabytes in 2020. The information generated by sensor data is going to escalate this growth! 70% of the information in the digital universe is generated by individuals; enterprises have some liability for 80% of information at some point in its digital life. Since 2005, enterprises have invested $4 Trillion dollars on hardware, software, services and staff to create, manage, store – and derive revenue from – information.2. “Human-friendly” information makes up about 85 percent of all data and includes emails, audio, video, social networking, blogs, call center conversations, machine-generated sensor data, and more. It grows at a breathtaking rate: 62 percent CAGR. This is the future of information computing and it represents a fundamental shift in the way people and businesses interact with information.Beyond its sheer size, unstructured information is where all the interesting, differentiating, and vital things happen. When processing information looking to uncover a crime, investigators look for incriminating emails. When trying to understand their customer base, marketers look for information on their customers. But, unfortunately, customers don’t send you databases; they tweet or blog. And this is only becoming more complicated with the explosion of social media activity.TRANSITION – Provide further explanations and context to Human Information
SLIDE OBJECTIVE – To understand the current state of Information Optimization, HP commissioned an independent study to focus on information growth and challenges. This study was conducted by Coleman Parkes in October 2011 using 554 telephone interviews among senior business executives and senior technology executives in Enterprise-level companies around the globe to understand their information management challenges, priorities, and perspectives. Some of the key results:KEY POINTS:Almost 50 percent of executives believe that they do not have an effective information strategy in place—a strategy that cuts across organizational silos, technologies and strategic functions.Only 2 percent report that they can deliver the right information at the right time to support the right business outcome 100 percent of the time.34% of respondents indicated that more than 50 percent of all the information within their organization remains unconnected, undiscovered and unused.35% of respondents indicated they were not effective at accessing business information as and when needed for legal/compliance and operational needs.TRANSITION – And the pace and direction of change heading into the next 5 years is taking shape already.
SLIDE OBJECTIVE – to highlight what is new and different – and important – in the world of InformationKEY POINTSThe world is a very different place than it was even five short years ago. It’s getting bigger digitally; but even more notable than the size increase of the world’s data growing by 50 percent each month, is the shift from primarily ‘structured’ information to ‘unstructured’:Recently, Forrester noted that they “estimate that firms effectively utilize less than 5% of available data”. (Source: Big Data Will Help Shape Your Market’s Next Big Winners, “The Forrester Blogs for Enterprise Architecture Professionals” -Brian Hopkins’, September 30, 2011).It’s no wonder. Information no longer is just the ‘database rows and columns’ of structured information. It’s the unstructured information in videos, images, audio files and all the social media and social networking channels that are increasingly adopted by business and government enterprises.TRANSITION – And the pace and direction of change heading into the next 5 years is taking shape already.
SLIDE OBJECTIVE: Examples of monetizing Information – for revenue growth and competitive advantageKEY POINTS:Digital information is now everywhere – in every sector, in every economy, in every organization and user of digital technology. This information can create significant value for the world economy, enhancing the productivity and competitiveness of companies
SLIDE OBJECTIVE: Analyzing and mining social media to drive business value; revenue and competitive advantage.KEY POINTS: Groupon’s making the most out of its millions of users, and analyzes its big data to determine the effectiveness of its advertising expenditures, consumer behavior as well as purchaser demographics and location. Groupon analyzes its data to create more personalized offers and is targeted to particular consumer interests have generated more purchases per customer. Groupon’s data analysis has led to better marketing insights on customer habits and preferences and delivers it as a service back to their merchants.The New York Jets launched their Ultimate Fan game in September 2010, which was the first revenue generating Facebook app to be backed by a pro sports team. The application lets football fans do online what they would normally do at home and in stadiums—root for their favorite teams and players, predict game scores, and hold a virtual tailgate party with other fans from across the globe. The Jets are able to engage with their fans and make them feel like they are part of the team. They are leveraging social medial to capitalize on their fans' passion for the team and their willingness to share that fervor. PETCO is yet another example. The company provided a promo code to their customers for $40 in free shipping. The person who shared their code with the most people won a $500 PETCO gift card. About 40 percent of the sales that resulted from this promotional push came from new consumers. The desire to save a few dollars on shipping costs drove loyal PETCO customers to connect with the larger pet owner community and spread the word about the store via social media.
Knowing what happened and why it happened are no longer adequate. Leaders need to know what is happening now, what is likely to happen next and, what actions they should take for optimal results.
SLIDE OBJECTIVE: Today, unstructured data accounts for 85 percent of the information in the digital universe. and its growing at a breathtaking rate: 62 percent CAGR. KEY POINTS: In looking at information’s history and advancements in the last 50 years, the databases of the 1960s were run on computers that were not powerful enough to understand ‘real-world’, rich information as humans could. To handle this, we had to create a new machine-friendly paradigm, which gave rise to a structured data world.As computer processing power increased, this machine-friendly paradigm proved very useful to business. When information was structured, its location gave it meaning. For instance, if a column stood to represent the ‘number of teddy bears in the warehouse’, when that column went to 0, an amazing thing would happen: the computer would automatically order more. The value here came from the computer’s ability to process the information, not merely to locate or retrieve it.
SLIDE OBJECTIVE: The technology stacks of yesterday were developed to process structured information and to provide answers to specific well-defined questions. The technology stack for today and the future combines the concepts and context from unstructured information with analysis of structured data. KEY POINTS: Here are some examples of the possibilities for leveraging both structured and unstructured information:The ability to identify a customer’s intent to purchase a product from their Twitter stream and instantly generate the most relevant offer to that customer based on similar buying patterns in structured data. A bank’s ability to analyze and form a credit risk assessment based upon sentiment in a phone conversation that is statistically correlated with credit risk data. Or a retailer comparing sensor data with pictures of store shelves to see in real-time what products are selling best. A next-generation retailer will be able to track the behavior of individual customers from Internet click streams, update their preferences, and model their likely behavior in real time.
SLIDE OBJECTIVE – to highlight that to get the most out of Human and Extreme Information, Automation and Real-Time Analysis are essential capabilities of Context-Aware and Pattern-Based Strategies. FOR EXAMPLEZynga – global gaming company has over 200 million monthly active users. They interact with their gamers real-time; they know what’s going on and change the game on the fly. 2. Tesco, the global grocery and general merchandise retailer, 3rd largest retailer in the world by revenues is able to optimize decisions on everything from pricing, promotions and shelf allocation, real-time. 3. The New York Time receives more than 30 million unique visitors per month. Real-time information lets them know what’s trending and why so they can leverage this knowledge immediately. TRANSITION – Understanding and leveraging both Human (e.g., Social, unstructured, multi-media) AND Extreme (huge, rapidly moving, real-time) Information
SLIDE OBJECTIVE: By simply making the right information more easily accessible to relevant stakeholders in a timely manner can create tremendous value.