Grab some coffee and enjoy 
the pre-show banter before 
the top of the hour!
Big Data & SQL: The On-Ramp to Hadoop 
The Briefing Room
Twitter Tag: #briefr 
The Briefing Room 
Welcome 
Host: 
Eric Kavanagh 
eric.kavanagh@bloorgroup.com 
@eric_kavanagh
! Reveal the essential characteristics of enterprise software, 
good and bad 
! Provide a forum for detailed analysis of today’s innovative 
technologies 
! Give vendors a chance to explain their product to savvy 
analysts 
! Allow audience members to pose serious questions... and get 
answers! 
Twitter Tag: #briefr 
The Briefing Room 
Mission
Twitter Tag: #briefr 
The Briefing Room 
Topics 
This Month: BIG DATA 
May: DATABASE 
June: ANALYTICS & MACHINE LEARNING 
2014 Editorial Calendar at 
www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr 
The Briefing Room 
Big Data
Twitter Tag: #briefr 
The Briefing Room 
Analyst: Robin Bloor 
Robin Bloor is 
Chief Analyst at 
The Bloor Group 
robin.bloor@bloorgroup.com 
@robinbloor
Twitter Tag: #briefr 
The Briefing Room 
HP Vertica 
! Vertica was founded in 2005 by Michael Stonebreaker and 
Andrew Palmer; it was acquired by HP in 2011 
! HP Vertica Analytics Platform is a grid-based, column-oriented 
database management system 
! The latest release, Version 7, offers new platform 
components that allow for Hadoop exploration and analysis 
using SQL
Twitter Tag: #briefr 
The Briefing Room 
Guests 
Eamon O Neill, Manager, Product Management, HP Vertica 
Eamon leads the product management efforts for the HP Vertica Analytics 
Platform. He has more than 15 years of high-tech product management 
experience and deep knowledge of mobile applications, software defined 
networking and storage, database marketing, and distributed systems. Eamon 
had a founding role in the creation of the cloud services platform at BladeLogic 
(now BMC Software). In addition to BMC Software, Eamon held product 
management, software engineering, and business consulting roles at Hitachi 
Data Systems, Unica (now IBM), and Cambridge Technology Partners. 
Jeff Healey, Director of Product Marketing, HP Vertica 
Jeff leads the product marketing and customer marketing efforts for the HP 
Vertica Analytics Platform. Jeff has more than 15 years of high-tech marketing 
experience and deep knowledge in messaging, positioning, and content 
development. Jeff previously led product marketing initiatives for Axeda 
Corporation, an M2M platform for sensor data and the Internet of Things. Prior 
to Axeda, Jeff held product marketing, customer marketing, and lead editorial 
roles at The MathWorks, Macromedia (now Adobe), Sybase (now SAP), and The 
Boston Globe.
Big Data & SQL: 
The On-Ramp to 
Hadoop 
With the HP Vertica Analytics Platform 
Jeff Healey, Director of Product Marketing, HP Vertica 
Eamon O’Neill, Director of Product Management, HP Vertica 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
A New Era of Accelerated Innovation 
Forever changing how consumers and businesses interact, enabling new opportunities 
2013 By 2020 
Every 60 seconds 
98,000+ tweets 
695,000 status updates 
11million instant messages 
698,445 Google searches 
168 million+ emails sent 
1,820TB of data created 
217 new mobile web users 
Growing Internet of Things (IoT) 
Pervasive 
Connectivity 
Explosion of 
Information 
Smart Device 
Expansion 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 11 without notice. 
40 Trillion GB(2) 
10 Million(3) 
… for 8 Billion(4) 
(1) IDC Directions 2013: Why the Datacenter of the Future Will Leverage a Converged Infrastructure, March 2013, Matt 
Eastwood ; (2) & (3) IDC Predictions 2012: Competing for 2020, Document 231720, December 2011, Frank Gens; (4) 
http://en.wikipedia.org 
30 Billion(1) 
Devices 
DATA 
Mobile 
Apps
The Time is Now 
CRM ERP Data Warehouse Web Social Log Files Machine Data Semi-structured 
Data Volumes 
Accuracy and Insight 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 12 without notice. 
Dark Data 
Traditional Big Data 
Enterprise Data 
Unstructured
HP Vertica Analytics Platform 
Vertica Flex Zone Vertica Enterprise 
Column Store Optimizer & Execution Engine 
High Availability & Redundancy 
MPP Shared Nothing 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 13 without notice. 
13 
JSON, CEF, Delimited 
Database Designer 
HP ConvergedSystem 300 for Vertica 
Management Console 
HDFS, Hcatalog, Flume, Files 
API & SDK (supports R, C++, Java) 
Time 
Series 
Analytics 
Functions 
Distributed 
R 
SQL 
ODBC 
JDBC 
Search 
Functionality 
Geospatial 
& 
Sentiment 
Key 
Value 
API 
HP BSM & Security 
Community BI Ecosystem 3rd party apps 
Marketplace
The Richest, Most Open SQL on Hadoop 
Challenge: Extracting data from Hadoop requires complex and 
brittle ETL processes 
Solution: Hadoop Navigation and Analytics 
Benefits: 
• Navigate Hadoop data using its native catalog 
• Quickly and easily load native data types from Hadoop to Vertica 
• Avoid creating and maintaining time-consuming schemas 
• Use the full power of HP Vertica SQL and analytics 
• Choose your own Hadoop distribution 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 14 without notice.
The Richest, Most Open SQL on Hadoop 
Challenge: Extracting Data from Hadoop requires complex and 
brittle ETL processes 
Solution: Hadoop Navigation and Analytics 
Benefits: 
• Navigate Hadoop data using its native catalog 
• Quickly and easily load native data types from Hadoop to Vertica 
• Avoid creating and maintaining time-consuming schemas 
• Use the full power of HP Vertica SQL and Analytics 
• Choose your own Hadoop distribution 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 15 without notice.
HP Vertica and MapR Solution 
HP Vertica Analytics Platform on MapR 
Optimized, interactive SQL-on-Hadoop solution for fastest value from big data 
• Complete SQL-on-Hadoop Solution 
• Broader Analytics Capabilities 
• Lower TCO & Manageability 
• Enterprise-Grade Reliability 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 16 without notice.
Most Complete SQL on Hadoop 
HP Vertica on MapR Limited “Query on Hadoop” Options 
• Full interactive ANSI SQL on Hadoop 
• More complete SQL maturity 
• Clients can leverage existing SQL skills 
• Handling complex joins and advanced analytic 
functions, query optimization, and many 
concurrent users 
• Certified integration with BI/visualization 
environments 
• Dynamic handling of mixed workloads 
• Supports a limited subset of HiveQL1 
– 1HiveQL is SQL-like dialect - subset of ANSI SQL 
• HiveQL is not as mature as SQL 
• Requires new skills 
• Immature query optimization for planning 
efficient joins and for processing 
• Onus on customer to integrate with BI/ 
visualization environments 
• Lack of workload management for high 
number of concurrent users 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 17 without notice.
• Analyzing gene 
variants using SNPs 
and Microarray data 
The problem: 
The solution: 
• Hadoop to find the variants 
between a sample sequence 
and a reference genome 
• HP Vertica to determine 
oncology targets 
• Tools: Pipeline Pilot, Spotfire, R 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 18 without notice. 
• Queries went from 5 
hours to 5 minutes 
• Scale to 100s of TB of 
data 
• More experiments => 
faster discoveries! 
The value: 
Accelerating Drug Discovery 
Innovative Healthcare 
Products Company
HP Vertica Flex Zone 
Avoid creating and maintaining time-consuming schemas 
Faster SQL querying 
on semi-structured data 
Auto-schematization 
semi-structured data loading 
Flexible parsers 
for JSON and delimited data 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 19 without notice. 
One-step schema 
for blazing-fast performance 
Load, manage, and explore semi-structured data
Exploring the Value of Dark Data 
Leading online source for health and medical news and information 
Challenges 
• Takes 900 hours per year to 
ingest semi-structured data 
for analysis 
• As requirements change, 
must again “re-structure” the 
data for exploration 
• Must meet ever-increasing 
requests for analytic insight 
in short timeframes 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 20 without notice. 
HP Vertica Flex Zone Solution 
• Slash development time by 
eliminating schema creation 
• Explore data with existing BI/ 
visualization tools for 
maximum insight 
• Operationalize data in one 
single step for fast analytics 
• Focus team on data analysis 
(not wrestling with data 
formats)
Thank you! 
Jeff Healey 
jeff.a.healey@hp.com 
617-386-4591 
Eamon O’Neill 
eamon@hp.com 
617-386 4604 
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Twitter Tag: #briefr 
The Briefing Room 
Perceptions & Questions 
Analyst: 
Robin Bloor
The Data Reservoir 
Robin Bloor, Ph.D.
Hadoop as the Data Reservoir
Big Data and the Data Reservoir
The Workload Paradigm Shift 
u Previously, we viewed 
database workloads as 
an i/o optimization 
problem 
u With analytics the 
workload is a very 
variable mix of i/o and 
calculation 
u No databases were built 
precisely for this – not 
even Big Data databases
A Process, Not an Activity 
u Data analytics is a multi-disciplinary 
end-to-end 
process 
u Until recently it was a 
walled-garden, but the 
walls were torn down by 
• Data availability 
• Scalable technology 
• Open source tools 
u Hadoop has a role here
The Hadoop Ecosystem 
u Even though it may 
not seem so, Hadoop 
is in its infancy 
u Hadoop’s popularity 
guarantees its future 
u Its future is also 
guaranteed by its 
commercial 
ecosystem
u What do you see as the fundamental division of 
workload between Hadoop, Flex Zone and 
Vertica? 
u Which specific components of the Hadoop 
ecosystem do you recommend using? 
u Do you support JSON? If so, for which contexts 
and in what way?
u Is there any special optimization in Vertica 
between query and analytical workloads. 
u Please describe the discovery and definition of 
metadata from Hadoop, through Flex Zone and 
into Vertica 
u Why do you think Hadoop is important from a 
technical perspective?
Twitter Tag: #briefr 
The Briefing Room
This Month: BIG DATA 
May: DATABASE 
June: ANALYTICS & MACHINE LEARNING 
www.insideanalysis.com/webcasts/the-briefing-room 
Twitter Tag: #briefr 
The Briefing Room 
Upcoming Topics 
2014 Editorial Calendar at 
www.insideanalysis.com
Twitter Tag: #briefr 
THANK YOU 
for your 
ATTENTION! 
The Briefing Room

Big Data & SQL: The On-Ramp to Hadoop

  • 1.
    Grab some coffeeand enjoy the pre-show banter before the top of the hour!
  • 2.
    Big Data &SQL: The On-Ramp to Hadoop The Briefing Room
  • 3.
    Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4.
    ! Reveal theessential characteristics of enterprise software, good and bad ! Provide a forum for detailed analysis of today’s innovative technologies ! Give vendors a chance to explain their product to savvy analysts ! Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room Mission
  • 5.
    Twitter Tag: #briefr The Briefing Room Topics This Month: BIG DATA May: DATABASE June: ANALYTICS & MACHINE LEARNING 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
  • 6.
    Twitter Tag: #briefr The Briefing Room Big Data
  • 7.
    Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8.
    Twitter Tag: #briefr The Briefing Room HP Vertica ! Vertica was founded in 2005 by Michael Stonebreaker and Andrew Palmer; it was acquired by HP in 2011 ! HP Vertica Analytics Platform is a grid-based, column-oriented database management system ! The latest release, Version 7, offers new platform components that allow for Hadoop exploration and analysis using SQL
  • 9.
    Twitter Tag: #briefr The Briefing Room Guests Eamon O Neill, Manager, Product Management, HP Vertica Eamon leads the product management efforts for the HP Vertica Analytics Platform. He has more than 15 years of high-tech product management experience and deep knowledge of mobile applications, software defined networking and storage, database marketing, and distributed systems. Eamon had a founding role in the creation of the cloud services platform at BladeLogic (now BMC Software). In addition to BMC Software, Eamon held product management, software engineering, and business consulting roles at Hitachi Data Systems, Unica (now IBM), and Cambridge Technology Partners. Jeff Healey, Director of Product Marketing, HP Vertica Jeff leads the product marketing and customer marketing efforts for the HP Vertica Analytics Platform. Jeff has more than 15 years of high-tech marketing experience and deep knowledge in messaging, positioning, and content development. Jeff previously led product marketing initiatives for Axeda Corporation, an M2M platform for sensor data and the Internet of Things. Prior to Axeda, Jeff held product marketing, customer marketing, and lead editorial roles at The MathWorks, Macromedia (now Adobe), Sybase (now SAP), and The Boston Globe.
  • 10.
    Big Data &SQL: The On-Ramp to Hadoop With the HP Vertica Analytics Platform Jeff Healey, Director of Product Marketing, HP Vertica Eamon O’Neill, Director of Product Management, HP Vertica © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 11.
    A New Eraof Accelerated Innovation Forever changing how consumers and businesses interact, enabling new opportunities 2013 By 2020 Every 60 seconds 98,000+ tweets 695,000 status updates 11million instant messages 698,445 Google searches 168 million+ emails sent 1,820TB of data created 217 new mobile web users Growing Internet of Things (IoT) Pervasive Connectivity Explosion of Information Smart Device Expansion © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 11 without notice. 40 Trillion GB(2) 10 Million(3) … for 8 Billion(4) (1) IDC Directions 2013: Why the Datacenter of the Future Will Leverage a Converged Infrastructure, March 2013, Matt Eastwood ; (2) & (3) IDC Predictions 2012: Competing for 2020, Document 231720, December 2011, Frank Gens; (4) http://en.wikipedia.org 30 Billion(1) Devices DATA Mobile Apps
  • 12.
    The Time isNow CRM ERP Data Warehouse Web Social Log Files Machine Data Semi-structured Data Volumes Accuracy and Insight © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 12 without notice. Dark Data Traditional Big Data Enterprise Data Unstructured
  • 13.
    HP Vertica AnalyticsPlatform Vertica Flex Zone Vertica Enterprise Column Store Optimizer & Execution Engine High Availability & Redundancy MPP Shared Nothing © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 13 without notice. 13 JSON, CEF, Delimited Database Designer HP ConvergedSystem 300 for Vertica Management Console HDFS, Hcatalog, Flume, Files API & SDK (supports R, C++, Java) Time Series Analytics Functions Distributed R SQL ODBC JDBC Search Functionality Geospatial & Sentiment Key Value API HP BSM & Security Community BI Ecosystem 3rd party apps Marketplace
  • 14.
    The Richest, MostOpen SQL on Hadoop Challenge: Extracting data from Hadoop requires complex and brittle ETL processes Solution: Hadoop Navigation and Analytics Benefits: • Navigate Hadoop data using its native catalog • Quickly and easily load native data types from Hadoop to Vertica • Avoid creating and maintaining time-consuming schemas • Use the full power of HP Vertica SQL and analytics • Choose your own Hadoop distribution © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 14 without notice.
  • 15.
    The Richest, MostOpen SQL on Hadoop Challenge: Extracting Data from Hadoop requires complex and brittle ETL processes Solution: Hadoop Navigation and Analytics Benefits: • Navigate Hadoop data using its native catalog • Quickly and easily load native data types from Hadoop to Vertica • Avoid creating and maintaining time-consuming schemas • Use the full power of HP Vertica SQL and Analytics • Choose your own Hadoop distribution © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 15 without notice.
  • 16.
    HP Vertica andMapR Solution HP Vertica Analytics Platform on MapR Optimized, interactive SQL-on-Hadoop solution for fastest value from big data • Complete SQL-on-Hadoop Solution • Broader Analytics Capabilities • Lower TCO & Manageability • Enterprise-Grade Reliability © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 16 without notice.
  • 17.
    Most Complete SQLon Hadoop HP Vertica on MapR Limited “Query on Hadoop” Options • Full interactive ANSI SQL on Hadoop • More complete SQL maturity • Clients can leverage existing SQL skills • Handling complex joins and advanced analytic functions, query optimization, and many concurrent users • Certified integration with BI/visualization environments • Dynamic handling of mixed workloads • Supports a limited subset of HiveQL1 – 1HiveQL is SQL-like dialect - subset of ANSI SQL • HiveQL is not as mature as SQL • Requires new skills • Immature query optimization for planning efficient joins and for processing • Onus on customer to integrate with BI/ visualization environments • Lack of workload management for high number of concurrent users © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 17 without notice.
  • 18.
    • Analyzing gene variants using SNPs and Microarray data The problem: The solution: • Hadoop to find the variants between a sample sequence and a reference genome • HP Vertica to determine oncology targets • Tools: Pipeline Pilot, Spotfire, R © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 18 without notice. • Queries went from 5 hours to 5 minutes • Scale to 100s of TB of data • More experiments => faster discoveries! The value: Accelerating Drug Discovery Innovative Healthcare Products Company
  • 19.
    HP Vertica FlexZone Avoid creating and maintaining time-consuming schemas Faster SQL querying on semi-structured data Auto-schematization semi-structured data loading Flexible parsers for JSON and delimited data © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 19 without notice. One-step schema for blazing-fast performance Load, manage, and explore semi-structured data
  • 20.
    Exploring the Valueof Dark Data Leading online source for health and medical news and information Challenges • Takes 900 hours per year to ingest semi-structured data for analysis • As requirements change, must again “re-structure” the data for exploration • Must meet ever-increasing requests for analytic insight in short timeframes © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 20 without notice. HP Vertica Flex Zone Solution • Slash development time by eliminating schema creation • Explore data with existing BI/ visualization tools for maximum insight • Operationalize data in one single step for fast analytics • Focus team on data analysis (not wrestling with data formats)
  • 21.
    Thank you! JeffHealey jeff.a.healey@hp.com 617-386-4591 Eamon O’Neill eamon@hp.com 617-386 4604 © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • 22.
    Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 23.
    The Data Reservoir Robin Bloor, Ph.D.
  • 24.
    Hadoop as theData Reservoir
  • 25.
    Big Data andthe Data Reservoir
  • 26.
    The Workload ParadigmShift u Previously, we viewed database workloads as an i/o optimization problem u With analytics the workload is a very variable mix of i/o and calculation u No databases were built precisely for this – not even Big Data databases
  • 27.
    A Process, Notan Activity u Data analytics is a multi-disciplinary end-to-end process u Until recently it was a walled-garden, but the walls were torn down by • Data availability • Scalable technology • Open source tools u Hadoop has a role here
  • 28.
    The Hadoop Ecosystem u Even though it may not seem so, Hadoop is in its infancy u Hadoop’s popularity guarantees its future u Its future is also guaranteed by its commercial ecosystem
  • 29.
    u What doyou see as the fundamental division of workload between Hadoop, Flex Zone and Vertica? u Which specific components of the Hadoop ecosystem do you recommend using? u Do you support JSON? If so, for which contexts and in what way?
  • 30.
    u Is thereany special optimization in Vertica between query and analytical workloads. u Please describe the discovery and definition of metadata from Hadoop, through Flex Zone and into Vertica u Why do you think Hadoop is important from a technical perspective?
  • 31.
    Twitter Tag: #briefr The Briefing Room
  • 32.
    This Month: BIGDATA May: DATABASE June: ANALYTICS & MACHINE LEARNING www.insideanalysis.com/webcasts/the-briefing-room Twitter Tag: #briefr The Briefing Room Upcoming Topics 2014 Editorial Calendar at www.insideanalysis.com
  • 33.
    Twitter Tag: #briefr THANK YOU for your ATTENTION! The Briefing Room