Big Data Storage and Analytics Q&A
Matthew Aslett, research director
2
Webinar Logistics
●  Be on the look-out for polling questions
●  You may ask questions at any time during the presentation by using the
Q&A box
●  ON-Demand Viewers please tweet us questions @cloudianstorage
●  At the end of the presentation please provide feedback and rate us
451 Research is an information
technology research & advisory company
Founded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
12,500+ senior IT professionals in our research community
Over 52 million data points each quarter
4,500+ reports published each year covering 2,000+
innovative technology & service providers
Headquartered in New York City with offices in London,
Boston, San Francisco, and Washington D.C.
451 Research and its sister company Uptime Institute
comprise the two divisions of The 451 Group
Research & Data
Advisory Services
Events
3
Copyright (C) 2015 451 Research LLC
4
Our Speakers
4
Paul Turner leads marketing, product planning and strategy at Cloudian. A storage
industry expert, he joined Cloudian from NetApp where he ran the Product Strategy Office,
guiding their investments into FlashRay,Iongrid and CacheIQ. Paul has more than 23
years of development and management leadership, including 15 years at Oracle.
Matt Aslet, Research Director for the data platforms and analytics research channel, has
overall responsibility for the coverage of operational and analytic databases, data
integration, data quality, and business intelligence. Matt's own primary area of focus is on
relational and non-relational databases - including NoSQL and NewSQL - data warehousing,
data caching, and Hadoop. Matthew is also an expert in open source software and regularly
contributes to 451 Research's open source-related research.
John Kreisa A veteran from the enterprise marketing industry, John has worked on products
at every level of the IT stack from the depths of storage through to the insight of business
intelligence and analytics. Currently John leads partner and strategic marketing initiatives at
open source leader Hortonworks who develops, distributes and supports Apache Hadoop.
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
5
Copyright (C) 2015 451 Research LLC
CAUSE?
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
•  Volume
•  Velocity
•  Variety
EFFECT
6
Copyright (C) 2015 451 Research LLC
CAUSE?
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
•  Volume
•  Velocity
•  Variety
EFFECTEFFECTEDCAUSE
7
Copyright (C) 2015 451 Research LLC
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
•  Volume
•  Velocity
•  Variety
Economics:
•  Commodity hardware
•  Open source software
EFFECTEFFECTEDCAUSE
8
Copyright (C) 2015 451 Research LLC
Big data is driven by economics
9
“Big	
  data	
  is	
  what	
  happened	
  when	
  the	
  cost	
  of	
  keeping	
  informa5on	
  
became	
  less	
  than	
  the	
  cost	
  of	
  throwing	
  it	
  away.”	
  
	
  –	
  George	
  Dyson	
  
“Big	
  data:	
  New	
  business	
  insights	
  based	
  on	
  storing,	
  processing	
  and	
  
analyzing	
  data	
  that	
  was	
  previously	
  ignored	
  due	
  to	
  the	
  cost	
  and	
  
func5onal	
  limita5ons	
  of	
  tradi5onal	
  data	
  management	
  technologies.”	
  
	
  –	
  451	
  Research	
  	
  	
  
Copyright (C) 2015 451 Research LLC
Big data is driven by economics
10
Copyright (C) 2015 451 Research LLC
What	
  happened	
  when	
  the	
  cost	
  of	
  keeping	
  informa5on	
  became	
  less	
  
than	
  the	
  cost	
  of	
  throwing	
  it	
  away?	
  
Big data is driven by economics
11
What	
  happened	
  when	
  the	
  cost	
  of	
  keeping	
  informa5on	
  became	
  less	
  
than	
  the	
  cost	
  of	
  throwing	
  it	
  away?	
  
•  The	
  processing	
  and	
  analysis	
  of	
  very	
  large	
  data	
  sets	
  in	
  their	
  en5rety	
  
•  Increased	
  adop5on	
  of	
  massively	
  parallel	
  processing	
  approaches	
  
•  Storage	
  and	
  analysis	
  of	
  both	
  structured	
  and	
  mul5-­‐structured	
  data	
  
•  Integra5on	
  of	
  external	
  (social)	
  and	
  corporate	
  data	
  for	
  more	
  complete	
  perspec5ve	
  
•  Schema-­‐free	
  and	
  schema-­‐on-­‐read	
  approaches	
  to	
  data	
  storage/analysis	
  
•  Adop5on	
  of	
  exploratory	
  analy5c	
  approaches	
  to	
  iden5fy	
  new	
  paSerns	
  in	
  data	
  
•  Predic5ve	
  analy5cs	
  as	
  a	
  fundamental	
  component	
  of	
  BI	
  strategies	
  
•  Machine-­‐learning	
  algorithms	
  automate	
  the	
  reflec5on	
  of	
  collec5ve	
  intelligence	
  
•  Increased	
  adop5on	
  of	
  in-­‐memory	
  databases	
  for	
  rapid	
  data	
  inges5on	
  
•  Real-­‐5me	
  analysis	
  of	
  data	
  prior	
  to	
  storage	
  within	
  the	
  data	
  warehouse/Hadoop	
  
•  Interac5ve,	
  na5ve,	
  SQL-­‐based	
  analysis	
  of	
  data	
  in	
  Hadoop	
  and	
  HBase	
  
•  Large-­‐scale	
  processing	
  of	
  sensor	
  and	
  other	
  machine-­‐generated	
  data/events	
  
	
  	
   Copyright (C) 2015 451 Research LLC
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
•  Volume
•  Velocity
•  Variety
Economics:
•  Commodity hardware
•  Open source software
EFFECTEFFECTEDCAUSE
12
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
IoT	
  
Copyright (C) 2015 451 Research LLC
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Traditional Analytic Systems Under Pressure
Challenges
•  Constrains data to app
•  Can’t manage new data
•  Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Modern Data Architecture Emerges to Unify Analytics & Data Processing
Modern Data Analytics Architecture
•  Enable applications to have access to
all your enterprise data through an
efficient centralized platform
•  Supported with a centralized
approach analytics, governance,
security and operations
•  Versatile to handle any applications
and datasets no matter the size or
type
Clickstream	
   Web	
  	
  
&	
  Social	
  
Geoloca3on	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
SOURCES
Existing Systems
ERP	
   CRM	
   SCM	
  
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch BatchMP
P	
  
EDW	
  
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Enabling the Data Lake for AnalyticsSCALE
SCOPE
Data Lake Definition
•  Centralized Architecture
Multiple applications on a shared data set
with consistent levels of service
•  Any App, Any Data
Multiple applications accessing all data
affording new insights and opportunities.
•  Unlocks ‘Systems of Insight’
Advanced algorithms and applications
used to derive new value and optimize
existing value.
Drivers:
1.  Cost Optimization
2.  Advanced Analytic Apps
Goal:
•  Centralized Architecture
•  Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
16
Your Data at Webscale Economics
16
HyperStore:	
  	
  SoZware	
  Defined	
  Storage	
  
REPLICATION	
  
	
  (RF=1,2,3,4)	
  
ERASURE	
  CODING	
  
(N+1,2,3,4)	
  
COMPRESSION	
  
(Zlib,lz4)	
  
Commodity	
  Servers	
   Scale	
  Out	
   Durable	
   Simple	
  to	
  Use	
  
CPU	
   Disks	
   Network	
  
	
  	
  	
  
Heterogeneous	
  Node	
  
100TB	
  
300TB	
  
17
Smart Data	
17
Consumer Activity
(Events, GPS, WiFi)
Social MediaDevice Tracking and Logs
Cloudian HyperStore
INTERNET	
  OF	
  THINGS	
  
BIG	
  DATA	
  
Event	
  processing	
  
plaMorm	
  
ü Analyze more – allows for efficient bulk
data analysis in place
ü Faster time-to-decision
ü HyperStore scales out with your data –
adding nodes for I/O
Analytics
Result of Analysis
18
Integration of Cloudian and Hortonworks
18
19
Interoperability : Cloudian & Hortonworks
19
YARN : Data Operating System
Script
Pig
Search
Solr
SQL
Hive/Tez,
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Others
In-Memory
Analytics,
ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
Batch
Map
Reduce
Linux Windows On-Premise Cloud
HDFS
S3 Native File System (URI scheme: s3n)
20
Use Cases
20
Hadoop for Internet of Things
Clickstream data Sentiment data Server log data Sensor data
Analysis of what people click on –
Individual web pages and in what
order.
Clickstream analysis can reveal
how users research products and
also how they complete their
online purchases.
ü  Internet Marketing
ü  Online Commerce
Unstructured data on opinions,
emotions, and attitudes from
sources like social media posts,
blogs, online product reviews and
customer support interactions.
Organizations use sentiment
analysis to understand how the
public feels about something and
track how those opinions change
over time.
ü  Retail
ü  Media & Entertainment
Large enterprises build, manage
and protect their own proprietary,
distributed information networks.
Server logs are the computer-
generated records that report
data on the operations of those
networks.
When there is a problem, its one
of the first places the IT team
looks for a diagnosis.
ü  IT Organizations
ü  Customer Support
From refrigerators and coffee
makers to energy-measuring
smart meters, sensor data is
everywhere. It is created by the
machinery that runs assembly
lines and the cell towers that
route our phone calls.
It is net new data that is
increasing exponential in the
information age.
ü  Manufacturing
ü  Industrial
21
Cloudian Smart Support
21
Thank You!
Matt Aslett
matthew.aslett@451research.com
www.451research.com
@maslett
Paul Turner
pturner@cloudian.com
www.cloudian.com
@CloudianStorage
John Kreisa
john@hortonworks.com
www.hortonworks.com
@Hortonworks

Cloudian 451-hortonworks - webinar

  • 1.
    Big Data Storageand Analytics Q&A Matthew Aslett, research director
  • 2.
    2 Webinar Logistics ●  Beon the look-out for polling questions ●  You may ask questions at any time during the presentation by using the Q&A box ●  ON-Demand Viewers please tweet us questions @cloudianstorage ●  At the end of the presentation please provide feedback and rate us
  • 3.
    451 Research isan information technology research & advisory company Founded in 2000 210+ employees, including over 100 analysts 1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 12,500+ senior IT professionals in our research community Over 52 million data points each quarter 4,500+ reports published each year covering 2,000+ innovative technology & service providers Headquartered in New York City with offices in London, Boston, San Francisco, and Washington D.C. 451 Research and its sister company Uptime Institute comprise the two divisions of The 451 Group Research & Data Advisory Services Events 3 Copyright (C) 2015 451 Research LLC
  • 4.
    4 Our Speakers 4 Paul Turnerleads marketing, product planning and strategy at Cloudian. A storage industry expert, he joined Cloudian from NetApp where he ran the Product Strategy Office, guiding their investments into FlashRay,Iongrid and CacheIQ. Paul has more than 23 years of development and management leadership, including 15 years at Oracle. Matt Aslet, Research Director for the data platforms and analytics research channel, has overall responsibility for the coverage of operational and analytic databases, data integration, data quality, and business intelligence. Matt's own primary area of focus is on relational and non-relational databases - including NoSQL and NewSQL - data warehousing, data caching, and Hadoop. Matthew is also an expert in open source software and regularly contributes to 451 Research's open source-related research. John Kreisa A veteran from the enterprise marketing industry, John has worked on products at every level of the IT stack from the depths of storage through to the insight of business intelligence and analytics. Currently John leads partner and strategic marketing initiatives at open source leader Hortonworks who develops, distributes and supports Apache Hadoop.
  • 5.
    •  Apache Hadoop • Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect 5 Copyright (C) 2015 451 Research LLC CAUSE?
  • 6.
    •  Apache Hadoop • Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect •  Volume •  Velocity •  Variety EFFECT 6 Copyright (C) 2015 451 Research LLC CAUSE?
  • 7.
    •  Apache Hadoop • Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect •  Volume •  Velocity •  Variety EFFECTEFFECTEDCAUSE 7 Copyright (C) 2015 451 Research LLC
  • 8.
    •  Apache Hadoop • Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect •  Volume •  Velocity •  Variety Economics: •  Commodity hardware •  Open source software EFFECTEFFECTEDCAUSE 8 Copyright (C) 2015 451 Research LLC
  • 9.
    Big data isdriven by economics 9 “Big  data  is  what  happened  when  the  cost  of  keeping  informa5on   became  less  than  the  cost  of  throwing  it  away.”    –  George  Dyson   “Big  data:  New  business  insights  based  on  storing,  processing  and   analyzing  data  that  was  previously  ignored  due  to  the  cost  and   func5onal  limita5ons  of  tradi5onal  data  management  technologies.”    –  451  Research       Copyright (C) 2015 451 Research LLC
  • 10.
    Big data isdriven by economics 10 Copyright (C) 2015 451 Research LLC What  happened  when  the  cost  of  keeping  informa5on  became  less   than  the  cost  of  throwing  it  away?  
  • 11.
    Big data isdriven by economics 11 What  happened  when  the  cost  of  keeping  informa5on  became  less   than  the  cost  of  throwing  it  away?   •  The  processing  and  analysis  of  very  large  data  sets  in  their  en5rety   •  Increased  adop5on  of  massively  parallel  processing  approaches   •  Storage  and  analysis  of  both  structured  and  mul5-­‐structured  data   •  Integra5on  of  external  (social)  and  corporate  data  for  more  complete  perspec5ve   •  Schema-­‐free  and  schema-­‐on-­‐read  approaches  to  data  storage/analysis   •  Adop5on  of  exploratory  analy5c  approaches  to  iden5fy  new  paSerns  in  data   •  Predic5ve  analy5cs  as  a  fundamental  component  of  BI  strategies   •  Machine-­‐learning  algorithms  automate  the  reflec5on  of  collec5ve  intelligence   •  Increased  adop5on  of  in-­‐memory  databases  for  rapid  data  inges5on   •  Real-­‐5me  analysis  of  data  prior  to  storage  within  the  data  warehouse/Hadoop   •  Interac5ve,  na5ve,  SQL-­‐based  analysis  of  data  in  Hadoop  and  HBase   •  Large-­‐scale  processing  of  sensor  and  other  machine-­‐generated  data/events       Copyright (C) 2015 451 Research LLC
  • 12.
    •  Apache Hadoop • Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect •  Volume •  Velocity •  Variety Economics: •  Commodity hardware •  Open source software EFFECTEFFECTEDCAUSE 12                     IoT   Copyright (C) 2015 451 Research LLC
  • 13.
    Page 13 ©Hortonworks Inc. 2011 – 2015. All Rights Reserved Traditional Analytic Systems Under Pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 40 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional
  • 14.
    Page 14 ©Hortonworks Inc. 2011 – 2015. All Rights Reserved Modern Data Architecture Emerges to Unify Analytics & Data Processing Modern Data Analytics Architecture •  Enable applications to have access to all your enterprise data through an efficient centralized platform •  Supported with a centralized approach analytics, governance, security and operations •  Versatile to handle any applications and datasets no matter the size or type Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   SOURCES Existing Systems ERP   CRM   SCM   ANALYTICS Data Marts Business Analytics Visualization & Dashboards ANALYTICS Applications Business Analytics Visualization & Dashboards ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) YARN: Data Operating System Interactive Real-TimeBatch Partner ISVBatch BatchMP P   EDW  
  • 15.
    Page 15 ©Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Driver: Enabling the Data Lake for AnalyticsSCALE SCOPE Data Lake Definition •  Centralized Architecture Multiple applications on a shared data set with consistent levels of service •  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities. •  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value. Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps Goal: •  Centralized Architecture •  Data-driven Business DATA LAKE Journey to the Data Lake with Hadoop Systems of Insight
  • 16.
    16 Your Data atWebscale Economics 16 HyperStore:    SoZware  Defined  Storage   REPLICATION    (RF=1,2,3,4)   ERASURE  CODING   (N+1,2,3,4)   COMPRESSION   (Zlib,lz4)   Commodity  Servers   Scale  Out   Durable   Simple  to  Use   CPU   Disks   Network         Heterogeneous  Node   100TB   300TB  
  • 17.
    17 Smart Data 17 Consumer Activity (Events,GPS, WiFi) Social MediaDevice Tracking and Logs Cloudian HyperStore INTERNET  OF  THINGS   BIG  DATA   Event  processing   plaMorm   ü Analyze more – allows for efficient bulk data analysis in place ü Faster time-to-decision ü HyperStore scales out with your data – adding nodes for I/O Analytics Result of Analysis
  • 18.
    18 Integration of Cloudianand Hortonworks 18
  • 19.
    19 Interoperability : Cloudian& Hortonworks 19 YARN : Data Operating System Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm Others In-Memory Analytics, ISV engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Batch Map Reduce Linux Windows On-Premise Cloud HDFS S3 Native File System (URI scheme: s3n)
  • 20.
    20 Use Cases 20 Hadoop forInternet of Things Clickstream data Sentiment data Server log data Sensor data Analysis of what people click on – Individual web pages and in what order. Clickstream analysis can reveal how users research products and also how they complete their online purchases. ü  Internet Marketing ü  Online Commerce Unstructured data on opinions, emotions, and attitudes from sources like social media posts, blogs, online product reviews and customer support interactions. Organizations use sentiment analysis to understand how the public feels about something and track how those opinions change over time. ü  Retail ü  Media & Entertainment Large enterprises build, manage and protect their own proprietary, distributed information networks. Server logs are the computer- generated records that report data on the operations of those networks. When there is a problem, its one of the first places the IT team looks for a diagnosis. ü  IT Organizations ü  Customer Support From refrigerators and coffee makers to energy-measuring smart meters, sensor data is everywhere. It is created by the machinery that runs assembly lines and the cell towers that route our phone calls. It is net new data that is increasing exponential in the information age. ü  Manufacturing ü  Industrial
  • 21.
  • 22.
    Thank You! Matt Aslett matthew.aslett@451research.com www.451research.com @maslett PaulTurner pturner@cloudian.com www.cloudian.com @CloudianStorage John Kreisa john@hortonworks.com www.hortonworks.com @Hortonworks