SlideShare a Scribd company logo
1 of 37
Download to read offline
© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
Introducing the Total Data Warehouse 
Matthew Aslett 
Research Director, Data Management and Analytics, 451 Research 
© 2014 by The 451 Group. All rights reserved
 Matthew Aslett 
• Research Director, Data Platforms and Analytics 
 matthew.aslett@451research.com 
 www.twitter.com/maslett 
 Responsible for data management 
and analytics research agenda 
 Focus on operational and analytic 
databases, including NoSQL, 
NewSQL, and Hadoop 
© 2014 by The 451 Group. All rights reserved 
 With 451 Research since 2007
© 2014 by The 451 Group. All rights reserved 
Company Overview 
 One company with 3 operating 
divisions 
 Syndicated research, advisory, 
professional services, datacenter 
certification, and events 
 Global focus 
 270+ staff 
 1,500+ client organizations: 
enterprises, vendors, service 
providers, and investment firms 
 Organic and growth through 
acquisition
Hadoop and the data warehouse 
 The rise of Apache Hadoop has been driven largely by demand for more 
flexible approaches to data management and analytics 
 Overcoming the limitations of traditional analytic databases and their adherence 
© 2014 by The 451 Group. All rights reserved 
to strictly defined schema. 
 Hadoop is largely complementary to existing data warehouse deployments 
 However, there is clear evidence that at least some workloads are being 
migrated from existing enterprise data warehouses to Hadoop 
 E.g. Teradata’s CEO noted in October 2013 that, on average, 20% of the 
total ETL workload on Teradata data warehouses could potentially move to 
Hadoop (4‐8% of the total Teradata data warehouse workload) 
 That has driven many people to question the extent to which Hadoop will 
replace the data warehouse
Describe the relationship between Hadoop and the enterprise 
data warehouse within your organization 
Survey conducted: Sept/Oct 2013 
© 2014 by The 451 Group. All rights reserved 
Sample: 98 
Hadoop and the data warehouse 
Hadoop not yet 
used 
Hadoop for 
workloads not 
Hadoop replacing 
Permanently 
migrating 
workloads to 
Hadoop 
EDW 
Temporarily previously on EDW 
offloading 
workloads to 
Hadoop 
 Two‐thirds of 
Hadoop 
engagement is 
currently 
non‐threatening 
or additive to 
existing data 
warehouse 
deployments
Hadoop replacing the data warehouse? 
 Frames the question incorrectly 
 based on an assumption that a ‘data warehouse’ is by default based on an 
© 2014 by The 451 Group. All rights reserved 
analytic relational database 
 A data warehouse as an enterprise platform for storing, processing and 
analyzing data 
 could be based on an analytic database, Hadoop, or a combination of the two 
 Hadoop is primarily used to handle unstructured and semi‐structured data 
 not a good fit – in terms of economics and data formats – for analytic databases 
 The future analytic data‐processing landscape will be a hybrid of analytic 
databases and Hadoop 
 each used where appropriate for the individual analytic use case.
Introducing the Total Data Warehouse 
 There are various phrases used to describe this hybrid landscape 
 in keeping with our ‘Total Data’ terminology, we call this the Total Data Warehouse 
 The primary platforms in a Total Data Warehouse are expected to be analytic 
databases and Hadoop 
 However we also expect to see the Total Data Warehouse comprise other data 
storage and processing platforms 
 Exploratory analytics/discovery platforms 
 Search 
 Graph processing 
 Stream processing 
 Machine learning 
 Log processing 
 NoSQL databases 
 NewSQL databases 
© 2014 by The 451 Group. All rights reserved
 There are various phrases used to describe this hybrid landscape 
 in keeping with our ‘Total Data’ terminology, we call this the Total Data Warehouse 
© 2014 by The 451 Group. All rights reserved 
The Total Data Warehouse 
PRE‐DEFINED 
REPORTING 
AD HOC 
ANALYTICS 
STATISTICAL 
ANALYTICS 
PREDICTIVE 
ANALYTICS 
MACHINE 
LEARNING MAPREDUCE 
SEARCH‐BASED 
ANALYTICS 
GRAPH 
ANALYTICS 
STREAM 
PROCESSING 
MULTI‐STRUCTURED 
DATA 
APPLICATIONS 
OPERATIONAL 
INTELLIGENCE 
NOSQL 
MULTI‐STRUCTURED 
DATA 
APPLICATIONS 
ANALYTIC 
DATABASE 
STRUCTURED DATA 
(NEW) SQL 
DATABASE 
STRUCTURED DATA 
APPLICATIONS 
HADOOP 
DISTRIBUTED 
FILE SYSTEM 
MULTI‐STRUCTURED 
DATA 
YARN 
LOG 
PROCESSING 
EXPLORATORY 
ANALYTICS 
PLATFORM 
MULTI‐STRUCTURED 
DATA
Data gravity and the Total Data Warehouse 
 ‘Data gravity’ suggests that processing resources will migrate to the 
platform that stores the most data, or perhaps the most important data 
 The balance of power is currently with the analytic database 
 However, Hadoop’s flexibility to support data‐processing engines beyond 
MapReduce could tip the balance in its favor in the long term 
 Apache YARN enables multiple versions of MapReduce, and for HDFS to 
support data‐processing frameworks in addition to MapReduce 
 Native SQL analytics 
 Stream processing 
 Graph processing 
 Bulk synchronous parallel computing 
 Machine learning 
 Apache Spark provides an in‐memory platform supporting high‐performance 
processing and multiple data processing engines 
© 2014 by The 451 Group. All rights reserved
Example Total Data Warehouses 
 Teradata’s Unified Data Architecture and QueryGrid ‐ enables querying of 
data in Teradata Database, Aster Database and Hortonworks 
 Pivotal’s Big Data Suite ‐ HD Hadoop distribution/Greenplum 
Database/GemFire distributed data grid and HAWQ SQL‐on‐Hadoop query 
engine 
 Cirro offers a federated approach to performing joins and query processing 
across multiple sources of data including relational database and Hadoop 
 Microsoft PolyBase enables SQL Server 2012 PDW analysts to query data in 
Hadoop using Microsoft’s T‐SQL 
 PolyBase is only available as part of the Microsoft Analytics Platform System 
(APS) 
 APS is an appliance that combines SQL Server 2012 PDW with Microsoft’s 
HDInsight distribution of Apache Hadoop 
 APS is also the only way that customers can adopt SQL Server 2012 PDW data 
warehousing environment 
 For Microsoft at least, Hadoop is an integral part of the next‐generation data 
warehouse 
© 2014 by The 451 Group. All rights reserved
 SQL‐on‐Hadoop engines clearly have a role to play in enabling the 
Total Data Warehouse 
 SQL‐based querying of data in HDFS 
 Federation of queries across multiple data platforms 
 SQL‐on‐Hadoop initiatives exploded in recent years as a means of uniting 
the large army of trained SQL analysts with the flexible data storage and 
processing capabilities of Hadoop 
 But SQL‐on‐Hadoop engines are not created equal 
 Batch SQL‐on‐Hadoop 
 Interactive SQL‐on‐Hadoop 
 SQL‐and‐Hadoop 
 Operational SQL‐on‐Hadoop 
 And the various offerings within those categories are differentiated 
© 2014 by The 451 Group. All rights reserved 
The role of SQL‐on‐Hadoop
Approach Details Examples 
© 2014 by The 451 Group. All rights reserved 
SQL on/and Hadoop 
Batch 
SQL‐on‐Hadoop 
Native SQL‐like processing of 
data in HDFS (via MR/Tez) Hive on MapReduce 
Interactive 
SQL‐on‐Hadoop 
Specialist SQL‐based query 
engine running on Hadoop 
Apache Drill, Cloudera 
Impala, Hive on Tez, 
Spark SQL 
SQL‐and‐Hadoop 
Federated querying of data in 
Hadoop and RDBMS 
Teradata, Microsoft, 
Oracle, IBM 
Operational 
SQL‐on‐Hadoop 
Operational database that 
stores in in HDFS 
Splice Machine, 
Trafodion
Approach Key features 
© 2014 by The 451 Group. All rights reserved 
SQL on Hadoop examples 
Hive on Tez 
Faster native querying than Hive on MapReduce, 
HiveQL compatibility, extreme‐scale data joins 
Apache Drill 
ANSI SQL, Hadoop, MongoDB, Cassandra, Riak, etc; 
consume JSON data, query hierarchical data 
Cloudera Impala High performance ad hoc processing, 
HiveQL compatibility, Parquet file format 
Spark SQL 
In‐memory SQL processing, Catalyst query 
optimizer, replacing Shark (Hive on Spark)
 Hadoop is largely complementary to existing data warehouse deployments 
 The future analytic data‐processing landscape will be a hybrid of analytic 
databases and Hadoop 
 we call this the Total Data Warehouse 
 ‘Data gravity’ suggests that processing resources will migrate to the 
platform that stores the most data, or perhaps the most important data 
 The balance of power is currently with the analytic database 
 Hadoop’s flexibility tip the balance in its favor in the long term 
 SQL‐on‐Hadoop engines clearly have a role to play in enabling the 
Total Data Warehouse 
 But SQL‐on‐Hadoop engines are not created equal 
© 2014 by The 451 Group. All rights reserved 
Conclusion
Questions? Comments? 
matthew.aslett@451research.com 
@maslett 
© 2014 by The 451 Group. All rights reserved
Self Service Data Exploration with Apache Drill 
© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
The MapR Distribution including Apache Hadoop 
© 2014 MapR Technologies 2 
Exponential 
Growth 
500+ 
Customers 
Premier 
Investors 
>2x annual bookings 
90% software licenses 
80% of accounts expand 3X 
< 1% lifetime churn 
> $1B in incremental revenue 
generated by 1 customer 
Big Data 
Riding the Wave with 
Hadoop 
The Big Data 
Platform 
of Choice
The Power of the Open Source Community 
Provisioning 
& 
coordination 
Savannah* 
Workflow 
& Data 
Governance 
Data 
Integration 
& Access 
Hue 
HttpFS 
Flume Knox* Falcon* Whirr 
MapR-FS MapR-DB 
© 2014 MapR Technologies 3 
Management 
APACHE HADOOP AND OSS ECOSYSTEM 
Streaming 
Storm* 
NoSQL & 
Search 
Solr 
MapR Data Platform 
Security 
SQL 
Drill 
Shark 
Impala 
YARN 
Batch 
Spark 
Cascading 
Pig 
Spark 
Streaming 
HBase 
Juju 
ML, Graph 
GraphX 
MLLib 
Mahout 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Tez* 
Accumulo* 
Hive 
Sqoop Sentry* Oozie ZooKeeper 
* Certification/support planned for 2014
© 2014 MapR Technologies 4 
UNSTRUCTURED 
DATA 
Unstructured data will account 
for more than 80% of the data 
collected by organizations 
STRUCTURED DATA 
1980 1990 2000 2010 2020 
Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data 
Total Data Stored
Today’s Data Comes in Different Shapes… 
© 2014 MapR Technologies 5 
Social Media 
Messages 
Audio 
Sensors 
Mobile Data 
Email 
Clickstream
© 2014 MapR Technologies 6 
Distance to Data 
Business 
(analysts, developers) 
“Plumbing” 
MapReduce development 
Business 
(analysts, developers) 
Modeling and 
transformations 
Hive and other 
SQL-on-Hadoop 
Existing approaches 
require a middleman (IT) 
Data 
Data
© 2014 MapR Technologies 7 
Distance to Data 
Business 
(analysts, developers) 
Existing approaches 
require a middleman (IT) 
“Plumbing” 
MapReduce development 
Hive and other 
SQL-on-Hadoop 
Business 
Data Agility (analysts, developers) 
Data 
Data 
Data 
Business 
(analysts, developers) 
Modeling and 
transformations
Improve time to value Redu2ce the burden on IT 
© 2014 MapR Technologies 8 
Why Improve Distance to Data? 
• Enable rapid data exploration and 
application development 
• IT should provide a valuable 
service without “getting in the way” 
• Can’t add DBAs to keep up with 
the exponential data growth 
• Minimize “unnecessary work” so IT 
can focus on value-added 
activities and become a partner to 
the business users
• Pioneering Data Agility for Hadoop 
• Apache open source project 
• Scale-out execution engine for low-latency queries 
• Unified SQL-based API for analytics & operational applications 
© 2014 MapR Technologies 9 
APACHE DRILL 
40+ contributors 
150+ years of experience building 
databases and distributed systems
Evolution Towards Self-Service Data Exploration 
© 2014 MapR Technologies 10 
Data Modeling and 
Transformation 
Data Visualization 
IT-driven 
IT-driven 
IT-driven 
Self-service 
IT-driven 
Self-service 
Not needed 
Self-service 
Traditional BI 
w/ RDBMS 
Self-Service BI 
w/ RDBMS SQL-on-Hadoop 
Self-Service 
Data Exploration 
Zero-day analytics
Optimized Data Architecture Machine Learning 
© 2014 MapR Technologies 11 
MapR Optimized Data Architecture 
Sources 
RELATIONAL, 
SAAS, 
MAINFRAME 
DOCUMENTS, 
EMAILS 
BLOGS, 
TWEETS, 
LINK DATA 
LOG FILES, 
CLICKSTREAMS 
SENSORS 
Streaming 
(Spark Streaming, Storm) 
Batch / Search 
(MR, Spark, Hive, Pig, …) 
NoSQL ODBMS 
(HBase, Accumulo, …) 
MapR Data Platform 
MapR-DB 
MAPR DISTRIBUTION FOR HADOOP 
MapR-FS 
MAPR DISTRIBUTION FOR HADOOP 
DATA WAREHOUSE 
Data Movement 
Data Access 
Analytics 
Search 
Schema-less 
data exploration 
BI, reporting 
Ad-hoc integrated 
analytics 
Data Transformation, Enrichment 
and Integration 
Operational Apps 
Recommendations 
Fraud Detection 
Logistics
© 2014 MapR Technologies 12 
(1) Self-Describing Data is Ubiquitous 
Flat files in DFS 
• Complex data (Thrift, Avro, protobuf) 
• Columnar data (Parquet, ORC) 
• Loosely defined (JSON) 
• Traditional files (CSV, TSV) 
Data stored in NoSQL stores 
• Relational-like (rows, columns) 
• Sparse data (NoSQL maps) 
• Embedded blobs (JSON) 
• Document stores (nested objects) 
{ 
name: { 
first: Michael, 
last: Smith 
}, 
hobbies: [ski, soccer], 
district: Los Altos 
}{ 
name: { 
first: Jennifer, 
last: Gates 
}, 
hobbies: [sing], 
preschool: CCLC 
}
RDBMS/SQL-on-Hadoop table 
Apache Drill table 
© 2014 MapR Technologies 13 
(2) Drill’s Data Model is Flexible 
Fixed schema Schema-less 
HBase 
JSON 
BSON 
CSV 
TSV 
Parquet 
Avro 
Flat 
Complex 
Flexibility 
Flexibility 
Name Gender Age 
Michael M 6 
Jennifer F 3 
{ 
name: { 
first: Michael, 
last: Smith 
}, 
hobbies: [ski, soccer], 
district: Los Altos 
}{ 
name: { 
first: Jennifer, 
last: Gates 
}, 
hobbies: [sing], 
preschool: CCLC 
}
(3) Drill Supports Schema Discovery On-The-Fly 
Schema Declared In Advance Schema2 Discovered On-The-Fly 
© 2014 MapR Technologies 14 
• Fixed schema 
• Leverage schema in centralized 
repository (Hive Metastore) 
• Fixed schema, evolving schema or 
schema-less 
• Leverage schema in centralized 
repository or self-describing data 
SCHEMA ON 
WRITE 
SCHEMA 
BEFORE READ 
SCHEMA ON THE 
FLY
Quick Tour 
Self-Service Data Exploration with Apache Drill 
© ©20 21041 M4 aMpaRp RTe Tcehcnhonloogloiegsies 15
Zero to Results in 2 Minutes (3 Commands) 
$ tar xzf apache-drill.tar.gz 
$ apache-drill/bin/sqlline -u jdbc:drill:zk=local 
0: jdbc:drill:zk=local> 
SELECT count(*) AS incidents, columns[1] AS category 
FROM dfs.`/tmp/SFPD_Incidents_-_Previous_Three_Months.csv` 
GROUP BY columns[1] 
ORDER BY incidents DESC; 
+------------+------------+ 
| incidents | category | 
+------------+------------+ 
| 8372 | LARCENY/THEFT | 
| 4247 | OTHER OFFENSES | 
| 3765 | NON-CRIMINAL | 
| 2502 | ASSAULT | 
... 
35 rows selected (0.847 seconds) 
Install 
Launch shell 
(embedded 
mode) 
Query 
Results 
© 2014 MapR Technologies 16
© 2014 MapR Technologies 17 
A storage engine instance 
- DFS 
- HBase 
- Hive Metastore/HCatalog 
A workspace 
- Sub-directory 
- Hive database 
A table 
- pathnames 
- HBase table 
- Hive table 
Data Source is in the Query 
SELECT timestamp, message 
FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` 
WHERE errorLevel > 2
© 2014 MapR Technologies 18 
Query Directory Trees 
# Query file: How many errors per level in Jan 2014? 
SELECT errorLevel, count(*) 
FROM dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` 
GROUP BY errorLevel; 
# Query directory sub-tree: How many errors per level? 
SELECT errorLevel, count(*) 
FROM dfs.logs.`/AppServerLogs` 
GROUP BY errorLevel; 
# Query some partitions: How many errors per level by month from 2012? 
SELECT errorLevel, count(*) 
FROM dfs.logs.`/AppServerLogs` 
WHERE dirs[1] >= 2012 
GROUP BY errorLevel, dirs[2];
Works with HBase and Embedded Blobs 
# Query an HBase table directly (no schemas) 
SELECT cf1.month, cf1.year 
FROM hbase.table1; 
# Embedded JSON value inside column profileBlob inside column family cf1 of 
the HBase table users 
SELECT profile.name, count(profile.children) 
FROM ( 
SELECT CONVERT_FROM(cf1.profileBlob, 'json') AS profile 
FROM hbase.users 
) 
© 2014 MapR Technologies 19
Combine Data Sources on the Fly 
# Join log directory with JSON file (user profiles) to identify the name and email address for 
anyone associated with an error message. 
SELECT DISTINCT users.name, users.emails.work 
FROM dfs.logs.`/data/logs` logs, 
© 2014 MapR Technologies 20 
dfs.users.`/profiles.json` users 
WHERE logs.uid = users.id AND 
logs.errorLevel > 5; 
# Join a Hive table and an HBase table (without Hive metadata) to determine the number of 
tweets per user 
SELECT users.name, count(*) as tweetCount 
FROM hive.social.tweets tweets, 
hbase.users users 
WHERE tweets.userId = convert_from(users.rowkey, 'UTF-8') 
GROUP BY tweets.userId;
Summary 
• Enable rapid data exploration and application development while 
© 2014 MapR Technologies 21 
reducing the burden on IT 
• Apache Drill 0.5 available now 
• Get involved 
– Download and play: http://incubator.apache.org/drill/ 
– Ask questions: drill-user@incubator.apache.org 
– Contribute: http://github.com/apache/incubator-drill/ 
– Join the Drill team at MapR 
• Email jacques@mapr.com 
• www.mapr.com/careers

More Related Content

What's hot

Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoopOmar Jaber
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2DataWorks Summit
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionDataWorks Summit/Hadoop Summit
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezJan Pieter Posthuma
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Data Con LA
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop WorldCloudera, Inc.
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 

What's hot (20)

Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2Luo june27 1150am_room230_a_v2
Luo june27 1150am_room230_a_v2
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop World
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 

Similar to Future of-hadoop-analytics

Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleHarald Erb
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesJanBask Training
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsMethod360
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseCloudera, Inc.
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 

Similar to Future of-hadoop-analytics (20)

Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Get started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languagesGet started with hadoop hive hive ql languages
Get started with hadoop hive hive ql languages
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
finap ppt conference.pptx
finap ppt conference.pptxfinap ppt conference.pptx
finap ppt conference.pptx
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 

More from MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Future of-hadoop-analytics

  • 1. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
  • 2. Introducing the Total Data Warehouse Matthew Aslett Research Director, Data Management and Analytics, 451 Research © 2014 by The 451 Group. All rights reserved
  • 3.  Matthew Aslett • Research Director, Data Platforms and Analytics  matthew.aslett@451research.com  www.twitter.com/maslett  Responsible for data management and analytics research agenda  Focus on operational and analytic databases, including NoSQL, NewSQL, and Hadoop © 2014 by The 451 Group. All rights reserved  With 451 Research since 2007
  • 4. © 2014 by The 451 Group. All rights reserved Company Overview  One company with 3 operating divisions  Syndicated research, advisory, professional services, datacenter certification, and events  Global focus  270+ staff  1,500+ client organizations: enterprises, vendors, service providers, and investment firms  Organic and growth through acquisition
  • 5. Hadoop and the data warehouse  The rise of Apache Hadoop has been driven largely by demand for more flexible approaches to data management and analytics  Overcoming the limitations of traditional analytic databases and their adherence © 2014 by The 451 Group. All rights reserved to strictly defined schema.  Hadoop is largely complementary to existing data warehouse deployments  However, there is clear evidence that at least some workloads are being migrated from existing enterprise data warehouses to Hadoop  E.g. Teradata’s CEO noted in October 2013 that, on average, 20% of the total ETL workload on Teradata data warehouses could potentially move to Hadoop (4‐8% of the total Teradata data warehouse workload)  That has driven many people to question the extent to which Hadoop will replace the data warehouse
  • 6. Describe the relationship between Hadoop and the enterprise data warehouse within your organization Survey conducted: Sept/Oct 2013 © 2014 by The 451 Group. All rights reserved Sample: 98 Hadoop and the data warehouse Hadoop not yet used Hadoop for workloads not Hadoop replacing Permanently migrating workloads to Hadoop EDW Temporarily previously on EDW offloading workloads to Hadoop  Two‐thirds of Hadoop engagement is currently non‐threatening or additive to existing data warehouse deployments
  • 7. Hadoop replacing the data warehouse?  Frames the question incorrectly  based on an assumption that a ‘data warehouse’ is by default based on an © 2014 by The 451 Group. All rights reserved analytic relational database  A data warehouse as an enterprise platform for storing, processing and analyzing data  could be based on an analytic database, Hadoop, or a combination of the two  Hadoop is primarily used to handle unstructured and semi‐structured data  not a good fit – in terms of economics and data formats – for analytic databases  The future analytic data‐processing landscape will be a hybrid of analytic databases and Hadoop  each used where appropriate for the individual analytic use case.
  • 8. Introducing the Total Data Warehouse  There are various phrases used to describe this hybrid landscape  in keeping with our ‘Total Data’ terminology, we call this the Total Data Warehouse  The primary platforms in a Total Data Warehouse are expected to be analytic databases and Hadoop  However we also expect to see the Total Data Warehouse comprise other data storage and processing platforms  Exploratory analytics/discovery platforms  Search  Graph processing  Stream processing  Machine learning  Log processing  NoSQL databases  NewSQL databases © 2014 by The 451 Group. All rights reserved
  • 9.  There are various phrases used to describe this hybrid landscape  in keeping with our ‘Total Data’ terminology, we call this the Total Data Warehouse © 2014 by The 451 Group. All rights reserved The Total Data Warehouse PRE‐DEFINED REPORTING AD HOC ANALYTICS STATISTICAL ANALYTICS PREDICTIVE ANALYTICS MACHINE LEARNING MAPREDUCE SEARCH‐BASED ANALYTICS GRAPH ANALYTICS STREAM PROCESSING MULTI‐STRUCTURED DATA APPLICATIONS OPERATIONAL INTELLIGENCE NOSQL MULTI‐STRUCTURED DATA APPLICATIONS ANALYTIC DATABASE STRUCTURED DATA (NEW) SQL DATABASE STRUCTURED DATA APPLICATIONS HADOOP DISTRIBUTED FILE SYSTEM MULTI‐STRUCTURED DATA YARN LOG PROCESSING EXPLORATORY ANALYTICS PLATFORM MULTI‐STRUCTURED DATA
  • 10. Data gravity and the Total Data Warehouse  ‘Data gravity’ suggests that processing resources will migrate to the platform that stores the most data, or perhaps the most important data  The balance of power is currently with the analytic database  However, Hadoop’s flexibility to support data‐processing engines beyond MapReduce could tip the balance in its favor in the long term  Apache YARN enables multiple versions of MapReduce, and for HDFS to support data‐processing frameworks in addition to MapReduce  Native SQL analytics  Stream processing  Graph processing  Bulk synchronous parallel computing  Machine learning  Apache Spark provides an in‐memory platform supporting high‐performance processing and multiple data processing engines © 2014 by The 451 Group. All rights reserved
  • 11. Example Total Data Warehouses  Teradata’s Unified Data Architecture and QueryGrid ‐ enables querying of data in Teradata Database, Aster Database and Hortonworks  Pivotal’s Big Data Suite ‐ HD Hadoop distribution/Greenplum Database/GemFire distributed data grid and HAWQ SQL‐on‐Hadoop query engine  Cirro offers a federated approach to performing joins and query processing across multiple sources of data including relational database and Hadoop  Microsoft PolyBase enables SQL Server 2012 PDW analysts to query data in Hadoop using Microsoft’s T‐SQL  PolyBase is only available as part of the Microsoft Analytics Platform System (APS)  APS is an appliance that combines SQL Server 2012 PDW with Microsoft’s HDInsight distribution of Apache Hadoop  APS is also the only way that customers can adopt SQL Server 2012 PDW data warehousing environment  For Microsoft at least, Hadoop is an integral part of the next‐generation data warehouse © 2014 by The 451 Group. All rights reserved
  • 12.  SQL‐on‐Hadoop engines clearly have a role to play in enabling the Total Data Warehouse  SQL‐based querying of data in HDFS  Federation of queries across multiple data platforms  SQL‐on‐Hadoop initiatives exploded in recent years as a means of uniting the large army of trained SQL analysts with the flexible data storage and processing capabilities of Hadoop  But SQL‐on‐Hadoop engines are not created equal  Batch SQL‐on‐Hadoop  Interactive SQL‐on‐Hadoop  SQL‐and‐Hadoop  Operational SQL‐on‐Hadoop  And the various offerings within those categories are differentiated © 2014 by The 451 Group. All rights reserved The role of SQL‐on‐Hadoop
  • 13. Approach Details Examples © 2014 by The 451 Group. All rights reserved SQL on/and Hadoop Batch SQL‐on‐Hadoop Native SQL‐like processing of data in HDFS (via MR/Tez) Hive on MapReduce Interactive SQL‐on‐Hadoop Specialist SQL‐based query engine running on Hadoop Apache Drill, Cloudera Impala, Hive on Tez, Spark SQL SQL‐and‐Hadoop Federated querying of data in Hadoop and RDBMS Teradata, Microsoft, Oracle, IBM Operational SQL‐on‐Hadoop Operational database that stores in in HDFS Splice Machine, Trafodion
  • 14. Approach Key features © 2014 by The 451 Group. All rights reserved SQL on Hadoop examples Hive on Tez Faster native querying than Hive on MapReduce, HiveQL compatibility, extreme‐scale data joins Apache Drill ANSI SQL, Hadoop, MongoDB, Cassandra, Riak, etc; consume JSON data, query hierarchical data Cloudera Impala High performance ad hoc processing, HiveQL compatibility, Parquet file format Spark SQL In‐memory SQL processing, Catalyst query optimizer, replacing Shark (Hive on Spark)
  • 15.  Hadoop is largely complementary to existing data warehouse deployments  The future analytic data‐processing landscape will be a hybrid of analytic databases and Hadoop  we call this the Total Data Warehouse  ‘Data gravity’ suggests that processing resources will migrate to the platform that stores the most data, or perhaps the most important data  The balance of power is currently with the analytic database  Hadoop’s flexibility tip the balance in its favor in the long term  SQL‐on‐Hadoop engines clearly have a role to play in enabling the Total Data Warehouse  But SQL‐on‐Hadoop engines are not created equal © 2014 by The 451 Group. All rights reserved Conclusion
  • 16. Questions? Comments? matthew.aslett@451research.com @maslett © 2014 by The 451 Group. All rights reserved
  • 17. Self Service Data Exploration with Apache Drill © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
  • 18. The MapR Distribution including Apache Hadoop © 2014 MapR Technologies 2 Exponential Growth 500+ Customers Premier Investors >2x annual bookings 90% software licenses 80% of accounts expand 3X < 1% lifetime churn > $1B in incremental revenue generated by 1 customer Big Data Riding the Wave with Hadoop The Big Data Platform of Choice
  • 19. The Power of the Open Source Community Provisioning & coordination Savannah* Workflow & Data Governance Data Integration & Access Hue HttpFS Flume Knox* Falcon* Whirr MapR-FS MapR-DB © 2014 MapR Technologies 3 Management APACHE HADOOP AND OSS ECOSYSTEM Streaming Storm* NoSQL & Search Solr MapR Data Platform Security SQL Drill Shark Impala YARN Batch Spark Cascading Pig Spark Streaming HBase Juju ML, Graph GraphX MLLib Mahout MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Tez* Accumulo* Hive Sqoop Sentry* Oozie ZooKeeper * Certification/support planned for 2014
  • 20. © 2014 MapR Technologies 4 UNSTRUCTURED DATA Unstructured data will account for more than 80% of the data collected by organizations STRUCTURED DATA 1980 1990 2000 2010 2020 Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data Total Data Stored
  • 21. Today’s Data Comes in Different Shapes… © 2014 MapR Technologies 5 Social Media Messages Audio Sensors Mobile Data Email Clickstream
  • 22. © 2014 MapR Technologies 6 Distance to Data Business (analysts, developers) “Plumbing” MapReduce development Business (analysts, developers) Modeling and transformations Hive and other SQL-on-Hadoop Existing approaches require a middleman (IT) Data Data
  • 23. © 2014 MapR Technologies 7 Distance to Data Business (analysts, developers) Existing approaches require a middleman (IT) “Plumbing” MapReduce development Hive and other SQL-on-Hadoop Business Data Agility (analysts, developers) Data Data Data Business (analysts, developers) Modeling and transformations
  • 24. Improve time to value Redu2ce the burden on IT © 2014 MapR Technologies 8 Why Improve Distance to Data? • Enable rapid data exploration and application development • IT should provide a valuable service without “getting in the way” • Can’t add DBAs to keep up with the exponential data growth • Minimize “unnecessary work” so IT can focus on value-added activities and become a partner to the business users
  • 25. • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications © 2014 MapR Technologies 9 APACHE DRILL 40+ contributors 150+ years of experience building databases and distributed systems
  • 26. Evolution Towards Self-Service Data Exploration © 2014 MapR Technologies 10 Data Modeling and Transformation Data Visualization IT-driven IT-driven IT-driven Self-service IT-driven Self-service Not needed Self-service Traditional BI w/ RDBMS Self-Service BI w/ RDBMS SQL-on-Hadoop Self-Service Data Exploration Zero-day analytics
  • 27. Optimized Data Architecture Machine Learning © 2014 MapR Technologies 11 MapR Optimized Data Architecture Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS BLOGS, TWEETS, LINK DATA LOG FILES, CLICKSTREAMS SENSORS Streaming (Spark Streaming, Storm) Batch / Search (MR, Spark, Hive, Pig, …) NoSQL ODBMS (HBase, Accumulo, …) MapR Data Platform MapR-DB MAPR DISTRIBUTION FOR HADOOP MapR-FS MAPR DISTRIBUTION FOR HADOOP DATA WAREHOUSE Data Movement Data Access Analytics Search Schema-less data exploration BI, reporting Ad-hoc integrated analytics Data Transformation, Enrichment and Integration Operational Apps Recommendations Fraud Detection Logistics
  • 28. © 2014 MapR Technologies 12 (1) Self-Describing Data is Ubiquitous Flat files in DFS • Complex data (Thrift, Avro, protobuf) • Columnar data (Parquet, ORC) • Loosely defined (JSON) • Traditional files (CSV, TSV) Data stored in NoSQL stores • Relational-like (rows, columns) • Sparse data (NoSQL maps) • Embedded blobs (JSON) • Document stores (nested objects) { name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos }{ name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC }
  • 29. RDBMS/SQL-on-Hadoop table Apache Drill table © 2014 MapR Technologies 13 (2) Drill’s Data Model is Flexible Fixed schema Schema-less HBase JSON BSON CSV TSV Parquet Avro Flat Complex Flexibility Flexibility Name Gender Age Michael M 6 Jennifer F 3 { name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos }{ name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC }
  • 30. (3) Drill Supports Schema Discovery On-The-Fly Schema Declared In Advance Schema2 Discovered On-The-Fly © 2014 MapR Technologies 14 • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  • 31. Quick Tour Self-Service Data Exploration with Apache Drill © ©20 21041 M4 aMpaRp RTe Tcehcnhonloogloiegsies 15
  • 32. Zero to Results in 2 Minutes (3 Commands) $ tar xzf apache-drill.tar.gz $ apache-drill/bin/sqlline -u jdbc:drill:zk=local 0: jdbc:drill:zk=local> SELECT count(*) AS incidents, columns[1] AS category FROM dfs.`/tmp/SFPD_Incidents_-_Previous_Three_Months.csv` GROUP BY columns[1] ORDER BY incidents DESC; +------------+------------+ | incidents | category | +------------+------------+ | 8372 | LARCENY/THEFT | | 4247 | OTHER OFFENSES | | 3765 | NON-CRIMINAL | | 2502 | ASSAULT | ... 35 rows selected (0.847 seconds) Install Launch shell (embedded mode) Query Results © 2014 MapR Technologies 16
  • 33. © 2014 MapR Technologies 17 A storage engine instance - DFS - HBase - Hive Metastore/HCatalog A workspace - Sub-directory - Hive database A table - pathnames - HBase table - Hive table Data Source is in the Query SELECT timestamp, message FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` WHERE errorLevel > 2
  • 34. © 2014 MapR Technologies 18 Query Directory Trees # Query file: How many errors per level in Jan 2014? SELECT errorLevel, count(*) FROM dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` GROUP BY errorLevel; # Query directory sub-tree: How many errors per level? SELECT errorLevel, count(*) FROM dfs.logs.`/AppServerLogs` GROUP BY errorLevel; # Query some partitions: How many errors per level by month from 2012? SELECT errorLevel, count(*) FROM dfs.logs.`/AppServerLogs` WHERE dirs[1] >= 2012 GROUP BY errorLevel, dirs[2];
  • 35. Works with HBase and Embedded Blobs # Query an HBase table directly (no schemas) SELECT cf1.month, cf1.year FROM hbase.table1; # Embedded JSON value inside column profileBlob inside column family cf1 of the HBase table users SELECT profile.name, count(profile.children) FROM ( SELECT CONVERT_FROM(cf1.profileBlob, 'json') AS profile FROM hbase.users ) © 2014 MapR Technologies 19
  • 36. Combine Data Sources on the Fly # Join log directory with JSON file (user profiles) to identify the name and email address for anyone associated with an error message. SELECT DISTINCT users.name, users.emails.work FROM dfs.logs.`/data/logs` logs, © 2014 MapR Technologies 20 dfs.users.`/profiles.json` users WHERE logs.uid = users.id AND logs.errorLevel > 5; # Join a Hive table and an HBase table (without Hive metadata) to determine the number of tweets per user SELECT users.name, count(*) as tweetCount FROM hive.social.tweets tweets, hbase.users users WHERE tweets.userId = convert_from(users.rowkey, 'UTF-8') GROUP BY tweets.userId;
  • 37. Summary • Enable rapid data exploration and application development while © 2014 MapR Technologies 21 reducing the burden on IT • Apache Drill 0.5 available now • Get involved – Download and play: http://incubator.apache.org/drill/ – Ask questions: drill-user@incubator.apache.org – Contribute: http://github.com/apache/incubator-drill/ – Join the Drill team at MapR • Email jacques@mapr.com • www.mapr.com/careers