SlideShare a Scribd company logo
© 2014 MapR Technologies 1© 2014 MapR Technologies
The Future of Hadoop: Data Agility
© 2014 MapR Technologies 2
Data is doubling in
size every two years
© 2014 MapR Technologies 3
44 ZETTABYTES
4.4 ZETTABYTES
2011 2013
1.8 ZETTABYTES
IDC estimates that in 2020,
there will be 44 zettabytes
of data in the world
2020
Source: IDC Digital Universe
© 2014 MapR Technologies 4
UNSTRUCTURED
DATA
STRUCTURED DATA
1980 2000 20101990 2020
Unstructured data will account
for more than 80% of the data
collected by organizations
Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data
TotalDataStored
© 2014 MapR Technologies 5
Unstructured Data is Ubiquitous
Social Media
Messages
Audio
Sensors
Mobile Data
Email
Clickstream
© 2014 MapR Technologies 6
Hadoop Adoption is Exploding
JOB TRENDS FROM INDEED.COM
Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
© 2014 MapR Technologies 7
The MapR Distribution for Hadoop
Best Product
Exponential
Growth
3X bookings Q1 ‘13 – Q1 ‘14
80% of accounts expand 3X
90% software licenses
<1% lifetime churn
>$1B in incremental revenue
generated by 1 customer
500+
CustomersBig Data
Riding the Wave with
Hadoop
The Big Data
Platform
of Choice
© 2014 MapR Technologies 8
360° Customer View
5PB
CUSTOMER DATA
© 2014 MapR Technologies 9PEOPLE
1.2B
PEOPLE
Largest Biometric Database in the World
© 2014 MapR Technologies 10© 2014 MapR Technologies
The Future of Hadoop: Data Agility
© 2014 MapR Technologies 11
Distance to Data
Business
(analysts, developers)
“Plumbing”
development
MapReduce
Business
(analysts, developers)
Modeling and
transformations
Hive and other
SQL-on-Hadoop
Existing approaches
require a middleman (IT)
Data
Data
© 2014 MapR Technologies 12
Real-World Data Modeling and Transformations
© 2014 MapR Technologies 13
© 2014 MapR Technologies 14
Distance to Data
Business
(analysts, developers)
“Plumbing”
development
MapReduce
Hive and other
SQL-on-Hadoop
Business
(analysts, developers)Data Agility
Existing approaches
require a middleman (IT)
Data
Data
Data
Business
(analysts, developers)
Modeling and
transformations
© 2014 MapR Technologies 15
Why Improve Distance to Data?
• Enable rapid data exploration and
application development
• IT should provide a valuable
service without “getting in the way”
• Can’t add DBAs to keep up with
the exponential data growth
• Minimize “unnecessary work” so IT
can focus on value-added
activities and become a partner to
the business users
2Reduce the burden on ITImprove time to value
© 2014 MapR Technologies 16
• Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics & operational applications
APACHE DRILL
40+ contributors
150+ years of experience building
databases and distributed systems
© 2014 MapR Technologies 17
Evolution Towards Self-Service Data Exploration
Data Modeling and
Transformation
Data Visualization
IT-driven
IT-driven
IT-driven
Self-service
IT-driven
Self-service
Not needed
Self-service
Traditional BI
w/ RDBMS
Self-Service BI
w/ RDBMS
SQL-on-Hadoop
Self-Service
Data Exploration
Zero-day analytics
© 2014 MapR Technologies 18
(1) Self-Describing Data is Ubiquitous
Flat files in DFS
• Complex data (Thrift, Avro, protobuf)
• Columnar data (Parquet, ORC)
• Loosely defined (JSON)
• Traditional files (CSV, TSV)
Data stored in NoSQL stores
• Relational-like (rows, columns)
• Sparse data (NoSQL maps)
• Embedded blobs (JSON)
• Document stores (nested objects)
{
name: {
first: Michael,
last: Smith
},
hobbies: [ski, soccer],
district: Los Altos
}
{
name: {
first: Jennifer,
last: Gates
},
hobbies: [sing],
preschool: CCLC
}
© 2014 MapR Technologies 19
(2) Drill’s Data Model is Flexible
HBase
JSON
BSON
CSV
TSV
Parquet
Avro
Schema-lessFixed schema
Flat
Complex
Flexibility
Flexibility
Name Gender Age
Michael M 6
Jennifer F 3
{
name: {
first: Michael,
last: Smith
},
hobbies: [ski, soccer],
district: Los Altos
}
{
name: {
first: Jennifer,
last: Gates
},
hobbies: [sing],
preschool: CCLC
}
RDBMS/SQL-on-Hadoop table
Apache Drill table
© 2014 MapR Technologies 20
(3) Drill Supports Schema Discovery On-The-Fly
• Fixed schema
• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or
schema-less
• Leverage schema in centralized
repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON
WRITE
SCHEMA
BEFORE READ
SCHEMA ON THE
FLY
© 2014 MapR Technologies 21© 2014 MapR Technologies
Quick Tour
Self-Service Data Exploration with Apache Drill
© 2014 MapR Technologies 22
• d
© 2014 MapR Technologies 23
Zero to Results in 2 Minutes (3 Commands)
$ tar xzf apache-drill.tar.gz
$ apache-drill/bin/sqlline -u jdbc:drill:zk=local
0: jdbc:drill:zk=local>
SELECT count(*) AS incidents, columns[1] AS category
FROM dfs.`/tmp/SFPD_Incidents_-_Previous_Three_Months.csv`
GROUP BY columns[1]
ORDER BY incidents DESC;
+------------+------------+
| incidents | category |
+------------+------------+
| 8372 | LARCENY/THEFT |
| 4247 | OTHER OFFENSES |
| 3765 | NON-CRIMINAL |
| 2502 | ASSAULT |
...
35 rows selected (0.847 seconds)
Install
Launch shell
(embedded
mode)
Query
Results
© 2014 MapR Technologies 24
A storage engine instance
- DFS
- HBase
- Hive Metastore/HCatalog
A workspace
- Sub-directory
- Hive database
A table
- pathnames
- HBase table
- Hive table
Data Source is in the Query
SELECT timestamp, message
FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet`
WHERE errorLevel > 2
© 2014 MapR Technologies 25
Query Directory Trees
# Query file: How many errors per level in Jan 2014?
SELECT errorLevel, count(*)
FROM dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet`
GROUP BY errorLevel;
# Query directory sub-tree: How many errors per level?
SELECT errorLevel, count(*)
FROM dfs.logs.`/AppServerLogs`
GROUP BY errorLevel;
# Query some partitions: How many errors per level by month from 2012?
SELECT errorLevel, count(*)
FROM dfs.logs.`/AppServerLogs`
WHERE dirs[1] >= 2012
GROUP BY errorLevel, dirs[2];
© 2014 MapR Technologies 26
Works with HBase and Embedded Blobs
# Query an HBase table directly (no schemas)
SELECT cf1.month, cf1.year
FROM hbase.table1;
# Embedded JSON value inside column profileBlob inside column family cf1 of
the HBase table users
SELECT profile.name, count(profile.children)
FROM (
SELECT CONVERT_FROM(cf1.profileBlob, 'json') AS profile
FROM hbase.users
)
© 2014 MapR Technologies 27
Combine Data Sources on the Fly
# Join log directory with JSON file (user profiles) to identify the name and email address for
anyone associated with an error message.
SELECT DISTINCT users.name, users.emails.work
FROM dfs.logs.`/data/logs` logs,
dfs.users.`/profiles.json` users
WHERE logs.uid = users.id AND
logs.errorLevel > 5;
# Join a Hive table and an HBase table (without Hive metadata) to determine the number of
tweets per user
SELECT users.name, count(*) as tweetCount
FROM hive.social.tweets tweets,
hbase.users users
WHERE tweets.userId = convert_from(users.rowkey, 'UTF-8')
GROUP BY tweets.userId;
© 2014 MapR Technologies 28
Summary
• Enable rapid data exploration and application development while
reducing the burden on IT
• Apache Drill beta coming soon
– Email tshiran@mapr.com
• Get involved
– Download and play: http://incubator.apache.org/drill/
– Ask questions: drill-user@incubator.apache.org
– Contribute: http://github.com/apache/incubator-drill/
© 2014 MapR Technologies 29
Thank You
@mapr maprtech
tshiran@mapr.com
Tomer Shiran, VP Product Management
MapRTechnologies
maprtech
mapr-technologies

More Related Content

What's hot

Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
mcsrivas
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
David Kaiser
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
Yahoo Developer Network
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
Lokesh Ramaswamy
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
Yahoo Developer Network
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Stanley Wang
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
rightsize
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
 
Dchug m7-30 apr2013
Dchug m7-30 apr2013Dchug m7-30 apr2013
Dchug m7-30 apr2013
jdfiori
 

What's hot (20)

Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop
Hadoop Hadoop
Hadoop
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Dchug m7-30 apr2013
Dchug m7-30 apr2013Dchug m7-30 apr2013
Dchug m7-30 apr2013
 

Viewers also liked

Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on Hadoop
DataWorks Summit
 
AWS Startup Challenge Presentation
AWS Startup Challenge PresentationAWS Startup Challenge Presentation
AWS Startup Challenge PresentationRoman Stanek
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Dataconomy Media
 
Transform Unstructured Data Into Relevant Data with IBM StoredIQ
Transform Unstructured Data Into Relevant Data with IBM StoredIQTransform Unstructured Data Into Relevant Data with IBM StoredIQ
Transform Unstructured Data Into Relevant Data with IBM StoredIQ
Perficient, Inc.
 
Lighten Your Data Center TCO With “Helium” Storage Solutions
Lighten Your Data Center TCO With “Helium” Storage SolutionsLighten Your Data Center TCO With “Helium” Storage Solutions
Lighten Your Data Center TCO With “Helium” Storage SolutionsHGST Storage
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias Storage
Fran Navarro
 
Graymeta C4 use case, Deduplication
Graymeta C4 use case, DeduplicationGraymeta C4 use case, Deduplication
Graymeta C4 use case, Deduplication
ETCenter
 
Hands On with the Unity 5 Game Engine! - Andy Touch - Codemotion Roma 2015
Hands On with the Unity 5 Game Engine! - Andy Touch - Codemotion Roma 2015Hands On with the Unity 5 Game Engine! - Andy Touch - Codemotion Roma 2015
Hands On with the Unity 5 Game Engine! - Andy Touch - Codemotion Roma 2015
Codemotion
 
Armado y-reparacion-de-pc
Armado y-reparacion-de-pcArmado y-reparacion-de-pc
Armado y-reparacion-de-pcJose Vidal
 
Jose f diaz sca casa corazon
Jose f diaz sca casa corazonJose f diaz sca casa corazon
Dateien per Drag & Drop in APEX Applikationen ablegen.
Dateien per Drag & Drop in APEX Applikationen ablegen.Dateien per Drag & Drop in APEX Applikationen ablegen.
Dateien per Drag & Drop in APEX Applikationen ablegen.
MT AG
 
Interim Management at Damen Shipyards Galati
Interim Management at Damen Shipyards GalatiInterim Management at Damen Shipyards Galati
Interim Management at Damen Shipyards GalatiDeRuiter
 
Plan de Marketing de FM Group
Plan de Marketing de FM GroupPlan de Marketing de FM Group
Plan de Marketing de FM Group
FMgroup Bcn
 
Sub1 G1 Gestor
Sub1 G1 GestorSub1 G1 Gestor
Sub1 G1 Gestor
cpedocentic
 
Segurinfo 2010 Neosecure
Segurinfo 2010   NeosecureSegurinfo 2010   Neosecure
Segurinfo 2010 Neosecure
Magmaeventos
 
Internet + web social
Internet + web socialInternet + web social
Internet + web social
cubitos98
 
International Investment projects_Establishing a Turkish Restaurant to Korea
International Investment projects_Establishing a Turkish Restaurant to KoreaInternational Investment projects_Establishing a Turkish Restaurant to Korea
International Investment projects_Establishing a Turkish Restaurant to Korea
Haeyoung Jang
 
Fase de planificación Virtual Group E-learning
Fase de planificación Virtual Group E-learningFase de planificación Virtual Group E-learning
Fase de planificación Virtual Group E-learning
Cristian Basurto
 
Un autre monde - Eine andere Welt (Vortrag Blended Learning Tele-Tandem M1 2014)
Un autre monde - Eine andere Welt (Vortrag Blended Learning Tele-Tandem M1 2014)Un autre monde - Eine andere Welt (Vortrag Blended Learning Tele-Tandem M1 2014)
Un autre monde - Eine andere Welt (Vortrag Blended Learning Tele-Tandem M1 2014)Stephanie WOESSNER
 

Viewers also liked (20)

Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on Hadoop
 
AWS Startup Challenge Presentation
AWS Startup Challenge PresentationAWS Startup Challenge Presentation
AWS Startup Challenge Presentation
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Transform Unstructured Data Into Relevant Data with IBM StoredIQ
Transform Unstructured Data Into Relevant Data with IBM StoredIQTransform Unstructured Data Into Relevant Data with IBM StoredIQ
Transform Unstructured Data Into Relevant Data with IBM StoredIQ
 
Lighten Your Data Center TCO With “Helium” Storage Solutions
Lighten Your Data Center TCO With “Helium” Storage SolutionsLighten Your Data Center TCO With “Helium” Storage Solutions
Lighten Your Data Center TCO With “Helium” Storage Solutions
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias Storage
 
Graymeta C4 use case, Deduplication
Graymeta C4 use case, DeduplicationGraymeta C4 use case, Deduplication
Graymeta C4 use case, Deduplication
 
Hands On with the Unity 5 Game Engine! - Andy Touch - Codemotion Roma 2015
Hands On with the Unity 5 Game Engine! - Andy Touch - Codemotion Roma 2015Hands On with the Unity 5 Game Engine! - Andy Touch - Codemotion Roma 2015
Hands On with the Unity 5 Game Engine! - Andy Touch - Codemotion Roma 2015
 
Armado y-reparacion-de-pc
Armado y-reparacion-de-pcArmado y-reparacion-de-pc
Armado y-reparacion-de-pc
 
Jose f diaz sca casa corazon
Jose f diaz sca casa corazonJose f diaz sca casa corazon
Jose f diaz sca casa corazon
 
Dateien per Drag & Drop in APEX Applikationen ablegen.
Dateien per Drag & Drop in APEX Applikationen ablegen.Dateien per Drag & Drop in APEX Applikationen ablegen.
Dateien per Drag & Drop in APEX Applikationen ablegen.
 
Interim Management at Damen Shipyards Galati
Interim Management at Damen Shipyards GalatiInterim Management at Damen Shipyards Galati
Interim Management at Damen Shipyards Galati
 
Plan de Marketing de FM Group
Plan de Marketing de FM GroupPlan de Marketing de FM Group
Plan de Marketing de FM Group
 
Sub1 G1 Gestor
Sub1 G1 GestorSub1 G1 Gestor
Sub1 G1 Gestor
 
Acerca del-amor
Acerca del-amorAcerca del-amor
Acerca del-amor
 
Segurinfo 2010 Neosecure
Segurinfo 2010   NeosecureSegurinfo 2010   Neosecure
Segurinfo 2010 Neosecure
 
Internet + web social
Internet + web socialInternet + web social
Internet + web social
 
International Investment projects_Establishing a Turkish Restaurant to Korea
International Investment projects_Establishing a Turkish Restaurant to KoreaInternational Investment projects_Establishing a Turkish Restaurant to Korea
International Investment projects_Establishing a Turkish Restaurant to Korea
 
Fase de planificación Virtual Group E-learning
Fase de planificación Virtual Group E-learningFase de planificación Virtual Group E-learning
Fase de planificación Virtual Group E-learning
 
Un autre monde - Eine andere Welt (Vortrag Blended Learning Tele-Tandem M1 2014)
Un autre monde - Eine andere Welt (Vortrag Blended Learning Tele-Tandem M1 2014)Un autre monde - Eine andere Welt (Vortrag Blended Learning Tele-Tandem M1 2014)
Un autre monde - Eine andere Welt (Vortrag Blended Learning Tele-Tandem M1 2014)
 

Similar to The Future of Hadoop: MapR VP of Product Management, Tomer Shiran

Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
BigDataEverywhere
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
MapR Technologies
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
Senturus
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
tshiran
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache Drill
MapR Technologies
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
MapR Technologies
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
DataWorks Summit
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
IRJET Journal
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
Inside Analysis
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDataWorks Summit
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
AshishRathore72
 
2014 08-20-pit-hug
2014 08-20-pit-hug2014 08-20-pit-hug
2014 08-20-pit-hug
Andy Pernsteiner
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
MapR Technologies
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
Impetus Technologies
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
Asis Mohanty
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Mats Uddenfeldt
 

Similar to The Future of Hadoop: MapR VP of Product Management, Tomer Shiran (20)

Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache Drill
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
2014 08-20-pit-hug
2014 08-20-pit-hug2014 08-20-pit-hug
2014 08-20-pit-hug
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 

More from MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
MapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
MapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
 

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

The Future of Hadoop: MapR VP of Product Management, Tomer Shiran

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies The Future of Hadoop: Data Agility
  • 2. © 2014 MapR Technologies 2 Data is doubling in size every two years
  • 3. © 2014 MapR Technologies 3 44 ZETTABYTES 4.4 ZETTABYTES 2011 2013 1.8 ZETTABYTES IDC estimates that in 2020, there will be 44 zettabytes of data in the world 2020 Source: IDC Digital Universe
  • 4. © 2014 MapR Technologies 4 UNSTRUCTURED DATA STRUCTURED DATA 1980 2000 20101990 2020 Unstructured data will account for more than 80% of the data collected by organizations Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data TotalDataStored
  • 5. © 2014 MapR Technologies 5 Unstructured Data is Ubiquitous Social Media Messages Audio Sensors Mobile Data Email Clickstream
  • 6. © 2014 MapR Technologies 6 Hadoop Adoption is Exploding JOB TRENDS FROM INDEED.COM Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
  • 7. © 2014 MapR Technologies 7 The MapR Distribution for Hadoop Best Product Exponential Growth 3X bookings Q1 ‘13 – Q1 ‘14 80% of accounts expand 3X 90% software licenses <1% lifetime churn >$1B in incremental revenue generated by 1 customer 500+ CustomersBig Data Riding the Wave with Hadoop The Big Data Platform of Choice
  • 8. © 2014 MapR Technologies 8 360° Customer View 5PB CUSTOMER DATA
  • 9. © 2014 MapR Technologies 9PEOPLE 1.2B PEOPLE Largest Biometric Database in the World
  • 10. © 2014 MapR Technologies 10© 2014 MapR Technologies The Future of Hadoop: Data Agility
  • 11. © 2014 MapR Technologies 11 Distance to Data Business (analysts, developers) “Plumbing” development MapReduce Business (analysts, developers) Modeling and transformations Hive and other SQL-on-Hadoop Existing approaches require a middleman (IT) Data Data
  • 12. © 2014 MapR Technologies 12 Real-World Data Modeling and Transformations
  • 13. © 2014 MapR Technologies 13
  • 14. © 2014 MapR Technologies 14 Distance to Data Business (analysts, developers) “Plumbing” development MapReduce Hive and other SQL-on-Hadoop Business (analysts, developers)Data Agility Existing approaches require a middleman (IT) Data Data Data Business (analysts, developers) Modeling and transformations
  • 15. © 2014 MapR Technologies 15 Why Improve Distance to Data? • Enable rapid data exploration and application development • IT should provide a valuable service without “getting in the way” • Can’t add DBAs to keep up with the exponential data growth • Minimize “unnecessary work” so IT can focus on value-added activities and become a partner to the business users 2Reduce the burden on ITImprove time to value
  • 16. © 2014 MapR Technologies 16 • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications APACHE DRILL 40+ contributors 150+ years of experience building databases and distributed systems
  • 17. © 2014 MapR Technologies 17 Evolution Towards Self-Service Data Exploration Data Modeling and Transformation Data Visualization IT-driven IT-driven IT-driven Self-service IT-driven Self-service Not needed Self-service Traditional BI w/ RDBMS Self-Service BI w/ RDBMS SQL-on-Hadoop Self-Service Data Exploration Zero-day analytics
  • 18. © 2014 MapR Technologies 18 (1) Self-Describing Data is Ubiquitous Flat files in DFS • Complex data (Thrift, Avro, protobuf) • Columnar data (Parquet, ORC) • Loosely defined (JSON) • Traditional files (CSV, TSV) Data stored in NoSQL stores • Relational-like (rows, columns) • Sparse data (NoSQL maps) • Embedded blobs (JSON) • Document stores (nested objects) { name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos } { name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC }
  • 19. © 2014 MapR Technologies 19 (2) Drill’s Data Model is Flexible HBase JSON BSON CSV TSV Parquet Avro Schema-lessFixed schema Flat Complex Flexibility Flexibility Name Gender Age Michael M 6 Jennifer F 3 { name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos } { name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC } RDBMS/SQL-on-Hadoop table Apache Drill table
  • 20. © 2014 MapR Technologies 20 (3) Drill Supports Schema Discovery On-The-Fly • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data 2Schema Discovered On-The-FlySchema Declared In Advance SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  • 21. © 2014 MapR Technologies 21© 2014 MapR Technologies Quick Tour Self-Service Data Exploration with Apache Drill
  • 22. © 2014 MapR Technologies 22 • d
  • 23. © 2014 MapR Technologies 23 Zero to Results in 2 Minutes (3 Commands) $ tar xzf apache-drill.tar.gz $ apache-drill/bin/sqlline -u jdbc:drill:zk=local 0: jdbc:drill:zk=local> SELECT count(*) AS incidents, columns[1] AS category FROM dfs.`/tmp/SFPD_Incidents_-_Previous_Three_Months.csv` GROUP BY columns[1] ORDER BY incidents DESC; +------------+------------+ | incidents | category | +------------+------------+ | 8372 | LARCENY/THEFT | | 4247 | OTHER OFFENSES | | 3765 | NON-CRIMINAL | | 2502 | ASSAULT | ... 35 rows selected (0.847 seconds) Install Launch shell (embedded mode) Query Results
  • 24. © 2014 MapR Technologies 24 A storage engine instance - DFS - HBase - Hive Metastore/HCatalog A workspace - Sub-directory - Hive database A table - pathnames - HBase table - Hive table Data Source is in the Query SELECT timestamp, message FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` WHERE errorLevel > 2
  • 25. © 2014 MapR Technologies 25 Query Directory Trees # Query file: How many errors per level in Jan 2014? SELECT errorLevel, count(*) FROM dfs.logs.`/AppServerLogs/2014/Jan/part0001.parquet` GROUP BY errorLevel; # Query directory sub-tree: How many errors per level? SELECT errorLevel, count(*) FROM dfs.logs.`/AppServerLogs` GROUP BY errorLevel; # Query some partitions: How many errors per level by month from 2012? SELECT errorLevel, count(*) FROM dfs.logs.`/AppServerLogs` WHERE dirs[1] >= 2012 GROUP BY errorLevel, dirs[2];
  • 26. © 2014 MapR Technologies 26 Works with HBase and Embedded Blobs # Query an HBase table directly (no schemas) SELECT cf1.month, cf1.year FROM hbase.table1; # Embedded JSON value inside column profileBlob inside column family cf1 of the HBase table users SELECT profile.name, count(profile.children) FROM ( SELECT CONVERT_FROM(cf1.profileBlob, 'json') AS profile FROM hbase.users )
  • 27. © 2014 MapR Technologies 27 Combine Data Sources on the Fly # Join log directory with JSON file (user profiles) to identify the name and email address for anyone associated with an error message. SELECT DISTINCT users.name, users.emails.work FROM dfs.logs.`/data/logs` logs, dfs.users.`/profiles.json` users WHERE logs.uid = users.id AND logs.errorLevel > 5; # Join a Hive table and an HBase table (without Hive metadata) to determine the number of tweets per user SELECT users.name, count(*) as tweetCount FROM hive.social.tweets tweets, hbase.users users WHERE tweets.userId = convert_from(users.rowkey, 'UTF-8') GROUP BY tweets.userId;
  • 28. © 2014 MapR Technologies 28 Summary • Enable rapid data exploration and application development while reducing the burden on IT • Apache Drill beta coming soon – Email tshiran@mapr.com • Get involved – Download and play: http://incubator.apache.org/drill/ – Ask questions: drill-user@incubator.apache.org – Contribute: http://github.com/apache/incubator-drill/
  • 29. © 2014 MapR Technologies 29 Thank You @mapr maprtech tshiran@mapr.com Tomer Shiran, VP Product Management MapRTechnologies maprtech mapr-technologies

Editor's Notes

  1. Have someone introduce me. Thank audience (tie to morning activities), sponsors, HP, etc. We’re here because this is the biggest thing that has happened to Hadoop…
  2. Here at the conference we’re talking about data science. But before we can appreciate the changes happening in data science, we must first talk about Data. Data is doubling every two years. The fast growing volume, variety and velocity of data is overwhelming traditional systems and approaches. A revolutionary approach is required to leverage this data. And with this new technology, Data science as we know, is undergoing tremendous change.
  3. To give you a sense of the data volumes that we’re talking about, I’ve included this chart that shows why a revolutionary approach is needed. You can see the amount of data growth moving from 1.8 Zettabytes to 44 Zettabytes in just over 5 years. To put this into perspective a large datawarehouse contains terabytes of data. A zettabye is 1 billion terabytes. Numbers in chart are from two IDC reports (sponsored by emc). http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm
  4. What is the source of this data growth? While structured data growth has been relatively modest, the growth in unstructured data has been exponential. Source of statistic: http://link.springer.com/chapter/10.1007/978-3-642-39146-0_2
  5. sensor data, social media, clickstream, genomic data, location information, video files, etc.
  6. The system that is enabling this growth in data capture is Hadoop.
  7. We are proud/fortunate that Forrester has named MapR as the best Hadoop distribution in the market.
  8. 8
  9. 9
  10. Many organizations now want to unlock the data in Hadoop and make it accessible to a broader audience within their organizations. That’s easier said than done. While we’ve largely solved the infrastructure scalability challenge, the massive volume, variety and velocity of this data introduces serious challenges on the human side, such as how to prepare all that data and make it available to users, how to make operational data available in real-time for analytics, etc. We need better technology to empower users to take advantage of these massive volumes of data. Past: Enable organizations to capture the data. Future: Enable organizations to more easily extract value from all this captured data. What does the future of Hadoop look like? The problem I’m sure many of you have experienced this (just like the quotes) Why we want to solve it Here’s what we’re doing about it
  11. One of the challenges with Hadoop as well as traditional data management tools is the business user’s “distance from the data”. The dependency on IT (or additional development) increases time to value and reduces agility. It also creates a burden on IT at a time when IT is already overworked. The red arrows in this illustration can represent significant backlogs and delays (often many months). Many of you are likely having to spend a lot of time on plumbing development and data preparation. How many have had to do this? (show hand)
  12. “Data modeling and transformations” may seem easy, but when you look at a real-world environment, you could have thousands of data sets.
  13. Opportunity
  14. This is the opportunity. The audience should feel like this is their chance to become heroes by bringing this to their companies. They have to feel (be emotional) about the problem at this point.
  15. IT-driven = months of delay, unnecessary work (data is no longer relevant, etc.) The so-what needs to be conveyed. Why does it matter that it’s not needed. 6 months -> 3 months -> 3 months -> day zero So imagine now what you can get… Data Agility is needed for Business Agility >>> Stand still during slide, move in at the punchline (why does this matter to YOU)
  16. Need an example or analogy to explain self-describing data.
  17. All SQL engines (traditional or SQL-on-Hadoop) view tables as spreadsheet-like data structures with rows and columns. All records have the same structure, and there is no support for nested data or repeating fields. Drill views tables conceptually as collections of JSON (with additional types) documents. Each record can have a different structure (hence, schema-less). This is revolutionary and has never been done before. If you consider the four data models shown in the 2x2, all models can be represented by the complex, no schema model (JSON) because it is the most flexible. However, no other data model can be represented by the flat, fixed schema model. Therefore, when using any SQL engine except Drill, the data has to be transformed before it can be available to queries.
  18. TODO: Add Impala and Splunk logos
  19. What I want you to see now is how easy is it to ….
  20. Is there something from Israel?
  21. With other technologies you have to do this, then this, then this, …
  22. Key takeaways Core message – We are revolutionizing Hadoop Call to action – get involved, and enjoy the conference as we have great speakers If doing Q&A, set boundaries (time - how much time we have, topic – what questions can I answer about this revolution), back pocket question (someone asked me this morning) -