SlideShare a Scribd company logo
1 of 24
© 2015 MapR Technologies 1© 2015 MapR Technologies
© 2015 MapR Technologies 2
About me:
• Systems Engineer at MapR Technologies in the Nordic countries
• 20 years experience:
– IBM (Sweden)
• MDM, Data Governance, Information Server, Data Stage
– Teradata (Sweden)
• Developing Warehouses
– Informix / IBM (US)
• Advisory Software Developer at IBM in the IBM Informix datablades group.
• Visionary Ambassador
© 2015 MapR Technologies 3
• Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics & operational applications
APACHE DRILL
40+ contributors
150+ years of experience building
databases and distributed systems
© 2015 MapR Technologies 4
The Power of the Open Source Community
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Spark
Streaming
Storm
StreamingNoSQL &
Search
Juju
Provisioning
&
Coordination
Sahara
ML, Graph
Mahout
MLLib
GraphX
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow
& Data
Governance
Pig
Cascading
Spark
Batch
MapReduce
v1 & v2
Tez
HBase
Solr
Hive
Impala
Spark SQL
Drill
SQL
Sentry Oozie ZooKeeperSqoop
Flume
Data
Integration
& Access
HttpFS
Hue
Data PlatformMapR-FS MapR-DB
Management
© 2015 MapR Technologies 5
Day-zero analytics & rapid application development
© 2015 MapR Technologies 6
Philosophy question: What is high performance?
• Example 1:
– 2 Month of preparation: 2 seconds of execution
• Example 2:
– 10 minutes preparation, 5 hours execution
• Example 3:
– 2 days of preparation, 20 days of execution
• Apache Drill is about getting to the data quicker, and with less
effort.
• Will it always be quicker?
– It depends how you compare.
© 2015 MapR Technologies 7
Performance is about data management
• How quickly can you incorporate new data
• What types of prep or ETL you need to do
• What you can do without having to read data & calculate statistics
– Remember, many users never read all their data
• Time to first result
• Time to last result
• Managing Scarce Resources
• Concurrency
© 2015 MapR Technologies 8®
© 2014 MapR Technologies 2© 2014 MapR Technologies
®
“Drill isn’t just about SQL-on-Hadoop. It’s about SQL-on-
pretty-much-anything, immediately, and without formality.”
-Andrew Brust, GigaOM Research, Dec 2014
© 2015 MapR Technologies 9
®
© 2014 MapR Technologies 9
Flexibility
your tool should
be flexible…
so you don’t have to be
© 2015 MapR Technologies 10
Apache Drill: Self Service SQL for Big data
AGILITY
INSTANT INSIGHTS TO BIG DATA
FLEXIBILITY
ONE INTERFACE
FOR HADOOP & NOSQL
FAMILIARITY
EXISTING SKILLS &
TECHNOLOGIES
• Direct queries on self
describing data
• No schemas or ETL
required
• Query HBase and
other NoSQL stores
• Use SQL to natively
operate on complex
data types (such as
JSON)
• Leverage ANSI SQL
skills and BI tools
• Plug-n-play with Hive
schema, file formats,
UDF’s
© 2015 MapR Technologies 11
One SQL Interface for All Data Formats
ANSI SQL queries on structured and semi-structured data
UNSTRUCTURED
DATA
STRUCTURED DATA
2000 20101990 2020
Unstructured data will
account for more than 80%
of the data collected by
organizations
Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data
TotalDataStored
Existing
SQL
Engines
Apache
Drill
Self-Service
Data
Exploration
IT-Driven BI
Self-Service BI
SQL Options for Analytics
© 2015 MapR Technologies 12
Governed
Approach
Agility by Reducing Distance to Data
Short analytic life cycles with no upfront schema creation and management
Hadoop Data Schema Design Transformation
Data Movement
(optional)
Users
Hadoop Data Users
New Business Questions
Total Time to Value: Weeks to Months
Total Time to Value: Minutes
Exploratory
Approach
Data Preparation
New Business Questions
Drill enables the
“As-It-Happens” business
with instant SQL analytics
on complex data
© 2015 MapR Technologies 13
Apache Drill - Delivering New Capabilities for BI
Agility&BusinessValue
Technology Enablement for BI
IT-Driven BI
Self-Service BI
Self-Service
Data Exploration
IT-Driven BI IT-Driven BI
Self-Service BI
Analyst-driven with
no IT dependency
Analyst-driven with
IT support for ETL
IT-created
reports, spreadsheets
1980s -1990s 2000s Now
© 2015 MapR Technologies 14
Self-Service Data Exploration
Direct access to Hadoop data from familiar BI / Analytics tools- ANSI SQL compatible
Ad-hoc
Reporting
Queries
Raw Data
Exploration
Day Zero
queries
…
© 2015 MapR Technologies 15
Drill Supports Schema Discovery On-The-Fly
• Fixed schema
• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or
schema-less
• Leverage schema in centralized
repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON
WRITE
SCHEMA
BEFORE READ
SCHEMA ON THE
FLY
© 2015 MapR Technologies 16
© 2015 MapR Technologies 17
Apache Drill:
Built for semi-structured data
© 2015 MapR Technologies 18
Flexibility in how you describe your data
• Drill doesn’t require schema, detects file types based on
– extensions
– magic bytes (e.g. PAR1) – systems settings
• Query can be planned on any file, anywhere
• Data types are determined as data arrives
• Some formats have known schema
– If they don’t, you can expose them as such through views
– Views are simply JSON files that define view SQL
© 2015 MapR Technologies 19
Familiarity
• Integrates with standard BI
Tools through ANSI SQL
and API’s to provide
access to:
• Files,
• HBase,
• Hive
• <Whatever>
• Hive UDF’s can be used
directly in Drill
© 2015 MapR Technologies 20
Real life examples (terminal recordings)
• Show a simple use case to explore a more complex json (twitter-data
collected using Flume
– http://showterm.io/4a770a2f02f956a4b7e42
– http://showterm.io/0f203718698ee3f12e532
• A simple use case of accessing a csv file using drill
– http://showterm.io/2a807546708217772a115
• Easy extraction of data from Hbase using Drill
– http://showterm.io/0666429adb0d673f676bd
• A simple example of using home made integrations (JDBC):
– http://showterm.io/f1c8a632bc8dd78e33a7d
© 2015 MapR Technologies 21
Real life examples (terminal recordings) ctd.
• How to access views, and show the explain plan of the view:
– http://showterm.io/b521276d7d829f70e09db
• A simple offloading case where data is taken from a MySQL and
brought into Parquet:
– http://showterm.io/c8046f815c82868a7a64f
• A simple join between CSV, Parquet, or MySQL:
– http://showterm.io/07b83c10a4696849e04f3
© 2015 MapR Technologies 22
• Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics & operational applications
APACHE DRILL
40+ contributors
150+ years of experience building
databases and distributed systems
© 2015 MapR Technologies 23
More information and resources
• Apache Foundation project:
– http://drill.apache.org
• MapR Drill sandbox downloads and more:
– http://www.mapr.com/products/drill
• Demo of online retailer use case with Apache Drill:
– https://www.youtube.com/watch?v=FkcegazNuio
© 2015 MapR Technologies 24
Q&A
@mapr maprtech
mpierre@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot

Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAdam Doyle
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
Presto Strata London 2019: Cost-Based Optimizer for interactive SQL on anything
Presto Strata London 2019: Cost-Based Optimizer for interactive SQL on anythingPresto Strata London 2019: Cost-Based Optimizer for interactive SQL on anything
Presto Strata London 2019: Cost-Based Optimizer for interactive SQL on anythingPiotr Findeisen
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ ZooskCloudera, Inc.
 
Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineDataWorks Summit
 
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Big Data Spain
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016StampedeCon
 
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016StampedeCon
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Alex Zeltov
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...Databricks
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark Summit
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion EngineAdam Doyle
 

What's hot (20)

Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Presto Strata London 2019: Cost-Based Optimizer for interactive SQL on anything
Presto Strata London 2019: Cost-Based Optimizer for interactive SQL on anythingPresto Strata London 2019: Cost-Based Optimizer for interactive SQL on anything
Presto Strata London 2019: Cost-Based Optimizer for interactive SQL on anything
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data Pipeline
 
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
 
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016
 
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 

Viewers also liked (7)

Critical Human Rights Priorities for the GOSL
Critical Human Rights Priorities for the GOSLCritical Human Rights Priorities for the GOSL
Critical Human Rights Priorities for the GOSL
 
JG Fort Burgoyne Jan2015PRESENTATION
JG Fort Burgoyne Jan2015PRESENTATIONJG Fort Burgoyne Jan2015PRESENTATION
JG Fort Burgoyne Jan2015PRESENTATION
 
MvonBrandis_CV
MvonBrandis_CVMvonBrandis_CV
MvonBrandis_CV
 
Options for Sri Lanka post HRC Resolution in March 2014
Options for Sri Lanka post HRC Resolution in March 2014Options for Sri Lanka post HRC Resolution in March 2014
Options for Sri Lanka post HRC Resolution in March 2014
 
THE LIBERATION TIGERS LTTE AT A GLANCE
THE LIBERATION TIGERS LTTE AT A GLANCETHE LIBERATION TIGERS LTTE AT A GLANCE
THE LIBERATION TIGERS LTTE AT A GLANCE
 
Modern War lecture
Modern War lectureModern War lecture
Modern War lecture
 
Possible Basis for GOSL ban on LTTE Front Organizations and Persons affiliate...
Possible Basis for GOSL ban on LTTE Front Organizations and Persons affiliate...Possible Basis for GOSL ban on LTTE Front Organizations and Persons affiliate...
Possible Basis for GOSL ban on LTTE Front Organizations and Persons affiliate...
 

Similar to Self-Service BI for big data applications using Apache Drill (Big Data Amsterdam v2.0, 2015-05-13)

Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataSenturus
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Datafreshdatabos
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integrationibi
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionMapR Technologies
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 

Similar to Self-Service BI for big data applications using Apache Drill (Big Data Amsterdam v2.0, 2015-05-13) (20)

Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 

Recently uploaded

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Recently uploaded (20)

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

Self-Service BI for big data applications using Apache Drill (Big Data Amsterdam v2.0, 2015-05-13)

  • 1. © 2015 MapR Technologies 1© 2015 MapR Technologies
  • 2. © 2015 MapR Technologies 2 About me: • Systems Engineer at MapR Technologies in the Nordic countries • 20 years experience: – IBM (Sweden) • MDM, Data Governance, Information Server, Data Stage – Teradata (Sweden) • Developing Warehouses – Informix / IBM (US) • Advisory Software Developer at IBM in the IBM Informix datablades group. • Visionary Ambassador
  • 3. © 2015 MapR Technologies 3 • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications APACHE DRILL 40+ contributors 150+ years of experience building databases and distributed systems
  • 4. © 2015 MapR Technologies 4 The Power of the Open Source Community APACHE HADOOP AND OSS ECOSYSTEM Security YARN Spark Streaming Storm StreamingNoSQL & Search Juju Provisioning & Coordination Sahara ML, Graph Mahout MLLib GraphX EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Pig Cascading Spark Batch MapReduce v1 & v2 Tez HBase Solr Hive Impala Spark SQL Drill SQL Sentry Oozie ZooKeeperSqoop Flume Data Integration & Access HttpFS Hue Data PlatformMapR-FS MapR-DB Management
  • 5. © 2015 MapR Technologies 5 Day-zero analytics & rapid application development
  • 6. © 2015 MapR Technologies 6 Philosophy question: What is high performance? • Example 1: – 2 Month of preparation: 2 seconds of execution • Example 2: – 10 minutes preparation, 5 hours execution • Example 3: – 2 days of preparation, 20 days of execution • Apache Drill is about getting to the data quicker, and with less effort. • Will it always be quicker? – It depends how you compare.
  • 7. © 2015 MapR Technologies 7 Performance is about data management • How quickly can you incorporate new data • What types of prep or ETL you need to do • What you can do without having to read data & calculate statistics – Remember, many users never read all their data • Time to first result • Time to last result • Managing Scarce Resources • Concurrency
  • 8. © 2015 MapR Technologies 8® © 2014 MapR Technologies 2© 2014 MapR Technologies ® “Drill isn’t just about SQL-on-Hadoop. It’s about SQL-on- pretty-much-anything, immediately, and without formality.” -Andrew Brust, GigaOM Research, Dec 2014
  • 9. © 2015 MapR Technologies 9 ® © 2014 MapR Technologies 9 Flexibility your tool should be flexible… so you don’t have to be
  • 10. © 2015 MapR Technologies 10 Apache Drill: Self Service SQL for Big data AGILITY INSTANT INSIGHTS TO BIG DATA FLEXIBILITY ONE INTERFACE FOR HADOOP & NOSQL FAMILIARITY EXISTING SKILLS & TECHNOLOGIES • Direct queries on self describing data • No schemas or ETL required • Query HBase and other NoSQL stores • Use SQL to natively operate on complex data types (such as JSON) • Leverage ANSI SQL skills and BI tools • Plug-n-play with Hive schema, file formats, UDF’s
  • 11. © 2015 MapR Technologies 11 One SQL Interface for All Data Formats ANSI SQL queries on structured and semi-structured data UNSTRUCTURED DATA STRUCTURED DATA 2000 20101990 2020 Unstructured data will account for more than 80% of the data collected by organizations Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data TotalDataStored Existing SQL Engines Apache Drill Self-Service Data Exploration IT-Driven BI Self-Service BI SQL Options for Analytics
  • 12. © 2015 MapR Technologies 12 Governed Approach Agility by Reducing Distance to Data Short analytic life cycles with no upfront schema creation and management Hadoop Data Schema Design Transformation Data Movement (optional) Users Hadoop Data Users New Business Questions Total Time to Value: Weeks to Months Total Time to Value: Minutes Exploratory Approach Data Preparation New Business Questions Drill enables the “As-It-Happens” business with instant SQL analytics on complex data
  • 13. © 2015 MapR Technologies 13 Apache Drill - Delivering New Capabilities for BI Agility&BusinessValue Technology Enablement for BI IT-Driven BI Self-Service BI Self-Service Data Exploration IT-Driven BI IT-Driven BI Self-Service BI Analyst-driven with no IT dependency Analyst-driven with IT support for ETL IT-created reports, spreadsheets 1980s -1990s 2000s Now
  • 14. © 2015 MapR Technologies 14 Self-Service Data Exploration Direct access to Hadoop data from familiar BI / Analytics tools- ANSI SQL compatible Ad-hoc Reporting Queries Raw Data Exploration Day Zero queries …
  • 15. © 2015 MapR Technologies 15 Drill Supports Schema Discovery On-The-Fly • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data 2Schema Discovered On-The-FlySchema Declared In Advance SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  • 16. © 2015 MapR Technologies 16
  • 17. © 2015 MapR Technologies 17 Apache Drill: Built for semi-structured data
  • 18. © 2015 MapR Technologies 18 Flexibility in how you describe your data • Drill doesn’t require schema, detects file types based on – extensions – magic bytes (e.g. PAR1) – systems settings • Query can be planned on any file, anywhere • Data types are determined as data arrives • Some formats have known schema – If they don’t, you can expose them as such through views – Views are simply JSON files that define view SQL
  • 19. © 2015 MapR Technologies 19 Familiarity • Integrates with standard BI Tools through ANSI SQL and API’s to provide access to: • Files, • HBase, • Hive • <Whatever> • Hive UDF’s can be used directly in Drill
  • 20. © 2015 MapR Technologies 20 Real life examples (terminal recordings) • Show a simple use case to explore a more complex json (twitter-data collected using Flume – http://showterm.io/4a770a2f02f956a4b7e42 – http://showterm.io/0f203718698ee3f12e532 • A simple use case of accessing a csv file using drill – http://showterm.io/2a807546708217772a115 • Easy extraction of data from Hbase using Drill – http://showterm.io/0666429adb0d673f676bd • A simple example of using home made integrations (JDBC): – http://showterm.io/f1c8a632bc8dd78e33a7d
  • 21. © 2015 MapR Technologies 21 Real life examples (terminal recordings) ctd. • How to access views, and show the explain plan of the view: – http://showterm.io/b521276d7d829f70e09db • A simple offloading case where data is taken from a MySQL and brought into Parquet: – http://showterm.io/c8046f815c82868a7a64f • A simple join between CSV, Parquet, or MySQL: – http://showterm.io/07b83c10a4696849e04f3
  • 22. © 2015 MapR Technologies 22 • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications APACHE DRILL 40+ contributors 150+ years of experience building databases and distributed systems
  • 23. © 2015 MapR Technologies 23 More information and resources • Apache Foundation project: – http://drill.apache.org • MapR Drill sandbox downloads and more: – http://www.mapr.com/products/drill • Demo of online retailer use case with Apache Drill: – https://www.youtube.com/watch?v=FkcegazNuio
  • 24. © 2015 MapR Technologies 24 Q&A @mapr maprtech mpierre@mapr.com Engage with us! MapR maprtech mapr-technologies

Editor's Notes

  1. The power of MapR begins with the power of open source innovation and community participation. In some cases MapR leads the community in projects like Apache Mahout (machine learning) or Apache Drill (SQL on Hadoop) In other areas, MapR contributes, integrates Apache and other open source software (OSS) projects into the MapR distribution, delivering a more reliable and performant system with lower overall TCO and easier system management. MapR releases a new version with the latest OSS innovations on a monthly basis. We add 2-4 new Apache projects annually as new projects become production ready and based on customer demand.
  2. What is the source of this data growth? While structured data growth has been relatively modest, the growth in unstructured data has been exponential. Source of statistic: http://link.springer.com/chapter/10.1007/978-3-642-39146-0_2
  3. Need in standard data architecture shapes/icons
  4. TODO: Add Impala and Splunk logos