© 2016 MapR Technologies© 2016 MapR Technologies
5.2 Product Update
MapR Product Mgmt & Product Marketing
Aug 17, 2016
© 2016 MapR Technologies
Today’s Presenters
Sameer Nori
Sr. Product Marketing Manager
Prashant Rathi
Sr. Product Manager
Ian Downard
Technical Marketing Engineer
Balaji Mohanam
Product Manager
© 2016 MapR Technologies 3
Today’s Agenda
• Recent Product Announcements
• The Spyglass Initiative & Demo
• MapR Ecosystem Pack(MEP)
• Spark and Drill updates
© 2016 MapR Technologies 4
The MapR Converged Data Platform
© 2016 MapR Technologies 5
Recent Product Announcements
• Quick Start Solution focused on Risk Management for Financial
Services – July 16
• Enterprise-Grade Spark Distribution – June 16
• Quick Start Migration Service – May 16
• Stream Processing On-Demand Training(ODT) – Apr 16
• Apache Drill 1.6 – Mar 16
© 2016 MapR Technologies 6
Four Big Themes in the 5.2 Release
Major new features
• MapR-DB
JSON Table replication
Binary Elastic Search v2.x support
Drill DB JSON improvements
• Streams
Performant Spark Streaming
Stream Admin APIs
Easier Management
• Spyglass : deep visibility across cluster ops
Deep visibility
Search across metrics and logs
Full control
customizable , sharable dashboards
Extensible
• Various Graphical Installer improvements
Community Innovation
• MapR Eco Pack 1.0
Supportability and Stability
Currency and Commitment to SLA
Easy deployment and upgrade
Customer requested features
• POSIX : HardLink and StatFS feature
• Fast Failover for client
• Fuse Client performance
• Rack Reliability for data placement
enhancement
• File Client Impersonation enhancements
© 2016 MapR Technologies 7
5.2 Ecosystem Support
These are the only component version changes in MEP 1.0 from 5.2 release date
and all of these have been out for 5.1 already.
Eco on 5.1 today MEP 1.0 on 5.2
Component Released with 5.1
Subsequently released for
5.1
Drill 1.4 1.6 1.6
Spark 1.5.2 1.6.1 1.6.1
Impala 2.2.0 2.5 2.5
Storm 0.10.0 0.10.1 0.10.1
Mahout 0.11.2 0.12.2 0.12.2
© 2016 MapR Technologies 8
4 Reasons to Step Up to MapR 5.2
1. New features in the MapR Converged Data Platform
2. Ecosystem updates
3. Continuing quality improvements
4. End-of-maintenance for prior releases
© 2016 MapR Technologies 9© 2016 MapR Technologies
Project Spyglass
© 2016 MapR Technologies 10
MapR Vision: Maximizing User/Operator Productivity
Deep
Visibility
Another
sample
Easy
Management
Full
Control
© 2016 MapR Technologies 11
The MapR Spyglass Initiative
• New approach for increasing user and administrator productivity
– Comprehensive, open, extensible
• Simplifies the management of growing big data deployments
• Starts with upcoming release
– Phase 1 – MapR Monitoring
– Initial focus on operational visibility
• Helps community innovate faster
– Extensive use of open source visualization and dashboarding tools
© 2016 MapR Technologies 12
Spyglass Initiative Phase 1 - MapR Monitoring
Empower administrators with cluster
monitoring capabilities, including
metric and log collection from nodes,
services, and jobs, with dashboards to
display information in a useful way.
Converged
Customizable
Extensible
© 2016 MapR Technologies 13
Collection VisualizationAggregation &
Storage
MapR Monitoring Architecture
Future
Data Sources
Log Shippers
Metrics
Collectors
Alerting
Node
Environmentals
(CPU, Mem, I/O)
Service
Daemons
(YARN, Drill,
Hive, etc.)
MapR Control System
…
© 2016 MapR Technologies 14
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization
Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total
size)
YARN/MR Application Monitoring
• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores,
RAM
Service Daemon Monitoring
• Per-service charts with for (CPU Usage
by type, Memory)
• Centralized, searchable logs
• MapR core and ecosystem services
(includes YARN, Drill and Spark)
© 2014 MapR Technologies 15
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
© 2014 MapR Technologies 16
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total size)
© 2014 MapR Technologies 17
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total size)
YARN/MR Application Monitoring
• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores, RAM
© 2014 MapR Technologies 18
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total size)
YARN/MR Application Monitoring
• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores, RAM
Service Daemon Monitoring
• Per-service charts with for (CPU Usage by
type, Memory)
• Centralized, searchable logs
• MapR core and ecosystem services
(includes YARN, Drill and Spark)
© 2016 MapR Technologies 19
Customizable
Dashboards
for Visualizing Metrics
Log
Analytics
© 2016 MapR Technologies 20
Destination to Learn and Collaborate
Blog about topics and ideas
Share code snippets and dashboards
View demos, tutorials, and videos
Engage in use case discussion/development
© 2016 MapR Technologies 21
Dashboards are defined with JSON
and easy to export and import in
Grafana and Kibana
Extend/Integrate using REST API
The Exchange
© 2016 MapR Technologies 22
Dashboards
can be viewed
on mobile
devices.
© 2016 MapR Technologies 23
Summary
● Data collection and storage infrastructure (packaged
and supported)
○ Collection/storage of metrics & logs across node, storage,
services
● Visualization dashboard (Driven via community)
○ Sample dashboards for Grafana & Kibana
5.2 - Spyglass 1.0 GA
CUSTOMIZABLE, shareable and mobile-ready dashboards
CONVERGED monitoring with deep search
EXTENSIBLE and easy to integrate with REST API
© 2016 MapR Technologies 24© 2016 MapR Technologies
MapR Ecosystem Pack (MEP)
© 2016 MapR Technologies 25
What is the MapR Ecosystem Pack (MEP)?
• What is the “MapR Ecosystem”?
– A selected set of stable and popular components from the
Hadoop Ecosystem that we fully support on the MapR
platform.
• What is the “Pack”?
– A single repository of selected versions of these components fully tested
to be interoperable.
– Available via installer or package.
– Delivered with a predictable cadence.
© 2016 MapR Technologies 26
Extended
Ecosystem
Where Does MEP Fit In?
MapR
Ecosystem
MEP
Community supported.
Fully supported, updates tied
to MapR core.
Fully supported, updates
follow MEP process.
© 2016 MapR Technologies 27
An Example: Drill in MEP releases
August September October November December January
MapR 5.2 MapR 6.0
MEP 1.0:
Drill 1.6
An example of how this would look for Drill
MEP 1.1:
Drill 1.8
MEP 3.0:
Drill 2.X
MEP 2.0:
Drill 1.9
On our current release plan, MapR 5.2 will receive 3
different versions of Drill before updates cease.
© 2016 MapR Technologies 28
MEP Can Be Installed Using the 5.2 Installer
Can select MapR and MEP version. Can manually select components.
© 2016 MapR Technologies 29
Competitor Process Comparison
MapR MEP
Process
Cloudera Hortonworks
Predictable Cadence
Required Component Upgrades
Updates independent of core release
Developer Previews
Support For Multiple Versions
Packaged updates
How our new process stacks up against the competition:
© 2016 MapR Technologies 30© 2016 MapR Technologies
Drill and Spark Updates
© 2016 MapR Technologies 31
Drill Product Evolution
Drill 1.0 GA
•Drill GA
Drill 1.1
•Automatic
Partitioning for
Parquet Files
•Window
Functions
support
•- Aggregate
Functions:
AVG, COUNT,
MAX, MIN,
SUM
•-Ranking
Functions:
CUME_DIST,
DENSE_RANK
,
PERCENT_RA
NK, RANK and
ROW_NUMBE
R
•Hive
impersonation
•SQL Union
support
•Complex data
enhancements
· and
more
Drill 1.2
•Native parquet
reader for Hive
tables
•Hive partition
pruning
•Multiple Hive
versions
support
•Hive 1.2.1
version support
•New analytical
functions
(Lead, lag,
Ntiile etc)
•Multiple
window
Partition By
clauses
support
•Drop table
syntax
•Metadata
caching
•Security
support for
web UI
Drill 1.3/1.4
•Improved
Tableau
experience
with faster
Limit 0 queries
•Metadata
(INFORMATIO
N_SCHEMA)
query speed
ups on Hive
schemas/table
s
•Robust
partition
pruning (more
data types,
large # of
partitions)
•Optimized
metadata
cache
•Improved
window
functions
resource
usage and
performance
Drill 1.5/1.6
•Enhanced
Stability &
scale
•New memory
allocator
•Improved
uniform query
load
distribution via
connection
pooling
• Enhanced
query
performance
•Early
application of
partition
pruning in
query
planning
•Hive tables
query
planning
improvements
•Row count
based pruning
for Limit N
queries
•JDK 1.8
Drill 1.7
•Enhanced
MaxDir/MinDir
functions
•Access to Drill
logs in the
Web UI
•Addition of
JDBC/ODBC
client IP in Drill
audit logs
•Monitoring via
JMX
•Hive CHAR
data type
support
•Partition
pruning
enhancements
•Ability to return
file names as
part of queries
ANSI SQL
Window
Functions
Enhanced Hive
Compatibility
Query
Performance
& Scale
Drill on MapR-
DB JSON
tables
Easy
Monitoring of
deployments
© 2016 MapR Technologies 32
Converging SQL and JSON with Apache Drill 1.6
• Flexible and operational analytics on NoSQL
– MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables
– Pushdown capabilities provide optimal interactive experience
• Enhanced query performance
– Provides better query performance via partition pruning, metadata caching and other optimizations
– Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill
• Better memory management
– Delivers greater stability and scale which enables customers to run not only larger but also more SQL
workloads on a MapR cluster
• Improved integration with visualization tools like Tableau
– Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop.
– Enhanced SQL Window functions
© 2016 MapR Technologies 33
Drill ANSI SQL Capabilities Directly on JSON
0: jdbc:drill:drillbit=10.10.103.32> SELECT * FROM mfs.yelp_maprdb.business LIMIT 1;
+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+----
---+-------+------+
| _id | attributes | business_id | categories | city | full_address | hours | latitude | longitude | name | neighborhoods | open | review_count |
stars | state | type |
+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+----
---+-------+------+
| --1emggGHgoG6ipd_RMb-g | {"Accepts Credit Cards":true,"Parking":{"garage":false,"lot":true,"street":false,"valet":false,"validated":false},"Price
Range":1.0,"Ambience":{},"Good For":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | --
1emggGHgoG6ipd_RMb-g | ["Food","Convenience Stores"] | Las Vegas | 3280 S Decatur Blvd
Westside
Las Vegas, NV 89102 | {"Friday":{},"Monday":{},"Saturday":{},"Sunday":{},"Thursday":{},"Tuesday":{},"Wednesday":{}} | 36.1305306 | -115.2072382 |
Sinclair | ["Westside"] | true | 4.0 | 4.0 | NV | business |
+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+----
---+-------+------+
0: jdbc:drill:drillbit=10.10.103.32> SELECT count(*) FROM mfs.yelp_maprdb.business;
+---------+
| EXPR$0 |
+---------+
| 42153 |
+---------+
© 2016 MapR Technologies 34
Simplified Deployment with YARN (Drill 1.8)
● Drill as a long running
application in YARN
● Key features
○ Client tool to launch Drill as
YARN application
○ New Drill application
master (AM)
○ CPU & memory controls
○ Add/remove nodes to
cluster
○ Multiple Drill clusters
Drill Configuration w/YARN
© 2016 MapR Technologies 35
Spark 2.0
© 2016 MapR Technologies 36
What’s in Spark 2.0?
• Structured Streaming with Spark SQL
– The ability to perform interactive queries against live streaming data.
– Output can now be aggregated in a stream for continuous applications.
– Pre-computation of analytics in a continuous fashion can occur as the data is generated
• Whole Stage Code-gen
– Provided by the second-generation Tungsten engine.
– Eliminates the need for multiple JVM calls by flattening SQL queries into one single
function evaluated as bytecode at runtime.
• Dataset API’s
– Runs on the same engine as SparkSQL.
– Allows access to data from a variety of different data sources.
– Can run database-like operations or allow for passing in custom code.
© 2016 MapR Technologies 37
Spark 2.0: Structure Streaming with Spark SQL (Alpha)
valrecords=sqlContext.read.format(“json”).stream(“hdfs://input”)
valcounts=records.groupBy(“user”).count() counts.write
.trigger(ProcessingTime(“5sec”)) .outputMode(UpdateInPlace(“user”))
.format(“jdbc”) .startStream(“mysql://...”)
Repeated Queries
DB
User Count
User 1 10
User 2 23
User 3 16
…….. ……..
Store only the processed output instead of every
single record.
● Query executed repeatedly as and when the data arrives.
● Read the result from persistent storage, instead of processing the entire data set, resulting in faster access.
© 2016 MapR Technologies 38
Spark 2.0 Whole Stage Code-gen: Planner
ParquetRelation
Filter
Project
Broadcast Hash join
Project
TungstenAggregate
Exchange
ParquetRelation
Filter
Project
ParquetRelation
Filter
Project
Broadcast Hash join
Project
TungstenAggregate
Exchange
ParquetRelation
Filter
Project
Whole Stage Codegen Whole Stage Codegen
© 2016 MapR Technologies 39
Q&AEngage with us!
1. Spyglass Initiative
https://www.mapr.com/products/spyglass-initiative
https://community.mapr.com/docs/DOC-1088
2. Ask Questions:
– Ask Us Anything about Spyglass in the MapR Community from Mon(Aug 29nd)-
Fri(Sep 2nd)
– https://community.mapr.com/

MapR 5.2 Product Update

  • 1.
    © 2016 MapRTechnologies© 2016 MapR Technologies 5.2 Product Update MapR Product Mgmt & Product Marketing Aug 17, 2016
  • 2.
    © 2016 MapRTechnologies Today’s Presenters Sameer Nori Sr. Product Marketing Manager Prashant Rathi Sr. Product Manager Ian Downard Technical Marketing Engineer Balaji Mohanam Product Manager
  • 3.
    © 2016 MapRTechnologies 3 Today’s Agenda • Recent Product Announcements • The Spyglass Initiative & Demo • MapR Ecosystem Pack(MEP) • Spark and Drill updates
  • 4.
    © 2016 MapRTechnologies 4 The MapR Converged Data Platform
  • 5.
    © 2016 MapRTechnologies 5 Recent Product Announcements • Quick Start Solution focused on Risk Management for Financial Services – July 16 • Enterprise-Grade Spark Distribution – June 16 • Quick Start Migration Service – May 16 • Stream Processing On-Demand Training(ODT) – Apr 16 • Apache Drill 1.6 – Mar 16
  • 6.
    © 2016 MapRTechnologies 6 Four Big Themes in the 5.2 Release Major new features • MapR-DB JSON Table replication Binary Elastic Search v2.x support Drill DB JSON improvements • Streams Performant Spark Streaming Stream Admin APIs Easier Management • Spyglass : deep visibility across cluster ops Deep visibility Search across metrics and logs Full control customizable , sharable dashboards Extensible • Various Graphical Installer improvements Community Innovation • MapR Eco Pack 1.0 Supportability and Stability Currency and Commitment to SLA Easy deployment and upgrade Customer requested features • POSIX : HardLink and StatFS feature • Fast Failover for client • Fuse Client performance • Rack Reliability for data placement enhancement • File Client Impersonation enhancements
  • 7.
    © 2016 MapRTechnologies 7 5.2 Ecosystem Support These are the only component version changes in MEP 1.0 from 5.2 release date and all of these have been out for 5.1 already. Eco on 5.1 today MEP 1.0 on 5.2 Component Released with 5.1 Subsequently released for 5.1 Drill 1.4 1.6 1.6 Spark 1.5.2 1.6.1 1.6.1 Impala 2.2.0 2.5 2.5 Storm 0.10.0 0.10.1 0.10.1 Mahout 0.11.2 0.12.2 0.12.2
  • 8.
    © 2016 MapRTechnologies 8 4 Reasons to Step Up to MapR 5.2 1. New features in the MapR Converged Data Platform 2. Ecosystem updates 3. Continuing quality improvements 4. End-of-maintenance for prior releases
  • 9.
    © 2016 MapRTechnologies 9© 2016 MapR Technologies Project Spyglass
  • 10.
    © 2016 MapRTechnologies 10 MapR Vision: Maximizing User/Operator Productivity Deep Visibility Another sample Easy Management Full Control
  • 11.
    © 2016 MapRTechnologies 11 The MapR Spyglass Initiative • New approach for increasing user and administrator productivity – Comprehensive, open, extensible • Simplifies the management of growing big data deployments • Starts with upcoming release – Phase 1 – MapR Monitoring – Initial focus on operational visibility • Helps community innovate faster – Extensive use of open source visualization and dashboarding tools
  • 12.
    © 2016 MapRTechnologies 12 Spyglass Initiative Phase 1 - MapR Monitoring Empower administrators with cluster monitoring capabilities, including metric and log collection from nodes, services, and jobs, with dashboards to display information in a useful way. Converged Customizable Extensible
  • 13.
    © 2016 MapRTechnologies 13 Collection VisualizationAggregation & Storage MapR Monitoring Architecture Future Data Sources Log Shippers Metrics Collectors Alerting Node Environmentals (CPU, Mem, I/O) Service Daemons (YARN, Drill, Hive, etc.) MapR Control System …
  • 14.
    © 2016 MapRTechnologies 14 Project Spyglass – Monitoring All You Care About Node/Infrastructure Monitoring • Global Aggregates (Average, Min, Max) Charts (e.g. CPU, Disk utilization) • Per-node charts (e.g. I/O Throughput by disk) • MFS read/writes and throughput • DB puts, gets, scans and cache metrics Cluster Space Utilization Monitoring • Cluster wide storage utilization • Storage Utilization Trend • Utilization per volume and per accountable entity (data, volume, snapshot and total size) YARN/MR Application Monitoring • Global YARN trend graphs • Containers - Pending, Active • vCores & RAM - Allocated & Used • Per Queue charts - containers, vCores, RAM Service Daemon Monitoring • Per-service charts with for (CPU Usage by type, Memory) • Centralized, searchable logs • MapR core and ecosystem services (includes YARN, Drill and Spark)
  • 15.
    © 2014 MapRTechnologies 15 Project Spyglass – Monitoring All You Care About Node/Infrastructure Monitoring • Global Aggregates (Average, Min, Max) Charts (e.g. CPU, Disk utilization) • Per-node charts (e.g. I/O Throughput by disk) • MFS read/writes and throughput • DB puts, gets, scans and cache metrics
  • 16.
    © 2014 MapRTechnologies 16 Project Spyglass – Monitoring All You Care About Node/Infrastructure Monitoring • Global Aggregates (Average, Min, Max) Charts (e.g. CPU, Disk utilization) • Per-node charts (e.g. I/O Throughput by disk) • MFS read/writes and throughput • DB puts, gets, scans and cache metrics Cluster Space Utilization Monitoring • Cluster wide storage utilization • Storage Utilization Trend • Utilization per volume and per accountable entity (data, volume, snapshot and total size)
  • 17.
    © 2014 MapRTechnologies 17 Project Spyglass – Monitoring All You Care About Node/Infrastructure Monitoring • Global Aggregates (Average, Min, Max) Charts (e.g. CPU, Disk utilization) • Per-node charts (e.g. I/O Throughput by disk) • MFS read/writes and throughput • DB puts, gets, scans and cache metrics Cluster Space Utilization Monitoring • Cluster wide storage utilization • Storage Utilization Trend • Utilization per volume and per accountable entity (data, volume, snapshot and total size) YARN/MR Application Monitoring • Global YARN trend graphs • Containers - Pending, Active • vCores & RAM - Allocated & Used • Per Queue charts - containers, vCores, RAM
  • 18.
    © 2014 MapRTechnologies 18 Project Spyglass – Monitoring All You Care About Node/Infrastructure Monitoring • Global Aggregates (Average, Min, Max) Charts (e.g. CPU, Disk utilization) • Per-node charts (e.g. I/O Throughput by disk) • MFS read/writes and throughput • DB puts, gets, scans and cache metrics Cluster Space Utilization Monitoring • Cluster wide storage utilization • Storage Utilization Trend • Utilization per volume and per accountable entity (data, volume, snapshot and total size) YARN/MR Application Monitoring • Global YARN trend graphs • Containers - Pending, Active • vCores & RAM - Allocated & Used • Per Queue charts - containers, vCores, RAM Service Daemon Monitoring • Per-service charts with for (CPU Usage by type, Memory) • Centralized, searchable logs • MapR core and ecosystem services (includes YARN, Drill and Spark)
  • 19.
    © 2016 MapRTechnologies 19 Customizable Dashboards for Visualizing Metrics Log Analytics
  • 20.
    © 2016 MapRTechnologies 20 Destination to Learn and Collaborate Blog about topics and ideas Share code snippets and dashboards View demos, tutorials, and videos Engage in use case discussion/development
  • 21.
    © 2016 MapRTechnologies 21 Dashboards are defined with JSON and easy to export and import in Grafana and Kibana Extend/Integrate using REST API The Exchange
  • 22.
    © 2016 MapRTechnologies 22 Dashboards can be viewed on mobile devices.
  • 23.
    © 2016 MapRTechnologies 23 Summary ● Data collection and storage infrastructure (packaged and supported) ○ Collection/storage of metrics & logs across node, storage, services ● Visualization dashboard (Driven via community) ○ Sample dashboards for Grafana & Kibana 5.2 - Spyglass 1.0 GA CUSTOMIZABLE, shareable and mobile-ready dashboards CONVERGED monitoring with deep search EXTENSIBLE and easy to integrate with REST API
  • 24.
    © 2016 MapRTechnologies 24© 2016 MapR Technologies MapR Ecosystem Pack (MEP)
  • 25.
    © 2016 MapRTechnologies 25 What is the MapR Ecosystem Pack (MEP)? • What is the “MapR Ecosystem”? – A selected set of stable and popular components from the Hadoop Ecosystem that we fully support on the MapR platform. • What is the “Pack”? – A single repository of selected versions of these components fully tested to be interoperable. – Available via installer or package. – Delivered with a predictable cadence.
  • 26.
    © 2016 MapRTechnologies 26 Extended Ecosystem Where Does MEP Fit In? MapR Ecosystem MEP Community supported. Fully supported, updates tied to MapR core. Fully supported, updates follow MEP process.
  • 27.
    © 2016 MapRTechnologies 27 An Example: Drill in MEP releases August September October November December January MapR 5.2 MapR 6.0 MEP 1.0: Drill 1.6 An example of how this would look for Drill MEP 1.1: Drill 1.8 MEP 3.0: Drill 2.X MEP 2.0: Drill 1.9 On our current release plan, MapR 5.2 will receive 3 different versions of Drill before updates cease.
  • 28.
    © 2016 MapRTechnologies 28 MEP Can Be Installed Using the 5.2 Installer Can select MapR and MEP version. Can manually select components.
  • 29.
    © 2016 MapRTechnologies 29 Competitor Process Comparison MapR MEP Process Cloudera Hortonworks Predictable Cadence Required Component Upgrades Updates independent of core release Developer Previews Support For Multiple Versions Packaged updates How our new process stacks up against the competition:
  • 30.
    © 2016 MapRTechnologies 30© 2016 MapR Technologies Drill and Spark Updates
  • 31.
    © 2016 MapRTechnologies 31 Drill Product Evolution Drill 1.0 GA •Drill GA Drill 1.1 •Automatic Partitioning for Parquet Files •Window Functions support •- Aggregate Functions: AVG, COUNT, MAX, MIN, SUM •-Ranking Functions: CUME_DIST, DENSE_RANK , PERCENT_RA NK, RANK and ROW_NUMBE R •Hive impersonation •SQL Union support •Complex data enhancements · and more Drill 1.2 •Native parquet reader for Hive tables •Hive partition pruning •Multiple Hive versions support •Hive 1.2.1 version support •New analytical functions (Lead, lag, Ntiile etc) •Multiple window Partition By clauses support •Drop table syntax •Metadata caching •Security support for web UI Drill 1.3/1.4 •Improved Tableau experience with faster Limit 0 queries •Metadata (INFORMATIO N_SCHEMA) query speed ups on Hive schemas/table s •Robust partition pruning (more data types, large # of partitions) •Optimized metadata cache •Improved window functions resource usage and performance Drill 1.5/1.6 •Enhanced Stability & scale •New memory allocator •Improved uniform query load distribution via connection pooling • Enhanced query performance •Early application of partition pruning in query planning •Hive tables query planning improvements •Row count based pruning for Limit N queries •JDK 1.8 Drill 1.7 •Enhanced MaxDir/MinDir functions •Access to Drill logs in the Web UI •Addition of JDBC/ODBC client IP in Drill audit logs •Monitoring via JMX •Hive CHAR data type support •Partition pruning enhancements •Ability to return file names as part of queries ANSI SQL Window Functions Enhanced Hive Compatibility Query Performance & Scale Drill on MapR- DB JSON tables Easy Monitoring of deployments
  • 32.
    © 2016 MapRTechnologies 32 Converging SQL and JSON with Apache Drill 1.6 • Flexible and operational analytics on NoSQL – MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables – Pushdown capabilities provide optimal interactive experience • Enhanced query performance – Provides better query performance via partition pruning, metadata caching and other optimizations – Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill • Better memory management – Delivers greater stability and scale which enables customers to run not only larger but also more SQL workloads on a MapR cluster • Improved integration with visualization tools like Tableau – Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop. – Enhanced SQL Window functions
  • 33.
    © 2016 MapRTechnologies 33 Drill ANSI SQL Capabilities Directly on JSON 0: jdbc:drill:drillbit=10.10.103.32> SELECT * FROM mfs.yelp_maprdb.business LIMIT 1; +-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+---- ---+-------+------+ | _id | attributes | business_id | categories | city | full_address | hours | latitude | longitude | name | neighborhoods | open | review_count | stars | state | type | +-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+---- ---+-------+------+ | --1emggGHgoG6ipd_RMb-g | {"Accepts Credit Cards":true,"Parking":{"garage":false,"lot":true,"street":false,"valet":false,"validated":false},"Price Range":1.0,"Ambience":{},"Good For":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | -- 1emggGHgoG6ipd_RMb-g | ["Food","Convenience Stores"] | Las Vegas | 3280 S Decatur Blvd Westside Las Vegas, NV 89102 | {"Friday":{},"Monday":{},"Saturday":{},"Sunday":{},"Thursday":{},"Tuesday":{},"Wednesday":{}} | 36.1305306 | -115.2072382 | Sinclair | ["Westside"] | true | 4.0 | 4.0 | NV | business | +-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+---- ---+-------+------+ 0: jdbc:drill:drillbit=10.10.103.32> SELECT count(*) FROM mfs.yelp_maprdb.business; +---------+ | EXPR$0 | +---------+ | 42153 | +---------+
  • 34.
    © 2016 MapRTechnologies 34 Simplified Deployment with YARN (Drill 1.8) ● Drill as a long running application in YARN ● Key features ○ Client tool to launch Drill as YARN application ○ New Drill application master (AM) ○ CPU & memory controls ○ Add/remove nodes to cluster ○ Multiple Drill clusters Drill Configuration w/YARN
  • 35.
    © 2016 MapRTechnologies 35 Spark 2.0
  • 36.
    © 2016 MapRTechnologies 36 What’s in Spark 2.0? • Structured Streaming with Spark SQL – The ability to perform interactive queries against live streaming data. – Output can now be aggregated in a stream for continuous applications. – Pre-computation of analytics in a continuous fashion can occur as the data is generated • Whole Stage Code-gen – Provided by the second-generation Tungsten engine. – Eliminates the need for multiple JVM calls by flattening SQL queries into one single function evaluated as bytecode at runtime. • Dataset API’s – Runs on the same engine as SparkSQL. – Allows access to data from a variety of different data sources. – Can run database-like operations or allow for passing in custom code.
  • 37.
    © 2016 MapRTechnologies 37 Spark 2.0: Structure Streaming with Spark SQL (Alpha) valrecords=sqlContext.read.format(“json”).stream(“hdfs://input”) valcounts=records.groupBy(“user”).count() counts.write .trigger(ProcessingTime(“5sec”)) .outputMode(UpdateInPlace(“user”)) .format(“jdbc”) .startStream(“mysql://...”) Repeated Queries DB User Count User 1 10 User 2 23 User 3 16 …….. …….. Store only the processed output instead of every single record. ● Query executed repeatedly as and when the data arrives. ● Read the result from persistent storage, instead of processing the entire data set, resulting in faster access.
  • 38.
    © 2016 MapRTechnologies 38 Spark 2.0 Whole Stage Code-gen: Planner ParquetRelation Filter Project Broadcast Hash join Project TungstenAggregate Exchange ParquetRelation Filter Project ParquetRelation Filter Project Broadcast Hash join Project TungstenAggregate Exchange ParquetRelation Filter Project Whole Stage Codegen Whole Stage Codegen
  • 39.
    © 2016 MapRTechnologies 39 Q&AEngage with us! 1. Spyglass Initiative https://www.mapr.com/products/spyglass-initiative https://community.mapr.com/docs/DOC-1088 2. Ask Questions: – Ask Us Anything about Spyglass in the MapR Community from Mon(Aug 29nd)- Fri(Sep 2nd) – https://community.mapr.com/