MapR 5.2 Product Update

© 2016 MapR Technologies© 2016 MapR Technologies
5.2 Product Update
MapR Product Mgmt & Product Marketing
Aug 17, 2016

© 2016 MapR Technologies
Today’s Presenters
Sameer Nori
Sr. Product Marketing Manager
Prashant Rathi
Sr. Product Manager
Ian Downard
Technical Marketing Engineer
Balaji Mohanam
Product Manager

© 2016 MapR Technologies 3
Today’s Agenda
• Recent Product Announcements
• The Spyglass Initiative & Demo
• MapR Ecosystem Pack(MEP)
• Spark and Drill updates

The MapR Converged Data Platform

Recent Product Announcements
• Quick Start Solution focused on Risk Management for Financial
Services – July 16
• Enterprise-Grade Spark Distribution – June 16
• Quick Start Migration Service – May 16
• Stream Processing On-Demand Training(ODT) – Apr 16
• Apache Drill 1.6 – Mar 16

Four Big Themes in the 5.2 Release
Major new features
• MapR-DB
JSON Table replication
Binary Elastic Search v2.x support
Drill DB JSON improvements
• Streams
Performant Spark Streaming
Stream Admin APIs
Easier Management
• Spyglass : deep visibility across cluster ops
Deep visibility
Search across metrics and logs
Full control
customizable , sharable dashboards
Extensible
• Various Graphical Installer improvements
Community Innovation
• MapR Eco Pack 1.0
Supportability and Stability
Currency and Commitment to SLA
Easy deployment and upgrade
Customer requested features
• POSIX : HardLink and StatFS feature
• Fast Failover for client
• Fuse Client performance
• Rack Reliability for data placement
enhancement
• File Client Impersonation enhancements

5.2 Ecosystem Support
These are the only component version changes in MEP 1.0 from 5.2 release date
and all of these have been out for 5.1 already.
Eco on 5.1 today MEP 1.0 on 5.2
Component Released with 5.1
Subsequently released for
5.1
Drill 1.4 1.6 1.6
Spark 1.5.2 1.6.1 1.6.1
Impala 2.2.0 2.5 2.5
Storm 0.10.0 0.10.1 0.10.1
Mahout 0.11.2 0.12.2 0.12.2

4 Reasons to Step Up to MapR 5.2
1. New features in the MapR Converged Data Platform
2. Ecosystem updates
3. Continuing quality improvements
4. End-of-maintenance for prior releases

© 2016 MapR Technologies 9© 2016 MapR Technologies
Project Spyglass

MapR Vision: Maximizing User/Operator Productivity
Deep
Visibility
Another
sample
Easy
Management
Full
Control

The MapR Spyglass Initiative
• New approach for increasing user and administrator productivity
– Comprehensive, open, extensible
• Simplifies the management of growing big data deployments
• Starts with upcoming release
– Phase 1 – MapR Monitoring
– Initial focus on operational visibility
• Helps community innovate faster
– Extensive use of open source visualization and dashboarding tools

Spyglass Initiative Phase 1 - MapR Monitoring
Empower administrators with cluster
monitoring capabilities, including
metric and log collection from nodes,
services, and jobs, with dashboards to
display information in a useful way.
Converged
Customizable
Extensible

Collection VisualizationAggregation &
Storage
MapR Monitoring Architecture
Future
Data Sources
Log Shippers
Metrics
Collectors
Alerting
Node
Environmentals
(CPU, Mem, I/O)
Service
Daemons
(YARN, Drill,
Hive, etc.)
MapR Control System
…

Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring
• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput
by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization
Monitoring
• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable
entity (data, volume, snapshot and total
size)
YARN/MR Application Monitoring
• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores,
RAM
Service Daemon Monitoring
• Per-service charts with for (CPU Usage
by type, Memory)
• Centralized, searchable logs
• MapR core and ecosystem services
(includes YARN, Drill and Spark)

by disk)

by disk)
Cluster Space Utilization Monitoring
entity (data, volume, snapshot and total size)

by disk)
• Per Queue charts - containers, vCores, RAM

by disk)
• Per Queue charts - containers, vCores, RAM
Service Daemon Monitoring
• Per-service charts with for (CPU Usage by
type, Memory)
• Centralized, searchable logs
• MapR core and ecosystem services
(includes YARN, Drill and Spark)

Customizable
Dashboards
for Visualizing Metrics
Log
Analytics

Destination to Learn and Collaborate
Blog about topics and ideas
Share code snippets and dashboards
View demos, tutorials, and videos
Engage in use case discussion/development

Dashboards are defined with JSON
and easy to export and import in
Grafana and Kibana
Extend/Integrate using REST API
The Exchange

Dashboards
can be viewed
on mobile
devices.

Summary
● Data collection and storage infrastructure (packaged
and supported)
○ Collection/storage of metrics & logs across node, storage,
services
● Visualization dashboard (Driven via community)
○ Sample dashboards for Grafana & Kibana
5.2 - Spyglass 1.0 GA
CUSTOMIZABLE, shareable and mobile-ready dashboards
CONVERGED monitoring with deep search
EXTENSIBLE and easy to integrate with REST API

MapR Ecosystem Pack (MEP)

What is the MapR Ecosystem Pack (MEP)?
• What is the “MapR Ecosystem”?
– A selected set of stable and popular components from the
Hadoop Ecosystem that we fully support on the MapR
platform.
• What is the “Pack”?
– A single repository of selected versions of these components fully tested
to be interoperable.
– Available via installer or package.
– Delivered with a predictable cadence.

Extended
Ecosystem
Where Does MEP Fit In?
MapR
Ecosystem
MEP
Community supported.
Fully supported, updates tied
to MapR core.
Fully supported, updates
follow MEP process.

An Example: Drill in MEP releases
August September October November December January
MapR 5.2 MapR 6.0
MEP 1.0:
Drill 1.6
An example of how this would look for Drill
MEP 1.1:
Drill 1.8
MEP 3.0:
Drill 2.X
MEP 2.0:
Drill 1.9
On our current release plan, MapR 5.2 will receive 3
different versions of Drill before updates cease.

MEP Can Be Installed Using the 5.2 Installer
Can select MapR and MEP version. Can manually select components.

Competitor Process Comparison
MapR MEP
Process
Cloudera Hortonworks
Predictable Cadence
Required Component Upgrades
Updates independent of core release
Developer Previews
Support For Multiple Versions
Packaged updates
How our new process stacks up against the competition:

Drill and Spark Updates

Drill Product Evolution
Drill 1.0 GA
•Drill GA
Drill 1.1
•Automatic
Partitioning for
Parquet Files
•Window
Functions
support
•- Aggregate
Functions:
AVG, COUNT,
MAX, MIN,
SUM
•-Ranking
Functions:
CUME_DIST,
DENSE_RANK
,
PERCENT_RA
NK, RANK and
ROW_NUMBE
R
•Hive
impersonation
•SQL Union
support
•Complex data
enhancements
· and
more
Drill 1.2
•Native parquet
reader for Hive
tables
•Hive partition
pruning
•Multiple Hive
versions
support
•Hive 1.2.1
version support
•New analytical
functions
(Lead, lag,
Ntiile etc)
•Multiple
window
Partition By
clauses
support
•Drop table
syntax
•Metadata
caching
•Security
support for
web UI
Drill 1.3/1.4
•Improved
Tableau
experience
with faster
Limit 0 queries
•Metadata
(INFORMATIO
N_SCHEMA)
query speed
ups on Hive
schemas/table
s
•Robust
partition
pruning (more
data types,
large # of
partitions)
•Optimized
metadata
cache
•Improved
window
functions
resource
usage and
performance
Drill 1.5/1.6
•Enhanced
Stability &
scale
•New memory
allocator
•Improved
uniform query
load
distribution via
connection
pooling
• Enhanced
query
performance
•Early
application of
partition
pruning in
query
planning
•Hive tables
query
planning
improvements
•Row count
based pruning
for Limit N
queries
•JDK 1.8
Drill 1.7
•Enhanced
MaxDir/MinDir
functions
•Access to Drill
logs in the
Web UI
•Addition of
JDBC/ODBC
client IP in Drill
audit logs
•Monitoring via
JMX
•Hive CHAR
data type
support
•Partition
pruning
enhancements
•Ability to return
file names as
part of queries
ANSI SQL
Window
Functions
Enhanced Hive
Compatibility
Query
Performance
& Scale
Drill on MapR-
DB JSON
tables
Easy
Monitoring of
deployments

Converging SQL and JSON with Apache Drill 1.6
• Flexible and operational analytics on NoSQL
– MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables
– Pushdown capabilities provide optimal interactive experience
• Enhanced query performance
– Provides better query performance via partition pruning, metadata caching and other optimizations
– Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill
• Better memory management
– Delivers greater stability and scale which enables customers to run not only larger but also more SQL
workloads on a MapR cluster
• Improved integration with visualization tools like Tableau
– Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop.
– Enhanced SQL Window functions

Drill ANSI SQL Capabilities Directly on JSON
0: jdbc:drill:drillbit=10.10.103.32> SELECT * FROM mfs.yelp_maprdb.business LIMIT 1;
+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+----
---+-------+------+
| _id | attributes | business_id | categories | city | full_address | hours | latitude | longitude | name | neighborhoods | open | review_count |
stars | state | type |
+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+----
---+-------+------+
| --1emggGHgoG6ipd_RMb-g | {"Accepts Credit Cards":true,"Parking":{"garage":false,"lot":true,"street":false,"valet":false,"validated":false},"Price
Range":1.0,"Ambience":{},"Good For":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | --
1emggGHgoG6ipd_RMb-g | ["Food","Convenience Stores"] | Las Vegas | 3280 S Decatur Blvd
Westside
Las Vegas, NV 89102 | {"Friday":{},"Monday":{},"Saturday":{},"Sunday":{},"Thursday":{},"Tuesday":{},"Wednesday":{}} | 36.1305306 | -115.2072382 |
Sinclair | ["Westside"] | true | 4.0 | 4.0 | NV | business |
+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+----
---+-------+------+
0: jdbc:drill:drillbit=10.10.103.32> SELECT count(*) FROM mfs.yelp_maprdb.business;
+---------+
| EXPR$0 |
+---------+
| 42153 |
+---------+

Simplified Deployment with YARN (Drill 1.8)
● Drill as a long running
application in YARN
● Key features
○ Client tool to launch Drill as
YARN application
○ New Drill application
master (AM)
○ CPU & memory controls
○ Add/remove nodes to
cluster
○ Multiple Drill clusters
Drill Configuration w/YARN

Spark 2.0

What’s in Spark 2.0?
• Structured Streaming with Spark SQL
– The ability to perform interactive queries against live streaming data.
– Output can now be aggregated in a stream for continuous applications.
– Pre-computation of analytics in a continuous fashion can occur as the data is generated
• Whole Stage Code-gen
– Provided by the second-generation Tungsten engine.
– Eliminates the need for multiple JVM calls by flattening SQL queries into one single
function evaluated as bytecode at runtime.
• Dataset API’s
– Runs on the same engine as SparkSQL.
– Allows access to data from a variety of different data sources.
– Can run database-like operations or allow for passing in custom code.

Spark 2.0: Structure Streaming with Spark SQL (Alpha)
valrecords=sqlContext.read.format(“json”).stream(“hdfs://input”)
valcounts=records.groupBy(“user”).count() counts.write
.trigger(ProcessingTime(“5sec”)) .outputMode(UpdateInPlace(“user”))
.format(“jdbc”) .startStream(“mysql://...”)
Repeated Queries
DB
User Count
User 1 10
User 2 23
User 3 16
…….. ……..
Store only the processed output instead of every
single record.
● Query executed repeatedly as and when the data arrives.
● Read the result from persistent storage, instead of processing the entire data set, resulting in faster access.

Spark 2.0 Whole Stage Code-gen: Planner
ParquetRelation
Filter
Project
Broadcast Hash join
Project
TungstenAggregate
Exchange
ParquetRelation
Filter
Project
ParquetRelation
Filter
Project
Broadcast Hash join
Project
TungstenAggregate
Exchange
ParquetRelation
Filter
Project
Whole Stage Codegen Whole Stage Codegen

Q&AEngage with us!
1. Spyglass Initiative
https://www.mapr.com/products/spyglass-initiative
https://community.mapr.com/docs/DOC-1088
2. Ask Questions:
– Ask Us Anything about Spyglass in the MapR Community from Mon(Aug 29nd)-
Fri(Sep 2nd)
– https://community.mapr.com/

MapR 5.2 Product Update

More Related Content

What's hot

Viewers also liked

Similar to MapR 5.2 Product Update

More from MapR Technologies

Recently uploaded

MapR 5.2 Product Update