ACHIEVING OPERATIONAL
EXCELLENCE WITH
HIVEAND MAPREDUCE
Confidential
CHALLENGES
2
Heterogeneous Application Environments Cluster Performance Monitoring Application Performance Monitor
Production Hadoop Environments Contain a Variety of Application Technologies
Confidential
CHALLENGES
3
Application Performance MonitorCluster Performance MonitoringHeterogeneous Application Environments
Cluster Monitoring Products Do Not Provide Application Insight
Confidential
CHALLENGES
4
Cluster Performance Monitoring Application Performance MonitorHeterogeneous Application Environments
Existing Tools Offer Limited Value for Monitoring Application Performance
Leaving us blind to business context, priority, ownership and performance of our data applications
Confidential
PERFORMANCEMONITORING&VISIBILITY
5
Enterprise Scale Monitoring and Management for Big Data Apps
Business&
OperationalContext
Data& TechnologyConnecting Business and Data
Confidential
4
5
3
5BESTPRACTICESTO ACHIEVE OPERATIONALEXCELLENCE
6
Visibility
1
Performance monitoring and visibility into all of your big data applications
• Increase the quality and efficiency of your deployments with a single integrated view of your
data applications and real-time performance metrics across all environments.
Segmenting users, applications and environments
• Quickly understand what is happening, where and by whom in ways that are meaningful and
aligned to how your business operates.
Identify performance issues, bottleneck and noncompliant applications and queries
• Spend less time wading through Hadoop logs, ResourceManager and source code to find
issues with your data pipelines. Instead, use that time optimizing your environment.
Add business context to better monitor your applications
• Immediately understand the business impact of an issue, including the downstream
implications, so you can rapidly take the right corrective action.
Collaborate across teams to resolve issues faster
• Collaboration between all roles that interact with an application, data scientists, developers
and operations, the quality and efficiency of your application increases.
2
Confidential
PERFORMANCE MONITORING& VISIBILITY
7
Pinpoint bottlenecks and
identify causes
Monitor current executions and performance
Comprehensive view of all your data processing execution Fully visualize your entire data pipeline
Immediately understand the status of all your data applications
See all successful, failed, pending processes…
Confidential
PERFORMANCE MONITORING& VISIBILITY
8
Fully visualize your queries and data pipelinesComprehensive view of all your data processing executions
RESULTS
JOIN OPERATIONS
SOURCE SINK
SURFACE HQL
Confidential
1
2
4
5BESTPRACTICESTO ACHIEVE OPERATIONALREADINESS
9
Segmentation
2
Performance monitoring and visibility into all of your big data applications
• Increase the quality and efficiency of your deployments with a single integrated view of your
data applications and real-time performance metrics across all environments.
Segmenting users, applications and environments
• Quickly understand what is happening, where and by whom in ways that are meaningful and
aligned to how your business operates.
Identify performance issues, bottleneck and noncompliant applications and queries
• Spend less time wading through Hadoop logs, ResourceManager and source code to find
issues with your data pipelines. Instead, use that time optimizing your environment.
Add business context to better monitor your applications
• Immediately understand the business impact of an issue, including the downstream
implications, so you can rapidly take the right corrective action.
Collaborate across teams to resolve issues faster
• Collaboration between all roles that interact with an application, data scientists, developers
and operations, the quality and efficiency of your application increases.
5
3
Confidential
SEGMENTATION
10
Pinpoint bottlenecks and
identify causes
Signal Over Noise
Quickly find and filter what you are
looking for and save as a custom view
Views can private, shared with a team,
or made public
Quickly view application data by
cluster, owner, technology etc
Confidential
5BESTPRACTICESTO ACHIEVE OPERATIONALREADINESS
11
Identify Problems
3
Performance monitoring and visibility into all of your big data applications
• Increase the quality and efficiency of your deployments with a single integrated view of your
data applications and real-time performance metrics across all environments.
Segmenting users, applications and environments
• Quickly understand what is happening, where and by whom in ways that are meaningful and
aligned to how your business operates.
Identify performance issues, bottleneck and noncompliant applications and queries
• Spend less time wading through Hadoop logs, ResourceManager and source code to find
issues with your data pipelines. Instead, use that time optimizing your environment.
Add business context to better monitor your applications
• Immediately understand the business impact of an issue, including the downstream
implications, so you can rapidly take the right corrective action.
Collaborate across teams to resolve issues faster
• Collaboration between all roles that interact with an application, data scientists, developers
and operations, the quality and efficiency of your application increases.
1
2
4
5
Confidential
QUICKLY DRILLDOWNTO EXPOSEROOTCAUSE
12
Create JIRA issues with views and data for quickly collaborating to resolve performance problems
With one click, create a Jira
issue with a link to this view
Confidential
IDENTIFYBOTTLENECKS ANDSLOWDOWNS
13
Pinpoint bottlenecks and
identify causes
Pinpoint bottlenecks and identify causes
CHOOSE METRICSUNDERSTAND BEHAVIORS VISUALIZE SLOWDOWNSDRILL DOWN TO QUERY PERFORMANCE VIEW
Confidential
5BESTPRACTICESTO ACHIEVE OPERATIONALREADINESS
14
Add Context
4
Performance monitoring and visibility into all of your big data applications
• Increase the quality and efficiency of your deployments with a single integrated view of your
data applications and real-time performance metrics across all environments.
Segmenting users, applications and environments
• Quickly understand what is happening, where and by whom in ways that are meaningful and
aligned to how your business operates.
Identify performance issues, bottleneck and noncompliant applications and queries
• Spend less time wading through Hadoop logs, ResourceManager and source code to find
issues with your data pipelines. Instead, use that time optimizing your environment.
Add business context to better monitor your applications
• Immediately understand the business impact of an issue, including the downstream
implications, so you can rapidly take the right corrective action.
Collaborate across teams to resolve issues faster
• Collaboration between all roles that interact with an application, data scientists, developers
and operations, the quality and efficiency of your application increases.
1
2
3
5
Confidential
BUSINESSCONTEXT
15
Leverage metadata to align applications with their business context
View and sort by application metadata
Visualize executions and resource contention
Understand concurrency
Confidential16
SURFACE ALL FAILURES
Quickly identify all failing applications
App Name
Owner
Organization
Cluster A or B
Privacy Level
Production or Dev
Custom Tags
More …
Not all problems are
created equal
Confidential
5BESTPRACTICESTO ACHIEVE OPERATIONALREADINESS
17
Collaborate
5
Performance monitoring and visibility into all of your big data applications
• Increase the quality and efficiency of your deployments with a single integrated view of your
data applications and real-time performance metrics across all environments.
Segmenting users, applications and environments
• Quickly understand what is happening, where and by whom in ways that are meaningful and
aligned to how your business operates.
Identify performance issues, bottleneck and noncompliant applications and queries
• Spend less time wading through Hadoop logs, ResourceManager and source code to find
issues with your data pipelines. Instead, use that time optimizing your environment.
Add business context to better monitor your applications
• Immediately understand the business impact of an issue, including the downstream
implications, so you can rapidly take the right corrective action.
Collaborate across teams to resolve issues faster
• Collaboration between all roles that interact with an application, data scientists, developers
and operations, the quality and efficiency of your application increases.
1
2
3
4
Confidential
NURTUREACULTUREOFOPERATIONALEXCELLENCE
18
Ensure that business, development, IT operations can collaborate seamlessly when it matters
Confidential
LET’S TAKE ATOUR
For a walk-through of all the
features of Driven,
Go to our Showcase
interactive demo
http://showcase.driven.io
THANKYOU
APPENDIX
Confidential
End-to-end operational telemetry metadata for big data applications
Accessible via Web browser, command-line interface (CLI), or simple search queries
Easy integrations through JMX and upcoming Driven SDK
…THROUGH ASCALABLE, SEARCHABLE METADATA STORE
Telemetry metadata
(SSL)
YARN
HADOOP APPS AND INFRASTRUCTURE
APPLICATIONS
Plugin
22
HADOOP CLUSTERS
WARfiles
Web App
Server
Server
Web CLI JMX
Web App
Server
SCALE OUT
SCALE OUT
Confidential
Commercial
Training Consulting
Community
Free community support through our
mailing list and our online forums
www.cascading.org/support | forums.cascading.io
We offer short-term consulting
engagements designed to help customers
with mentoring and best practices
Developer training for
Cascading and Scalding,
private training also available
www.cascading.io/services/training/
Varying levels of technical
support are available to support
your production deployments
www.cascading.io/services/support
Supporting our Customers & Community
SUPPORTOPTIONS
23

5 Best Practices to Achieve Operational Excellence with Hive and MapReduce

  • 1.
  • 2.
    Confidential CHALLENGES 2 Heterogeneous Application EnvironmentsCluster Performance Monitoring Application Performance Monitor Production Hadoop Environments Contain a Variety of Application Technologies
  • 3.
    Confidential CHALLENGES 3 Application Performance MonitorClusterPerformance MonitoringHeterogeneous Application Environments Cluster Monitoring Products Do Not Provide Application Insight
  • 4.
    Confidential CHALLENGES 4 Cluster Performance MonitoringApplication Performance MonitorHeterogeneous Application Environments Existing Tools Offer Limited Value for Monitoring Application Performance Leaving us blind to business context, priority, ownership and performance of our data applications
  • 5.
    Confidential PERFORMANCEMONITORING&VISIBILITY 5 Enterprise Scale Monitoringand Management for Big Data Apps Business& OperationalContext Data& TechnologyConnecting Business and Data
  • 6.
    Confidential 4 5 3 5BESTPRACTICESTO ACHIEVE OPERATIONALEXCELLENCE 6 Visibility 1 Performancemonitoring and visibility into all of your big data applications • Increase the quality and efficiency of your deployments with a single integrated view of your data applications and real-time performance metrics across all environments. Segmenting users, applications and environments • Quickly understand what is happening, where and by whom in ways that are meaningful and aligned to how your business operates. Identify performance issues, bottleneck and noncompliant applications and queries • Spend less time wading through Hadoop logs, ResourceManager and source code to find issues with your data pipelines. Instead, use that time optimizing your environment. Add business context to better monitor your applications • Immediately understand the business impact of an issue, including the downstream implications, so you can rapidly take the right corrective action. Collaborate across teams to resolve issues faster • Collaboration between all roles that interact with an application, data scientists, developers and operations, the quality and efficiency of your application increases. 2
  • 7.
    Confidential PERFORMANCE MONITORING& VISIBILITY 7 Pinpointbottlenecks and identify causes Monitor current executions and performance Comprehensive view of all your data processing execution Fully visualize your entire data pipeline Immediately understand the status of all your data applications See all successful, failed, pending processes…
  • 8.
    Confidential PERFORMANCE MONITORING& VISIBILITY 8 Fullyvisualize your queries and data pipelinesComprehensive view of all your data processing executions RESULTS JOIN OPERATIONS SOURCE SINK SURFACE HQL
  • 9.
    Confidential 1 2 4 5BESTPRACTICESTO ACHIEVE OPERATIONALREADINESS 9 Segmentation 2 Performancemonitoring and visibility into all of your big data applications • Increase the quality and efficiency of your deployments with a single integrated view of your data applications and real-time performance metrics across all environments. Segmenting users, applications and environments • Quickly understand what is happening, where and by whom in ways that are meaningful and aligned to how your business operates. Identify performance issues, bottleneck and noncompliant applications and queries • Spend less time wading through Hadoop logs, ResourceManager and source code to find issues with your data pipelines. Instead, use that time optimizing your environment. Add business context to better monitor your applications • Immediately understand the business impact of an issue, including the downstream implications, so you can rapidly take the right corrective action. Collaborate across teams to resolve issues faster • Collaboration between all roles that interact with an application, data scientists, developers and operations, the quality and efficiency of your application increases. 5 3
  • 10.
    Confidential SEGMENTATION 10 Pinpoint bottlenecks and identifycauses Signal Over Noise Quickly find and filter what you are looking for and save as a custom view Views can private, shared with a team, or made public Quickly view application data by cluster, owner, technology etc
  • 11.
    Confidential 5BESTPRACTICESTO ACHIEVE OPERATIONALREADINESS 11 IdentifyProblems 3 Performance monitoring and visibility into all of your big data applications • Increase the quality and efficiency of your deployments with a single integrated view of your data applications and real-time performance metrics across all environments. Segmenting users, applications and environments • Quickly understand what is happening, where and by whom in ways that are meaningful and aligned to how your business operates. Identify performance issues, bottleneck and noncompliant applications and queries • Spend less time wading through Hadoop logs, ResourceManager and source code to find issues with your data pipelines. Instead, use that time optimizing your environment. Add business context to better monitor your applications • Immediately understand the business impact of an issue, including the downstream implications, so you can rapidly take the right corrective action. Collaborate across teams to resolve issues faster • Collaboration between all roles that interact with an application, data scientists, developers and operations, the quality and efficiency of your application increases. 1 2 4 5
  • 12.
    Confidential QUICKLY DRILLDOWNTO EXPOSEROOTCAUSE 12 CreateJIRA issues with views and data for quickly collaborating to resolve performance problems With one click, create a Jira issue with a link to this view
  • 13.
    Confidential IDENTIFYBOTTLENECKS ANDSLOWDOWNS 13 Pinpoint bottlenecksand identify causes Pinpoint bottlenecks and identify causes CHOOSE METRICSUNDERSTAND BEHAVIORS VISUALIZE SLOWDOWNSDRILL DOWN TO QUERY PERFORMANCE VIEW
  • 14.
    Confidential 5BESTPRACTICESTO ACHIEVE OPERATIONALREADINESS 14 AddContext 4 Performance monitoring and visibility into all of your big data applications • Increase the quality and efficiency of your deployments with a single integrated view of your data applications and real-time performance metrics across all environments. Segmenting users, applications and environments • Quickly understand what is happening, where and by whom in ways that are meaningful and aligned to how your business operates. Identify performance issues, bottleneck and noncompliant applications and queries • Spend less time wading through Hadoop logs, ResourceManager and source code to find issues with your data pipelines. Instead, use that time optimizing your environment. Add business context to better monitor your applications • Immediately understand the business impact of an issue, including the downstream implications, so you can rapidly take the right corrective action. Collaborate across teams to resolve issues faster • Collaboration between all roles that interact with an application, data scientists, developers and operations, the quality and efficiency of your application increases. 1 2 3 5
  • 15.
    Confidential BUSINESSCONTEXT 15 Leverage metadata toalign applications with their business context View and sort by application metadata Visualize executions and resource contention Understand concurrency
  • 16.
    Confidential16 SURFACE ALL FAILURES Quicklyidentify all failing applications App Name Owner Organization Cluster A or B Privacy Level Production or Dev Custom Tags More … Not all problems are created equal
  • 17.
    Confidential 5BESTPRACTICESTO ACHIEVE OPERATIONALREADINESS 17 Collaborate 5 Performancemonitoring and visibility into all of your big data applications • Increase the quality and efficiency of your deployments with a single integrated view of your data applications and real-time performance metrics across all environments. Segmenting users, applications and environments • Quickly understand what is happening, where and by whom in ways that are meaningful and aligned to how your business operates. Identify performance issues, bottleneck and noncompliant applications and queries • Spend less time wading through Hadoop logs, ResourceManager and source code to find issues with your data pipelines. Instead, use that time optimizing your environment. Add business context to better monitor your applications • Immediately understand the business impact of an issue, including the downstream implications, so you can rapidly take the right corrective action. Collaborate across teams to resolve issues faster • Collaboration between all roles that interact with an application, data scientists, developers and operations, the quality and efficiency of your application increases. 1 2 3 4
  • 18.
    Confidential NURTUREACULTUREOFOPERATIONALEXCELLENCE 18 Ensure that business,development, IT operations can collaborate seamlessly when it matters
  • 19.
    Confidential LET’S TAKE ATOUR Fora walk-through of all the features of Driven, Go to our Showcase interactive demo http://showcase.driven.io
  • 20.
  • 21.
  • 22.
    Confidential End-to-end operational telemetrymetadata for big data applications Accessible via Web browser, command-line interface (CLI), or simple search queries Easy integrations through JMX and upcoming Driven SDK …THROUGH ASCALABLE, SEARCHABLE METADATA STORE Telemetry metadata (SSL) YARN HADOOP APPS AND INFRASTRUCTURE APPLICATIONS Plugin 22 HADOOP CLUSTERS WARfiles Web App Server Server Web CLI JMX Web App Server SCALE OUT SCALE OUT
  • 23.
    Confidential Commercial Training Consulting Community Free communitysupport through our mailing list and our online forums www.cascading.org/support | forums.cascading.io We offer short-term consulting engagements designed to help customers with mentoring and best practices Developer training for Cascading and Scalding, private training also available www.cascading.io/services/training/ Varying levels of technical support are available to support your production deployments www.cascading.io/services/support Supporting our Customers & Community SUPPORTOPTIONS 23