Taming the Elephant - Learn how    Monsanto manages their Hadoop clusters    to enable Genome/Sequence processing         ...
Agenda• Introductions• Monsanto Hadoop Use Case     • Operational Challenges     • How Monsanto leverages Cloudera Manager...
Introductions    • Monsanto      • Erich Hochmuth – R&D IT Data & Analytics Lead      • Mark Seidenstricker – Infrastructu...
Monsanto Serves Farmers Around the World    Working With Growers Large and Small, Row Crops and Vegetables4
Monsanto’s Approach to Driving Yield    A System of Agriculture Working Together to Boost Productivity                    ...
Increasing Yield through Big Data    At the Cornerstone of Yield Increases is Information & Analytics                     ...
What are the Challenges of managing a Hadoop Cluster?    Software Provisioning & Configuration Management        •   Autom...
What are the Solutions?    With Cloudera Manager, you get…    Intuitive Management Console         •   Mission control sty...
What are the Benefits of Cloudera Manager?    Lowers the barrier for Hadoop administration        •   Do not need to rely ...
Cloudera Enterprise – The Platform for Big Data10
Why You Need Cloudera Manager?     Complexity services running across many machines     Hadoop is more than a dozen       ...
Cloudera Manager     End-to-End Administration for CDH     1   Deploy         Install, configure & start your cluster in 3...
Managing Complexity       One Tool For Everything DEPLOYMENT &                                                            ...
Raw Data vs. Hadoop Intelligence     Providing Context                                   1   Smart Configuration          ...
Cloudera Manager Key Features                  Installs the complete Hadoop stack in minutes via a wizard-based interface ...
Cloudera Manager Key Features (Contd..)                  Gather, view and search Hadoop logs collected from across the clu...
Cloudera Manager Roadmap     •   Cloudera Manager 4.1 – Released 10/24           • Platform Support for CDH4.1           •...
Why Cloudera Manager?      Simple administration in a single tool      End-to-End Hadoop      Intelligentsystem level – Cl...
Next Steps     • Try out FREE edition of Cloudera Manager        •   Download from:            http://www.cloudera.com/pro...
Q&A20
Key Features     Cloudera Manager22
Install A Cluster In 3 Simple Steps     Cloudera Manager Key Features                  1             Find Nodes           ...
View Service Health & Performance     Cloudera Manager Key Features24
Get Host-Level Snapshots     Cloudera Manager Key Features25
Monitor & Diagnose Cluster Workloads     Cloudera Manager Key Features26
Gather, View & Search Hadoop Logs     Cloudera Manager Key Features27
Track Events From Across The Cluster     Cloudera Manager Key Features28
Report On System Performance & Usage     Cloudera Manager Key Features29
Visualize Health Status With Heatmaps     Cloudera Manager Key Features30
Manage Multiple CDH Clusters     Cloudera Manager Key Features31
Easily Configure High Availability     Cloudera Manager Key Features32
Set The Time Context Globally     Cloudera Manager Key Features33
Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequen...
Upcoming SlideShare
Loading in …5
×

Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

1,548 views
1,420 views

Published on

Managing Hadoop clusters to meet business needs can be challenging. Learn how Monsanto has effectively tamed the elephant using Cloudera Manager.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,548
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
67
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Monsanto is a St. Louis-based agricultural company with one goal in mind – produce more food, fiber and fuel using less inputs like water and land, while improving the lives of the people around the world that benefit from our technology.Monsanto utilizes a systems approach to improving upon today’s agricultural offerings – Breeding, Biotechnology, and Advanced Agronomic Practices These three facets of our approach help farmers improve productivity, reduce the costs of farming, and grow better foods for consumers and better feed for animals.We’re proud to have customers of all kinds; from large-acre, technology-driven row-crop farmers in Central Illinois all the way to farmers with very small landholdings who are just beginning to realize the benefits of modern agriculture in Africa.
  • Sustainably increasing yield, while more efficiently using inputs and resources, requires every tool at farmers’ disposal. At Monsanto, we’re focused on three pillars for driving yield: breeding, biotechnology and improved agronomic practices. All three are required to meet our goals.Basics of Breeding Breeding, a technique that has been practiced by farmers for thousands of years, involves bringing together two parent plants to produce a new offspring that contains a mixture of parent characteristics. Monsanto has assembled a pool of elite seed genetics (germplasm) from around the world, and we use cutting-edge technology to help us more quickly, efficiently and accurately find desired traits for breeding. Our primary method is using genetic analysis – mapping the DNA of plants – to identify seeds with traits we want, such as improved yield, disease resistance, suitability for a particular climate, and in the case of vegetables better taste and nutrition.Basics of Biotechnology Biotechnology is the process of inserting a gene from one species, like a plant or a bacterium, into another species. We use biotechnology to give plants desirable characteristics (or traits) that often cannot be developed through breeding practices. The traits we develop help farmers produce more of their crop, reduce costs and conserve resources. Examples of these traits would be herbicide tolerance, insect-resistance and drought-tolerance. We also are working to develop traits that will benefit consumers, such as soybeans that produce healthier oils.Basics of AgronomicsAgronomic practices are steps farmers incorporate into their farm management systems to improve soil quality, enhance water use, manage crop residue and improve the environment through better fertilizer management. These steps not only improve a farmer’s bottom line by decreasing input costs, but also improve the environment by decreasing water use and over-fertilization. Improved agronomics cover a broad range of practices, suitable for any type of farm. For example, a high-tech, high productivity grower may use GPS and computer systems to automate planting for optimal row spacing and varying inputs acre by acre, to produce more and conserve more. A subsistence farmer can see significant benefits by learning about input management and optimal plant spacing to reduce costs and improve yield. Conservation tillage is a broadly applicable technique that preserves topsoil and locks in moisture.
  • Strata + Hadoop World 2012: Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing

    1. 1. Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing Erich Hochmuth Bala Venkatrao Mark Seidenstricker Aparna Ramani• Hadoop World 2012, New York, October 25th, 2012
    2. 2. Agenda• Introductions• Monsanto Hadoop Use Case • Operational Challenges • How Monsanto leverages Cloudera Manager & Product Demo • Key benefits of using Cloudera Manager• Cloudera Manager • Overview • Key Features • Roadmap• Q&A2
    3. 3. Introductions • Monsanto • Erich Hochmuth – R&D IT Data & Analytics Lead • Mark Seidenstricker – Infrastructure R&D Architect • Cloudera • Bala Venkartrao – Director, Products • Aparna Ramani – Director, Engineering3
    4. 4. Monsanto Serves Farmers Around the World Working With Growers Large and Small, Row Crops and Vegetables4
    5. 5. Monsanto’s Approach to Driving Yield A System of Agriculture Working Together to Boost Productivity BREEDING BIOTECHNOLOGY AGRONOMICS The art and science The science of improving The farm management of combining genetic material plants by inserting genes practices involved in to produce a new seed into their DNA growing plants5
    6. 6. Increasing Yield through Big Data At the Cornerstone of Yield Increases is Information & Analytics Increased Yield Variety Volume Velocity • Raw Sequence data • PBs of NGS data • 10’s millions yield dps/day • Unstructured sensor data • 10’s TBs of genomic data • 100’s million genotyping dps/day • Poly-structured genomic data • TBs of yield data • TBs of NGS data/week • Spatial data • Billions of genotyping dps6
    7. 7. What are the Challenges of managing a Hadoop Cluster? Software Provisioning & Configuration Management • Automated & simplified installation/patch management • Streamlined cluster configuration Enterprise –ready Tools • Enterprise grade monitoring & management capabilities • Integration with existing enterprise IT stack Reporting & Monitoring • Proactive monitoring & alerting • Capacity planning Support • Midwest Location • Lack of Hadoop expertise7
    8. 8. What are the Solutions? With Cloudera Manager, you get… Intuitive Management Console • Mission control style dashboard for entire cluster • Centralized management of entire Hadoop ecosystem • Treat the cluster as an appliance • Configuration change audit & validation Integration with Enterprise IT Management Tools • Connect to Corporate LDAP • Cloudera Manager API integrates with existing BMC platform Comprehensive Monitoring & Alerting • Proactive service level alerts • Summarized cluster level graphs & charts • Real-time series charts (MapReduce & HBase) Historical Cluster Metrics/Reports • Capacity planning - Disk usage/ Slot Capacity8
    9. 9. What are the Benefits of Cloudera Manager? Lowers the barrier for Hadoop administration • Do not need to rely on experts solely • Reduces the number of administrators needed Provides a “one-stop” holistic view • Easy to understand how the overall cluster is performing Includes pre-tuned configuration with best practices • Get straight to solving the business problem Integrates with Cloudera support • Leverage the real experts…not just for bugs9
    10. 10. Cloudera Enterprise – The Platform for Big Data10
    11. 11. Why You Need Cloudera Manager? Complexity services running across many machines Hadoop is more than a dozen • Hundreds of hardware components • Thousands of settings • Limitless permutations Context not just a collection of parts Hadoop is a system, • Everything is interrelated • Raw data about individual pieces is not enough • Must extract what’s important Efficiency multiple tools & manual process takes longer Managing Hadoop with • Complicated, error-prone workflows • Longer issue resolution • Lack of consistent & repeatable processes11
    12. 12. Cloudera Manager End-to-End Administration for CDH 1 Deploy Install, configure & start your cluster in 3 simple steps 2 Configure & Optimize Ensure optimal settings for all hosts & services 3 Monitor, Diagnose & Report Find & fix problems quickly, view current & historical activity & resource usage12
    13. 13. Managing Complexity One Tool For Everything DEPLOYMENT & ACTIVITY MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING CONFIGURATION MONITORINGDO-IT-YOURSELF +CLOUDERA ENTERPRISE “In a recent Cloudera survey, >95% of respondents emphasized the importance of having a single end-to-end tool to manage their Hadoop Operations” 13
    14. 14. Raw Data vs. Hadoop Intelligence Providing Context 1 Smart Configuration ? Auto-sets configurations & guards against user error VS. 2 Workflows Ensures that multi-step tasks are accomplished completely & in the correct sequence 3 Dependencies Aware of how a particular action affects the rest of the cluster & manages the impact 4 Events & Alerts Makes you aware of what’s important at a Hadoop system level 5 History Compares current & past activities for context14
    15. 15. Cloudera Manager Key Features Installs the complete Hadoop stack in minutes via a wizard-based interface Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single interface Allows you to manage multiple clusters from a single instance of Cloudera Manager Integrate Cloudera Manager with Active Directory Establishes the time context globally for almost all views Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis Set server roles, configure services and manage security across the cluster Gracefully start, stop and restart of services as needed Supports Administrator and Read-Only users Maintains a complete record of configuration changes with the ability to roll back to previous states Monitors dozens of service performance metrics and alerts you when you approach critical thresholds15
    16. 16. Cloudera Manager Key Features (Contd..) Gather, view and search Hadoop logs collected from across the cluster Scans Hadoop logs for irregularities and warns you before they impact the cluster Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching Generates email alerts when certain events occur Consolidates all cluster activity into a single, real-time view View information pertaining to hosts in your cluster including status, resident memory, virtual memory and roles Visualize health status and metrics across the cluster to quickly identify problem nodes and take action Visualize current and historical disk usage by user, group and directory Track MapReduce activity on the cluster by job or user Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools16
    17. 17. Cloudera Manager Roadmap • Cloudera Manager 4.1 – Released 10/24 • Platform Support for CDH4.1 • Cloudera Impala management & monitoring • New monitoring – Zookeeper, Flume NG • Maintenance Mode • Host Decommissioning • Several Usability Enhancements • Cloudera Manager 4.5 – Early 2013 • Rolling Upgrades/ Restarts • Enhanced Monitoring, Cluster Heatmaps etc. • Role Groups Configuration • Cloud Support • Others – SNMP support, Error handling, ISV integration etc.17
    18. 18. Why Cloudera Manager? Simple administration in a single tool End-to-End Hadoop Intelligentsystem level – Cloudera’s experience realized in software Manages Hadoop at a Efficient workflows & makes administrators more productive Simplifies complex Best-in-Class management application available The only enterprise-grade Hadoop18
    19. 19. Next Steps • Try out FREE edition of Cloudera Manager • Download from: http://www.cloudera.com/products-services/tools/ • Support available via scm-users@cloudera.org • For Cloudera Enterprise subscriptions, please contact: sales@cloudera.com19
    20. 20. Q&A20
    21. 21. Key Features Cloudera Manager22
    22. 22. Install A Cluster In 3 Simple Steps Cloudera Manager Key Features 1 Find Nodes 2 Install Components 3 Assign Roles Enter the names of the hosts which will be Cloudera Manager automatically installs the CDH Verify the roles of the nodes within your cluster.included in the Hadoop cluster. Click Continue. components on the hosts you specified. Make changes as necessary.23
    23. 23. View Service Health & Performance Cloudera Manager Key Features24
    24. 24. Get Host-Level Snapshots Cloudera Manager Key Features25
    25. 25. Monitor & Diagnose Cluster Workloads Cloudera Manager Key Features26
    26. 26. Gather, View & Search Hadoop Logs Cloudera Manager Key Features27
    27. 27. Track Events From Across The Cluster Cloudera Manager Key Features28
    28. 28. Report On System Performance & Usage Cloudera Manager Key Features29
    29. 29. Visualize Health Status With Heatmaps Cloudera Manager Key Features30
    30. 30. Manage Multiple CDH Clusters Cloudera Manager Key Features31
    31. 31. Easily Configure High Availability Cloudera Manager Key Features32
    32. 32. Set The Time Context Globally Cloudera Manager Key Features33

    ×