August 4, 2011<br />Managing the Apache Hadoop lifecycle <br />Charles Zedlewski, Vice President, Product<br />
The good and bad news – Hadoop means business<br />©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or...
You have reasonable asks for Hadoop<br />©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistrib...
Quickly diagnose the root cause of issues so you know what to improve
Quickly take action and solve issues at their root cause
Continuously optimize policies to improve system availability and QOS in the long term</li></ul>Patch / Hot Fix<br />Resto...
But Hadoop is special…<br />Fault tolerant<br />Scalable<br />Widespread<br />©2011 Cloudera, Inc. All Rights Reserved. Co...
Verbose
Multi-layered
Hot market for skills</li></li></ul><li>Hadoop in POC – the hunter-gatherers<br />5<br />Copyright 2011 Cloudera Inc. All ...
First principles – set business goals<br />What are the business outcomes Hadoop is supposed to deliver?<br />New insights...
First principles – set operations goals<br />Performance<br />Utilization<br />Cost of operations<br />Availability<br />Q...
System design – stick to the basics<br />Hadoop needs to know where it’s hard drives are<br />Running on a virtualized lay...
Hadoop in production – the tribe<br />We have a chief that looks out for the tribe<br />Make sure there’s enough fire for ...
Train your chief!<br />Unix & DBA backgrounds are both valid starting points<br />10<br />Copyright 2011 Cloudera Inc. All...
Then empower your chief!<br />Managing Hadoop requires<br />Sensible selection of hardware<br />Visibility into users, job...
Discovery – monitoring & alerting<br />You want to anticipate & alert on:<br />Health checks & status of key nodes (Nameno...
Diagnosis<br />7 lenses into Hadoop, used in combination<br />Service metrics<br />System metrics<br />Configurations<br /...
Avoid the scripts<br />Script to run a check<br />Script to import a file<br />Script to preempt a job<br />Script to inst...
The web of scripts – where it ends<br />15<br />Copyright 2011 Cloudera Inc. All rights reserved<br />Nothing ever changes...
Hadoop as a standard platform<br />Fire is not a big deal any more.<br />Pollution, congestion, etc a concern<br />More sp...
Upcoming SlideShare
Loading in...5
×

Harnessing the Power of Apache Hadoop Series

1,679

Published on

How to Manage Your Apache Hadoop Lifecycle.
So you’ve got Apache Hadoop in development. Now what? In this webinar, Cloudera’s VP of Products Charles Zedlewski will explain how to plan for and manage the Apache Hadoop lifecycle inside a Cloudera deployment.

Published in: Technology
1 Comment
4 Likes
Statistics
Notes
  • Hi,
    I am recruiting you any for universalisation, charismation, divinisation and presentation,
    Sorry, for this comment, i have commented on topic for recession, but then i went universal, pardon me .... !
    i am not doing too much, i am doing what i think it has to be done ....
    my solution for recession is universalisation, means evaluate all resourcess and assets of universe and then apply necessary sum of new currency (Zik=100$) to pay all debts and to buy off all taxes from national governments ....
    of course for this we need adequate entity, i see on horizon only myself as the secular and universal, legal and official The God, recognised by UN and with contracts with all national states governments,
    of course i invite you all to create a fresh new account at google, free, but with my data: universal identities names and universal residence, like this: Zababau Ganetros Cirimbo Ostangu zaqaqef@gmail.com ogiriny64256142, ( you can create this one but then inform me), access to account i have to have because this is divinising universalisation, but you can open it for all, i simply have to arrange it to adapt to paradigm, isn't it ......
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,679
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Harnessing the Power of Apache Hadoop Series"

  1. 1. August 4, 2011<br />Managing the Apache Hadoop lifecycle <br />Charles Zedlewski, Vice President, Product<br />
  2. 2. The good and bad news – Hadoop means business<br />©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.<br />2<br />Use Case<br />Use Case<br />Industry<br />Application<br />Application<br />Clickstream Sessionization<br />Social Network Analysis<br />Clickstream Sessionization<br />Content Optimization<br />Web<br />Mediation<br />Network Analytics<br />Media<br />ADVANCED ANALYTICS<br />DATA PROCESSING<br />Data Factory<br />Loyalty & Promotions Analysis<br />Telco<br />Trade Reconciliation<br />Fraud Analysis<br />Retail<br />SIGINT<br />Entity Analysis<br />Financial<br />Genome Mapping<br />Sequencing Analysis<br />Federal<br />Bioinformatics<br />
  3. 3. You have reasonable asks for Hadoop<br />©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.<br />3<br />Activity Monitor<br /><ul><li>Discover issues before they start to impact your business & operational goals
  4. 4. Quickly diagnose the root cause of issues so you know what to improve
  5. 5. Quickly take action and solve issues at their root cause
  6. 6. Continuously optimize policies to improve system availability and QOS in the long term</li></ul>Patch / Hot Fix<br />Restore / Recover<br />Hadoop Operations<br />
  7. 7. But Hadoop is special…<br />Fault tolerant<br />Scalable<br />Widespread<br />©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.<br />4<br />Good<br />Bad<br /><ul><li>Distributed
  8. 8. Verbose
  9. 9. Multi-layered
  10. 10. Hot market for skills</li></li></ul><li>Hadoop in POC – the hunter-gatherers<br />5<br />Copyright 2011 Cloudera Inc. All rights reserved<br />Everyone does the same job<br />Still amazed by this fire thing<br />Life is nasty, brutish and short<br />Little distinction between job code, Hadoop code & configuration<br />
  11. 11. First principles – set business goals<br />What are the business outcomes Hadoop is supposed to deliver?<br />New insights<br />Lower business costs<br />Lower IT costs<br />More data under management<br />More revenue through better targeting, conversion<br />?<br />6<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  12. 12. First principles – set operations goals<br />Performance<br />Utilization<br />Cost of operations<br />Availability<br />Quality of service<br />Flexibility / elasticity<br />Security<br />Transparency<br />?<br />7<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  13. 13. System design – stick to the basics<br />Hadoop needs to know where it’s hard drives are<br />Running on a virtualized layer is a bad idea<br />RAID is a bad idea<br />Running on remote storage the worst idea<br />Servers - prioritize flexibility over bells and whistles<br />How easily will you be able to expand your cluster?<br />How easily can you evolve your core / spindle ratio?<br />How many companies support that exotic chip, card, drive, power supply, etc?<br />Network – prioritize quality over bells & whistles<br />10G on the backplane is usually unnecessary<br />Plan how to adapt your topology as your cluster grows<br />8<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  14. 14. Hadoop in production – the tribe<br />We have a chief that looks out for the tribe<br />Make sure there’s enough fire for everyone<br />Survival of the tribe is still the main concern<br />Job code distinct from the rest of Hadoop<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  15. 15. Train your chief!<br />Unix & DBA backgrounds are both valid starting points<br />10<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  16. 16. Then empower your chief!<br />Managing Hadoop requires<br />Sensible selection of hardware<br />Visibility into users, jobs, activities, hardware, operating system, services, logs and more<br />Ability to make changes to configurations, services, patch levels and more<br />In many organizations the chief is precluded from some of these decisions / actions by preexisting policy<br />Take an “appliance mentality” to Hadoop decision making<br />11<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  17. 17. Discovery – monitoring & alerting<br />You want to anticipate & alert on:<br />Health checks & status of key nodes (Namenode, Master, etc)<br />Completion & performance of jobs & pipelines (for SLA measurement)<br />System performance & availability<br />Log events (only specific ones)<br />12<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  18. 18. Diagnosis<br />7 lenses into Hadoop, used in combination<br />Service metrics<br />System metrics<br />Configurations<br />Change history<br />Log history<br />Activities, jobs & tasks<br />Stack trace / profiling<br />One lens rarely tells the whole story<br />13<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  19. 19. Avoid the scripts<br />Script to run a check<br />Script to import a file<br />Script to preempt a job<br />Script to instrument a daemon<br />Script to….<br />14<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  20. 20. The web of scripts – where it ends<br />15<br />Copyright 2011 Cloudera Inc. All rights reserved<br />Nothing ever changes or improves<br />Garish, jerry-rigged<br />Time goes into maintaining scripts, not achieving the objectives<br />One and only one person loves it<br />
  21. 21. Hadoop as a standard platform<br />Fire is not a big deal any more.<br />Pollution, congestion, etc a concern<br />More specialized roles<br />Patching, updating, upgrading, configuring and tuning are all distinct<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  22. 22. Optimize – plan for multi-tenancy<br />Definition – ability of disparate groups, users, data and workloads to operate concurrently on 1 logical Hadoop system<br />Multi-tenancy helps you get more of what you really want<br />Better performance<br />Better cost of operations<br />New insights<br />Greater availability<br />Multi-tenancy has some additional considerations<br />17<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  23. 23. Optimize – policies for permissions<br />Authentication<br />Don’t talk to strangers<br />Should integrate with existing IT infrastructure<br />Authentication (Kerberos) patches now part of CDH3<br />Authorization<br />Not everyone can access everything<br />Ex. Production data sets are read-only to quants / analysts. Analysts have home or group directories for derived data sets.<br />Mostly enforced via HDFS permissions; directory structure and organization is critical<br />Not as fine grained as column level access in EDW, RDBMS (but this is coming)<br />18<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  24. 24. Optimize – plan for resources<br />Tracking & establishing policies for usage cluster resources<br />Files, bytes and quotas thereof<br />Tasks, memory, IO, CPU, network and scheduling thereof<br />By now you’ve almost certainly graduated to a sophisticated scheduler<br />Policies to prevent bad behavior (e.g. auto-kill)<br />Monitor and track resource utilization across all groups<br />Periodically review queue / pool decisions to improve QOS<br />19<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  25. 25. Wrapping it up<br />The operational lifecycle for Hadoop is similar to other systems but Hadoop itself is not<br />The basics are not a good place to get creative<br />Think command center, not man cave<br />Multi-tenancy is an attractive opportunity with some additional operational burdens<br />There’s lots more work to do<br />20<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  26. 26. ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.<br />21<br />We appreciate your time and interest in<br />For Additional Information:<br />www.<br />cloudera.com<br />twitter.com/<br />cloudera<br />+1 (888) 789-1488<br />sales@cloudera.com<br />facebook.com/<br />cloudera<br />

×