Hadoop summit-ams-2014-04-03


Published on

Criteo slides form the Hadoop summit in Amsterdam

Published in: Engineering, Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • http://www.shutterstock.com/pic.mhtml?id=95662684Who are we ?* Serving the right ad….* Slide wasimposed by MarketingYouprobablywillencounter the cloud versus in-house dilemniaKey factor is the elastic aspect ;we use our cluster 100% of the time ;wealready have DCs ;in-house waslessexpensive
  • This is the story of a growing and successfull startup usingHadoop.Growingmeansincreased volume. Successfullmeansbuckloads of cash to grow the infrastructure. Startup meansverysmall teams to manage the wholething.PoCiseasyWhenyou gain traction, everythingwill go fastWentfrom 12 nodes to 150 2 yearsago, to 600 today, above 1000 by the end of the year.Whyisitgrowingthatfast ?Virtuouscircle :Variousteam aregatheringskillsBI analysts: the more theyget, the more theywantHadoop shows mutualizationbenefits, platform to consolidate ad-hoc data processingtoolsYou business will boom thanks to hadoop adoption
  • Becauseyouneed to scale infrastructure:Automate operations (prod VS devops)Tune hadoop system (Hardware, Linux, Hadoopitself)Specifically networkThis is about scaling the infrastructure. Withhundreds of clients usinghadoop as a service, youalsoneed to scale infrastructure usage. For instance: multi-tenancy.Managing ressource contention Mapreduce, storageMaintainsecurity (user sandboxingthroughauthorization & authentication)Allowhadoop to beused as a service
  • Don’tdo anything by hand ; youwillhurtyourselfmanagingthousands of serversBuildthings once, runforeverThe choice of freedom : don’tbebound to a specificvendor ; eg. We use CDH4.5.0 right now, but could, and probablywill switch to HDPFull stack automation : frombare-metal to live service
  • Our cluster are turnkeydeployed once hosting and network have finishedtheirworkWeassigneverynode a role, and the hosts will boot and setup themselvesaccordingly
  • Why a diskless system ?You want maximumstoragedensityThereforeyouwant to fillthose 14 slots per server with 3TB drivesThereforeyouwill break a nicesymmetry if yourun the OS fromdiskNot theoretical: hands-on experience on 150-node cluster operated for 1.5 yearsHard constraint; but veryworthwhile:RemoteloggingcompulsoryNothinghiddenfrom automation system2GB of RAM per node (2% of RAM)
  • How do weachievethisMinimize size of diskless imageBoot chef as soon as youcan, and let it flow fromthereInitial chef roleis an inventoryrole. Chef used for management of updates, OS, service deployment.
  • Maintenance : * EvolutiveUpgradingyour distribution regularly (don’twant to lagbehind)* CorrectiveHadoopworksbetterwhenyoujustdon’ttouchit* HowEverythingistested on a PREPROD environmentProgressive deployment (rolling-out node by node) maybedisrupting for long running jobs
  • http://blog.cloudera.com/blog/2014/03/a-guide-to-checkpointing-in-hadoop/Monitor user facing interfaces : usersfrequentlyassimilatecluster’s condition to the JT’s or NN’s GUIMonitor yourJobtracker (MRv1 willeventuallygetstuck)MOST IMPORTANT OF ALL: CHECKPOINTINGMonitor the checkpoints of yourfsimage or youwill end-up with a namenode in reallybadshapeAt one point wewerehavingnearly 6 months of edits ;) 12 hours to start a NN ; urbanlegend of NN beingunsafe to restartMonitor HDFS disk usage and local disk usage
  • 10:00http://www.flickr.com/photos/76588645
  • In a realworld system most of yourtaskswillbe IO boundReadahead ! Very importantWhenyou hit a performance bottleneck, the first thing to watch for is *outside* hadoop, becauseHadoopis a DOS to yourwhole infrastructureUse infrastructure local caches as much as youcan
  • Default parameters are usable for small clusters / smallnodesThese are examples, wehad to tune a significant part of themDetaillist of significantones + explanations
  • Default parameters are usable for small clusters / smallnodesThese are examples, wehad to tune a significant part of themLog settings theywillkillyour JT / NNHandler countsSeparate the thread pool for internal / external clients. Alsoeasier for firewallingHA has somedownsides (checkpointing)
  • One of the first thingthatyouwillgetwhenyou move yourhadoop cluster past a rack isyour network engineersyelling at you.Plan aheadyour network topology !
  • Fat treesuited to North/South traffic.Hadoop uses the network as a bus: East-Westlayer2 FabricPath/TrillLayer3 BGP
  • Soundsobvious but impementing a correct definition of the rack topologyregarding network isvery importantlldp information flackingdepending on which interface withask (4 interfaces bonding)
  • 20:00Hadoopis a shared ressource.Whenyour usagegrowsyouwill face ressource starvation and contention. This will lead to twoproblems:1) Accountability: You willneed to report ressource accountabilitynumbers to plan for growth and optimize2) Maintain a good user experienceHDFS quotas ; but has bugs in fsimagecheckpointingScheduling ; user facingproblem ; requireseducation to understand the time/spacefolding; achievewelldesigned jobs (mapper ~ 3 to 15 minutes)YARN solvesmap/reduce ratio (+20% computing power)
  • Once yourealizeyourcompany ’s mostcritical data has landed on HadoopAnd youronlysecurity model isobscurityYouwillwant to switch to somethingbuilt-in and robustsecurity model.
  • Very good documentation fromCloudera, HortonworksNot sufficientthough:ironing out the problems (SPENGO) needs close integrationwith IT
  • Hadoop limitation: POSIX-levelaccess to HDFS HTTPFS worksaround the absence of a scalable(and workswithKerberos, too !)In oursystems, HDFS replacedcompletelyIsilonStreaming a sustained 200-400MB/s of logs into the clusterDon’tcreatebottlenecks ; address the connectivitywith a many-to-many pattern
  • JSON + GZIP is good enough for most uses.
  • http://www.flickr.com/photos/jarbo/9379813470
  • Hadoop summit-ams-2014-04-03

    1. 1. HADOOP, FROM LAB TO 24/7 PRODUCTION http://criteolabs.com/jobs
    2. 2. criteolabs.com/jobs Jean-Baptiste NOTE jb.note@criteo.com Ana DIN a.din@criteo.com From the Criteo HPC Team (+ Loïc / Serge / Maxime / Samuel / Yann / Stuart) ABOUT US
    4. 4. criteolabs.com/jobs « Anything that can go wrong - will go wrong » -- Murphy’s Law TALES OF A TECHNOLOGY ADOPTION
    5. 5. criteolabs.com/jobs Usage of Hadoop is growing exponentially • Learning curve is real • Analysts discover interesting things with raw data – Which causes them to ask more questions • Increased insight leads to a better product – Which leads to more data • Data gains in value and more is kept (and studied!) • YOU (the admin) are the bottleneck ! USAGE GROWTH
    6. 6. criteolabs.com/jobs • Administration automation • Hadoop configuration tuning • Network • Multitenancy TOPICS
    7. 7. criteolabs.com/jobs ADMINISTRATION AUTOMATION
    8. 8. criteolabs.com/jobs Rack and load! • Machine is racked, cabled and provisionned for a role • Chef is our one stop-shop for automation • Diskless system install AUTOMATING DEPLOYMENTS INSTA- CLUSTER!
    9. 9. criteolabs.com/jobs • Learn from the past • Previous cluster 1.5 years operation • 78% failure rate on /dev/sda at restart • Disk usage symmetry • Garanteed statelessness OS DISKLESS : WHY
    10. 10. criteolabs.com/jobs • PXE Boot on custom CentOs image • Automated Chef bootstrap • Everything done by Chef – Inventory – Firmware updates – OS / Service deployment OS DISKLESS : HOW
    11. 11. criteolabs.com/jobs • Evolutive maintenance (version bump) • Not much to do on normal ops • Most freq. issue is flacking / slow performing host • Use Preprod / Prod for infra changes • Progressive VS black out MAINTENANCE
    12. 12. criteolabs.com/jobs • User facing interfaces • Jobtracker • Fsimage checkpointing • HDFS usage and local disk usage MONITORING
    13. 13. criteolabs.com/jobs HADOOP CONFIG TUNING
    14. 14. criteolabs.com/jobs • Hadoop is a DDOS to your infrastructure – Increase ARP retention (L2-specific) – Use NSCD • Increase Read ahead • Disable THP compaction • MTU jumbo frames SYSTEM CONFIGS
    15. 15. criteolabs.com/jobs CLUSTER CONFIGS
    16. 16. criteolabs.com/jobs CLUSTER CONFIGS • Adjust log settings (default is INFO,console) • Increase handler counts (JT,NN,DN) • Use namenode.service.handler.count • Watch out for checkpointing loops
    17. 17. criteolabs.com/jobs NETWORK
    18. 18. criteolabs.com/jobs • One datacenter topology will not fit all • Web traffic VS Hadoop traffic • Historical Fat-tree hierarchy with layer 2 routing • Switched to meshed design (soon layer3) NETWORK TOPOLOGY
    19. 19. criteolabs.com/jobs • Rack awareness (of course !) – Performance – Reliability – Maintenance (eg. relocation) HADOOP TOPOLOGY
    20. 20. criteolabs.com/jobs • HDFS Quotas • Scheduling (user-facing) • Map / Reduce ratio • Use Yarn ! MULTITENANCY
    21. 21. criteolabs.com/jobs SECURITY
    22. 22. criteolabs.com/jobs • Dedicated kdc / realm • Dedicated services principals • Cross-realm trusts • Delegate user management to your IT KERBEROS SETUP
    23. 23. criteolabs.com/jobs • Use multiple proxies • Easy way to interconnect to the outside world • Data injection / read with a simple curl • High bandwidth transfers HTTPFS PROXIES
    24. 24. criteolabs.com/jobs • Multiple use cases (ML, BI analytics) • Baseline Json (+gzip) is ok • Don’t optimize too early • We still use it(*) at Peta scale (*) some teams also use Parquet and contributed to Hive integration FILE FORMATS
    25. 25. criteolabs.com/jobs QUESTIONS ?
    26. 26. criteolabs.com/jobs Did I say we’re hiring! We’re hiring lots of engineers in 2014. Come join us! http://criteolabs.com/jobs MY FELLOW CRITEOS WOULD KILL ME…