Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to implement hadoop successfuly


Published on

Published in: Technology, Business
  • Be the first to comment

How to implement hadoop successfuly

  1. 1. How To Implement HadoopSuccessfully!Based on: Avinash KaushikBy Adir Sharabi
  2. 2. 24%of Hadoop projects areactually in production.OnlyBy Rainstor
  3. 3. About ConduitOver 250 million active end usersMore than 260,000 publishersOver 3 billion monthly user interactionsDeployed in 120 countriesFounded in 2005Acquired Wibiya in 2011
  4. 4. Product OfferingB2B B2C
  5. 5. Agg.FilesUsageFilesUsage RecordsHadoopHbaseHDFSDWHProductOptimization EngineInsightsHiveMySQLHueIntegration ServicesReporting ServicesBusiness ObjectsRMahoutOozieConduit’s Data PlatformBusinessStreamingKafka WEPsReal TimeMonitoring
  6. 6. Tip #1Dont buy the hype of„big data‟ and throwmillions of dollarsaway, but don‟t stand still.
  7. 7. Tip #1 Select 1 well defined use case Small super-smart team Experiment on the cloud Quantify the effort and value for your organization „fail faster while failing forward‟
  8. 8. Conduit’s initial use caseMerge ExtractUsers Pings Users Table DailyInstallations50M 600M7 Hour 1 HourBefore: 8-10 HoursMerge ExtractUsers Pings Users Table DailyInstallations120M 2.2BToday: 30 Minutes!
  9. 9. 020406080100120140160180200220240260280300320340360380400420440460data size (TB) # of NodesConduit’s Big Data Growth (5TB to 500TB)Jan 2009DWH LaunchedMar 2010Hadoop Launchedon cloud (8 nodes)Feb 2011Hadoop Deployedon conduit’s data center(72 nodes)Jan & Oct 2012Procurement(105/120 nodes)Sep 2013Procurement – DR
  10. 10. Conduit’s Data Platform in Numbers• Hardware:125 Nodes (+70 after DR) on 6 racksTB Used/1.2 PB Total• Daily processed data:50,000 files500,000,000 records700 GB• Daily jobs submitted: Over 5,000• Data freshness: 60 minutes
  11. 11. Tip #2Data is turning challengesinto business opportunities.
  12. 12. 8%8%9%9%10%11%13%15%19%0% 5% 10% 15% 20%analyze complete rather than partial data setsotherCustomer intelligence for more targetedmarketingInclude more semi-structure/unstructured infointo decision makingImprove scientific researchETLlog analysisReduce cost of data analysisMine data for business intelligenceUse Cases
  13. 13. Business Model Maturity IndexBusinessInsightsBusinessOptimizationBusinessMonitoringDataMonetizationBusinessMetamorphosisMonitoringbusinessperformance toflag areas ofinterestIntegrate insights&recommendationsinto existingbusiness processesEmbed analyticsto optimizebusinessprocessesLeverage insightsto identify newrevenueopportunitiesTransformcustomer andproduct insightsto move intonew markets© Copyright 2013 EMC Corporation. All rights reserved
  14. 14. But… Hadoop in the Enterprise Eco System – lot of the featuresEnterprises need or want are put on the back seat Hadoop is NOT cheap (H/W & operations cost) – Makesure company‟s decision makers are on board Hadoop is still rough on the edges – tooling may not beas mature as Enterprises are used to Data access is batch oriented
  15. 15. Tip #3The 10/90 rule for magnificentdata success.
  16. 16. Tip #3 Nurture your „big brains‟ Hadoop cutting edge technology – Investment in relatedskills and training is crucial Good Data Scientists are “unicorns” Embrace the Open Source culture it will payoff BI team is essential for connecting the dots
  17. 17. Data Roles @ ConduitProductMobileData Infra TeamData BI TeamData Science TeamWibiya Quick LaunchToolbarBIScientist Scientist Scientist ScientistBI BI BIOtherScientistBI
  18. 18. Tip #4Shoot for right time data,not real time data.
  19. 19. Tip #4 Complex decision making is time consuming thereforeunable to react in real time Real time is expensive! Taylor the right solution to accommodate the required datafreshness Focus on big things!
  20. 20. Data Maturity vs. Freshness @Conduit10 60LowMediumHighReal TimeMonitoringHue/HiveReportingServiceAdvancedAnalyticsModelsBusinessObjectiveAdvancedAnalyticsModelsReportingServiceFreshnessData Maturity(Structured, cleansed &completedHadoopDWHKafka
  21. 21. Tip #5Data quality sucks,just get over it!
  22. 22. Tip #5 Data will be dirty, schema-less, no foreign keys And yet, we are standing on a mountain of gold! Make your best and know when to shift to data analysis Tune your algorithms to tolerate data deficiencies thenhunt for insights Big data is not Data Warehouse
  23. 23. Tip #6Democratize the data.
  24. 24. Tip #6
  25. 25. Tip #6
  26. 26. Tip #6
  27. 27. Tip #6
  28. 28. Tip #6 Break down barriers preventing our users/applications fromusing their valuable data in more effective ways to gleanmeaningful insights Provide your users advanced self service tools to access thedata Hadoop ecosystem evolving as we speak Your performance is measured by the tools effectivenessand ease of use
  29. 29. To Summarize…• Start small• Identify the opportunities• Invest in people & related skills• Adjust processes to the organization needs• Know your data limits• Self Service Tools are extremely important
  30. 30. Q&