• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

How to implement hadoop successfuly

on

  • 1,818 views

 

Statistics

Views

Total Views
1,818
Views on SlideShare
1,816
Embed Views
2

Actions

Likes
1
Downloads
45
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hadoop in the Enterprise Eco SystemHadoop is designed to solve Big Data problems encountered by Web and Social companies. In doing so, lot of the features Enterprises need or want are put on the back seat. For example HDFS does not offer native support for security and authentication.Hadoop is NOT cheapHardware Cost - lets say a Hadoop node is $5000. A 100 node cluster would be $500,000 for hardware.IT and Operations costs - teams like : Network Admins, IT, Security Admins, System Admins. Also one needs to think about operational costs like Data Center expenses : cooling, electricity ..etcHadoop is still rough on the edgesThe development and admin tools for Hadoop are still pretty new. Companies like Cloudera, Horton Works, MapR and Karmasphere have been working on this issue. How ever the tooling may not be as mature as Enterprises are used to (say Oracle Admin ..etc)

How to implement hadoop successfuly How to implement hadoop successfuly Presentation Transcript

  • How To Implement HadoopSuccessfully!Based on: Avinash KaushikBy Adir Sharabi
  • 24%of Hadoop projects areactually in production.OnlyBy Rainstor
  • About ConduitOver 250 million active end usersMore than 260,000 publishersOver 3 billion monthly user interactionsDeployed in 120 countriesFounded in 2005Acquired Wibiya in 2011
  • Product OfferingB2B B2C
  • Agg.FilesUsageFilesUsage RecordsHadoopHbaseHDFSDWHProductOptimization EngineInsightsHiveMySQLHueIntegration ServicesReporting ServicesBusiness ObjectsRMahoutOozieConduit’s Data PlatformBusinessStreamingKafka WEPsReal TimeMonitoring
  • Tip #1Dont buy the hype of„big data‟ and throwmillions of dollarsaway, but don‟t stand still.
  • Tip #1 Select 1 well defined use case Small super-smart team Experiment on the cloud Quantify the effort and value for your organization „fail faster while failing forward‟
  • Conduit’s initial use caseMerge ExtractUsers Pings Users Table DailyInstallations50M 600M7 Hour 1 HourBefore: 8-10 HoursMerge ExtractUsers Pings Users Table DailyInstallations120M 2.2BToday: 30 Minutes!
  • 020406080100120140160180200220240260280300320340360380400420440460data size (TB) # of NodesConduit’s Big Data Growth (5TB to 500TB)Jan 2009DWH LaunchedMar 2010Hadoop Launchedon cloud (8 nodes)Feb 2011Hadoop Deployedon conduit’s data center(72 nodes)Jan & Oct 2012Procurement(105/120 nodes)Sep 2013Procurement – DR
  • Conduit’s Data Platform in Numbers• Hardware:125 Nodes (+70 after DR) on 6 racksTB Used/1.2 PB Total• Daily processed data:50,000 files500,000,000 records700 GB• Daily jobs submitted: Over 5,000• Data freshness: 60 minutes
  • Tip #2Data is turning challengesinto business opportunities.
  • 8%8%9%9%10%11%13%15%19%0% 5% 10% 15% 20%analyze complete rather than partial data setsotherCustomer intelligence for more targetedmarketingInclude more semi-structure/unstructured infointo decision makingImprove scientific researchETLlog analysisReduce cost of data analysisMine data for business intelligenceUse Cases
  • Business Model Maturity IndexBusinessInsightsBusinessOptimizationBusinessMonitoringDataMonetizationBusinessMetamorphosisMonitoringbusinessperformance toflag areas ofinterestIntegrate insights&recommendationsinto existingbusiness processesEmbed analyticsto optimizebusinessprocessesLeverage insightsto identify newrevenueopportunitiesTransformcustomer andproduct insightsto move intonew markets© Copyright 2013 EMC Corporation. All rights reserved
  • But… Hadoop in the Enterprise Eco System – lot of the featuresEnterprises need or want are put on the back seat Hadoop is NOT cheap (H/W & operations cost) – Makesure company‟s decision makers are on board Hadoop is still rough on the edges – tooling may not beas mature as Enterprises are used to Data access is batch oriented
  • Tip #3The 10/90 rule for magnificentdata success.
  • Tip #3 Nurture your „big brains‟ Hadoop cutting edge technology – Investment in relatedskills and training is crucial Good Data Scientists are “unicorns” Embrace the Open Source culture it will payoff BI team is essential for connecting the dots
  • Data Roles @ ConduitProductMobileData Infra TeamData BI TeamData Science TeamWibiya Quick LaunchToolbarBIScientist Scientist Scientist ScientistBI BI BIOtherScientistBI
  • Tip #4Shoot for right time data,not real time data.
  • Tip #4 Complex decision making is time consuming thereforeunable to react in real time Real time is expensive! Taylor the right solution to accommodate the required datafreshness Focus on big things!
  • Data Maturity vs. Freshness @Conduit10 60LowMediumHighReal TimeMonitoringHue/HiveReportingServiceAdvancedAnalyticsModelsBusinessObjectiveAdvancedAnalyticsModelsReportingServiceFreshnessData Maturity(Structured, cleansed &completedHadoopDWHKafka
  • Tip #5Data quality sucks,just get over it!
  • Tip #5 Data will be dirty, schema-less, no foreign keys And yet, we are standing on a mountain of gold! Make your best and know when to shift to data analysis Tune your algorithms to tolerate data deficiencies thenhunt for insights Big data is not Data Warehouse
  • Tip #6Democratize the data.
  • Tip #6
  • Tip #6
  • Tip #6
  • Tip #6
  • Tip #6 Break down barriers preventing our users/applications fromusing their valuable data in more effective ways to gleanmeaningful insights Provide your users advanced self service tools to access thedata Hadoop ecosystem evolving as we speak Your performance is measured by the tools effectivenessand ease of use
  • To Summarize…• Start small• Identify the opportunities• Invest in people & related skills• Adjust processes to the organization needs• Know your data limits• Self Service Tools are extremely important
  • Q&Ail.linkedin.com/pub/adir-sharabi/3b/6ab/510/