Your SlideShare is downloading. ×
Getting Started & Successfulwith Big Data© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555@Pe...
Your Hosts TodayPaul BrookCloud EMEA Program ManagerDell2© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (...
Pentaho Webinar Series3© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Sign-up at: pentaho.com
Goals for Today4© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555To Understand:• How to get a...
Complete Analytics and Visual Data ManagementHadoopNoSQL DatabasesData Discovery&VisualizationEnterprise&Ad Hoc ReportingP...
Data Warehouse OptimizationData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)ERPCRMCDRAnalyticD...
Steps To Start with Hadoop7© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555HadoopInstallatio...
Data Architecture and Integration ChallengesData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)E...
Data Architecture and Integration ChallengesData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)E...
Data Architecture and Integration ChallengesData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)E...
Solution Architecture & Demo11© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Fast and easy ...
Global MarketingFast and easy way todeploy Hadoopclusters with Dell
Global MarketingWell we are ready, buthow will the HardwareTeam know how to sizeand design the Hadoopcluster……..?I don’t k...
Global MarketingReduce time to Cluster Sizing, Design &DeploymentFaster time to productive operationsOptimize and adapt fo...
Global MarketingDell | HadoopSolution“Dell … was one of the first ofthe hardware vendors to graspthe fact that cloud is ab...
Global MarketingCrowbar• Acceleratesmulti-nodedeployments• Simplifiesmaintenance• StreamlinesongoingupdatesBuilt with DevO...
Global MarketingDeploy a Hadoop cluster in ~2 hoursReduce softwarelicensing fees100%Use Crowbar to:• Automate the deployme...
Global MarketingCrowbar dashboard provides visibility
Global MarketingLeverage developer expertise worldwideDownload the open source software:https://github.com/dellcloudedge/c...
Global MarketingDell.com/Crowbar
pentaho.com/download• Install on a local desktop – no need for acluster• “Managed Code” no additional installations• Penta...
Start LoadingLoading into HDFS & HIVE– Hadoop Copy Files– Specify source files / destination22© 2013, Pentaho. All Rights ...
Solution Architecture & Demo23© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Demo
Maximize Performance24© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555As much as15x faster t...
Additional Best Practices25© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555LeverageHadoop• D...
Solution Architecture & Demo26© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Q & A
27© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Contact Us or Sign-up at:pentaho.com
Upcoming SlideShare
Loading in...5
×

Big Data Integration Webinar: Getting Started With Hadoop Big Data

837

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
837
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Two major
  • TAKE-AWAYSPentaho provides complete integrated DI+BI for every leading big data platform.
  • The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  • The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  • The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  • The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  • Delivered as a hardware, software, and services Reference Architecture (RA) which can scale from 6-nodes up to 720-nodesCurrently utilizes PowerEdge C 2100/C6100/C6105 R720, R720XD servers and PowerConnect 6248 or Force 10 switchesDell CrowbarAutomated solution deployment and configuration (Bare metal, OS, Solution Stack, and Monitoring)CDH3 EnterpriseCloudera Hadoop DistributionCloudera Management ToolsCloudera SupportPartner EcosystemSoftware and services capabilities to address broader customer needs around HadoopEnabling non-technical business users to leverage HadoopSimplify getting data into HadoopIntuitive analytics reporting and dashboardsSolution Provided viaReference ArchitectureDeployment GuideDell Digital LockerDell Deployment Services
  • First OpenStack cloud solution provider in marketPioneer OpenStack partner (Only Day 1 hardware provider)Most history with the OpenStack technology = expertize + RA’s that have been tested longer and fuller than newcomersDell offers a deep partnership ecosystemSingle point of support and purchase to reduce the problem of dealing with multiple vendorsONLY company providing automated software to do multi-node OpenStack provisioning: CrowbarDell developed software that we opensourced in the community.OpenStack expertsize
  • Transcript of "Big Data Integration Webinar: Getting Started With Hadoop Big Data"

    1. 1. Getting Started & Successfulwith Big Data© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555@Pentaho #BigDataWebSeries
    2. 2. Your Hosts TodayPaul BrookCloud EMEA Program ManagerDell2© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Davy NysVP EMEA & APACPentahoChuck YarbroughTechnical Solutions MarketingPentaho
    3. 3. Pentaho Webinar Series3© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Sign-up at: pentaho.com
    4. 4. Goals for Today4© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555To Understand:• How to get a Hadoopcluster up and running• Where Hadoop andother pieces fit into thearchitecture• How you can easily getdata in & out Hadoop• How to leverageHadoop with Pentaho• Initial Best Practices
    5. 5. Complete Analytics and Visual Data ManagementHadoopNoSQL DatabasesData Discovery&VisualizationEnterprise&Ad Hoc ReportingPredictive Analytics&Machine LearningData Ingestion, Manipulation&IntegrationAnalytic Databases© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75555
    6. 6. Data Warehouse OptimizationData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)ERPCRMCDRAnalyticData Mart(s)AnalyticData Mart(s)AnalyticData Mart(s)LogsLogsOther DataRaw DataParsed DataAnalytic DatasetsMaster DataTapeArchive
    7. 7. Steps To Start with Hadoop7© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555HadoopInstallation• Install locally – as Pseudo-Distributedmode• Leverage tools like Dell Crowbar• Cloud sandbox• Easy download & installation• Start on desktopPentahoInstallation12• Extract or access data from sourcesystems• Load it (in its raw form) into Hadoop• Tokenize & parse as required• Transform & enrich• Load into destinationStartLoading3
    8. 8. Data Architecture and Integration ChallengesData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)ERPCRMCDRAnalyticData Mart(s)AnalyticData Mart(s)AnalyticData Mart(s)LogsLogsOther DataRaw DataParsed DataAnalytic DatasetsMaster DataNOSQL
    9. 9. Data Architecture and Integration ChallengesData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)ERPCRMCDRAnalyticData Mart(s)AnalyticData Mart(s)AnalyticData Mart(s)LogsLogsOther DataRaw DataParsed DataAnalytic DatasetsMaster DataNOSQLExtractTransformLoadOrchestration & IntegrationMR
    10. 10. Data Architecture and Integration ChallengesData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)ERPCRMCDRAnalyticData Mart(s)AnalyticData Mart(s)AnalyticData Mart(s)LogsLogsOther DataRaw DataParsed DataAnalytic DatasetsMaster DataNOSQLExtractTransformLoadOrchestration & IntegrationMR
    11. 11. Solution Architecture & Demo11© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Fast and easy way to deployHadoop clusters with Dell1
    12. 12. Global MarketingFast and easy way todeploy Hadoopclusters with Dell
    13. 13. Global MarketingWell we are ready, buthow will the HardwareTeam know how to sizeand design the Hadoopcluster……..?I don’t know….and itmay take a long time tobuild the Hadoop clusterTime is a criticalfactor, we need to getthis project moving
    14. 14. Global MarketingReduce time to Cluster Sizing, Design &DeploymentFaster time to productive operationsOptimize and adapt for your needsDeliver the best return on investmentReduce risk &increaseflexibilitywithDellDell.com/Crowbar
    15. 15. Global MarketingDell | HadoopSolution“Dell … was one of the first ofthe hardware vendors to graspthe fact that cloud is aboutprovisioning services, not aboutthe hardware.”Maxwell Cooter, Cloud ProExcels at supporting complex big dataanalyses across large collections ofstructured and unstructured data• Hadoop handles a variety of workloads,including search, log processing, datawarehousing, recommendation systems andvideo/image analysis• Work on the most modern scale-outarchitectures using a clean-sheet design dataframework• Without vendor lock-inApache Hadoop softwareCrowbar software framework with aHadoop barclampPowerEdge C8000 Series, C6220, R720, R720XDForce10 or PowerConnect switchesReference ArchitectureDeployment GuideJoint Service and SupportProven solutionsProven componentsPartner Ecosystem
    16. 16. Global MarketingCrowbar• Acceleratesmulti-nodedeployments• Simplifiesmaintenance• StreamlinesongoingupdatesBuilt with DevOps• Provides an operational model for managing bigdata clusters and cloudField-proven technologies• Build on locally deployed Chef Server• Raw servers to full cluster in <2 hours• Hardened with more than a year of deploymentsApache 2 open source• Multi-apps (Hadoop & OpenStack)• Multi-OS (Ubuntu, RHEL, CentOS, SUSE)NOT limited to Dell hardwareCrowbar Software FrameworkA modular, open source framework
    17. 17. Global MarketingDeploy a Hadoop cluster in ~2 hoursReduce softwarelicensing fees100%Use Crowbar to:• Automate the deployment andconfiguration of a Hadoop cluster• Quickly provision bare-metalservers from box to cluster withminimal intervention• Maintain, upgrade and evolve yourHadoop cluster over time• Leverage an open sourceframework backed by a growingglobal developer ecosystemReducedevelopment time4-6 mo.CrowbarsoftwareframeworkEvolve to meet your needs over time with built in DevOps
    18. 18. Global MarketingCrowbar dashboard provides visibility
    19. 19. Global MarketingLeverage developer expertise worldwideDownload the open source software:https://github.com/dellcloudedge/crowbarParticipate in an active communityhttp://lists.us.dell.com/mailman/listinfo/crowbarGet resources on the Wiki:https://github.com/dellcloudedge/crowbar/wikiVisit Dell.com/Crowbar,Crowbar@Dell.com
    20. 20. Global MarketingDell.com/Crowbar
    21. 21. pentaho.com/download• Install on a local desktop – no need for acluster• “Managed Code” no additional installations• Pentaho will write to the Hadoop DistributedCache for executionPentaho Installation21© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Scheduling Integration Manipulation Orchestration2
    22. 22. Start LoadingLoading into HDFS & HIVE– Hadoop Copy Files– Specify source files / destination22© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75553Loading into HBASE– Zookeeper host & port– Specify HBASE Mapping
    23. 23. Solution Architecture & Demo23© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Demo
    24. 24. Maximize Performance24© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555As much as15x faster thanhand-writtencode.Parallelexecution asMapReducein the Hadoopcluster.
    25. 25. Additional Best Practices25© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555LeverageHadoop• Don’t do database lookups inside aMapper/Reducer – bring the data set into HDFS• Don’t transfer data between two clusteringtechnologies – network overload• Start with a small data set and validate logic &performance outside the cluster• Gradually increase volumes and fine tune theapplication, cluster, data stores & networkDon’t Boilthe Ocean• Leverage the various technologies available• A combination of easy to use tools, powerfulscripting and custom coding provides the best mixIt’sAND…AND
    26. 26. Solution Architecture & Demo26© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Q & A
    27. 27. 27© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Contact Us or Sign-up at:pentaho.com

    ×