• Save
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 

Big Data Integration Webinar: Getting Started With Hadoop Big Data

on

  • 946 views

 

Statistics

Views

Total Views
946
Views on SlideShare
946
Embed Views
0

Actions

Likes
2
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Two major
  • TAKE-AWAYSPentaho provides complete integrated DI+BI for every leading big data platform.
  • The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  • The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  • The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  • The company decided to invest in Hadoop to ingest the raw CDR data into Hadoop along with other data. This frees DW capacity for high value transactional data; while also, lowering cost and meeting compliance requirements. And since the data is rescued from tape, it becomes available for reporting and analysis.
  • Delivered as a hardware, software, and services Reference Architecture (RA) which can scale from 6-nodes up to 720-nodesCurrently utilizes PowerEdge C 2100/C6100/C6105 R720, R720XD servers and PowerConnect 6248 or Force 10 switchesDell CrowbarAutomated solution deployment and configuration (Bare metal, OS, Solution Stack, and Monitoring)CDH3 EnterpriseCloudera Hadoop DistributionCloudera Management ToolsCloudera SupportPartner EcosystemSoftware and services capabilities to address broader customer needs around HadoopEnabling non-technical business users to leverage HadoopSimplify getting data into HadoopIntuitive analytics reporting and dashboardsSolution Provided viaReference ArchitectureDeployment GuideDell Digital LockerDell Deployment Services
  • First OpenStack cloud solution provider in marketPioneer OpenStack partner (Only Day 1 hardware provider)Most history with the OpenStack technology = expertize + RA’s that have been tested longer and fuller than newcomersDell offers a deep partnership ecosystemSingle point of support and purchase to reduce the problem of dealing with multiple vendorsONLY company providing automated software to do multi-node OpenStack provisioning: CrowbarDell developed software that we opensourced in the community.OpenStack expertsize

Big Data Integration Webinar: Getting Started With Hadoop Big Data Big Data Integration Webinar: Getting Started With Hadoop Big Data Presentation Transcript

  • Getting Started & Successfulwith Big Data© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555@Pentaho #BigDataWebSeries
  • Your Hosts TodayPaul BrookCloud EMEA Program ManagerDell2© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Davy NysVP EMEA & APACPentahoChuck YarbroughTechnical Solutions MarketingPentaho
  • Pentaho Webinar Series3© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Sign-up at: pentaho.com
  • Goals for Today4© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555To Understand:• How to get a Hadoopcluster up and running• Where Hadoop andother pieces fit into thearchitecture• How you can easily getdata in & out Hadoop• How to leverageHadoop with Pentaho• Initial Best Practices
  • Complete Analytics and Visual Data ManagementHadoopNoSQL DatabasesData Discovery&VisualizationEnterprise&Ad Hoc ReportingPredictive Analytics&Machine LearningData Ingestion, Manipulation&IntegrationAnalytic Databases© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75555
  • Data Warehouse OptimizationData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)ERPCRMCDRAnalyticData Mart(s)AnalyticData Mart(s)AnalyticData Mart(s)LogsLogsOther DataRaw DataParsed DataAnalytic DatasetsMaster DataTapeArchive
  • Steps To Start with Hadoop7© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555HadoopInstallation• Install locally – as Pseudo-Distributedmode• Leverage tools like Dell Crowbar• Cloud sandbox• Easy download & installation• Start on desktopPentahoInstallation12• Extract or access data from sourcesystems• Load it (in its raw form) into Hadoop• Tokenize & parse as required• Transform & enrich• Load into destinationStartLoading3
  • Data Architecture and Integration ChallengesData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)ERPCRMCDRAnalyticData Mart(s)AnalyticData Mart(s)AnalyticData Mart(s)LogsLogsOther DataRaw DataParsed DataAnalytic DatasetsMaster DataNOSQL
  • Data Architecture and Integration ChallengesData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)ERPCRMCDRAnalyticData Mart(s)AnalyticData Mart(s)AnalyticData Mart(s)LogsLogsOther DataRaw DataParsed DataAnalytic DatasetsMaster DataNOSQLExtractTransformLoadOrchestration & IntegrationMR
  • Data Architecture and Integration ChallengesData Sources Big Data ArchitectureData Warehouse(Master & Transactional Data)ERPCRMCDRAnalyticData Mart(s)AnalyticData Mart(s)AnalyticData Mart(s)LogsLogsOther DataRaw DataParsed DataAnalytic DatasetsMaster DataNOSQLExtractTransformLoadOrchestration & IntegrationMR
  • Solution Architecture & Demo11© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Fast and easy way to deployHadoop clusters with Dell1
  • Global MarketingFast and easy way todeploy Hadoopclusters with Dell
  • Global MarketingWell we are ready, buthow will the HardwareTeam know how to sizeand design the Hadoopcluster……..?I don’t know….and itmay take a long time tobuild the Hadoop clusterTime is a criticalfactor, we need to getthis project moving
  • Global MarketingReduce time to Cluster Sizing, Design &DeploymentFaster time to productive operationsOptimize and adapt for your needsDeliver the best return on investmentReduce risk &increaseflexibilitywithDellDell.com/Crowbar
  • Global MarketingDell | HadoopSolution“Dell … was one of the first ofthe hardware vendors to graspthe fact that cloud is aboutprovisioning services, not aboutthe hardware.”Maxwell Cooter, Cloud ProExcels at supporting complex big dataanalyses across large collections ofstructured and unstructured data• Hadoop handles a variety of workloads,including search, log processing, datawarehousing, recommendation systems andvideo/image analysis• Work on the most modern scale-outarchitectures using a clean-sheet design dataframework• Without vendor lock-inApache Hadoop softwareCrowbar software framework with aHadoop barclampPowerEdge C8000 Series, C6220, R720, R720XDForce10 or PowerConnect switchesReference ArchitectureDeployment GuideJoint Service and SupportProven solutionsProven componentsPartner Ecosystem
  • Global MarketingCrowbar• Acceleratesmulti-nodedeployments• Simplifiesmaintenance• StreamlinesongoingupdatesBuilt with DevOps• Provides an operational model for managing bigdata clusters and cloudField-proven technologies• Build on locally deployed Chef Server• Raw servers to full cluster in <2 hours• Hardened with more than a year of deploymentsApache 2 open source• Multi-apps (Hadoop & OpenStack)• Multi-OS (Ubuntu, RHEL, CentOS, SUSE)NOT limited to Dell hardwareCrowbar Software FrameworkA modular, open source framework
  • Global MarketingDeploy a Hadoop cluster in ~2 hoursReduce softwarelicensing fees100%Use Crowbar to:• Automate the deployment andconfiguration of a Hadoop cluster• Quickly provision bare-metalservers from box to cluster withminimal intervention• Maintain, upgrade and evolve yourHadoop cluster over time• Leverage an open sourceframework backed by a growingglobal developer ecosystemReducedevelopment time4-6 mo.CrowbarsoftwareframeworkEvolve to meet your needs over time with built in DevOps
  • Global MarketingCrowbar dashboard provides visibility
  • Global MarketingLeverage developer expertise worldwideDownload the open source software:https://github.com/dellcloudedge/crowbarParticipate in an active communityhttp://lists.us.dell.com/mailman/listinfo/crowbarGet resources on the Wiki:https://github.com/dellcloudedge/crowbar/wikiVisit Dell.com/Crowbar,Crowbar@Dell.com
  • Global MarketingDell.com/Crowbar
  • pentaho.com/download• Install on a local desktop – no need for acluster• “Managed Code” no additional installations• Pentaho will write to the Hadoop DistributedCache for executionPentaho Installation21© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Scheduling Integration Manipulation Orchestration2
  • Start LoadingLoading into HDFS & HIVE– Hadoop Copy Files– Specify source files / destination22© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75553Loading into HBASE– Zookeeper host & port– Specify HBASE Mapping
  • Solution Architecture & Demo23© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Demo
  • Maximize Performance24© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555As much as15x faster thanhand-writtencode.Parallelexecution asMapReducein the Hadoopcluster.
  • Additional Best Practices25© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555LeverageHadoop• Don’t do database lookups inside aMapper/Reducer – bring the data set into HDFS• Don’t transfer data between two clusteringtechnologies – network overload• Start with a small data set and validate logic &performance outside the cluster• Gradually increase volumes and fine tune theapplication, cluster, data stores & networkDon’t Boilthe Ocean• Leverage the various technologies available• A combination of easy to use tools, powerfulscripting and custom coding provides the best mixIt’sAND…AND
  • Solution Architecture & Demo26© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Q & A
  • 27© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555Contact Us or Sign-up at:pentaho.com