• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community
 

Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community

on

  • 320 views

For more information:

For more information:

https://www.facebook.com/TheInevitableCloud
Linkedin: The Inevitable Cloud Community

Statistics

Views

Total Views
320
Views on SlideShare
320
Embed Views
0

Actions

Likes
0
Downloads
18
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community Presentation Transcript

    • Intro to Big Data and Apache HadoopDr. Amr Awadallah, CTO/Founder@awadallah, aaa@cloudera.com
    • Who is Cloudera?2What the EnterpriseRequires The market-leadingHadoop-based platformwith batch and real-timeprocessing frameworks A comprehensive suite ofsystem and datamanagement software Training and certificationprograms Comprehensive supportand consulting servicesExtensive PartnerEcosystem Over 400 partners acrosshardware, software andservicesThe Leader inBig DataManagement Deliver a revolutionarydata managementplatform based onApache Hadoop Enable organizations toimprove operationalefficiency and AskBigger Questions of alltheir dataCustomers & UsersAcross Industries More productiondeployments than allother vendors combined©2013 Cloudera, Inc. All Rights Reserved.
    • Data Has Changed in the Last 30 YearsDATAGROWTHEND-USERAPPLICATIONSTHE INTERNETMOBILE DEVICESSOPHISTICATEDMACHINESSTRUCTURED DATA – 10%1980 2012UNSTRUCTURED DATA – 90%3 ©2013 Cloudera, Inc. All Rights Reserved.
    • What if you wanted to…4DataQuestionSpeedUsageType/Form©2013 Cloudera, Inc. All Rights Reserved.
    • So what is Apache ?Self-HealingHigh-BandwidthClustered StorageByte StreamsFault-TolerantDistributed ProcessingSchema-on-Read12345245125134235134Input FileHDFS storage distributionNode A Node B Node C Node D Node E12345245125134235134Output FileMapReduce compute distributionNode A Node B Node C Node D Node EStorageCompute©2013 Cloudera, Inc. All Rights Reserved.5
    • 6Next-Gen Data Management©2013 Cloudera, Inc. All Rights Reserved.
    • The Key Benefit: Agility/Flexibility7Schema-on-Read (Hadoop):Schema-on-Write (RDBMS):• Prescriptive Data Modeling:• Create static DB schema• Transform data into RDBMS• Query data in RDBMS format• New columns must be addedexplicitly before new data canpropagate into the system.• Good for Known Unknowns(Repetition)• Descriptive Data Modeling:• Copy data in its native format• Create schema + parser• Query Data in its native format(does ETL on the fly)• New data can start flowing any timeand will appear retroactively once theschema/parser properly describes it.• Good for Unknown Unknowns(Exploration)©2013 Cloudera, Inc. All Rights Reserved.
    • Scalable Technology + Scalable Development8Grows without requiring developers tore-architect their algorithms/application©2013 Cloudera, Inc. All Rights Reserved.AUTO SCALE
    • Low ROB(but still a ton ofaggregate value)High ROBEconomics: Return on Byte9 ©2013 Cloudera, Inc. All Rights Reserved.
    • Cloud DeploymentCDH: Cloudera Distribution incl. Apache HadoopCoordinationDataIntegrationFastRead/WriteAccessBatch Processing LanguagesWeb ConsoleJob WorkflowMetadataAPACHE ZOOKEEPERAPACHE FLUME,APACHE SQOOP APACHE HBASEAPACHE PIG, APACHE HIVEHUEAPACHE OOZIEAPACHE HIVE MetaStoreInteractive SQLData Mining LibImpalaAPACHE MAHOUTAPACHE WHIRRBuild/Test:APACHEBIGTOPCloudera Manager Free Edition (Installation Wizard)©2013 Cloudera, Inc. All Rights Reserved.10Hadoop Core KernelMapReduce, HDFSConnectivityData Processing LibDataFu for PigODBC/JDBC/FUSE/HTTPS
    • Cloudera Enterprise11 ©2013 Cloudera, Inc. All Rights Reserved.
    • The Cloudera Solution Stack12CLOUDERAUNIVERSITYDEVELOPERTRAININGADMINISTRATORTRAININGDATA SCIENCETRAININGCERTIFICATIONPROGRAMSPROFESSIONAL SERVICESUSE CASE DISCOVERY NEW HADOOP DEPLOYMENT PROOF-OF-CONCEPTDEPLOYMENT CERTIFICATIONPROCESS & TEAMDEVELOPMENTPRODUCTION PILOTSMANAGEMENTSOFTWARE &TECHNICAL SUPPORT(SUBSCRIPTION)CDHINGEST STORE EXPLORE PROCESS ANALYZE SERVECMCLOUDERA MANAGERCSCLOUDERA SUPPORTOSSAPACHE HADOOP & OPEN SOURCE SOFTWARE©2013 Cloudera, Inc. All Rights Reserved.
    • Powered by Cloudera Impala13BEFORE IMPALA• With Impala:Interactive ANSI-92 SQL queriesNative distributed query engineOptimized for low-latency• Provides:Answers as fast as you can askEveryone can ask questions of all dataBig data storage and analytics togetherWITH IMPALA• Unified storage:Supports HDFS and HBaseFlexible file formats and schemas• Unified Metastore• Unified Security• Unified Client Interfaces:ODBC/JDBCSQL syntaxHue Beeswax Web UIBATCH PROCESSINGUSER INTERFACEREAL-TIME ACCESS©2013 Cloudera, Inc. All Rights Reserved.
    • Cloudera in the Enterprise Stack14 ©2013 Cloudera, Inc. All Rights Reserved.
    • Use Case: A Major Financial Institution©2013 Cloudera, Inc. All Rights Reserved.15The Challenge:• Current EDW at capacity; cannot support growing data depth and width• Performance issues in business critical apps; little room for innovation.New solution saves tens of millions byoptimizing existing EDW for analytics& reducing data storage costs by 99%The Solution:• Cloudera Enterprise offloads datastorage (S), processing (T) & someanalytics (Q) from the EDW.• EDW resources can now be focusedon repeatable operational analytics.• Month data scan in 4 secs vs. 4 hoursOperational(44%)ELT Processing(42%)Analytics (11%)DATA WAREHOUSEAnalyticsProcessingStorageCLOUDERAOperational(50%)Analytics(50%)DATA WAREHOUSE
    • Beyond Data Warehousing16COMMUNICATIONSLocation-basedadvertisingHEALTH CAREPatient sensors,monitoring,EHRs Qualityof careLAW ENFORCEMENT& DEFENSEThreat analysis,Social mediamonitoring,Photo analysisEDUCATION& RESEARCHExperimentsensoranalysisFINANCIAL SERVICESRisk & portfolioanalysisNew productsON-LINE SERVICES /SOCIAL MEDIAPeople & careermatchingWebsiteoptimizationUTILITIESSmart Meteranalysis fornetworkcapacityCONSUMERPACKAGED GOODSSentimentanalysisof what’s hot,customer serviceMEDIA /ENTERTAINMENTViewers /advertisingeffectivenessTRAVEL &TRANSPORTATIONSensor analysis for optimaltraffic flowsCustomersentimentLIFE SCIENCESClinical trialsGenomicsRETAILConsumer sentimentOptimizedmarketingAUTOMOTIVEAuto sensorsreporting location,problemsHIGH TECHNOLOGY /INDUSTRIAL MFG.Mfg qualityWarrantyanalysisOIL & GASDrillingexplorationsensoranalysis©2013 Cloudera, Inc. All Rights Reserved.
    • 17The Road AheadBringingComputeto DataBringingApplicationsto Data2006-2012 2013-???
    • Flexibility• Store any data• Run any analysis• Keep’s pace with the rate of change of incoming dataScalability• Proven growth to PBS/1,000s of nodes• No need to rewrite queries, automatically scales• Keep’s pace with the rate of growth of incoming dataEconomics• Cost per TB at a fraction of other options• Keep all of your data alive in an active archive• Powering the data beats algorithm movementThe Cloudera Platform for Big Data18 ©2013 Cloudera, Inc. All Rights Reserved.
    • Dr. Amr AwadallahCTO/Founder@awadallahaaa@cloudera.com