INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY11Evolving Hadoop for the Data SocietyOpen Platform for Next-Gen Analyticsvin.sha...
INTEL CONFIDENTIAL2Hope trumps hype
INTEL CONFIDENTIAL3Virtuous cycle of data-driven innovationCLOUDRicher data toanalyze2.8 Zettabytes of datagenerated WW in...
INTEL CONFIDENTIAL4Democratize data analysisEnhance scientific understanding, drive innovation,and accelerate medical cure...
INTEL CONFIDENTIALModels and Cases
INTEL CONFIDENTIAL6Data ValueData AnalysisData-Intensive DiscoveryDrugDiscoveryLife SciencesGenomeDataEMRClininicalTrialsS...
INTEL CONFIDENTIAL7Value• Enable researchers to discover biomarkers anddrug targets by correlating genomic data sets• 90% ...
INTEL CONFIDENTIAL8Data ValueData AnalysisData-Driven BusinessCustomerServiceTelcoContent CDRIPTraffic ShopProductCustomer...
INTEL CONFIDENTIAL9Data-Driven Business: Customer ServiceValue• 300 million wireless subscribers• Enable subscriber access...
INTEL CONFIDENTIAL10Data ValueData AnalysisData-Rich CommunitiesCustomerServiceUtilitiesMeterDataInfrastructureDataMonitor...
INTEL CONFIDENTIAL11Data-Rich Communities: Smart CityValue• Enforce traffic laws and detect license fraud• Monitor and pre...
INTEL CONFIDENTIALPlatform
INTEL CONFIDENTIAL1314Si28.085
INTEL CONFIDENTIAL14At the intersection of transformative forcesEnabling exascale computingon massive data setsHelping ent...
INTEL CONFIDENTIAL15Intel® Distribution for Apache Hadoop* software* Other names and brands may be claimed as the property...
INTEL CONFIDENTIAL16Intel® Distribution for Apache Hadoop* softwareversion 3.xAll external names and brands are claimed as...
INTEL CONFIDENTIAL17Intel® Distribution for Apache Hadoop* softwareversion 2.3• File-based encryption in HDFS• Up to 20x f...
INTEL CONFIDENTIAL18Intel® Distribution for Apache Hadoop* softwareversion 3.0• Cell-level ACLs in HBase• Encryption suppo...
INTEL CONFIDENTIALSecurity & Performance
INTEL CONFIDENTIAL20Enterprise data requires defense in depthFirewallGatewayAuthnAuthZEncryptionAudit & AlertsContainment
INTEL CONFIDENTIAL21Intel Expressway protects Hadoop APIsAuthnRBACEncryptionContainment• Enforces consistent security poli...
INTEL CONFIDENTIAL22Kerberos authenticates Hadoop servicesEncryptionContainmentFirewallAPIsAuthenticationKDCrequesttickets...
INTEL CONFIDENTIAL23Manager simplifies role-based access controlFirewallAuthZ• File, table, and service-level controls• In...
INTEL CONFIDENTIAL24Intel Distribution provides HDFS encryptionFirewallRBAC• Extends compression codec into crypto codec• ...
INTEL CONFIDENTIAL25Intel AES-NI accelerates decryption 20x64k 4k 1kAES-NI 460 457 454No AES-NI 87 87 86050100150200250300...
INTEL CONFIDENTIAL26Learn more about Intel and Hadoop• Unique insights that help you tune,secure, and manage your deployme...
INTEL CONFIDENTIALAgility
INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY2828Savanna: Hadoop on OpenStackIlya EltermanSenior Director Cloud Services
• Dev and QA teams - fast clusters provisioning• Data Scientists/Analysts - API to run theanalytic jobs with infrastructur...
Goal is to create native OpenStack component toprovision and operate Hadoop clusters on top ofOpenStack. Key characteristi...
Savanna Architecture OverviewSavannaPythonClientRESTAPIClusterConfigurationManagerHorizonKeystoneAuthDALNovaGlanceSwiftSav...
Savanna RoadmapPhase 1 – Completed, April 13thBasic cluster provisioning with “pre-built” imagesPhase 2 – In Progress, Jul...
Learn more about Savanna• All code and documentation open source• Latest version 0.1.2 from 05/13• Launchpad home page• ht...
INTEL CONFIDENTIALLive DemoSavanna with Intel Distributionat Intel Booth
Evolving Hadoop for the Data Society
Upcoming SlideShare
Loading in...5
×

Evolving Hadoop for the Data Society

621

Published on

Why does the world need an Intel Distribution for Apache Hadoop and what's it got to do with OpenStack?

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
621
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Evolving Hadoop for the Data Society

  1. 1. INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY11Evolving Hadoop for the Data SocietyOpen Platform for Next-Gen Analyticsvin.sharmastrategy & marketingopen source x open data
  2. 2. INTEL CONFIDENTIAL2Hope trumps hype
  3. 3. INTEL CONFIDENTIAL3Virtuous cycle of data-driven innovationCLOUDRicher data toanalyze2.8 Zettabytes of datagenerated WW in 20121CLIENTSRicheruser experiencesRicher datafrom devicesINTELLIGENT SYSTEMSSources: (1) IDC Digital Universe 2020, (2) IDC40 Zettabytes of data willbe generated WW in 20201
  4. 4. INTEL CONFIDENTIAL4Democratize data analysisEnhance scientific understanding, drive innovation,and accelerate medical curesCreate new data-driven business models, reduceresource waste, improve organizational processesIncrease public safety with smart traffic andimprove energy efficiency with smart grids
  5. 5. INTEL CONFIDENTIALModels and Cases
  6. 6. INTEL CONFIDENTIAL6Data ValueData AnalysisData-Intensive DiscoveryDrugDiscoveryLife SciencesGenomeDataEMRClininicalTrialsSensorDataImagesSimDataPhysical SciencesCensusDataTextA/VSurveysSocial SciencesTreatmentOptimizationHypothesisFormationModeling &PredictionAstronomyParticlePhysicsPublic PolicyTrendAnalysisData Management
  7. 7. INTEL CONFIDENTIAL7Value• Enable researchers to discover biomarkers anddrug targets by correlating genomic data sets• 90% gain in throughput; 6X data compressionAnalytics• Provide curated data sets with pre-computedanalysis (classification, correlation, biomarkers)• Provide APIs for applications to combine andanalyze public and private data setsData Management• Use Hive and Hadoop for query and search• Dynamically partition and scale Hbase• 10-node cluster / Intel Xeon E5 processors• 10GbE networkData-Intensive Discovery: GenomicsIntel Distribution
  8. 8. INTEL CONFIDENTIAL8Data ValueData AnalysisData-Driven BusinessCustomerServiceTelcoContent CDRIPTraffic ShopProductCustomerBehaviorRetailCustomerBehaviorTransactionsFSINetworkOptimizationProductInnovationMarketInsightBusinessEfficiencyBehaviorModelingFraudAnalyticsClientEngagementData Management
  9. 9. INTEL CONFIDENTIAL9Data-Driven Business: Customer ServiceValue• 300 million wireless subscribers• Enable subscriber access to billing data• 30X gain in performance; lower TCOAnalytics• Provides real-time retrieval of 6 months data• Supports new BI with 15 types of queries• Enables targeted ad serving and promotionsData Management• Use Hadoop/HBase for search and analysis• 30 TB/month of billing data• 300K reads/second; 800K inserts/second• 133-node cluster / Intel Xeon E5 processors CDRSubscriber Self Service
  10. 10. INTEL CONFIDENTIAL10Data ValueData AnalysisData-Rich CommunitiesCustomerServiceUtilitiesMeterDataInfrastructureDataMonitorDataBehaviorPolice & SecurityIDDemographicsGovernment ServicesNetworkOptimizationSmartGridsSafeStreetsCrimeDetectionCrimePreventionServiceAgilityWaste &Fraud AnalysisData ManagementID Programs
  11. 11. INTEL CONFIDENTIAL11Data-Rich Communities: Smart CityValue• Enforce traffic laws and detect license fraud• Monitor and predict traffic patterns• In a city of 31 million peopleAnalytics• Detect traffic law violations automatically• Detect driver license fraud by data mining• Forecast traffic with predictive analyticsData Management• 30,000 cameras• 6Mb/s stream rate per camera• 15 PB of images in active use• 2 billion records in HBaseDetection PreventionRegionalLocal
  12. 12. INTEL CONFIDENTIALPlatform
  13. 13. INTEL CONFIDENTIAL1314Si28.085
  14. 14. INTEL CONFIDENTIAL14At the intersection of transformative forcesEnabling exascale computingon massive data setsHelping enterprises buildopen interoperable cloudsContributing code andfostering ecosystemHPC Cloud Open Source1018
  15. 15. INTEL CONFIDENTIAL15Intel® Distribution for Apache Hadoop* software* Other names and brands may be claimed as the property of others.Hardware-enhanced performance & securityEnables partner innovation in analyticsStrengthens Apache Hadoop* ecosystem
  16. 16. INTEL CONFIDENTIAL16Intel® Distribution for Apache Hadoop* softwareversion 3.xAll external names and brands are claimed as the property of others.Intel® Manager for Apache Hadoop softwareDeployment, Configuration, Monitoring, Alerts, and SecurityHDFS 2.0.3Hadoop Distributed File SystemYARN (MRv2)Distributed Processing FrameworkHBase0.96.1ColumnarStoreZookeeper3.4.5CoordinationFlume1.3.0LogCollectorSqoop1.4.1DataExchangePig 0.9.2ScriptingHive 0.10.0SQL QueryOozie 3.3.0WorkflowMahout 0.7Machine LearningHcatalogMetadataConnectorsIngest, Analysis, VisualIntel proprietary Intel enhancements contributed to open source Open source components included without change
  17. 17. INTEL CONFIDENTIAL17Intel® Distribution for Apache Hadoop* softwareversion 2.3• File-based encryption in HDFS• Up to 20x faster decryption with AES-NI*• Role-based access control for Hadoop services• Up to 8.5X faster Hive queries using HBase co-processor• Adaptive data replication in HDFS and Hbase• Optimized for SSD with Cache Acceleration Software• Integrated text search with Lucene• Simplified deployment & comprehensive monitoring• Automated configuration with Intel® Active Tuner• Deployment of HBase across mutiple datacenters• Detailed profiling of Hadoop jobs• Simplified design of HBase schemas (+ in 2.4)• REST APIs for deployment and management (+ in 2.4)*Based on internal testingHardware-enhanced SecurityOptimized PerformanceSimplified Management
  18. 18. INTEL CONFIDENTIAL18Intel® Distribution for Apache Hadoop* softwareversion 3.0• Cell-level ACLs in HBase• Encryption support in Hive and Pig• Secure inter-node communication with SSL• Compression and CRC with SSE 4.2• Up to 8.5X faster Hive queries using HBase co-processor• Adaptive replication in HDFS and HBase• Snapshot support in Hadoop• SNMP support for monitoring*Based on internal testing• Hadoop 2.0.3 and YARN support• Lustre support• GlusterFS support• Hcatalog support
  19. 19. INTEL CONFIDENTIALSecurity & Performance
  20. 20. INTEL CONFIDENTIAL20Enterprise data requires defense in depthFirewallGatewayAuthnAuthZEncryptionAudit & AlertsContainment
  21. 21. INTEL CONFIDENTIAL21Intel Expressway protects Hadoop APIsAuthnRBACEncryptionContainment• Enforces consistent security policies across all Hadoop services• Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS APIs• Complies with Common Criteria EAL4+, HSM, FIPS 140-2 certifications• Deploys as software, virtual appliance, or hardware applianceHcatalogStargateWebHDFSFirewallREST APIs
  22. 22. INTEL CONFIDENTIAL22Kerberos authenticates Hadoop servicesEncryptionContainmentFirewallAPIsAuthenticationKDCrequestticketsend serviceticketrequest servicesend resposevalidateticket41235 IntelManager• Wizard enables setup ofsecure cluster withencrypted key exchange• Manager generates principaland keytab for Hadoopservices• Manager enables batchupload of keytab files
  23. 23. INTEL CONFIDENTIAL23Manager simplifies role-based access controlFirewallAuthZ• File, table, and service-level controls• Intel Manager pushes ACLs to each node
  24. 24. INTEL CONFIDENTIAL24Intel Distribution provides HDFS encryptionFirewallRBAC• Extends compression codec into crypto codec• Provides an abstract API for general useMapReduceRecordReaderMapCombinerPartitionerLocalMerge & SortReduceRecordWriterHDFSDecryptEncryptDerivativeEncryptDerivativeDecrypt
  25. 25. INTEL CONFIDENTIAL25Intel AES-NI accelerates decryption 20x64k 4k 1kAES-NI 460 457 454No AES-NI 87 87 86050100150200250300350400450500Speed(MB/s)AES Encryption64k 4k 1kAES-NI 1266 1259 1253No AES-NI 64 63 630200400600800100012001400Speed(MB/s)AES Decryption20X6XSoftware and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark*and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause theresults to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performanceof that product when combined with other products. For more information go to http://www.intel.com/performance.• OpenSSL 1.0.1c optimized to use Intel AES-NI (7 math functions in processor accelerate AES)• Intel Distribution crypto framework uses OpenSSL 1.0.1c• Patch and design document released to open source (JIRA HADOOP-9331)
  26. 26. INTEL CONFIDENTIAL26Learn more about Intel and Hadoop• Unique insights that help you tune,secure, and manage your deploymentin addition to essential understandingof Apache Hadoop• Distilled from years of Intelexperience in deploying andoptimizing Apache Hadoop and HBasefor enterprises• Based on Intel expertise in optimizingthe full Hadoop stack – from Hive onHadoop through Java to Linux on x86hardwarehttp://hadoop.intel.comhttp://www.intel.com/bigdataIntel Training and Certification Case Studies and Resources
  27. 27. INTEL CONFIDENTIALAgility
  28. 28. INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY2828Savanna: Hadoop on OpenStackIlya EltermanSenior Director Cloud Services
  29. 29. • Dev and QA teams - fast clusters provisioning• Data Scientists/Analysts - API to run theanalytic jobs with infrastructure provisioninghappening under the hood• Administrators - centralized clustermanagement and monitoringHadoop on OpenStack Use Cases
  30. 30. Goal is to create native OpenStack component toprovision and operate Hadoop clusters on top ofOpenStack. Key characteristics:• Open source• Native for OpenStack• Support for different Hadoop distributions• Makes resources dedicated to IaaS cloudavailable for Hadoop workloadsSavanna Key Principles
  31. 31. Savanna Architecture OverviewSavannaPythonClientRESTAPIClusterConfigurationManagerHorizonKeystoneAuthDALNovaGlanceSwiftSavannaPagesHadoopVMProvisioningPluginHadoopVMHadoopVMHadoopVMVMManagerImageRegistry
  32. 32. Savanna RoadmapPhase 1 – Completed, April 13thBasic cluster provisioning with “pre-built” imagesPhase 2 – In Progress, July 15thPluggable mechanism of integration with vendor toolingand cluster operations supportPhase 3 – Scoping, 2-3 months"Analytics as a service” - job execution framework, supportdifferent scripting languages
  33. 33. Learn more about Savanna• All code and documentation open source• Latest version 0.1.2 from 05/13• Launchpad home page• https://launchpad.net/savanna• Code on stackforgeo Integrated with OpenStack CI/CDo https://github.com/stackforge/savanna• Active community• https://lists.launchpad.net/savanna-all/
  34. 34. INTEL CONFIDENTIALLive DemoSavanna with Intel Distributionat Intel Booth

×