Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Evolution of Big Data at Intel - Crawl, Walk and Run Approach

8,689 views

Published on

Hadoop Summit 2015

Published in: Technology
  • Dating direct: ♥♥♥ http://bit.ly/39mQKz3 ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ http://bit.ly/39mQKz3 ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Looking For A Job? Positions available now. FT or PT. $10-$30/hr. No exp required. ➤➤ http://ishbv.com/socialpaid/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Evolution of Big Data at Intel - Crawl, Walk and Run Approach

  1. 1. Evolution of Big Data at Intel - crawl, walk and run approach Gomathy Bala | Director Chandhu Yalla | Manager & Architect Key Contributors: Sonja Sandeen, Seshu Edala, Nghia Ngo and Darin Watson IT BI Big Data Team
  2. 2. Copyright © 2014, Intel Corporation. All rights reserved. Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. The content in this presentation is being shared Under NDA. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others. Copyright © 2014, Intel Corporation. All rights reserved. 2
  3. 3. Copyright © 2014, Intel Corporation. All rights reserved. Agenda • Intel IT Big Data Journey • Enterprise DW architecture • BI Big Data 3 yr Roadmap • Big Data Ecosystem Architecture • Platform Strategies & BKMs • Summary 3
  4. 4. Copyright © 2014, Intel Corporation. All rights reserved. 2011 2012 2013 2014 2015 Intel IT Big Data Journey 4 Big Data & Analytics Strategy Production Online Telmap: 1st Use Case Preproduction Online Hadoop Evaluation IDH to CDH Hadoop 2.0 $176M BV Production: Security BI, Attribute Reduction System, ATM Ellipses Engine, IAH- Retail Analytics 6 Environments CDH 5.3 4 Use Cases in Preproduction 12 POC Use Cases 6 Use Cases in Production $290K investment $948/TB 3 Use Cases in Production Smart-What, Marketing- IAH, Incident Predictability $6M BV CDH 5.1 IAH – Cloud CRM In Production Enterprise Standards, Guidance, Processes for Platform & Capabilities 15 Active Use Cases | $290K + 10.5 HC Investment | Delivered $182M BV
  5. 5. Copyright © 2014, Intel Corporation. All rights reserved. Big Data & Analytics Really Delivers! 5From 2014 – 2015 Intel IT Business Review – Annual Edition Kim's Video
  6. 6. Copyright © 2014, Intel Corporation. All rights reserved. Any Data Source ERP In Memory Real-Time Data Platform CRM SCM SRM ECC BW ECCW Real-Time & Self Service Analytics Platform MDG NW Teradata Cloudera Hadoop Data Lake Reporting Tools Data Tiering Hot-Cold data Enterprise Data Warehouse Other Apps Custom Intel … NR T Predictive Analytics BPC BCS Cloud BI Saa S New Apps. Downstream Applications 2014-2017 Vision: Real-Time Enterprise 6
  7. 7. Copyright © 2014, Intel Corporation. All rights reserved. FE Tools CLS/Proxy High speed data loader BigData • Machine Learning • Log Processing • Unstructured data Use Cases • High volume counter Analytics • Text Parsing/Mining • Strategic/Operational reporting • Interactive Reporting Use Cases • High Concurrent user analytics - Supply/Order • Mission critical analytics – Finance/HR SQL on Hadoop Enterprise Data Architecture with Hadoop and Other MPP DWH Current & Future Strategy Future Present EDWMfg Data A %ge of Traditiona l BI use cases IMT
  8. 8. Copyright © 2014, Intel Corporation. All rights reserved. BI Big Data | 3-Year Roadmap 8 Big Data + AA Big Data + SSAA + Traditional BI Big Data + SSAA + Traditional BI 2015 2016 2017 Scalable and well designed Hadoop Platform  Evolve IMT + Hadoop  Data Lineage & Data Catalog  Streaming Capabilities  Advanced SQL on Hadoop  ACID semantics  Evolve Big Data + SSAA per ecosystem roadmaps  BC/DR  End to end enterprise features  Enterprise ready: OLAP and Traditional DW Hadoop is an open source framework designed for big data analytics. Hadoop is evolving rapidly, but it will still take a couple of years for it to mature and support “traditional bi” use cases. Legend Orange Text: Traditional BI Capabilities Green Text: Big Data/AA Capabilities  Security (RBAC, ITS/IRS)  Data Governance  Data Discovery  Self Service AA Framework  IMT + Hadoop  AVP + Hadoop  In-memory + Near real time capabilities  SQL on Hadoop
  9. 9. Copyright © 2014, Intel Corporation. All rights reserved. Data Integration Big Data Platform – Ecosystem Architecture & Maturity 9 NRT/Stream Processing In-Memory Processing Processing Layer Batch Processing Data Virtualization Data DiscoveryAdv. AnalyticsAdv. Visualization Data Management Presentation Layer End User Data Steward Business Analyst Data Scientist DeveloperUser layer Auditor Machine Learning Analytical layer Statistical Numerical Time series Textual/Log Spatial Graph Textual/Log DB Hierarchy DBRelational DB Graph DB Storage Model Platform Virtualization Infrastructure Platform Management Network Management Systems Management Data Ingestion Continuous IntegrationDev Framework Security Source/Target APIs 3rd Party Drivers Ent. Scheduler Srvs Metadata MgmtWorkload Mgmt Middleware *Other names and brands may be claimed as the property of others. Columnar DB Data Egression Other Vendors offered capabilities Majority CDH offered capabilities Data Consumption Prescriptive Guidance Change Release GovernanceEngagement Service Management Training Support Processes
  10. 10. Copyright © 2014, Intel Corporation. All rights reserved. BI Big Data Platform 10 Hadoop Project Sandbox – CDH 5.3 Multiple Instances Deployed on Intel Cloud & MyCloud environments. TTM to business: 2-3 Days Hadoop Pre-Production – CDH 5.3 10 data nodes | 399TB | 320 vcores Use cases in Dev/POC: 14 Hadoop Production – CDH 5.3 22 data nodes | 658TB | 704 vcores Use cases Live in prod: 7  Hadoop 2.0 architecture provides reliability, scalability & performance  High availability and scalability design  Well positioned to meet 2015 business use case requirements  Repeatable architecture for faster builds.  Capacity additions: Add data node. White boxes, Waterfall equipment or HP servers  TTM: Varies depending on HW (3 wks-2 months) Job/Workflow Management Data Node Data Node Data Node Data Node Data Node Name Node Resource Mgr Name Node Resource Mgr heartbeat, balancing, replication YARN Scale to meet business needs Gateway Nodes (NN hi-av) Gateway nodes Login (ssh) : AD authentication & authorization, access cluster, run HDFS commands, submit jobs, etc. Management Node Source Data DB Data Visualization Tools Data Movement/ETL EDW or Datamart DB data Unstructured Semi-structured
  11. 11. Copyright © 2014, Intel Corporation. All rights reserved. • Skills and resources with time to ramp up • Starting small is ok. Focus on design and scalability for the platform. • Technical product evaluation  Stick with a distribution which is core Hadoop open source stack vs proprietary software • Security is a big deal to Intel, Big Data Security capabilities implementation is key focus • Methodology to understand the data is to use an iterative discovery method with technical, business and modeling teams. • Intel IT Big Data Journey benefited heavily from Cloudera partnership • Open source will play a big role in advancing Big Data capabilities and analytics BKM’s | Summary
  12. 12. Copyright © 2014, Intel Corporation. All rights reserved. BI Big Data IT@Intel Resource Info 12 BI Big Data IT@Intel Resource Links: 1. Hadoop Migration Success Story: How Intel IT Moved to Cloudera 2. Mining Big Data in the Enterprise for Better Business Intelligence 3. Enabling Big Data Platforms and Solutions with Centralized Data Management 4. Integrating Apache Hadoop* into Intel’s Big Data Environment 5. Using a Multiple Data Warehouse Strategy to Improve BI Analytics To learn more: www.intel.com/bigdata
  13. 13. Copyright © 2014, Intel Corporation. All rights reserved. Q & A 13
  14. 14. Intel Confidential — Do Not Forward
  15. 15. Copyright © 2014, Intel Corporation. All rights reserved. Backup 15
  16. 16. Copyright © 2014, Intel Corporation. All rights reserved. Big Data Capability Catalog Hive HDFS MapReduceZookeeper Pig Mahout NetworkServers Storage Security OS Hi-AvEAM / AD Integration HDFS Compress WHIRR Hbase Governance Change Release Engagement Service mgmt. Prescriptive Guidance Training SQOOP JDBC Other DW Infrastructure Process Cloudera* Distribution of Hadoop (CDH) *Other names and brands may be claimed as the property of others. Storm Hcatalog ACCUMULOYARN SPARK Autosys SecureGIT Impala JDBC HiveODBC 3rd Party SW/Connectors Integration HUE SOLRIMPALA PARQUET DataFu Impala ODBC TDCH Oozie Kafka Sqoop DI Gateway Flume SFTP SMBClient Data Integration Camel Enabled PlannedWIP Avail. Now 1-3 Months 3-6+ Months Cloudera Manager* System Management Cloudera Navigator* Data Management Audit Access Control Discovery Explore Lineage Lifecyle DeploymentMonitoring Reporting Diagnostics Alerting Service Management Rolling Upgrades Config Rollbacks List includes only the capabilities planned for next 6 months. 16 Google Analytics SFDC Sentry
  17. 17. Copyright © 2014, Intel Corporation. All rights reserved. i. Find Differences with a Comparative Evaluation in a Sandbox Environment ii. Define Your Strategy for the Cloudera Implementation iii. Split the Hardware Environment iv. Upgrade the Hadoop Version v. Create a Preproduction-to- Production Pipeline vi. Rebalance the Data Migration to Cloudera – 6 BKMs
  18. 18. Copyright © 2014, Intel Corporation. All rights reserved. Building Block Strategy to Enterprise Security of Hadoop Q1’15: Perimeter access with LDAP + finer grain controls with Sentry. The second building block towards enterprise grade security design. Q2’15: Add Kerberos to enable more Hadoop components and further secure the platform 2H’15: Exploration starting, awaiting product and target to adopt in 2H’15 in Production. NowQ2’15 2H’15
  19. 19. Copyright © 2014, Intel Corporation. All rights reserved. Hadoop Maturity & Evolution 19 MapReduce (batch data processing, cluster resource management) HDFS 1.0 (redundant, reliable data storage) Hadoop 1.0 YARN (cluster resource management) HDFS 2.0 (redundant, reliable data storage) Interactive (Impala) In-Memory (Spark) Batch (Map Reduce) Online (Hbase) Others (Search, Storm etc.) Graph Applications Run Natively In Hadoop + Scalable data storage and processing platform + Positioned for Batch processing workloads for Map and Reduce only + Apache Hive offers SQL like query language - Lacks reliability and stability - No support for low latency queries  Apache YARN allows you to run multiple applications in Hadoop and provides reliability, scalability and performance  Advanced Resource Management  Apache Hive offers a 50x improvement in performance for queries  Cloudera Impala to support low latency query requirements with SQL-92 and SQL- 2000 support  Data at Rest Encryption and Row Level/Cell Level Security planned  Data Streaming and Search Capability  GraphDB  Expanded Data Governance  IMT + Hadoop Integration  Improved Front End tool integration/support  Deeper Diagnostics for multiple components 2005 - 2012 2013 - 2014 Hadoop 2.0 HDFS (redundant, reliable data storage) YARN (cluster resource management) Batch (Map Reduce) Others (data processing) 2015 - 2017
  20. 20. Copyright © 2014, Intel Corporation. All rights reserved. 2014 Intel IT Vital Statistics 20 >6,300 IT employees 59 global IT sites >98,000 Intel employees1 168 Intel sites in 65 Countries 64 Data Centers (91 Data Centers in 2010) 80% of servers virtualized (42% virtualized in 2010, goal of 75%) >147,000+ Devices 100% of laptops encrypted 100% of laptops with SSD’s >43,200 handheld devices 57 mobile applications developed Source: Information provided by Intel IT as of Jan 2014 1Total employee count does not include wholly owned subsidiaries that Intel IT does not directly support Copyright © 2014, Intel Corporation. All rights reserved.
  21. 21. Copyright © 2014, Intel Corporation. All rights reserved. Big Data in the Industry 21 Recommendation Engine Fraud Detection Sentiment Analytics Behavioral Targeting Customer Experience AnalyticsMarketing campaign Analytics
  22. 22. Copyright © 2014, Intel Corporation. All rights reserved. Learn more about Intel IT’s Initiatives at www.intel.com/IT Sharing Intel IT Best Practices With the World

×