Lessons Learned from Building an Enterprise Big Data Platform from the Ground up in Months

1,552 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,552
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Introduction to CPE – Quick recap of the CPE charter/mission
  • Why hadoop? Make sure you tie the lessons
  • - Given a sense of diversity workloads and scope of the platform. This not just a single application or OpenStack configured to be optimized for single app.
    - Out focus here is not just addressing scale related issues in various openstack environments, but to have the right framework for us to be able to identify and quickly remediate these issues. E.g SDN testing – rapid creation of control plane elements – VMs, networks, routers and tenant etc allowed us to identify scale issues in Neutron and fix some of them.
    - Architecture goals
    Secure, scalable and reliable OpenStack based cloud platform
  • we are following a RESTful SOA model
    - closely resembles OpenStack design philosophy
    - a collection of loosely coupled independent services that expose a REST/JSON endpoint. This enables two things – federated development model that allow services to independently designed, developed, tested and deployed to production (2) applications don’t view the platform as rigid framework, but a menu of services can be used to "compose" an application ( not an "all or nothing" proposition)
    - keystone as a core IAC service, this means all other CPE services with integrate with keystone for Authn and Authz
    - all CPE services exposed thru a REST/JSON API.
    - secure multi tenancy a core objective for all CPE services (you might want to touch on multiple levels of tenancy here - domains and projects)
  • Introduction to CPE – Quick recap of the CPE charter/mission

  • ----- Meeting Notes (6/2/14 23:05) -----
    Central Platform for
    - Audit purpose
    - No re-learn every time a new service is deployed
    - Seemless integration between the services

    ----- Meeting Notes (6/3/14 00:24) -----
    Scalability
    Legacy technologies had limitations when it came to scale out
    Reliability
    Any part of the infrastructure could fail
    Platform has to be highly available as we have external customers who are dependent on this critical infrastructure
    Self healing
    Platform should be able to seamlessly migrate jobs & workloads to other DCs
  • Introduction to CPE – Quick recap of the CPE charter/mission
  • Authentication (Integration with LDAP or AD for Authentication, Other modes of authentication supported like PAM)
    ----- Meeting Notes (6/3/14 00:24) -----
    We wanted to share the core PoCs and Test plans we executed as we evaluated the different distros
  • Authentication (Integration with LDAP or AD for Authentication, Other modes of authentication supported like PAM)
    Operational Management – API Support (User Management, Job Management, Data Ingestion)

  • ----- Meeting Notes (6/3/14 00:24) -----
    Apple-to-Apple comparison
    Management testing, Scalability & Reliability testing,
    Used real life workloads that we migrated from our existing platforms
    North South, East West
  • Weighted matrix
    Customer workloads have been taken into consideration and use that to define your matrix and rating
  • Introduction to CPE – Quick recap of the CPE charter/mission
  • Network to Spindle – 20:12 (125 MB – 1 Gbps)
    CPU – 100%, Memory – 44%, Disk – 71.5%
    Network – (East West – 23 GBps, North South - - 27 GBps)
    More memory to make sure that we provision for future

    Namenode – 100 million files (128 GB) (1 million files – 1 GB)

    ----- Meeting Notes (6/3/14 00:24) -----
    Personal lesson - Agonizing over SATA, NL-SAS, SAS etc
  • For Symantec – Security was not an afterthought.
    ----- Meeting Notes (6/3/14 00:24) -----
    If you are planning to use open source tools - make sure you understand the investment on the development efforts needed in the short/long term.

  • ----- Meeting Notes (6/3/14 00:24) -----
    crossing the chasm - early adopters, early majority, late majority
  • Lessons Learned from Building an Enterprise Big Data Platform from the Ground up in Months

    1. 1. Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 1 Building an enterprise big data platform from ground up (Lessons Learned) Roopesh Varier Srinivas Nimmagadda Director, CPE Technical Director, CPE
    2. 2. Agenda Introduction1 Platform Objectives2 Decision Criteria & PoC3 Key Lessons4 Q & A5 2Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
    3. 3. Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 3
    4. 4. CPE Overview • CPE Charter – Build a consolidated cloud infrastructure that offers platform services to host Symantec cloud applications • Symantec big data platforms already hosting diverse (security, data management) workloads – Analytics – Reputation based security, Managed Security Services, Fraud Detection – Financial – Consumer and Enterprise transactions – Dial Home – Appliance logs • We are building a central big data platform that provides batch and real time services – Security, Scalability & Multi-tenancy are core objectives for all services Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 4
    5. 5. Core Services CPE Platform Architecture 5 Compute Networking Storage CLIs ScriptsCloud Applications Big Data Messaging Identity & Access (Keystone) Supporting Services Authn Roles User Mgmt Tenancy Quotas Logging Metering Monitoring Deployment Compute (Nova) Image (Glance) SDN (Neutron) Load Balancing DNS SQL Batch Analytics Stream Processing Msg Queue Mem Cache Email Relay SSL K/V Store Web Portal Object Store REST/JSON API Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
    6. 6. Agenda Introduction1 Platform Objectives2 Decision Criteria & PoC3 Key Lessons4 Q & A5 Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 6
    7. 7. Platform Objectives Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 7 • Shared central platform – Generic Architecture with central deployment, monitoring and logging framework • Multiple workloads – Batch Processing – Interactive Queries • Multi-tenancy – Security – Resource Tenancy • Scalability • Reliability – Active – Active – Data Ingestion (Teeing) • Self healing • Effective Resource Utilization • Performance
    8. 8. Agenda Introduction1 Platform Objectives2 Decision Criteria & PoC3 Key Lessons4 Q & A5 Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 8
    9. 9. Decision Criteria • Management – Deployment – Operational Ease – Add/Remove Nodes – Add/Remove Packages – Monitoring – Software Upgrade • Multiple workloads – Opensource tools (Presto, Shark etc) – SQL query support • Performance Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 9
    10. 10. Decision Criteria (contd) • Security – Authentication – Audit of access control – Kerberos integration • Scalability – Support for different file format • Reliability & Self Healing – SPoF – DR Support • Failure alerts • Self healing – Operational Management Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 10
    11. 11. 128 GB, 48 TB NL-SAS (7.2K) 4 Racks 40Gb to Spine 2x10Gb LACP/Server Proof of Concept Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 11 Vendor 1 Data Nodes Vendor 2 Data Nodes Vendor 3 Data Nodes Name Nodes & Client Nodes
    12. 12. Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
    13. 13. Agenda Introduction1 Platform Objectives2 Decision Criteria & PoC3 Key Lessons4 Q & A5 Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 13
    14. 14. Key Lessons • Platform objectives should be defined up front – Security – Scale • Workload, Long term capacity needs – Fault Tolerance/Availability • DR vs Active-Active • Understand your use cases well • Hardware Selection – Don’t agonize over it – Go mainstream • SATA good enough – 48 TB/node (12 Spindles) • Data node sizing – Cost vs Capacity (Sweet Spot) • CPU to Disk Ratio (Load based) – Core/TB Ratio - 2:3 (32 cores with HT) • RAM – 128GB • Uniform SKU Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 14
    15. 15. Key Lessons • Distro Selection (Technology drivers) – Define the principles • Opensource vs Proprietary, Maturity vs Cutting Edge – Features to support use cases • Batch Processing vs In-memory processing • POSIX compliance – Management needs (Build vs Buy) • Operational ease • Logging/Monitoring – Security • Workflow needs – Open-source tools • Customizations/Development efforts required Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 15
    16. 16. Key Lessons • Distro Selection (Business drivers) – Strategic Partnership • OEM deals – Long term viability – Cost • 2 Year TCO • 5 Year TCO – Template driven selection • Driving adoption – Early sandboxes – Documentation – Reference application – Keep resources aside for bootstrapping Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 16
    17. 17. Key Lessons • Most valuable insight – Big Data problems comes with big problems and big ROI • Challenges – Development work – Dev/Ops mind set – Cultural Change – Evangelization • Opportunities – Customer insights – New revenue streams – Start experimentation • Paralysis by analysis – Have fun!!! Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda 17
    18. 18. “If everything seems under control, you're not going fast enough” - Mario Andretti, Formula One World Champion 18Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda
    19. 19. Thank you! Copyright © 2012 Symantec Corporation. All rights reserved. Thank you! 19 Roopesh Varier Srinivas Nimmagadda Director, CPE Technical Director, CPE Hadoop Summit 2014 – Roopesh Varier & Srinivas Nimmagadda

    ×