Your SlideShare is downloading. ×
Delivering Apache Hadoop for the Modern Data Architecture
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Delivering Apache Hadoop for the Modern Data Architecture

1,170
views

Published on

Join Hortonworks and Cisco as we discuss trends and drivers for a modern data architecture. Our experts will walk you through some key design considerations when deploying a Hadoop cluster in …

Join Hortonworks and Cisco as we discuss trends and drivers for a modern data architecture. Our experts will walk you through some key design considerations when deploying a Hadoop cluster in production. We'll also share practical best practices around Cisco-based big data architectures and Hortonworks Data Platform to get you started on building your modern data architecture.

Published in: Technology

0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,170
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
131
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Page 1 © Hortonworks Inc. 2014 Delivering Apache Hadoop for the Modern Data Architecture Cisco & Hortonworks. We do Hadoop Together
  • 2. Page 2 © Hortonworks Inc. 2014 Our speakers… Ajay Singh Director Technical Channels, Hortonworks Sean McKeown Solutions Architect, Data Center, Cisco
  • 3. Page 3 © Hortonworks Inc. 2014 Why Hadoop: Traditional Data Architecture Pressured 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020 Data source: IDC SOURCES OLTP, ERP, CRM Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geolocation Data
  • 4. Page 4 © Hortonworks Inc. 2014 Sensor Server Logs Text Social Geographic Machine Clickstream Structured Unstructured Financial Services New Account Risk Screens ✔ ✔ Trading Risk ✔ Insurance Underwriting ✔ ✔ ✔ Telecom Call Detail Records (CDR) ✔ ✔ Infrastructure Investment ✔ ✔ Real-time Bandwidth Allocation ✔ ✔ ✔ Retail 360° View of the Customer ✔ ✔ Localized, Personalized Promotions ✔ Website Optimization ✔ What: Business Applications of Hadoop
  • 5. Page 5 © Hortonworks Inc. 2014 Sensor Server Logs Text Social Geographic Machine Clickstream Structured Unstructured Manufacturing Supply Chain and Logistics ✔ Preventive Maintenance ✔ Crowd-sourced Quality Assurance ✔ Healthcare Use Genomic Data in Medial Trials ✔ ✔ Monitor Patient Vitals in Real-Time Pharmaceutical s Recruit & Retain Patients for Drug Trials ✔ ✔ Improve Prescription Adherence ✔ ✔ ✔ Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ Monitor Rig Safety in Real-Time ✔ ✔ Government ETL Offload in Response to Budgetary Pressures ✔ Sentiment Analysis for Gov’t Programs ✔ What: Business Applications of Hadoop
  • 6. Page 6 © Hortonworks Inc. 2014 OPERATIONS TOOLS Provision, Manage & Monitor DEV & DATA TOOLS Build & Test DATASYSTEMSAPPLICATIONS Repositories ROOMS Statistical Analysis BI / Reporting, Ad Hoc Analysis Interactive Web & Mobile Apps Enterprise Applications RDBMS EDW MPP How: Modern Data Architecture with Hadoop Governance     &  Integra.on   Security   Opera.ons   Data  Access   Data  Management   ENTERPRISE HADOOP SOURCES OLTP, ERP, CRM Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geolocation Data
  • 7. Page 7 © Hortonworks Inc. 2014 YARN Transforms Hadoop’s Architecture     Enables  deep  insight   across  a  large,  broad,   diverse  set  of  data  at   efficient  scale     Mul.-­‐Use  Data  Pla>orm   Store  all  data  in  one  place,  process  in  many  ways   Batch   Interac.ve   Itera.ve   Streaming   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   n   Store any/all raw data sources and processed data over extended periods of time. YARN  :  Data  Opera.ng  System  
  • 8. Page 8 © Hortonworks Inc. 2014 Designing Hadoop Cluster § Cluster Storage Capacity § Server Specification § Cluster Size § Factoring Performance Key Considerations §  Any piece of hardware can and will fail §  More nodes means less impact on failure §  Resiliency and fault tolerance improve with scale §  Build resiliency through scale §  Still use modern hardware §  Software handles hardware failures
  • 9. Page 9 © Hortonworks Inc. 2014 Storage Capacity §  Key Input §  Initial Data Size §  3 year YOY growth §  Compression ratio §  Intermediate and materialized views §  Replication Factor §  Note §  Hard to accurately predict the size of intermediate & materialized views at the start of a project §  Be conservative with compression ratio. Mileage varies by data type §  Hadoop needs temp space to store intermediate files Hadoop Cluster Raw Data Work In Process Data Master Data Materialized Views
  • 10. Page 10 © Hortonworks Inc. 2014 Storage Capacity Total Storage Required (Initial Size + " YOY Growth + Intermediate Data Size) " X Replication Count " X 1.2" Compression Ratio" Good Rule of Thumb Replication Count = 3" " Compression Ratio = 4-5" " Intermediate Data Size = 50%-100% of Raw Data Size" Note 1.2 factor is included in the sizing estimator to account for the temp space requirement of Hadoop"
  • 11. Page 11 © Hortonworks Inc. 2014 Server Specification Page 11 §  Master Nodes – NameNode, Resource Manager, HBase Master §  Dual Intel Xeon E5-26xx series processors §  128GB or 256GB RAM per chassis §  4+ – 1TB NL-SAS/SATA Drives RAID10+ Spares §  Worker Nodes – DataNode, Node Manager and Region Server §  Dual Intel Xeon E5-26xx series processors §  128GB RAM or 256GB RAM §  12 – 1-4 TB NLSAS/SATA Drives §  Gateway Nodes / Edge Nodes §  Mirror of Master Nodes configuration
  • 12. Page 12 © Hortonworks Inc. 2014 Number of Data Nodes Cluster Size 12 Storage Per Server Number of Master Nodes §  Name Node, Zookeeper §  Resource Manager, Zookeeper §  Failover Name Node, HBase Master, Hive Server, Zookeeper §  In a half-rack cluster, this would be combined with Resource Manager §  Management Node (Ambari, Ganglia, Nagios) §  In a half-rack cluster, this would be combined with the Name Node Total Storage" Required" Note §  Large clusters may need more than 4 master nodes §  Start at 2/4 and grow based on usage
  • 13. Page 13 © Hortonworks Inc. 2014 Factoring Performance § Data Nodes § 1 TB drives for performance clusters § 4 TB drives for archive clusters § Meeting SLA Requirements §  Hadoop workloads are varied §  Difficult to assess cluster size based on SLAs without actual testing §  Good News: Hadoop performs linearly with scale §  Enables one to design experiments using a fraction of data §  Best Practice Guidance §  Create a test configuration with a rack of servers §  Load a slice of data §  Run tests with real-life queries to measure performance & fine tune the system §  Scale cluster size based on observed performance 13
  • 14. Page 14 © Hortonworks Inc. 2014 OPERATIONAL  TOOLS   DEV  &  DATA  TOOLS   INFRASTRUCTURE   HDP and Cisco are deeply integrated in the data centerSOURCES EXISTING   Systems   Clickstream   Web  &Social   Geoloca.on   Sensor  &   Machine   Server  Logs   Unstructured   DATASYSTEM RDBMS   EDW   MPP   HANA APPLICATIONS   BusinessObjects BI Deep Partnerships Hortonworks and Cisco engages in deep engineered relationships with the leaders in the data center, such as Microsoft, Teradata, Redhat, & SAP Broad Partnerships Over 600 partners work with Hortonworks to certify their applications to work with Hadoop so they can extend big data to their users HDP 2.1 Governance &Integration Security Operations Data Access Data Management YARN
  • 15. Page 15 © Hortonworks Inc. 2014 Cisco + Hortonworks Validated Design Sean McKeown Solutions Architect, Data Center, Cisco
  • 16. Page 16 © Hortonworks Inc. 2014 Cisco + Hortonworks Validated Design
  • 17. Page 17 © Hortonworks Inc. 2014 Cisco UCS Common Platform Architecture (CPA) Building Blocks for Big Data 17 UCS  6200  Series   Fabric  Interconnects   Nexus  2232   Fabric  Extenders     UCS  Manager   UCS  240  M3   Servers   LAN,  SAN,  Management  
  • 18. Page 18 © Hortonworks Inc. 2014 UCS + Hortonworks Reference Configurations 18 unformatted storage per rack for a total of 7.68 petabytes (PB) when scaled to per rack, for a total of 7.68 PB and 31.25 TB of flash memory per domain. entailed in designing and building your own custom solution. The solution Performance Optimized (UCS-SL-CPA2-P) Performance and Capacity Balanced (UCS-SL-CPA2-PC) Capacity Optimized (UCS-SL-CPA2-C) Capacity Optimized with Flash Memory (UCS-SL-CPA2-CF) Connectivity • 2 Cisco UCS 6248UP 48- Port Fabric Interconnects • 2 Cisco Nexus® 2232PP 10GE Fabric Extenders • 2 Cisco UCS 6296UP 96- Port Fabric Interconnects • 2 Cisco Nexus 2232PP 10GE Fabric Extenders • 2 Cisco UCS 6296UP 96- Port Fabric Interconnects • 2 Cisco Nexus 2232PP 10GE Fabric Extenders • 2 Cisco UCS 6296UP 96- Port Fabric Interconnects • 2 Cisco Nexus 2232PP 10GE Fabric Extenders Management Cisco UCS Manager Cisco UCS Manager Cisco UCS Manager Cisco UCS Manager Servers 8 Cisco UCS C240 M3 Rack Servers, each with: • 2 Intel Xeon processors E5-2680 v2 • 256 GB of memory • LSI MegaRaid 9271CV 8i card • 24 900-GB 10K SFF SAS drives (168 TB total) 16 Cisco UCS C240 M3 Rack Servers, each with: • 2 Intel Xeon processors E5-2660 v2 • 256 GB of memory • LSI MegaRaid 9271CV 8i card • 24 1-TB 7.2K SFF SAS drives (384 TB total) 16 Cisco UCS C240 M3 Rack Servers, each with: • 2 Intel Xeon processors E5-2640 v2 • 128 GB of memory • LSI MegaRaid 9271CV 8i card • 12 4-TB 7.2K LFF SAS drives (768 TB total) 16 Cisco UCS C240 M3 Rack Servers, each with: • 2 Intel Xeon processors E5-2660 v2 • 128 GB of memory • Cisco UCS Nytro MegaRAID 200-GB Controller • 12 4-TB 7.2K LFF SAS drives (768 TB total) Table 1. Cisco CPA v2 for Big Data Includes Four Optimized Configurations
  • 19. Page 19 © Hortonworks Inc. 2014 Installing Servers Today LAN SAN • RAID settings • Disk scrub actions • Number of vHBAs • HBA WWN assignments • FC Boot Parameters • HBA firmware • FC Fabric assignments for HBAs • QoS settings • Border port assignment per vNIC • NIC Transmit/Receive Rate Limiting • VLAN assignments for NICs • VLAN tagging config for NICs • Number of vNICs • PXE settings • NIC firmware • Advanced feature settings • Remote KVM IP settings • Call Home behaviour • Remote KVM firmware • Server UUID • Serial over LAN settings • Boot order • IPMI settings • BIOS scrub actions • BIOS firmware • BIOS Settings
  • 20. Page 20 © Hortonworks Inc. 2014 UCS Service Profiles LAN SAN ServiceProfile
  • 21. Page 21 © Hortonworks Inc. 2014 Abstracting the Logical Architecture 21 Adapter Switch 10GE A Eth 1/1 FEX A 6200-A Physical Cable Virtual Cable (VN-Tag)Server vNIC 1 10GE A vEth 1 FEX A Adapte r 6200-A vHBA 1 vFC 1 Service Profile Cables vNIC 1 vEth 1 6200-A vHBA 1 vFC 1 (Server) Server ü  Dynamic, Rapid Provisioning ü  State abstraction ü  Location Independence ü  Blade or Rack What you getWhat you see Chassis
  • 22. Page 22 © Hortonworks Inc. 2014 Cisco UCS: Physical Architecture 22 6200 Fabric A 6200 Fabric B B200 VIC F E X B F E X A SAN  A   SAN  B  ETH  1   ETH  2   MGMT MGMT Chassis 1 Fabric Switch Fabric Extenders Uplink Ports Compute Blades Half / Full width OOB Mgmt Server Ports Virtualized Adapters Cluster Rack Mount C240 VIC FEX A FEX B
  • 23. Page 23 © Hortonworks Inc. 2014 Simple Scalability 23 Single Rack 16 servers Single Domain Up to 10 racks, 160 servers, 7PBytes Multiple Domains L2/L3 Switching
  • 24. Page 24 © Hortonworks Inc. 2014 Proven performance and linear scalability 24
  • 25. Page 25 © Hortonworks Inc. 2014 Simplified Management Throughout Cluster Lifecycle Provisioning Monitoring Maintenance Growth UCSM provides: •  Speed •  Ease of experimentation •  Consistency •  Simplicity •  Visibility
  • 26. Page 26 © Hortonworks Inc. 2014 Complete Network Flexibility Example: •  vNIC0 for management •  vNIC1 for internal •  vNIC2 for external •  No OS bonding needed with Fabric Failover Configure as vNICs and vLANs as you need with the click of a mouse 26 Data ingress/egress VNIC 0 VNIC 0 VNIC 1 L2/L3 Switching Data  Node  1   VNIC 2 Data  Node  2   6200 A VNIC 2 6200 B VNIC 1
  • 27. Page 27 © Hortonworks Inc. 2014 Creating QoS Policies and Enabling JumboFrames 27 !! Best Effort policy for management VLAN Platinum policy for cluster VLAN
  • 28. Page 28 © Hortonworks Inc. 2014 Switch Buffer Usage With Network QoS Policy to prioritize HBase Read Operations 0" 5000" 10000" 15000" 20000" 25000" 30000" 35000" 40000" Latency((us)( Time( READ","Average"Latency"(us)" QoS","READ","Average"Latency"(us)" 1" 70" 139" 208" 277" 346" 415" 484" 553" 622" 691" 760" 829" 898" 967" 1036" 1105" 1174" 1243" 1312" 1381" 1450" 1519" 1588" 1657" 1726" 1795" 1864" 1933" 2002" 2071" 2140" 2209" 2278" 2347" 2416" 2485" 2554" 2623" 2692" 2761" 2830" 2899" 2968" 3037" 3106" 3175" 3244" 3313" 3382" 3451" 3520" 3589" 3658" 3727" 3796" 3865" 3934" 4003" 4072" 4141" 4210" 4279" 4348" 4417" 4486" 4555" 4624" 4693" 4762" 4831" 4900" 4969" 5038" 5107" 5176" 5245" 5314" 5383" 5452" 5521" 5590" 5659" 5728" 5797" 5866" 5935" Buffer&Used& Timeline& Hadoop"TeraSort" Hbase" Read Latency Comparison of Non- QoS vs. QoS Policy ~60% Read Improvement HBase + Hadoop Map Reduce (Terasort)
  • 29. Page 29 © Hortonworks Inc. 2014 UCS Rack-Mount Servers UCS Blade Servers UCS Common Platform Architecture with Hortonworks SAN/NAS Arrays Enterprise Applications Single Platform for Traditional and Big Data Applications
  • 30. Page 30 © Hortonworks Inc. 2014 THANK YOU ajaysingh@hortonworks.com semckeow@cisco.com