Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Optimising Service Deployment and Infrastructure Resource Configuration

128 views

Published on

This is a presentation delivered by Alec Leckey (Intel) at the 2nd Data Centre Symposium held in conjunction with the National Conference on Cloud Computing and Commerce (http://2018.nc4.ie/) on April 10, 2018 in Dublin, Ireland.

Learn more about the RECAP project: https://recap-project.eu/
Install the Intel Landscaper: https://github.com/IntelLabsEurope/landscaper

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Optimising Service Deployment and Infrastructure Resource Configuration

  1. 1. Alec Leckey Intel Labs
  2. 2. Data Centre challenges. • Decrease Operational Costs, • Maintain Consistent Performance, • Increase Scale, • Innovate, and deliver more value. TCO Performance Over-provisioning Utilization Energy Allocation Management Availability More Capacity More Complexity Application Growth Every 2 Years Data Volume Every 18 Months Operational Costs Every 8 Years Reduction in Compute Costs Every 2 Years 2x Increase in Management & Administration1 50% 2x 2x 8x More Resources 1 – IDC Directions ‘14 - 2014 Source: Worldwide and Regional Public IT Cloud Services 2013-2017 Forecast. IDC (August 2013) http://www.idc.com/getdoc.jsp?containerId=242464
  3. 3. Why we need to understand infrastructure 3 • T-Nova* project demonstrates a 10X performance improvement when an Network Traffic Analyzer is landed onto a machine that is SR-IOV enabled • However it’s not feasible to manually place workloads at scale. • How can we automatically match workloads to suitable infrastructure? VNFC Performance - Bytes Per Second Total Traffic Standard Deployment Enhanced Deployment Matching workload types to hardware features can improve performance * http://www.t-nova.eu/
  4. 4. Infrastructure Landscape Goal: Support setup and run-time orchestration for optimised service delivery by defining and maintaining a layered landscape: • Physical • Virtual • Service Nodes in each layer are enriched by telemetry
  5. 5. Landscaper Overview Graph representation of physical, virtual, service layers of infrastructure landscape • Landscape Nodes have a category: • Compute, Storage, Network • Landscape History • edges have a ‘from time’ and ‘to time’ • Landscape State • landscape nodes have state nodes • Data gathered by collectors • Data export via RESTful API • json string - networkx Xeon E5Xeon Phi AES-NI AtomSSD NVM 10Gb Virtual Storage Object store Video transcode Wordpress ERP Virtual Network Virtual Machine
  6. 6. Different Landscape Views
  7. 7. • Plugin architecture • Can detect and update based on events • Current Collectors • HwLoc (internal components) and CPUinfo (enrich the core/pu attributes) • OpenStack Heat • OpenStack Nova • OpenStack Neutron • OpenStack Cinder • Docker Swarm • OpenDayLight • Importer (.csv) Landscaper Collectors
  8. 8. Service Stack (1x view)
  9. 9. pu Service Layer Virtual Layer Physical Layer vm machine stack pcidev bridge numanode core cache socket/ package networkvnic switch subnet switch bridge pcidev osdev_storage osdev_network osdev_network puCompute Category Network Category Storage Category volume cache cache cache L3 Cache L2 Cache L1 Data Cache L1 Instruction Cache Heat Cinder Nova Neutron OpenDaylight (cpuinfo) hwloc + cpuinfo Service Stack (10x view)
  10. 10. machine numanode bridge pcidev ID, NAME, CATEGORY, LAYER, ARCHITECTURE, OS_NAME, OS_VERSION, OS_RELEASE, OS_INDEX, ALLOCATION, PROCESS_NAME, HW_LOC_VERSION, DMI_BOARD_VENDOR, DMI_BOARD_NAME, DMI_BOARD_SERIAL, DMI_BOARD_VERSION, DMIN_BIOS_DATE, DMI_BIOS_VENDOR, DMI_BIOS_VERSION, DMI_SYS_VENDOR, DMI_CHASSIS_VENDOR, DMI_CHASSIS_TYPE, DMI_CHASSIS_ASSET_TAG, DMI_CHASSIS_SERIAL, DMI_PRODUCT_NAME, DMI_PRODUCT_UUID, DMI_PRODUCT_VERSION, LINUX_GROUP, BACKEND, NODESET, COMPLETE_NODESET, ALLOWED_NODESET, CPUSET, COMPLET_CPUSET, ALLOWED_CPUSET, ONLINE_CPUSET, COSTS ID, NAME, CATEGORY, LAYER, OS_INDEX, ALLOCATION, LOCAL_MEMORY, NODESET, COMPLETE_NODESET, ALLOWED_NODESET, CPUSET, COMPLET_CPUSET, ALLOWED_CPUSET, ONLINE_CPUSET, ID, NAME, CATEGORY, LAYER, OS_INDEX, ALLOCATION, BRIDGE_PCI, BRIDGE_TYPE, PCI_LINK_SPEED, PCI_BUS_ID, PCI_TYPE, DEPTH ID, NAME, CATEGORY, LAYER, OS_INDEX, ALLOCATION, PCI_SLOT, PCI_LINK_SPEED, PCI_BUS_ID, PCI_TYPE Compute Meta-Data: Physical Layer
  11. 11. package cache core pu ID, NAME, CATEGORY, LAYER, OS_INDEX ALLOCATION, CPU_FAMILY_NUMBER, CPU_VENDOR, CPU_MODEL_NUMBER, CPU_MODEL, CPU_STEPPING, NODESET, COMPLETE_NODESET, ALLOWED_NODESET, CPUSET, COMPLET_CPUSET, ALLOWED_CPUSET, ONLINE_CPUSET, ID, NAME, CATEGORY, LAYER, ALLOCATION, CACHE_SIZE, CACHE_LINESIZE, CACHE_ASSOCIATIV ITY NODESET, COMPLETE_NODES ET, ALLOWED_NODESE T, CPUSET, COMPLET_CPUSET, ALLOWED_CPUSET, ONLINE_CPUSET, ID, NAME, CATEGORY, LAYER, OS_INDEX, ALLOCATION, NODESET, COMPLETE_NODESET, ALLOWED_NODESET, CPUSET, COMPLET_CPUSET, ALLOWED_CPUSET, ONLINE_CPUSET, ID, NAME, CATEGORY, LAYER, OS_INDEX, WP, ALLOCATION, CPUID_LEVEL, CPU_CORES, CORE_ID, CPU MHZ, MICROCODE, VENDOR_ID, CPU_FAMILY, APICID, INTIAL_APICID, SIBLINGS, ADDRESS_SIZES, MODEL, MODEL_NAME, STEPPING, CACHE_SIZE, CACHE_ALLIGNMENT, NODESET, COMPLETE_NODESET, ALLOWED_NODESET, CPUSET, COMPLET_CPUSET, ALLOWED_CPUSET, ONLINE_CPUSET, PHYSICAL_ID, FPU, FLAGS, BOGOMIPS, CLF_FLUSHSIZE, Compute Meta-Data: Physical Layer
  12. 12. Enrichment Through Telemetry 13 Snap: a Lightweight modular programmable telemetry system • Unified namespace, Configurable at run time, Dynamically derived metrics • Integration of diverse data for analysis • Calculation of generic node metrics across the stack (e.g. Utilization & Saturation) Instrumentation Logs Capture Store Transform & Prepare Access
  13. 13. Snap - architecture • Full stack: motherboards, cpus, memory, disks, operating systems, hypervisor, guest operating system, hosted applications • Performant. Scalable. Dynamically reconfigurable. Secure. Extensible. Manageable.
  14. 14. Snap - telemetry Process PublishCollect $ go get github.com/intelsdi-x/snap http://snap-telemetry.io/ Plugin Catalogue (github)
  15. 15. Adaptive telemetry – anomaly detection approach 16 Challenge: Sending all data all the time • overflow the system with “redundant” information. Goal: reduce data transfer while preserving essence Approach & Findings: • Pluggable anomaly detection algorithm • Increased transmission rate around outliers only • Transmissions typically reduced by >10x Time elapse (seconds) %ageutilizationofCPU Machine 1 Machine 2 Machine 3
  16. 16. Contextual Information 17 • Automatic application of USE methodology • Ranking & Cost functions • Supports comparison of service configurations & generation of deployment template for specific workloads Representation of SDI sub-graph including performance
  17. 17. Application to large scale systems 18 • Optimization of Initial placement • Re-balancing actuations • Troubleshooting • Accounting • Security • Capacity planning Using the landscape data it is possible to develop models for:
  18. 18. Network Model for vCDN deployments Technical challenges: • Performance of virtualisation technologies, especially virtualised storage. • Orchestration of a multi-tenant vCDN service and infrastructure. • Optimisation of placement and scaling of vCDN system. • Monitoring and repair of the vCDN system. • Detection and mitigation of impact of “noisy neighbours”.
  19. 19. 1. Load to Capacity Requirement Mapping 2. Load to Telemetry Mapping 3. Infrastructure Configuration Optimization Resource A Telemetry for Resource A Infrastructure BT Workload Resource B Telemetry for Resource B KPI 1 KPI 2 Cost Resource A Telemetry for Resource A Infrastructure Workload Resource B Telemetry for Resource B KPI 1 KPI 2 Cost Resource A Telemetry for Resource A Infrastructure Workload Resource B Telemetry for Resource B KPI 1 KPI 2 Cost Optimization approaches
  20. 20. 22 Utility Theory approach
  21. 21. BT as Infrastructure Provider Content Operator End User Requesting for Content Part A: Provider vs Customer Part B: Provider vs Customer Provider vs Customer
  22. 22. Landscape Model 24 UK Exchange PointCore SitesMetro SitesMulti-Service Access Points Network Switch Physical Servers Virtual Machines Service Stacks Legend
  23. 23. Content Provider consumers 2 3 MSAN Metro Site Core Exchange Costs 1
  24. 24. Success Criteria Create a system to: • model the performance of VNF’s prior to deployment • learn the configuration of existing networks and predict the impact of topology, application and infrastructure changes • improve the placement decisions of Orchestration systems to improve infrastructure utilization whilst guaranteeing performance and availability SLAs • put in place remediation rules a priory to failures happening. Ensuring rapid protection using the minimum of additional resources • automate the remediation of unexpected/unpredicted failures in a timely fashion (several minutes).

×