Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Approximate Analysis via In-transit and Edge Resources

272 views

Published on

With increasing availability of `edge' resources, understanding how computation should be split across edge devices and data centre based systems remains an important challenge. Latency sensitive applications, e.g. in stream processing, video analytics, etc, are often constrained by the network connectivity between the edge resource (involved in data collection) and the analytics platform (often reachable over a multi-hop network). We describe how edge resources can be more effectively used in combination with data centre-based resources to support such types of applications. This work builds on a recently produced survey paper covering various applications (generally in scientific computing), which could benefit from more effective edge resource/data centre integration -- survey paper is available at: https://arxiv.org/abs/1609.03647

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Approximate Analysis via In-transit and Edge Resources

  1. 1. Omer F. Rana School of Computer Science & Informatics Cardiff University, UK ranaof@cardiff.ac.uk Twitter:@omerfrana In collaboration with: Rajiv Ranjan (Newcastle Univ., UK), Massimo Villari (U. Messina, Italy) Manish Parashar, Javier Diaz-Montes, Ali Zamani, Mengsong Zhou (Rutgers University, USA), Ioan Petri, Tom Beach, Yacine Rezgui (Cardiff University), Rafael Tolosana-Calasanz (Univ. of Zaragoza, Spain) Luiz Bittencourt (UNICAMP, Brazil) Approximate Analysis via In-transit and Edge Resources Systems Research Challenges in the Internet of Things Workshop 16th – 17th January 2017
  2. 2. https://goo.gl/PME0zu
  3. 3. The Rise of “Edge Computing” • Edge devices can have varying capability, data rates and programmability • Capability to also undertake some processing on these devices – Increasing availability of programming support – “software defined environments” As data volume and velocity increases – we need to rethink our Cloud/Data Centre architecture Move away from centralised solutions – to more Peer- 2-Peer Edge Clouds (Distributed Clouds) How do we support service provisioning and orchestration at the Network Edge From: Manish Parashar (Rutgers Univ.) Summary
  4. 4. CONTEXT Petri, I.et al. 2016. Coordinating data analysis and management in multi-layered clouds. EAI International Conference on Cloud, Networking for IoT systems, Rome, Italy, 26-27 October 2015.Proceedings of EAI International Conference on Cloud, Networking for IoT systems, Springer. Available at: http://orca.cf.ac.uk/78658/1/cn4iot15.pdf
  5. 5. • “Cloud of Things” (CoT) & “Fog Computing” (Cloudlets) – Extending computing to the edges of the network; – Overcoming latency constraints • Real world/pervasive systems benefiting from Cloud infrastructure (keep this much more open, not Telco Centric) – Mobile & task off-loading (balancing energy usage with computation capability) – Interest in reverse of Mobile Offloading (cf. Cloud offloading) • Edge clouds via: – “Cloudlets” (Fog Computing) + Mobile offloading (device clone in Cloud/annotated program call tree (local/Cloud) – e.g. MAUI, CloneCloud, ThinkAir, Moitree, EMCO, etc) – Limitations of the “last mile” network (common in Content Distribution Networks) – Akamai (web traffic @ 25Terabits/s, 2 trillion daily internet interactions) – “edge server” ensemble (175k servers, 85% one net hop) Defining “Edge Services” …
  6. 6. Aggregator Sensor Cluster Communication channel s6 s9 s10 s7 c2 s11 s15 s12 s13 c1 s14 s1 s4 s3 s2 s5 c3 Time s8 eUtility eU1 If g(x,y) > 100 then Buy Z shares of Stock S x y Decision trigger eU3 eU2 g(x,y) From Jeff Voas, Bob Marcus (NIST) – Terminology for IoT/Clouds Where should this be hosted?
  7. 7. Amazon Lambda – 100 millisecond billing • User creates a “lambda” function (e.g. Python code) that is triggered based on events (custom code, dependencies, uploaded to AWS) – as a “handler” method – Event source (e.g. AWS S3, DynamoDB stream associated with a table, HTTP/Amazon Gateway API, Amazon Cognito Auth./Mobile Services etc) publishes object-created events – Amazon Kinesis Stream Events (poll stream, generate events when new record detected) – User generated events • Resource allocation – Identify memory requirements => CPU requirements – Identify execution timeout (to prevent indefinite execution) – Can be sync./async. – Monitored metrics: Invocation rate, Errors, Durations (Latency), Throttles (via CloudWatch) • Environment – Container-based enactment/execution (AWS Lambda automatically does this) – Container maintained (“frozen”) after lambda function finishes (no init. needed) + /tmp space kept (transient cache for multiple executions) – Background processes/callbacks maintained when container resumes AWS Lambda -- compute nodes charged by 100ms -- not the hour. First 1M node.js exec/month for free -- a monitoring challenge (http://aws.amazon.com/lambda/)
  8. 8. ETSI – Mobile Edge Computing (MEC) & NFV http://www.etsi.org/technologies-clusters/technologies/mobile-edge-computing VM-based (QoS/QoE rules – resource need, latency, etc) Discover, advertise Services – DNS proxy Traffic rules & state
  9. 9. ETSI – Mobile Edge Computing (MEC) & NFV http://www.etsi.org/technologies-clusters/technologies/mobile-edge-computing Edge orchestrator (available resources, hosts, topology, etc) Application management Operations Support System (OSS) EPC/EPCaaS E-UTRAN access network: LTE, LTE-A + legacy Cloud-RAN
  10. 10. Modularization to build a flexible network architecture Enabling Concept: Architecture Modularization Kashif Mehmood, Telenor Research, Norway (Dec 2016)
  11. 11. Modularization natively supports Network slicing IoT Network slice with no mobility and relaxed security requirements 5G control Flow Management Access Function Security and AAA management Mobility Management Connectivity Management Context Aware Engine Kashif Mehmood, Telenor Research, Norway (Dec 2016)
  12. 12. Edge Boundary Network processor Ad hoc/mesh network What’s on the “Edge”?
  13. 13. Edge Boundary Network processor Ad hoc/mesh network What’s on the “Edge”?
  14. 14. Osmotic Computing M. Villari, M. Fazio, S. Dustdar, O. Rana & R. Ranjan, “Osmotic Computing: A New Paradigm for Edge/ Cloud Integration”, IEEE Cloud Computing Magazine, December 2016 https://www.computer.org/csdl/mags/cd/2016/06/mcd2016060076-abs.html • Migration of micro- services from Edge to Data Centers • Services hosted on light weight containers (e.g. Docker) • Migration triggered by monitored events (e.g. latency)
  15. 15. Osmotic Computing M. Villari, M. Fazio, S. Dustdar, O. Rana & R. Ranjan, “Osmotic Computing: A New Paradigm for Edge/ Cloud Integration”, IEEE Cloud Computing Magazine, December 2016 • Microservices can be both: management services and user owned and managed services • Aligns with work on EPCaaS/5G + user-owned network management
  16. 16. SCENARIO & IMPLEMENTATION
  17. 17. • Real time optimisation of building energy use – sensors provide readings within an interval of 15-30 minutes, – Optimisation run over this interval • The efficiency of the optimisation process depends on the capacity of the computing infrastructure – deploying multiple EnergyPlus simulations • Closed loop optimisation – Set control set points – Monitor/acquire sensor data + perform analysis with EnergyPlus – Update HVAC and actuators in physical infrastructure 17 EnergyPlus is a whole building energy simulation program that engineers, architects, and researchers use to model energy and water use in buildings. Modelling the performance of a building with EnergyPlus enables building professionals to optimize building design to reduce energy usage – http://apps1.eere.energy.gov/buildings/energyplus/
  18. 18. Instrumented Facility CENTRO SPORTIVO FIDIA ROMA (http://www.asfidia.it/) Pool (indoor) – size: 25m x 16m, depth: 1,60m to 2,10m, Capacity: 760 m³ Learning Pool (indoor) – size: 16m x 4 m, depth: 1m, Capacity: 64 m³ 1 Gym (indoor) provided of electric equipment (electric bicycles, etc…) 1 Fitness room (indoor) size: 18m x 9m x 3m, Volume: 486m³ 1 Volleyball court (indoor) – size: 40m x 28m x 8m, Volume: 8960 m³ 2 Tennis/Five-a-side courts (outdoor, with changing rooms) – size: 30m x 20m
  19. 19. Federated Clouds in Building Optimisation I. Petri, O. Rana, J. Diaz-Montes, M. Zou, M. Parashar, T. Beach, Y. Rezqui, and H. Li, "In-transit Data Analysis and Distribution in a Multi-Cloud Environment using CometCloud," The International Workshop on Energy Management for Sustainable Internet-of-Things and Cloud Computing. Co-located with International Conference on Future Internet of Things and Cloud (FiCloud 2014), Barcelona, Spain, August 2014.
  20. 20. 20 INPUT OUTPUT
  21. 21. 21 Ioan Petri, Omer Rana, Yacine Rezgui, Haijiang Li, Tom Beach, Mengsong Zou, Javier Diaz Montes, Manish Parashar: “Cloud Supported Building Data Analytics”. DPMSS workshop alongside CCGRID 2014: pp 641- 650, Chicago, USA. IEEE Computer Society Press.
  22. 22. • In the context of single cloud federation (3 workers) only 37 out of 72 tasks are completed within the deadline of 1 hour. Extend deadline to 1 h 30 min • Exchanging 15 tuples between the two federation sites, with increased cost for execution and storage.
  23. 23. 23 M. Zou, A. Zamani, J. Diaz-Montes, I. Petri, O. Rana, M. Parashar, “Leveraging in-transit computational capabilities in federated ecosystems”. IEEE Symposium on Service-Oriented System Engineering (SOSE), Oxford, UK, March 29 -April 2 2016. In-transit node In-transit node Edge Devices
  24. 24. • Can we characterise behaviour of in-transit nodes? – Network Data Centers vs. Edge Data Centers – Goes beyond the use of simple programmable network characterisation • Consider job (J) consisting of (k) tasks – Deadline(J); Budget(J); CRatio(J) – with k’ <= k • Consider that there is some waiting time W(J) before a job J can be executed at resource provider. – Job is idle (queued) and it is using storage space at the destination resource. • Identify & configure a data path that leverages in-transit computation to take advantage of W(J) for a job. 24 Characterising “In-Transit” Nodes
  25. 25. Characteristing the problem: To leverage in-transit computation and minimize the amount of time a job is idle at destination, the objective of our problem becomes maximizing the amount of tasks completed in-transit 25 An Optimisation Problem
  26. 26. To leverage in-transit computation and minimize the amount of time a job is idle at destination, the objective of our problem becomes maximizing the amount of tasks completed in-transit subject to being ready to compute at destination resource d at the scheduled time (2), performing computation within the given deadline (3), keeping costs within the given budget (4), and making sure that the completion ratio is satisfied (5): 26 An Optimisation Problem … constraints ratio between completed tasks and total number of tasks composing job J.
  27. 27. Cost(J) is the overall cost of computing job J, 27 Cost Analysis M. Zhou, A. Zamani, J. Diaz-Montes, I. Petri, O. Rana, M. Parashar & A. Anjum, “Deadline Constrained Video Analysis via In-Transit Computational Environments”, IEEE Transactions on Services Computing, 2017 (to appear)
  28. 28. 28 Leveraging In-Transit Computational Capabilities in Federated Ecosystems. IEEE SOSE 2016 Sites implemented as VMs on Amazon SDN capability emulated via Mininet. Each VM had one Mininet host and one Mininet switch • Routing tables managed via POX SDN controller Switches were connected to each other using Generic Routing Encapsulation (GRE) tunnelling, B/W allocation via a token bucket filter (i) Base: in-transit resources and sites have the same computational power; (ii) Higher: in-transit resources are less powerful than those at the resource providers’ sites; and (iii) Highest: in-transit resources are much less powerful than site resources.
  29. 29. 29 Job Properties & Resource Types SLA: at least 60% of the tasks, within a job, must be completed before the deadline
  30. 30. Considered Scenarios • Traditional – Request resources from a cloud provider – no awareness of in-transit resources – Traditionally, all computation at the data center • Traditional + In-transit: – An intermediate controller allocates resources within the network (without client involvement or awareness) • In-transit aware (In-transit 2) – Client aware of available in-transit nodes – Willing to accept a wider range of offers from Cloud providers – Requests controller to perform in-transit optimisation
  31. 31. 31 Job Completion & Overheads
  32. 32. 33 Costs & Revenue
  33. 33. Edge-based Approximation … 1 • “Performing exact computation or operating at peak-level service demand require a high amount of resources, allowing selective approximation or occasional violation of the specification can provide disproportionate gains in efficiency.” • Techniques used: Precision Scaling Loop Perforation Load value approximation Task dropping/skipping Memory access skipping Data sampling Program versions of different accuracy Using inexact hardware (SRAM, eDRAM, DRAM, GPU, etc) Voltage scaling Refresh rate reduction Inexact reads/writes Lossy compression Neural networks Compiler-based strategies S. Mittal, “A Survey of Techniques for Approximate Computing”, ACM Computing Surveys, Vol. 48, Issue 4, May 2016
  34. 34. Edge-based Approximation … 2 • Combine capability in Data Centre with “approximate” algorithms in transit or at the edge • EnergyPlus (as at present) + a trained neural network (as a function approximator for EnergyPlus behaviour) • But why? – EnergyPlus ~ Execution time(Minutes) – Neural Network Training ~ Execution time (Minutes) – Trained (FF) Neural Network ~ Execution time (Seconds) • Combine more accurate model execution with approximate model via a learned neural network • Trigger re-training when input parameters change significantly – Each EnergyPlus execution provides potential training data for the neural network
  35. 35. Three-phased execution • Phase 1: EnergyPlus simulations – 30 simulations to acquire initial data • Phase 2: Co-schedule EnergyPlus with ANN training – wait for ANN training threshold to be reached • Phase 3: Deploy trained ANN on edge and in- transit nodes – Change in input data parameter range would trigger Phase 1 again • Can in-transit and edge resource be used more effectively?: – Increase job acceptance and completion – Increase potential revenue earned by edge and in-transit resources
  36. 36. Edge-based Approximation
  37. 37. Edge-based Approximation … overhead
  38. 38. Conclusion … • Emergence of data-driven + data intensive applications • Use of Cloud/data centres and edge nodes collectively • Pipeline-based enactment a common theme – Various characteristics – buffer management and data coordination – Model development that can be integrated into a workflow environment • Automating application adaptation – … as infrastructure changes – … as application characteristics change

×