Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Software-Defined Systems for Network-Aware Service Composition and Workflow Placement

566 views

Published on

The presentation slides of my Ph.D. thesis proposal ("CAT" as known in my university). I received a score of 18/20.

Supervisors:
Prof. Luís Veiga (IST, ULisboa)
Prof. Peter Van Roy (UCLouvain)

Jury:
Prof. Javid Taheri (Karlstad University)
Prof. Fernando Mira da Silva (IST, ULisboa)

Published in: Engineering
  • Be the first to comment

Software-Defined Systems for Network-Aware Service Composition and Workflow Placement

  1. 1. Software-Defined Systems for Network-Aware Service Composition and Workflow Placement Pradeeban Kathiravelu Supervisors: Prof. Luís Veiga Prof. Peter Van Roy Lisboa, Portugal. 18/06/2018.
  2. 2. 2/33 Introduction ● Network Softwarization: Making the networks “programmable”. – Software-Defined Networking (SDN) ● Unifying the control plane away from network data plane devices. ● Global view and control of the data center network via a single controller. – Network-Functions Virtualization (NFV) ● Virtualizing network middleboxes into network functions. ● Firewall, intrusion detection, Network Address Translation (NAT), .. ● Software-Defined Systems (SDS). – Frameworks extending, or inspired by, SDN. – Storage, Security, Data center, .. ● Improved configurability: Separation of mechanisms from policies.
  3. 3. 3/33 Motivation ● Software-Defined Systems to compose and place service workflows beyond data center-scale. – Bring the control back to the service user.
  4. 4. 4/33 Research Questions ● Can we uniformly separate the infrastructure from the network, at various stages of development, from data centers to the cloud? ● Can such network softwarization offer economic and performance benefits to the end users? ● Can we orchestrate the services for workflow compositions efficiently, by extending SDN to the cloud and edge environments? ● Can we improve the performance of big data applications by scaling the execution environment in a network-aware manner?
  5. 5. 5/33 Current Contributions ● Network softwarization as an encompassing approach, from design to cloud deployments (CoopIS’16, SDS’15, and IC2E’16). ● Differentiating network flows based on user policies with SDN and middleboxes (EI2N’16 and IM’17). ● Network-aware big data executions (SDS’18 and CoopIS’15). ● Extending network softwarization to wide area for service composition and workflow placement. – Cloud-Assisted Networks as an alternative connectivity provider (Networking’18). – Composing service chains at the edge (ETT’18, ICWS’16, and SDS’16).
  6. 6. 6/33 I) Cloud-Assisted Networks as an Alternative Connectivity Provider Kathiravelu, P., Chiesa, M., Marcos, P., Canini, M., Veiga, L. Moving Bits with a Fleet of Shared Virtual Routers. In IFIP Networking 2018. (CORE Rank A). May 2018. pp. 370 – 378.
  7. 7. 7/33 Introduction ● Increasing demand for bandwidth. ● Decreasing bandwidth prices. ● Pricing Disparity. E.g. IP Transit Price, 2014 (per Mbps) – USA: 0.94 $ – Kazakhstan: 15 $ – Uzbekistan: 347 $ ● What about latency? – Online gaming. – High-frequency trading. – Remote surgery.
  8. 8. 8/33 Motivation ● Cloud providers often have a dedicated connectivity* . – Increasing number of regions and points of presence. – Well provisioned and maintained network. ● Can a network overlay over cloud instances be used as an alternative connectivity provider? – High-performance. – Cost-effectiveness. – Optional network services. * James Hamilton, VP, AWS (AWS re:invent 2016).
  9. 9. 9/33 Cloud-Assisted Networks ● Virtual/overlay networks over cloud environments
  10. 10. 10/33 ● Better control over the path, compared to the Internet paths. Our Proposal: NetUber ● A third-party virtual connectivity provider with no fixed infrastructure. – A cloud-assisted overlay network, leveraging multi-cloud infrastructures.
  11. 11. 11/33 Better Alternative to SaaS Replication ● Deploy Software-as-a-Service (SaaS) applications in just one or a few regions. – Use NetUber to access them from other regions. ● Access to more regions via multiple cloud providers. – Ohio (AWS, but not GCP); London (both AWS and GCP); Belgium (GCP, but not AWS).
  12. 12. 12/33 A.Cost of Cloud Instances. – Charged per second. – Very high. [Spot instances: volatile, but up to 90% savings] B.Cost of Bandwidth – Charged per data transferred. – Also very high. [No cheaper alternative.] C.Cost to connect to the cloud provider. – Often managed by the cloud provider. E.g: AWS Direct Connect – Typically, the end user pays directly to the cloud provider. Monetary Costs to Operate NetUber
  13. 13. 13/33 Evaluation ● Cheaper point-to-point connectivity. – AWS as the overlay cloud provider. – Compared against a transit provider and another connectivity provider with a large global backbone network. ● Better throughput or Reduced Latency. – Compared to ISPs. – Traffic sent from: RIPE Atlas Probes and distributed servers. – Destination: AWS distributed servers from the AWS regions. – ISPs vs. ISP to the nearest AWS region and then NetUber overlay. ● Network Services: Compression, Encryption, ..
  14. 14. 14/33 1) Cheaper Point-to-Point Connectivity ● Expense for 10 Gbps flat connectivity – Measured for transfers from EU and USA. – Cheaper for data transfers <50 TB/month.
  15. 15. 15/33 2) Improve Latency with Cloud Routes ● Instead of sending traffic A → Z, can we send A → B → Z? ○ A is closer to B. B and Z are servers in cloud regions. ○ B and Z are connected by NetUber overlay.
  16. 16. 16/33 Ping times: ISP vs. NetUber (via region, % Improvement) ● NetUber cuts Internet latencies up to a factor of 30%. ● The use of AWS Direct Connect would make this even faster.
  17. 17. 17/33 Key Findings • Previous research focus on technical side. – Not economical aspects - More expensive. ● Industrial efforts on leveraging cloud or data center infrastructure to offer connectivity. – Teridion - Internet fast lanes for SaaS providers. – Voxility - As an alternative to transit providers. ● NetUber - A cheaper alternative (< 50 TB/month). – A connectivity provider that does not own the infrastructure – “Internet Fast-routes” through cloud-assisted networks. – Better than ISPs (< 100 Mbps, often with a cap) for end users.
  18. 18. 18/33 II) Composing Network Service Chains at the Edge: A Resilient and Adaptive Software-Defined Approach Kathiravelu, P., Van Roy, P., & Veiga, L. In Transactions on Emerging Telecommunications Technologies (ETT). (JCR IF: 1.535, Q2). 2018. Wiley. Accepted for publication.
  19. 19. 19/33 Motivation ● Increasingly, network services placed at the edge. – Limitations in hosting all the network services on-premise. – Closer to the users than centralized clouds. ● Network Service Chaining (NSC) – Finding the optimal service chain for a user request. – Service Level Objectives of the service chain users.
  20. 20. 20/33 Our Proposal: Évora ● A graph-based algorithm to incrementally construct and deploy service chains at the edge. ● An Orchestrator in the user device, to place and migrate service chains, adhering to the user policies. ● An architecture extending SDN to wide area to efficiently support the service chains at the edge.
  21. 21. 21/33 Évora Approach ● Initialize once per user device: – Step 1) Construct a service graph. ● Initialize once per a user’s service chain. – Step 2) Find matching subgraphs for the user’s service chain as partial, potential chains. – Step 3) Complete matches → Potential Chains. ● Initialize once per <nsc, policy> pair – Step 4) Service chain placement at the best fit among the possible chains, based on a user-defined policy. ● Execute the service chain.
  22. 22. 22/33 1) Initialize the orchestrator (Once per device) ● Construct a service graph in the user device. ― As a snapshot of the available service instances at the edge.
  23. 23. 23/33 2) Initialize Service Chain (Once per each chain) ● Construct matching subgraphs as potential chains. – while noting the individual service properties ● Incrementally calculate a “penalty value” for each potential chain that is being constructed. – with user-given weight to the properties. ● monthly cost (C), throughput (T), end-to-end latency (L), ..
  24. 24. 24/33 3) Complete matches → Potential Service Chain Placements ● Ability to place the entire service chain in the matching subgraph. – Complete matching subgraph, i.e. a potential service chain placement is found. ● Record. ● Stop procedure once all the nodes are traversed. ● Subsequent NSC executions require no initialization.
  25. 25. 25/33 4) The Service Chain Placement ● Penalty function, with normalized values of C, L, and T. – α,β,γ ← Non-negative integers specified by user. ● Solve this as a Mixed Integer Linear Problem. ● The penalty function can be extended with powers. ● Place the current NSC (<nsc,policy> pair) on the service composition with minimal penalty value. ● Possible updates and migrations. – Future service unavailability → choose the next.
  26. 26. 26/33 Solution Architecture ● Extending SDN to a multi-domain edge environment. – With Message-Oriented Middleware (MOM).
  27. 27. 27/33 Evaluation ● Microbenchmark how user policies are satisfied with Évora for service chains among various alternatives. – Algorithm effectiviness in satisfying user policies. – Efficacy: Closeness to optimal results ● minimizing penalty function results in improved quality of experience
  28. 28. 28/33 User policies with two attributes ● Location of the circles → Properties (C, L, and T). ● Darker circles – chains with minimal penalty, the ones that we prefer (circled). T ↑ and C ↓ T ↑ and L ↓ C ↓ and L ↓ ● Results show user policies supported fairly well.
  29. 29. 29/33 ● Policies with three attributes: One given more prominence (weight = 10), than the other two (weight = 3). ● Results show efficient support for multiple attributes with different weights. Radius of the circles – Monthly Cost
  30. 30. 30/33 Key Findings ● More and more services hosted at the edge. ● NSCs have more constraints than stand-alone VNFs. ● Évora supports efficient chaining of network services. – Leveraging a software-defined approach for services ● Extending SDN with MOM.
  31. 31. 31/33 1) Software-Defined Cyber-Physical Systems (CPS) workflows in the edge ● Can we tackle some design, operational, and scalability challenges of CPS? – By representing them as software-defined service compositions at the edge? III) Ongoing work SDS’17, M4IoT’15, and CLUSTER (Invited from SDS’17. Unde review).
  32. 32. 32/33 2) A Service-Oriented Workflow for Big Data Research at the Edge ● Analyse decentralized big data (TB-scale) with a service based data access and virtual integration approach. – Addressing data related optimizations as service chains. ● Data cleaning, incremental data integration, and data analysis. CoopIS’15, SDS’18, and DAPD (Distributed and Parallel Databases. Invited from DMAH’17. Under Review).
  33. 33. 33/33 Thank you! pradeeban.kathiravelu@tecnico.ulisboa.pt Acknowledgements: Prof. Marco Canini (KAUST) Prof. Ashish Sharma (Emory) Prof. Helena Galhardas (IST) Prof. Tihana Galinac Grbac (URijeka) Prof. Marco Chiesa (KTH) Ed Warnicke (Cisco)
  34. 34. 34/33 ~ Thanks ~
  35. 35. 35/33 Additional Slides
  36. 36. 36/33 Publications Overview
  37. 37. 37/33 Thesis Overview
  38. 38. 38/33 (1.1) *SDNSim* ● CoopIS’16 and SDS’15
  39. 39. 39/33 Introduction ● Network architectures and algorithms simulated or emulated at early stages of development. ● SDN is expanding in its scope. – Programmable networks → continuous development. – Native integration of network emulators into SDN controllers.
  40. 40. 40/33 How well the SDN simulators fare? ● Network simulators supporting SDN and emulation capabilities. – NS-3. ● Cloud simulators extended for cloud networks with SDN. – CloudSim → CloudSimSDN. However.. ● Lack of “SDN-Native” network simulators. – Simulators not following the Software-Defined Systems paradigm. – Policy/algorithmic code locked in simulator-imperative code. ● Need for easy migration and programmability.
  41. 41. 41/33 Goals ● A simulator for SDN Systems. ● Extend and leverage the SDN controllers in cloud network simulations. – Bring the benefits of SDN to its own simulations! ● Reusability, Scalability, Easy migration, . . . – Run the control plane code in the actual controller (portability). – Simulate the data plane (scalability, resource efficiency). ● by programmatically invoking the southbound of SDN controller.
  42. 42. 42/33 Our Proposal: Software-Defined Simulations ● Separation of control plane and (simulated) data plane. ● Integration with SDN controllers.
  43. 43. 43/33 SDNSim: A Framework for Software-Defined Simulations. ● Network system to be simulated. – Expressed in “descriptors”. ● XML-based description language. – Parsed and executed in SDNSim simulation sandbox. ● A Java middleware. ● Simulated application logic. – Deployed into controller.
  44. 44. 44/33 Contributions and SDNSim Approach 1. Reusable simulation building blocks. ● Simulating complex and large-scale SDN systems. – Network Service Chaining (NSC).
  45. 45. 45/33 1. Reusable simulation building blocks. ● Simulating complex and large-scale SDN systems. – Network Service Chaining (NSC). – As a case of Network Function Virtualization (NFV).
  46. 46. 46/33 2. Support for continuous development and iterative deployment. ● Checkpointing and versioning of simulated application logic. – Incremental updates: changesets as OSGi bundles in the control plane.
  47. 47. 47/33 3. State-aware simulations. ● Adaptive scaling through shared state. – Horizontal scalability through In-Memory Data Grids. – State of the simulations for scaling decisions. ● Pause-and-resume simulations. – Multi-tenanted parallel executions.
  48. 48. 48/33 4. Expressiveness. ● Data plane: XML-based network representation. ● Control plane: Java API.
  49. 49. 49/33 Prototype Implementation ● Oracle Java 1.8.0 - Development language. ● Apache Maven 3.1.1 - Build the bundles and execute the scripts. ● Infinispan 7.2.0.Final - Distributed cluster. ● Apache Karaf 3.0.3 - OSGi run time. ● OpenDaylight Beryllium - Default controller. ● Multiple deployment options: – As a stand-alone simulator. – Distributed execution with an SDN controller. – As a bundle in an OSGi-based SDN controller.
  50. 50. 50/33 Evaluation Deployment Configurations ● Intel Core TM i7-4700MQ – CPU @ 2.40GHz 8 processor. – 8 GB memory. – Ubuntu 14.04 LTS 64 bit operating system. ● A cluster of up to 5 identical computers.
  51. 51. 51/33 Evaluation Strategy ● Benchmark against CloudSimSDN. – Cloud2Sim for distributed execution. ● Simulating routing algorithms in fat-tree topology. ● Experiments repeated 6 times. ● Data center simulations of up to 100,000 nodes.
  52. 52. 52/33 Performance and Problem Size ● SDNSim yields higher performance for larger simulations.
  53. 53. 53/33 Horizontal Scalability ● Smart scale-out. ● Higher horizontal scalability.
  54. 54. 54/33 Performance with Incremental Updates ● Smaller simulations: up to 1000 nodes. ● SDNSim: controller and middleware execution completion time.
  55. 55. 55/33 Performance with Incremental Updates ● Initial execution takes longer - Initializations.
  56. 56. 56/33 Performance with Incremental Updates ● Faster executions once the system is initialized.
  57. 57. 57/33 Incremental Updates: Test-driven development ● Faster executions once the system is initialized.
  58. 58. 58/33 Incremental Updates: Test-driven development ● Even faster executions for subsequent simulations.
  59. 59. 59/33 Incremental Updates: Test-driven development ● No change in simulated environment – Deploy changesets to controller.
  60. 60. 60/33 Incremental Updates: Test-driven development ● No change in simulated environment - Revert changeset.
  61. 61. 61/33 Performance with Incremental Scaling ● No change in controller - scale the simulated environment.
  62. 62. 62/33 Network Construction with Mininet and SDNSim ● Adaptive Emulation and Simulation. – Simulate when resources are scarce for emulation.
  63. 63. 63/33 Automated Code Migration: Simulation → Emulation ● Time taken to programmatically convert an SDNSim simulation script into a Mininet script.
  64. 64. 64/33 Conclusion Conclusions ● SDNSim is an SDN-aware network simulator – Built following the SDN paradigm ● Separation of data layer from the control layer and application logic. – Enabling an incremental modelling of cloud networks. ● Performance and scalability. – Complex network systems simulations. – Reuse the same controller code algorithm developers created to – simulate much larger scale deployments. – Adaptive parallel and distributed simulations. Future Work ● Extension points for easy migrations. – More emulator and controller integrations.
  65. 65. 65/33 (1.2) *SENDIM* ● Simulation, Emulation, aNd Deployment Integration Middleware ● IC2E’16
  66. 66. 66/33 SENDIM Integration
  67. 67. 67/33
  68. 68. 68/33
  69. 69. 69/33
  70. 70. 70/33 (2) *NetUber*
  71. 71. 71/33 NetUber Application Scenarios ● Cheaper transfers between two endpoints. ● Higher throughput or reduced latency. ● Better alternative to SaaS replication. ● Network services (compression, encryption, ..).
  72. 72. 72/33 Scenario (1 of 4): Cheaper Transfers A) Cost of Cloud Instances: Observations ● 10 Gbps R4 instance (r4.8xlarge) pairs offered only maximum of 1.2 Gbps of data transfer inter-region. – 10 Gbps only inside a placement group. ● We need more pairs of instances!
  73. 73. 73/33 Spot Instances ● Cheaper (up to 90% savings), but volatile, instances. ● Price Fluctuations - Future price unpredictable (for EC2). ● Differing prices among availability zones of a region. – Buy from the cheapest availability zones at the moment. – Maintain instances in the cheap availability zones.
  74. 74. 74/33 B) Cost of Bandwidth: Price disparity is real! Scenario (1 of 4): Cheaper Transfers Regions 1 - 9 (US, Canada, and EU) remain much cheaper than the others.
  75. 75. 75/33 C) Cost to Connect to the Cloud Provider Scenario (1 of 4): Cheaper Transfers ● Connect the end-user to the cloud servers. ● Often provided by the cloud provider. ● Example: Amazon Direct Connect. ● Charged per port-hour (e.g. how many hours a 10 GbE port is used).
  76. 76. 76/33 Scenario (2 of 4): Higher throughput or reduced latency ● Cloud-Assisted Point-to-Point Connectivity – Better control over the path, compared to the Internet paths. – Also cheaper than MPLS networks or transit providers. ● Thanks to spot instances.
  77. 77. 77/33 Scenario (3 of 4): Better Alternative to SaaS Replication ● See slide 8
  78. 78. 78/33 Scenario (4 of 4): Network Services ● NetUber uses memory-optimized R4 spot instances. – Each instance with 244 GB memory, 32 vCPU, and 10 GbE interface. ● Possibility to deploy network services at the instances. ● Network services. – Value-added services for the customer. ● Encryption, WAN-Optimizer, load balancer, .. – Services for cost-efficiency. ● Compression.
  79. 79. 79/33 Conclusion ● A connectivity provider that does not own the infrastructure. ● “Internet Fast-routes” through cloud-assisted networks. – Better than ISPs (~50 - 75 Mbps, often with a cap) for end- users. ● Cheaper point-to-point connectivity. – Cheaper than transit providers and similar offerings (for < 50 TB/month). ● Future work: – Evaluate NetUber for more parameters (loss rate, jitter, ..) – Evaluate the cost with more cloud providers and pairs of regions.
  80. 80. 80/33 (3) *SMART* ● EI2N’16 and IM’17
  81. 81. 81/33 Introduction ● Cloud data centers consist of various tenants with multiple roles. ● Differentiated Quality of Service (QoS) in multi-tenant clouds. – Service Level Agreements (SLA). – Different priorities among tenant processes. ● Network is shared among the tenants. – End-to-end delivery guarantee despite congestion for critical flows.
  82. 82. 82/33 SDN for Clouds ● Cross-layer optimization of clouds with SDN. – Centralized control plane of the network-as-a-service.
  83. 83. 83/33 Motivation ● How to offer differentiated QoS and SLA in multi-tenant networks? – Application-level user preferences and system policies. – Performance guarantees at the network-level. – More potential in having them both! – SDN, Middleboxes, . . .
  84. 84. 84/33 Goals ● How to offer differentiated QoS and SLA in multi- tenant networks? – Leverage SDN to offer a selective partial redundancy in network flows. – FlowTags - Software middlebox to tag the flows with contextual information. ● Application-level preferences to the network control plane as tags. ● Dynamic flow routing modifications based on the tags.
  85. 85. 85/33 Goals ● How to offer differentiated QoS and SLA in multi- tenant networks? – Leverage SDN to offer a selective partial redundancy in network flows. – FlowTags - Software middlebox to tag the flows with contextual information. ● Application-level preferences to the network control plane as tags. ● Dynamic flow routing modifications based on the tags.
  86. 86. 86/33 Our Proposal: SMART ● An SDN Middlebox Architecture for Reliable Transfers. ● An architectural enhancement for network flows allocation, routing, and control. ● Timely delivery of priority flows by dynamically diverting them to a less congested path. ● Cloning subflows of higher priority flows. ● An adaptive approach in cloning and diverting of the flows.
  87. 87. 87/33 Contributions ● A cross-layer architecture ensuring differentiated QoS. ● A context-aware appraoch in load balancing the network. – Servers supporting multihoming, connected topologies, . . .
  88. 88. 88/33 SMART Approach ● Divert and clone subflows by setting breakpoints in the flows in their route, to avert congestion. – Trade-off of minimal redundancy to ensure the SLA of priority flows. – Adaptive execution with contextual information on the network. ● Leverage FlowTags middlebox – to pass application-level system and user preferences to the network.
  89. 89. 89/33 SMART Enhancements ● When to break and when to merge? – Clone destination.
  90. 90. 90/33 SMART Deployment
  91. 91. 91/33 SMART Workflow
  92. 92. 92/33 I: Tag Generation for Priority Flows ● Tag generation query and response. – between the hosts and the FlowTags controller. ● A centralized controller for FlowTags. ● Tag the flows at the origin. ● FlowTagger software middlebox. – A generator of the tags. – Invoked by the host application layer. – Similar to the FlowTags-capable middleboxes for NATs
  93. 93. 93/33 II: Regular routing until the policies (from the tags) are violated
  94. 94. 94/33 III: When a threshold is met ● Controller is triggered through OpenFlow API. ● A series of control flows inside the control plane. ● Modify flow entries in the relevant switches.
  95. 95. 95/33 SMART Control Flows: Rules Manager ● A software middlebox in the control plane. ● Consumes the tags from the packet. – Similar to FlowTags-capable firewalls.
  96. 96. 96/33 Rules Manager Tags Consumption ● Interprets the tags – as input to the SMART Enhancer
  97. 97. 97/33 SMART Enhancer ● Core of the SMART architecture. ● Gets the input to the enhancement algorithms. ● Decides the flow modifications. – Breakpoint node and packet. – Clone/divert decisions.
  98. 98. 98/33 Prototype Implementation ● Developed in Oracle Java 1.8.0. ● OpenDaylight Beryllium as the core SDN controller. ● Enhancer and the Rules Manager middlebox as controller extensions. – Developed as OSGi bundles. – Deployed into Apache Karaf runtime of OpenDaylight. ● FlowTags middlebox controller deployed along the SDN controller. – FlowTags, originally a POX extension. ● Network nodes and flows emulated with Mininet. – Larger scale cloud deployments simulated.
  99. 99. 99/33 Evaluation Strategy ● Data center network with 1024 nodes and leaf-spine topology. – Path lengths of more than two-hops. – Up to 100,000 of short flows. ● Flow completion time < 1 s. ● A few non-priority elephant flows. – SLA → maximum permitted flow completion time for priority flows – Uniformly randomized congestion. ● hitting a few uplinks of nodes concurrently. ● overwhelming amount of flows through the same nodes and links. ● Benchmark: SMART enhancements over base routing algorithms. – Performance (SLA awareness), redundancy, and overhead.
  100. 100. 100/33 SMART Adaptive Clone/Replicate with Shortest-Path ● Replicate the subsequent flows once a previous flow was cloned.
  101. 101. 101/33 SMART Adaptive Clone/Replicate with Equal-Cost Multi-Path (ECMP) ● Repeat the experiment with ECMP routing.
  102. 102. 102/33 Related Work ● Multipath TCP (MPTCP) uses the available multiple paths between the nodes concurrently to route the flows across the nodes. – Performance, bandwidth utilization, and congestion control – through a distributed load balancing. ● ProgNET leverages WS-Agreement and SDN for SLA-aware cloud. ● pFabric for deadline-constrained data flows with minimal completion time. ● QJump linux traffic control module for latency- sensitive applications.
  103. 103. 103/33 Conclusion Conclusions ● SMART leverages redundancy in the flows as a mean to improve the SLA of the priority flows. ● Opens an interesting research question leveraging SDN, middleboxes, and redundancy. – Cross-layer optimizations through tagging the flows. – For differentiated QoS. Future Work ● Implementation of SMART on a real data center network. ● Evaluate against the identified related work quantitatively.
  104. 104. 104/33 (4) *Mayan* ● ICWS’16 and SDS’16
  105. 105. 105/33 Introduction ● eScience workflows – Computation-intensive. – Execute on highly distributed networks. ● Complex service compositions aggregating web services – To automate scientific and enterprise business processes.
  106. 106. 106/33 Motivation ● Scalable Distributed Executions in wide area networks. – Better orchestration of service compositions. ● Multi-Tenant Environments. – Isolation Guarantees. – Differentiated Quality of Service (QoS). ● Increasing demand for geo-distribution (workflows and service compositions).
  107. 107. 107/33 Contributions ● Support for, – Adaptive execution of scientific workflows. – Flexible service composition. – Reliable large-scale service composition. – Efficient selection of service instances.
  108. 108. 108/33 Our Proposal: Mayan ● Extensible SDN approach for cloud-scale service composition. ● An approach driven by, – Loose coupling of service definitions and implementations. – Message-oriented Middleware (MOM). – Availability of a logically centralized control plane. ● Leveraging OpenDaylight SDN controller as the core. – Modular, as OSGi bundles. – Additional advanced features. ● State of executions and transactions stored in the controller distributed data tree. ● Clustered and federated deployments.
  109. 109. 109/33 Software-Defined Service Composition: Services as the building blocks of Mayan
  110. 110. 110/33 Multiple Implementations and Deployments of a Service
  111. 111. 111/33
  112. 112. 112/33 Mayan Services Registry: Modelling Language
  113. 113. 113/33 Service Composition Representation ● <Service3,(<Service1, Input1>, <Service2, Input2>)>
  114. 114. 114/33 Alternative Implementations and Deployments
  115. 115. 115/33 Multi-Domain Workflows
  116. 116. 116/33 Connecting Services View with the Network View
  117. 117. 117/33 Connecting Services View with the Network View
  118. 118. 118/33
  119. 119. 119/33 Evaluation System Configurations ● Evaluation Approach: – Smaller physical deployments in a cluster. – Larger deployments as simulations and emulations (Mininet). ● Evaluated Deployment: – Service Composition Implementations. ● Web services frameworks. ● Apache Hadoop MapReduce. ● Hazelcast In-Memory Data Grid. – OpenDaylight SDN Controller.
  120. 120. 120/33 Preliminary Assessments ● A workflow performing distributed data cleaning and consolidation. – A distributed web service composition. vs. – Mayan approach with the extended SDN architecture.
  121. 121. 121/33 Speedup and Horizontal Scalability ● No negative scalability in larger distributions. ● 100% more positive scalability for larger deployments.
  122. 122. 122/33 Throughput of the controller ● Measured as the number of msg entirely processed by the controller, arriving from the publishers to be forwarded towards a relevant receiver. ● 5000 messages/s in a concurrency of 10 million msg.
  123. 123. 123/33 Processing Time ● Total time taken to process the complete set of messages at a Mayan controller, against the varying number of messages. ● The controller scaled linearly regarding processing time with the number of parallel messages. ● It processes 10 million messages in 40 minutes.
  124. 124. 124/33 Scalability of the Mayan Controller ● The results presented are for a single stand-alone deployment of the controller. ● Mayan is designed as a federated deployment. – Scales horizontally to ● manage a wider area with a more substantial number of service nodes and improved latency. ● handle more concurrent messages in each controller domain.
  125. 125. 125/33 Related Work ● MapReduce for efficient service compositions [SD 2014]. ● Palantir: SDN for MapReduce performance with the network proximity data [ZY 2014]. [SD 2014] Deng, Shuiguang, et al. "Top-Automatic Service Composition: A Parallel Method for Large-Scale Service Sets." Automation Science and Engineering, IEEE Transactions on 11.3 (2014): 891-905. [ZY 2014] Yu, Ze, et al. "Palantir: Reseizing network proximity in large-scale distributed computing frameworks using sdn." 2014 IEEE 7th International Conference on Cloud Computing (CLOUD). IEEE, 2014.
  126. 126. 126/33 Conclusion ● SDN-based approach that enables large scale flexibility with performance – Components in eScience workflows as building blocks of a distributed platform. – Service composition with web services and distributed execution frameworks. – Multi-tenant and multi-domain executions.
  127. 127. 127/33 (5) *Evora*
  128. 128. 128/33 Services ● A core element of the Internet ecosystem. ● Various types of Services – Web services and microservices ● key in modern cloud applications. – Network services / Virtual Network Functions ● firewall, load balancer, proxy, .. – Data services ● data cleaning, data integration, .. ● Interesting common research challenges: – Service placement. – Service instance selection. – Service composition or “service chaining”.
  129. 129. 129/33 Why Service-Oriented Architectures for our systems? ● Beyond data center scale. – Thanks to the fact that services are standardized. ● SOA and RESTful reference architectures. – Multiple implementation approaches such as Message- Oriented Middleware. ● Service endpoints to handover messages internally to the broker. ● Publish/subscribe to a message broker over the Internet. ● Flexibility, modularity, loose-coupling, and adaptability.
  130. 130. 130/33 Challenges in achieving Service Chaining at the Edge ● Dependencies among the network services. – Need to be accessible from each other. ● Service Level Objectives of the service chain users. – Latency, throughput, monthly cost, .. ● Finding the optimal service chain for a user request. – In general, an NP-hard problem.
  131. 131. 131/33 Service Chain: s1 → s2 → s3 → s4 ● Goals – Services close to the user. – Services close to the following services in the chain. – Satisfying user Service Level Objectives!
  132. 132. 132/33 Alternative Representations
  133. 133. 133/33 Problem Scale: Representation of the service graph from the data center graph ● The number of links in this service graph grows – linearly with the number of edges or links between the edge nodes. – exponentially with the average number of services per edge node.
  134. 134. 134/33 What has Message-Oriented Middleware got to do with the controller? ● Expose the internals from controller (e.g. OpenDaylight) – Through a message-based northbound API ● e.g. AMQP (Advanced Message Queuing Protocol). – Publish/Subscribe with a broker (e.g. ActiveMQ). ● What can be exposed – Data tree (internal data structures of the controller) – Remote procedure calls (RPCs) – Notifications. ● Thanks to Model-Driven Service Abstraction Layer (MD-SAL) of OpenDaylight. – Compatible internal representation of data plane. – Messaging4Transport Project.
  135. 135. 135/33 MILP and Graph Matching can be computation intensive ● But initialization is once per user service chain with a given policy. – This procedure does not repeat once initialized. – unless updates received from the edge network. ● New data center with the service offering at the edge. ● An existing data center or a service offering fails to respond. ● Services in each NSC is typically 5 – 10. – Évora algorithm follows a greedy approach, rather than a typical graph matching.
  136. 136. 136/33 ● Two attributes given more prominence (weight = 10), than the third (weight = 3). ● Results show efficient support for multiple attributes with different weights. Radius of the circles – Monthly Cost
  137. 137. 137/33 Performance and Scalability of Évora Orchestrator Algorithms
  138. 138. 138/33 Algorithm
  139. 139. 139/33 Algorithm
  140. 140. 140/33 (6) *SD-CPS* ● Work-in-Progress ● SDS’17, M4IoT’15, and CLUSTER (Under Review)
  141. 141. 141/33 (7) *Obidos* ● Work-in-Progress ● CoopIS’15, DMAH’17, and DAPD (Under Review)
  142. 142. 142/33 (8) *SDDS* ● SDS’18 (Best Paper Award)
  143. 143. Introduction ● Big data with increasing volume and variety. – Volume requires scalability. – Variety requires interoperability. ● Data Services – Services that access and process big data. – Unified web service interface to data → Interoperability! ● Chaining of data services. – Composing chains of numerous data services. – Data Access → Data cleaning → Data Integration.
  144. 144. Problem Statement ● Data services offer interoperability. ● But when related data and services are distributed far from each other → Bad performance with scale. – How to scale out efficiently? ● How to minimize communication overheads?
  145. 145. 145/33 Motivation ● Software-Defined Networking (SDN). – A unified controller to the data plane devices. – Brings network awareness to the applications. ● To make big data executions – Interoperable. – Network-aware.
  146. 146. 146/33 Our Proposal: SDDS ● Can we bring SDN to the data services? ● Software-Defined Data Services (SDDS).
  147. 147. 147/33 Contributions ● SDDS as a generic approach for data services. – Extending and leveraging SDN in the data centers. ● A software-defined framework for data services. – Efficient performance and management of data services. – Interoperability and scalability.
  148. 148. 148/33 Solution Architecture ● A bottom-up approach, extending SDN. – Data Plane (SDN OpenFlow Switches) – Storage PlaneStorage Plane (SQL and NoSQL data stores) – Control Plane (SDN Controller, In-Memory Data Grids (IMDGs), ..) – Execution Plane (Orchestrator and Web Service Engines)Execution Plane (Orchestrator and Web Service Engines)
  149. 149. 149/33 Network-Aware Service Executions with SDN
  150. 150. 150/33 SDDS Planes and Layered Architecture
  151. 151. 151/33 SDDS Approach ● Define all the data operations as interoperable services. ● SDN for distributing data and service executions – Inside a data center (e.g. Software-Defined Data Centers). – Beyond data centers (extend SDN with Message-Oriented Middleware). ● Optimal placement of data and service execution. – Minimize communication overhead and data movements. ● Keep the related data and executions closer. ● Send the execution to data, rather than data to execution. – Execute data service on the best-fit server, until interrupted.
  152. 152. 152/33 Efficient Data and Execution Placement
  153. 153. 153/33 Efficient Data and Execution Placement {i, j} – related data objects D – datasets of interest n – execution node Σ – spread of the related data objects
  154. 154. 154/33 Prototype Implementation ● Data services implemented with web service engines. – Apache Axis2 1.7.0 and Apache CXF 3.2.1. ● IMDG clusters – Hazelcast 3.9.2 and Infinispan 9.1.5. ● Persistent storage – MySQL Server and MongoDB. ● Core SDN Controller – OpenDaylight Beryllium.
  155. 155. 155/33 Evaluation Environment ● A cluster of 6 servers. – AMD A10-8700P Radeon R6, 10 Compute Cores 4C+6G × 4. – 8 GB of memory. – Ubuntu 16.04 LTS 64 bit operating system. – 1 TB disk space.
  156. 156. 156/33 Evaluation ● How does SDDS comply as a network-aware big data execution compared to network-agnostic execution? – SDDS vs data services on top of Infinispan IMDG. – A data storage and update service with an increasing volume of persistent data across the cluster, up to a total of 6 TB data. ● Measured the throughput from the service plane – by the total amount of data processed through the data services per unit time.
  157. 157. 157/33 Evaluation ● SDDS outperforms the base. – Better data locality ● by distributing data adhering to network topology. – Better resource efficiency. ● by avoiding scaling out prematurely. – Better throughput with minimal distribution when there is no need to utilize all the 6 servers.
  158. 158. 158/33 Related Work ● Software-Defined Systems. – Software-Defined Service Composition. – Software-Defined Cyber-Physical Systems and SDIoT. ● Industrial SDDS offerings. – Many of them storage focused. ● PureStorage, PrimaryIO, HPE, RedHat, .. – Many focus on specific data services. ● Containers and devops – Atlantix and Portworx. ● Data copying and sharing – IBM Spectrum Copy Data Management and Catalogic ECX. ● We are the first to propose a generic SDDS framework.
  159. 159. 159/33 Conclusion Summary ● Software-Defined Data Services (SDDS) offer both interoperability and scalability to big data executions. ● SDDS leverages SDN in building a software-defined framework for network-aware executions. ● SDDS caters to data services and compositions of data services for an efficient execution. Future Work ● Extend SDDS for edge and IoT/CPS environments.

×