Services that access or process a large volume of data are known as data services. Big data frameworks consist of diverse storage media and heterogeneous data formats. Through their service-based approach, data services offer a standardized execution model to big data frameworks. Software-Defined Networking (SDN) increases the programmability of the network, by unifying the control plane centrally, away from the distributed data plane devices. In this paper, we present Software-Defined Data Services (SDDS), extending the data services with the SDN paradigm. SDDS consists of two aspects. First, it models the big data executions as data services or big services composed of several data services. Then, it orchestrates the services centrally in an interoperable manner, by logically separating the executions from the storage. We present the design of an SDDS orchestration framework for network-aware big data executions in data centers. We then evaluate the performance of SDDS through microbenchmarks on a prototype implementation. By extending SDN beyond data centers, we can deploy SDDS in broader execution environments.
https://kkpradeeban.blogspot.com/2018/04/software-defined-data-services.html
Software-Defined Data Services: Interoperable and Network-Aware Big Data Executions (Best Paper Award: SDS-2018)
1. Software-Defined Data Services:
Interoperable and Network-Aware Big Data Executions
Pradeeban Kathiravelu, Peter Van Roy, Luís Veiga
5th
IEEE International Conference on Software Defined Systems (SDS 2018).
Barcelona, Spain. 24/04/2018.
2. Introduction
● Big data with increasing volume and variety.
– Volume requires scalability.
– Variety requires interoperability.
● Data Services
– Services that access and process big data.
– Unified web service interface to data → Interoperability!
● Chaining of data services.
– Composing chains of numerous data services.
– Data Access → Data cleaning → Data Integration.
3. Problem Statement
● Data services offer interoperability.
● But when related data and services are distributed
far from each other → Bad performance with scale.
– How to scale out efficiently?
● How to minimize communication overheads?
4. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
4/20
Motivation
● Software-Defined Networking (SDN).
– A unified controller to the data plane devices.
– Brings network awareness to the applications.
● To make big data executions
– Interoperable.
– Network-aware.
5. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
5/20
Our Proposal
● Can we bring SDN to the data services?
● Software-Defined Data Services (SDDS).
6. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
6/20
Contributions
● SDDS as a generic approach for data services.
– Extending and leveraging SDN in the data centers.
● A software-defined framework for data services.
– Efficient performance and management of data services.
– Interoperability and scalability.
7. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
7/20
Solution Architecture
● A bottom-up approach, extending SDN.
– Data Plane (SDN OpenFlow Switches)
– Storage PlaneStorage Plane (SQL and NoSQL data stores)
– Control Plane (SDN Controller, In-Memory Data Grids (IMDGs), ..)
– Execution Plane (Orchestrator and Web Service Engines)Execution Plane (Orchestrator and Web Service Engines)
8. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
8/20
Network-Aware Service Executions
with SDN
10. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
10/20
SDDS Approach
● Define all the data operations as interoperable services.
● SDN for distributing data and service executions
– Inside a data center (e.g. Software-Defined Data Centers).
– Beyond data centers (extend SDN with Message-Oriented
Middleware).
● Optimal placement of data and service execution.
– Minimize communication overhead and data movements.
● Keep the related data and executions closer.
● Send the execution to data, rather than data to execution.
– Execute data service on the best-fit server, until interrupted.
12. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
12/20
Efficient Data and Execution Placement
{i, j} – related data objects
D – datasets of interest
n – execution node
Σ – spread of the related data objects
13. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
13/20
Prototype Implementation
● Data services implemented with web service
engines.
– Apache Axis2 1.7.0 and Apache CXF 3.2.1.
● IMDG clusters – Hazelcast 3.9.2 and Infinispan 9.1.5.
● Persistent storage – MySQL Server and MongoDB.
● Core SDN Controller – OpenDaylight Beryllium.
14. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
14/20
Evaluation Environment
● A cluster of 6 servers.
– AMD A10-8700P Radeon R6, 10 Compute Cores 4C+6G
× 4.
– 8 GB of memory.
– Ubuntu 16.04 LTS 64 bit operating system.
– 1 TB disk space.
15. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
15/20
Evaluation
● How does SDDS comply as a network-aware big
data execution compared to network-agnostic
execution?
– SDDS vs data services on top of Infinispan IMDG.
– A data storage and update service
● with an increasing volume of persistent data across the cluster
● up to a total of 6 TB data.
● Measured the throughput from the service plane
– by the total amount of data processed through the data
services per unit time.
16. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
16/20
Evaluation
● SDDS outperforms the base.
– Better data locality
● by distributing data adhering to network topology.
– Better resource efficiency.
● by avoiding scaling out prematurely.
– Better throughput with minimal distribution when
there is no need to utilize all the 6 servers.
17. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
17/20
Related Work
● Software-Defined Systems.
– Software-Defined Service Composition.
– Software-Defined Cyber-Physical Systems and SDIoT.
● Industrial SDDS offerings.
– Many of them storage focused.
● PureStorage, PrimaryIO, HPE, RedHat, ..
– Many focus on specific data services.
● Containers and devops – Atlantix and Portworx.
● Data copying and sharing – IBM Spectrum Copy Data Management
and Catalogic ECX.
● We are the first to propose a generic SDDS
framework.
18. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
18/20
Conclusion
Summary
● Software-Defined Data Services (SDDS) offer both
interoperability and scalability to big data executions.
● SDDS leverages SDN in building a software-defined
framework for network-aware executions.
● SDDS caters to data services and compositions of
data services for an efficient execution.
19. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
19/20
Conclusion
Summary
● Software-Defined Data Services (SDDS) offer both
interoperability and scalability to big data executions.
● SDDS leverages SDN in building a software-defined
framework for network-aware executions.
● SDDS caters to data services and compositions of data
services for an efficient execution.
Future Work
● Extend SDDS for edge and IoT/CPS environments.
20. Software-Defined Data Services (SDDS)Software-Defined Data Services (SDDS)
20/20
Conclusion
Summary
● Software-Defined Data Services (SDDS) offer both
interoperability and scalability to big data executions.
● SDDS leverages SDN in building a software-defined
framework for network-aware executions.
● SDDS caters to data services and compositions of data
services for an efficient execution.
Future Work
● Extend SDDS for edge and IoT/CPS environments.
Thank you! Questions?