Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On Developing and Operating of Data Elasticity Management Process

496 views

Published on

The Data-as-a-Service (DaaS) model enables data analytics
providers to provision and offer data assets to their consumers. To achieve quality of results for the data assets, we need to enable DaaS elasticity by trading off quality and cost of resource usage. However, most of the current work on DaaS is focused on infrastructure elasticity, such as scaling
in/out data nodes and virtual machines based on performance and usage, without considering the data assets' quality of results. In this talk, we introduce an elastic data asset model for provisioning data enriched with quality of results. Based on this model, we present techniques to generate and operate data elasticity management process that is used to
monitor, evaluate and enforce expected quality of results. We develop a runtime system to guarantee the quality of resulting data assets provisioned on-demand. We present several experiments to demonstrate the usefulness of our proposed techniques.

Published in: Education
  • Be the first to comment

  • Be the first to like this

On Developing and Operating of Data Elasticity Management Process

  1. 1. On Developing and Operating of Data Elasticity Management Process Tien-Dung Nguyen, Hong-Linh Truong, Georgiana Copil, Duc- Hung Le, Daniel Moldovan, and Schahram Dustdar Distributed Systems Group, TU Wien truong@dsg.tuwien.ac.at http://dsg.tuwien.ac.at/staff/truong ICSOC 2015, 17 Nov, Goa, India 1
  2. 2. Outline  Motivation  Approach  Elasticity Model for Data Asset  Generating and Operating Data Elasticity Management Process  Prototype, Evaluation and Lessons learned  Conclusions and Future Work 2 ICSOC 2015, 17 Nov, Goa, India
  3. 3. Motivation  Data-as-a-Service (DaaS) and Analytics-as-a- Service (AaaS) Provider  data analytic workflows producing data assets for different consumers  Data assets can be “sold” based on  quality of data (e.g., accuracy)  performance of the execution of data analytic workflow  cost for computation and data resource.  DaaS/AaaS consumers typically want  data assets with expectation of quality of data, performance and cost 3 ICSOC 2015, 17 Nov, Goa, India
  4. 4. Example: Ensuring Quality of GPS Data Asset  GPS data of motorbikes in Ho Chi Minh city  GPS Data-as-a-Service (DaaS) provider and several DaaS consumers (e.g., Taxi company) 4 ICSOC 2015, 17 Nov, Goa, India Expectation: Quality of data: Vehicle location accuracy>= 81% Vehicle speed accuracy>= 81% Performance deliveryTime <=55s Cost <= €0.09
  5. 5. GPS Data Asset Example Timestamp DeviceID Longitude Latitude Speed Local Area Estimated Speed in local area Wed Sep 10 07:45:00 ICT 2014 51B00552 10.660332 106.779396 0 CMT8- VTS 10.25 Wed Sep 10 07:45:23 ICT 2014 51C29797 10.749635 106.67208 24 CMT8- DBP 30.5 Wed Sep 10 07:46:24 ICT 2014 51B01907 10.877548 106.64205 0 CMT8- AC 21.1 5 ICSOC 2015, 17 Nov, Goa, India Image source: http://vnexpress.net/tin- tuc/thoi-su/hang-nghin- xe-ket-cung-ngay-dau- thu-phi-cau-binh-trieu-1- 2858465.html
  6. 6. Motivation – Research Questions  How to support Data Elasticity Management Process for different “business models”?  Improve quality of data: monitor/adjust quality of data  Guarantee performance: scaling in/out services to monitor/adjust quality of data to adapt with changes of number of consumers  Cost: minimize computation cost and get some benefits  They are not in the current focus  quality of services and cost service composition  refining/replacing/extending analytic tasks in data analytics workflow 6 ICSOC 2015, 17 Nov, Goa, India
  7. 7. Approach 7 Generating and execution Data Elasticity Management Process from information in data analytics workflow and expected quality of results of the data asset. ICSOC 2015, 17 Nov, Goa, India
  8. 8. Approach  Algorithm to generate data elasticity management process  Inputs:  Data Analytics Workflows  Quality of Results  Primitive Actions  Output  Data Elasticity Management Process  Runtime Environment for Data Elasticity Management Process 8 ICSOC 2015, 17 Nov, Goa, India
  9. 9. Quality of Results  Specifies the expectation of quality of data, performance and cost of data asset  Quality of Results (QoR) (analytics quality) 9 Hong Linh Truong, Schahram Dustdar: Principles of Software-Defined Elastic Systems for Big Data Analytics. IC2E 2014: 562-567 Hong Linh Truong, Schahram Dustdar: Principles of Software-Defined Elastic Systems for Big Data Analytics. IC2E 2014: 562-567 ICSOC 2015, 17 Nov, Goa, India
  10. 10. Example: Quality of Results !at.ac.tuwien.dsg.depic.common.entity.qor.QoRModel dataAssetForm: XML listOfMetrics: - !at.ac.tuwien.dsg.depic.common.entity.qor.QoRMetric - !at.ac.tuwien.dsg.depic.common.entity.qor.QoRMetric name: vehicleArc listOfRanges: - !at.ac.tuwien.dsg.depic.common.entity.qor.Range rangeID: vehicleArc_co1 toValue: 80.0 - !at.ac.tuwien.dsg.depic.common.entity.qor.Range fromValue: 81.0 rangeID: vehicleArc_co2 toValue: 100.0 unit: '%' 10 ICSOC 2015, 17 Nov, Goa, India listOfQElements: - !at.ac.tuwien.dsg.depic.common.entity.qor .QElement listOfRanges: - speedArc_co1 - vehicleArc_co2 - deliveryTime_co2 price: 0.01 qElementID: qElement4 listOfQElements: - !at.ac.tuwien.dsg.depic.common.entity.qor .QElement listOfRanges: - speedArc_co1 - vehicleArc_co2 - deliveryTime_co2 price: 0.01 qElementID: qElement4
  11. 11. Primitive Action for Data Elasticity Management 11  Primitive actions for monitoring and adjustment  to capture information of action to adjust quality of data, performance ?  filled by data experts, profiling tool, benchmarking, etc. ICSOC 2015, 17 Nov, Goa, India
  12. 12. - !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.AdjustmentAction actionID: VAA actionName: VAA artifact: name: VAA description: adjust vehicle Accuracy location: …./salsa/upload/files/jun/artifact_sh/VAA.sh restfulAPI: /VAA/rest/control type: sh associatedQoRMetric: vehicleArc Example: Primitive Action listOfAdjustmentCases: - !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.AdjustmentCase estimatedResult: conditionID: vehicleArc_c1 lowerBound: 81.0 metricName: vehicleArc upperBound: 100.0 listOfParameters: - !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter parameterName: longtitudeIndex type: int value: 0 - !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter parameterName: latitudeIndex type: int value: 1 - !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter parameterName: speedIndex type: int value: 2 listOfPrerequisiteActionIDs: [] listOfAdjustmentCases: - !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.AdjustmentCase estimatedResult: conditionID: vehicleArc_c1 lowerBound: 81.0 metricName: vehicleArc upperBound: 100.0 listOfParameters: - !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter parameterName: longtitudeIndex type: int value: 0 - !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter parameterName: latitudeIndex type: int value: 1 - !at.ac.tuwien.dsg.depic.common.entity.primitiveaction.Parameter parameterName: speedIndex type: int value: 2 listOfPrerequisiteActionIDs: [] 12 ICSOC 2015, 17 Nov, Goa, India
  13. 13. Generating Data Elasticity Management Process ICSOC 2015, 17 Nov, Goa, India 13  Data Asset: associated with an expected state and current state  Adjustment Process: a set of primitive actions that change the state of an data asset  Resource Control Plan: computational resources for executing adjustment processes
  14. 14. Using Data Elasticity Management Process to ensure QoR 14 ICSOC 2015, 17 Nov, Goa, India
  15. 15. Using Data Elasticity Management Process  Our current focus:  Enrichment of data assets at the end of the data analytics process  A general principle:  store data into a data buffer  perform actions on data in the buffer before delivering the data to customer  Data buffers can have different plugins interfacing to different types of databases 15 ICSOC 2015, 17 Nov, Goa, India
  16. 16. Prototype Implementation ICSOC 2015, 17 Nov, Goa, India 16 Prototype Source. https://github.com/tuwiendsg/EPICS/tree/master/depic
  17. 17. Experiments: Setup  Scenario  Near-real time GPS data of vehicles in HoChiMinh City. Data size 1.17GB.  Emulating data source by sending historical GPS data to scalable message oriented middleware (MOM - Apache ActiveMQ)  5 concurrent DaaS consumers  Infrastructure  1 VM (7GB RAM, 4 vCPUs, 40GB Disk) for Tooling, Orchestrator, Data Asset Loader and Data Analytics Workflow Management  4 VMs (1GB RAM, 1 vCPU, 40GB) for monitoring/adjustment services at the beginning  Unit costs  For speedAcr and vehicleAcr as EUR 0.0002  machines in the data analytics phase as 0.104 EUR  Machines in enrichment phase as EUR0.026 17 ICSOC 2015, 17 Nov, Goa, India
  18. 18. Elasticity Process Management Generation ICSOC 2015, 17 Nov, Goa, India 18
  19. 19. Elasticity Process Management Generation  We rely on domain expert to provide knowledge about primitive actions for data assets (and services to monitor and enrichment)  Missing information leads to incomplete generation  IT experts may modify the process before ist deployment  Current generation algorithms do not deal with complex dependencies among QoR and other metadata about data analytics functions  New classes of algorithms ICSOC 2015, 17 Nov, Goa, India 19
  20. 20. Data quality, processing cost, and data asset costs 20 Figure 6: Data Analytics Workflow for provisioning GPS data Provider: • Vehicle location accuracy (%): [0 20], [21 40] [41 60], [61 80], [81 100] • Vehicle speed accuracy (%): [0 20], [21 40] [41 60], [61 80], [81 100] • Delivery time (s): [0 54], [55 120] • Assumed cost function: Consumers’ expectations of the data asset • Case 1: – Vehicle location accuracy > 81% – Vehicle speed accuracy > 81 % – Delivery Time < 55s • Case 2: – Vehicle location accuracy > 61% – Vehicle Speed accuracy > 61 % – Delivery Time < 55s ICSOC 2015, 17 Nov, Goa, India
  21. 21. Data quality, processing cost, and data asset costs ICSOC 2015, 17 Nov, Goa, India 21  f5 (w_processingTime =0.5, w_vechicleAcr =0.25, w_speedAcr =0.25)  Consumers shared some common cost (data analytic part)  We cannot determine the best cost model but report and act according to generated/pre-defined plans  The runtime of data analytics is crucial:  huge problems in integration and performance/cost consequences  Resource plan for data quality enrichment is a very hard problem
  22. 22. Conclusions and Future Work  Conclusions  Supporting DaaS/AaaS to develop and enforce appropriate cost elasticity models for data assets  Elasticity of quality, cost and resource usage focusing on data aspects  But current techniques are still at an early stage  Future work  Domain knowledge integration  Tool for the development and integration of primitive actions  Realistic data asset cost models in DaaS  Data asset cost studies  System performance & data asset quality enrichment/evaluation  Different performance and cost models and dependencies  Algorithms: new classes of management processes 22 ICSOC 2015, 17 Nov, Goa, India Thanks! and Questions?Thanks! and Questions?

×