TUW-ASE-SUmmer 2014: Evaluating and Utilizing Data Concerns for DaaS
1. Evaluating and Utilizing Data Concerns for
DaaS
Hong-Linh Truong
Distributed Systems Group,
Vienna University of Technology
truong@dsg.tuwien.ac.at
http://dsg.tuwien.ac.at/staff/truong
1ASE Summer 2014
Advanced Services Engineering,
Summer 2014, Lecture 5
Advanced Services Engineering,
Summer 2014, Lecture 5
2. Outline
Data concern-aware DaaS service engineering
Data concern evaluation
Data concern publishing
A Proof-of-concept: QoD Framework
Issues in utilizing data concerns
ASE Summer 2014 2
3. ........
Recall -- DaaS Concerns
ASE Summer 2014 3
datadata DaaSDaaS.... data assetsdata assets
Data
concerns
Quality of
data
Ownership
Price
License ....
APIs, Querying, Data Management, etc.
DaaS concerns include QoS, quality of data (QoD),
service licensing, data licensing, data governance, etc.
DaaS concerns include QoS, quality of data (QoD),
service licensing, data licensing, data governance, etc.
4. 4
Recall -- DaaS design &
implementation
Data
items
Data
items
Data
items
Data resourceData resource
Data
assets
Data resourceData resource Data resourceData resource
Data resourceData resourceData resourceData resource
Consumer
Consumer
DaaS
ASE Summer 2014
5. HOW TO EVALUATE DATA
CONCENRS FOR DATA
ASSETS IN DAAS?
ASE Summer 2014 5
6. Patterns for „turning data to DaaS“
ASE Summer 2014 6
Storage/Database
-as-a-Service
Storage/Database
-as-a-Service
datadata DaaSDaaS
Storage/Databa
se/Middleware
Storage/Databa
se/Middleware
datadata
Things
DaaSDaaS
Storage/Database/
Middleware
Storage/Database/
Middleware
datadata
People
DaaSDaaS
DaaSDaaSdatadata Build Data
Service
APIs
Deploy
Data
Service
7. Data-related activities
ASE Summer 2014 7
Wrapping
data
Publishing DaaS
interface
Typical activities for data wrapping and publishing
Typical activities for data updating & retrieval
Updating
data
Selecting
data
datadata
Provisioning
data
8. Wrapping data
(Relational) database
(Storage of ) Files
Streams of events (including attached
information)
Service interfaces are different
Update mechanisms are different
ASE Summer 2014 8
9. Typical data concern evaluation
ASE Summer 2014 9
Evaluating data
concerns
Evaluating data
concerns
Describing data
concerns
Describing data
concerns
Data Concerns
Evaluation Tools
Data Concerns
Representation Models
Populating data
concerns
Populating data
concerns
Publishing services
What do we need in order to perform these activities?
10. 10
Data concern-aware DaaS
engineering process Typical activities
for data wrapping
and publishing
Typical activities
for data updating &
retrieval
ASE Summer 2014
Hong Linh Truong, Schahram Dustdar: On Evaluating and Publishing
Data Concerns for Data as a Service. APSCC 2010: 363-370
Hong Linh Truong, Schahram Dustdar: On Evaluating and Publishing
Data Concerns for Data as a Service. APSCC 2010: 363-370
11. DaaS service operationDaaS service operation
Wrapping, selecting, and updating
data in DaaS (1)
11ASE Summer 2014
Processing
parameter
Processing
parameter
Mapping parameters to
data queries parameter
Query content of
data resources
Mapping and
returning results
Mapping and
returning results
Mapping parameters to
metadata queries
Mapping parameters to
metadata queries
Querying metadata of
data resources
Querying metadata of
data resources
Data
Consumer
Data
Consumer
different strategies for structured data and unstructured data
12. Wrapping, selecting, and updating
data in DaaS (2)
Different techniques exist for wrapping,
selecting, updating and retrieving data
How generic data concern evaluation and
publishing techniques can be integrated with
these techniques?
12ASE Summer 2014
13. WHICH TYPES OF DATA ARE NEEDED FOR
EVALUATING DATA CONCERNS?
WHAT IS THE IMPACT OF DATA
PROVISIONING MODELS (OFFLINE
VERSUS NEAR-REALTIME) ON CONCERN
EVALUATION/PUBLISHING?
Discussion
ASE Summer 2014 13
14. Evaluating data concerns – the
three important points
14
• At which level the
evaluation is performed?
evaluation
scope
• When the evaluation is
done?
evaluation
modes
• How the evaluation tool
is invoked?
integration
model
ASE Summer 2014
Hong Linh Truong, Schahram Dustdar: On Evaluating and Publishing Data Concerns for Data as a Service. APSCC
2010: 363-370
Hong Linh Truong, Schahram Dustdar: On Evaluating and Publishing Data Concerns for Data as a Service. APSCC
2010: 363-370
15. Evaluating data concerns –
evaluation scopes
Three scopes
data resource
DaaS operations
DaaS as a whole
15
Why multiple evaluation scopes make sense?
enable fine-grained evaluationenable fine-grained evaluation
ASE Summer 2014
16. Evaluating data concerns –
evaluation modes
Off-line
before the access to data
On-the-fly
when the data is requested
16
Why multiple evaluation modes make sense?
suitable for different types of datasuitable for different types of data
ASE Summer 2014
17. Evaluating data concerns –
integration modes
Push and pull data concerns
Pass-by-value versus pass-by-reference to data
concerns evaluation tools
17
Why multiple integration modes make sense?
suitable for different tool integration strategiessuitable for different tool integration strategies
ASE Summer 2014
18. Evaluating data concerns – some
patterns (1)
18
Pull, pass-by-referencesPull, pass-by-references
ASE Summer 2014
19. Evaluating data concerns – some
patterns (2)
19
Pull, pass-by-valuesPull, pass-by-values
ASE Summer 2014
20. Evaluating data concerns – some
patterns (3)
20
Push, pass-by-values (1)Push, pass-by-values (1)
ASE Summer 2014
21. Evaluating data concerns – some
patterns (4)
21
Push, pass-by-values (2)Push, pass-by-values (2)
ASE Summer 2014
22. Evaluation Tool – Internal Software
components
Self-developed or third-party software
components for evaluation tool
Advantages
Tightly couple integration performance, security,
data compliance
Customization
Disadvantages
Usually cannot be integrated with other features
(e.g., data enrichment)
Costly (e.g., what if we do not need them)
ASE Summer 2014 22
23. Evaluation tool – using cloud
services
Evaluation features are provided by cloud
services
Several implementations
Informatica Cloud Data Quality Web Services, StrikeIron,
Advantages
Pay-per-use, combined features
Disadvantages
Features are limited (with certain types of data)
Performance issues with large-scale data
Data compliance and security assurance
ASE Summer 2014 23
24. Evaluation Tool -- using human
computation capabilities
Professionals and Crowds can act as data
concerns evaluators
For complex quality assessment that cannot be done by
software
Issues
Subjective evaluation
Performance
Limited type of data (e.g., images, documents, etc.)
ASE Summer 2014 24
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A Novel
Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer, Jens Lehmann: Crowdsourcing Linked
Data Quality Assessment. International Semantic Web Conference (2) 2013: 260-276
Óscar Figuerola Salas, Velibor Adzic, Akash Shah, and Hari Kalva. 2013. Assessing internet video quality using
crowdsourcing. In Proceedings of the 2nd ACM international workshop on Crowdsourcing for multimedia (CrowdMM '13).
ACM, New York, NY, USA, 23-28. DOI=10.1145/2506364.2506366 http://doi.acm.org/10.1145/2506364.2506366
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A Novel
Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer, Jens Lehmann: Crowdsourcing Linked
Data Quality Assessment. International Semantic Web Conference (2) 2013: 260-276
Óscar Figuerola Salas, Velibor Adzic, Akash Shah, and Hari Kalva. 2013. Assessing internet video quality using
crowdsourcing. In Proceedings of the 2nd ACM international workshop on Crowdsourcing for multimedia (CrowdMM '13).
ACM, New York, NY, USA, 23-28. DOI=10.1145/2506364.2506366 http://doi.acm.org/10.1145/2506364.2506366
25. BASED ON WHICH CRITERIA, AN EVALUATION
SCOPE, EVALUATION MODE OR INTEGRATION
MODE IS SELECTED?
Discussion time
ASE Summer 2014 25
WHICH ARE OTHER COMPONENTS INTERACTING
WITH EVALUATION TOOLS?
WHY DO WE NOT REALLY DISCUSS THE
IMPLEMENTATION OF EVALUATION TOOLS?
26. Publishing data concern
information (1)
Off-line publishing of data concerns
suitable for static data concerns
the publishing of data concerns of a data
resource is separated from the service
operation which provides the access to the
data resource
ASE Summer 2014 26
27. Publishing data concern
information (2)
On-the-fly publishing of data concerns
associating concerns with retrieved data
resources
the resulting data resources (e.g., via queries)
are annotated with data concerns evaluated
by data concerns evaluation tools.
suitable for providing dynamic data concerns
ASE Summer 2014 27
28. 28
Publishing data concern
information (3)
On-the-fly publishing of data concerns through
queries
the use of different service operation
parameters to query data concerns of data
resources
suitable for validating data concerns before
accessing data resources
ASE Summer 2014
29. WHAT ARE THE RELATIONSHIPS BETWEEN
CONCERN EVALUATION AND PUBLISHING
WHEN DATA IS DYNAMICALLY UPDATED?
Discussion time
ASE Summer 2014 29
30. How do we utilize the data concern-
aware service engineering process?
Using this model we can determine and publish
several concerns
Our “a proof-of-concept”
A framework for evaluating and publishing QoD of
DaaS
A proof-of-concept implementation of data concern-
aware service engineering process
Another example: model and publish privacy
concerns for DaaS [ECOWS 2010]
ASE Summer 2014 30
Michael Mrissa, Salah-Eddine Tbahriti, Hong-Linh Truong, "Privacy model and annotation for DaaS", The 8th European
Conference on Web Services (ECOWS 2010), (c)IEEE Computer Society, 1-3 December, 2010, Ayia Napa, Cyprus
Michael Mrissa, Salah-Eddine Tbahriti, Hong-Linh Truong, "Privacy model and annotation for DaaS", The 8th European
Conference on Web Services (ECOWS 2010), (c)IEEE Computer Society, 1-3 December, 2010, Ayia Napa, Cyprus
31. 31
QoD framework (1)
Pull QoD Evaluation Models for DaaS
Pass-by-references and pass-by-value
References of data resources: URI
Values: any object
Third-party data evaluation tools
ASE Summer 2014
32. 32
QoD framework (2)
ASE Summer 2014
http://www.infosys.tuwien.ac.at/prototype/SOD1/dataconcerns/http://www.infosys.tuwien.ac.at/prototype/SOD1/dataconcerns/
33. 33
QoD framework: publishing
concerns (1)
Off-line data concern
publishing
a common data concern
publication specification
a tool for providing data concerns
according to the specification
supported by external service
information systems
ASE Summer 2014
34. QoD framework: publishing
concerns (2)
On-the-fly querying data concerns associated with data
resources
Using REST parameter convention
Based on metric names in the data concern
specification
ASE Summer 2014 34
Hong Linh Truong, Schahram Dustdar, Andrea Maurino, Marco Comerio: Context, Quality and Relevance:
Dependencies and Impacts on RESTful Web Services Design. ICWE Workshops 2010: 347-359
Hong Linh Truong, Schahram Dustdar, Andrea Maurino, Marco Comerio: Context, Quality and Relevance:
Dependencies and Impacts on RESTful Web Services Design. ICWE Workshops 2010: 347-359
35. QoD framework: publishing
concerns (3)
Specifying requests by using utilizing query parameters
the form of metricName=value
35
Obtaining contex and quality by using context and quality
parameters without specifying value conditions
GET/resource?crq.accuracy="0.5"&crq.location=’’Europe”GET/resource?crq.accuracy="0.5"&crq.location=’’Europe”
curl http://localhost:8080/UNDataService/data/query/Population annual growth rate
(percent)?crq.qod
{”crq.qod” : {
”crq.dataelementcompleteness ”: 0.8654708520179372,
”crq.datasetcompleteness”: 0.7356502242152466,
...
}}
curl http://localhost:8080/UNDataService/data/query/Population annual growth rate
(percent)?crq.qod
{”crq.qod” : {
”crq.dataelementcompleteness ”: 0.8654708520179372,
”crq.datasetcompleteness”: 0.7356502242152466,
...
}}
ASE Summer 2014
36. 36
QoD framework: QoD monitoring
and composition
QoD concerns monitoring and composition are
useful for the evaluation of aggregated data
resources
Our approach
Utilizing monitoring rules
QoD metrics of data resources are passed to an rule
engine
Rules are user-defined for monitoring and composing
QoD metrics
ASE Summer 2014
37. QoD framework experiments
Implementation
Java, JAX-RS/Jersey, Drools
Utilizing UNDataAPI - www.undata-api.org
XML data sets without QoD
Illustrating examples: check data from 1990-
2009
datasetcompleteness: the completeness of the list of
countries
dataelementcompleteness: the completeness of data
elements in the list metrics
RESTful services wrapping to UNDataAPI
ASE Summer 2014 37
43. Elasticity
If data does not fit for a purpose, because data
concerns do not meet the requirement from the
consumer
DaaS may enrich the data,
The consumer may switch to another DaaS
The consumer may combine data from different
DaaSs
The consumer may combine data from a DaaS with
its own data
Elasticity of data and data concerns
ASE Summer 2014 43
44. Data fits to your purpose
Data concern measurement
They are determined from the data
Whether they fit to your application is dependent on
application contexts
Data concern interpretation
Context-specific interpretation
The same type of data with the same set of concern
measurements but might not fit for the same application at
different times/contexts
Application-specific treatment!
Strongly related to data elasticity
ASE Summer 2014 44
45. Exercises
Read mentioned papers
Identify and analyze the relationships between
data concerns evaluation tools and types of data
Analyze trade-offs between on-line and off-line
evaluation and when we can combine them
Analyze how to utilize evaluated data concerns
for optimizing data compositions
Analyze situations when software cannot be
used to evaluate data concerns
ASE Summer 2014 45
46. 46
Thanks for
your attention
Hong-Linh Truong
Distributed Systems Group
Vienna University of Technology
truong@dsg.tuwien.ac.at
http://dsg.tuwien.ac.at/staff/truong
ASE Summer 2014