This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase
On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion.
Many powerful Machine Learning algorithms are based on graphs, e.g., Page Rank (Pregel), Recommendation Engines (collaborative filtering), text summarization, and other NLP tasks. Also, the recent developments with Graph Neural Networks connect the worlds of Graphs and Machine Learning even further.
Considering data pre-processing and feature engineering which are both vital tasks in Machine Learning Pipelines extends this relationship across the entire ecosystem. In this session, we will investigate the entire range of Graphs and Machine Learning with many practical exercises.
On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion.
Many powerful Machine Learning algorithms are based on graphs, e.g., Page Rank (Pregel), Recommendation Engines (collaborative filtering), text summarization, and other NLP tasks. Also, the recent developments with Graph Neural Networks connect the worlds of Graphs and Machine Learning even further.
Considering data pre-processing and feature engineering which are both vital tasks in Machine Learning Pipelines extends this relationship across the entire ecosystem. In this session, we will investigate the entire range of Graphs and Machine Learning with many practical exercises.
Entity matching and entity resolution are becoming more important disciplines in data management over time, based on increasing number of data sources that should be addressed in economy that is undergoing digital transformation process, growing data volumes and increasing requirements related to data privacy. Data matching process is also called record linkage, entity matching or entity resolution in some published works. For long time research about the process was focused on matching entities from same dataset (i.e. deduplication) or from two datasets. Different algorithms used for matching different types of attributes were described in the literature, developed and implemented in data matching and data cleansing platforms. Entity resolution is element of larger entity integration process that include data acquisition, data profiling, data cleansing, schema alignment, data matching and data merge (fusion).
We can use motivating example of global pharmaceutical company with offices in more than 60 countries worldwide that migrated customer data from various legacy systems in different countries to new common CRM system in the cloud. Migration was phased by regions and countries, with new sources and data incrementally added and merged with data already migrated in previous phases. Entity integration in such case require deep understanding of data architectures, data content and each step of the process. Even with such deep understanding, design and implementation of the solution require many iterations in development process that consume human resources, time and financial resources. Reducing the number of iterations by automating and optimizing steps in the process can save vast amount of resources. There is a lot of available literature addressing any of the steps in the process, proposing different options for improvement of results or processing optimization, but the whole process still require a lot of human work and subject matter specific knowledge and many iterations to produce results that will have high F-measure (both high precision and recall). Most of the algorithms used in the various steps of the process are Human in the loop (HITL) algorithms that require human interaction. Human is always part of the simulation and consequently influences the outcome.
This paper is a part of the work in progress aimed to define conceptual framework that will try to automate and optimize some steps of entity integration process and try to reduce requirements for human influence in the process. In this paper focus will be on conceptual process definition, recommended data architecture and use of existing open source solutions for entity integration process automation and optimization.
Data imputing uses to posit missing data values, as missing data have a negative effect on the computation validity of models. This study develops a genetic algorithm (GA) to optimize imputing for missing cost data of fans used in road tunnels by the Swedish Transport Administration (Trafikverket). GA uses to impute the missing cost data using an optimized valid data period. The results show highly correlated data (R- squared 0.99) after imputing the missing data. Therefore, GA provides a wide search space to optimize imputing and create complete data. The complete data can be used for forecasting and life cycle cost analysis. Ritesh Kumar Pandey | Dr Asha Ambhaikar"Data Imputation by Soft Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd14112.pdf http://www.ijtsrd.com/computer-science/real-time-computing/14112/data-imputation-by-soft-computing/ritesh-kumar-pandey
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...Craig Knoblock
Over the last few years we have been building domain-specific knowledge graphs for a variety of real-world problems, including creating virtual museums, combating human trafficking, identifying illegal arms sales, and predicting cyber attacks. We have developed a variety of techniques to construct such knowledge graphs, including techniques for extracting data from online sources, aligning the data to a domain ontology, and linking the data across sources. In his talk I will present these techniques and describe our experience in applying Semantic Web technologies to build knowledge graphs for real-world problems.
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
Efficient recommender systems are critical for the success of many industries, such as job recommendation, news recommendation, ecommerce, etc. This talk will illustrate how to build an efficient document recommender system by leveraging Natural Language Processing(NLP) and Deep Neural Networks (DNNs). The end-to-end flow of the document recommender system is build on AWS at scale, using Analytics Zoo for Spark and BigDL. The system first processes text rich documents into embeddings by incorporating Global Vectors (GloVe), then trains a K-means model using native Spark APIs to cluster users into several groups. The system further trains a recommender model for each group, and gives an ensemble prediction for each test record. By adopting the end-to-end pipeline of Analytics Zoo solution, we saw about 10% improvement of mean reciprocal ranking and 6% of precision respectively compared to the search recommendations for a job recommendation study.
Speaker: Guoqiong Song
Context-aware Programming for Hybrid and Diversity-aware Collective Adaptive ...Hong-Linh Truong
Collective adaptive systems (CASs) have been researched intensively since many years. However, the recent emerging developments and advanced models in service-oriented computing, cloud computing and human computation have fostered several new forms of CASs. Among them, Hybrid and Diversity-aware CASs (HDA-CASs) characterize new types of CASs in which a collective is composed of hybrid machines and humans that collaborate together with different complementary roles. This emerging HDA-CAS poses several research chal
lenges in terms of programming, management and provisioning. In this paper, we investigate the main issues in programming HDA-CASs. First, we analyze context characterizing HDA-CASs. Second, we propose to use the concept of hybrid compute units to implement HDA-CASs that can be elastic. We call this type of HDA-CASs h2 CAS (Hybrid Compute Unit-based HDA-CAS). We then discuss a meta-view of h2CAS that describes a h 2 CAS program. We analyze and present program features for h2CAS in four main different contexts.
Enabling and controlling elasticity of cloud comput-
ing applications is a challenging issue. Elasticity programming directives have been introduced to
delegate elasticity control to infrastructures and to
separate elasticity control from application logic. Since
coordination models provide a general approach to manage interaction and elasticity control entails interactions among cloud infrastructure components, we present a coordination-based approach to elasticity control, supporting delegation and separation of concerns at design and run-time, paving the way towards coordination-aware elasticity.
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Hong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase/
Interactions spanning multiple organizations have become an important aspect in today’s collaboration landscape. Organizations create alliances to fulfill strategic objectives. The dynamic nature of collaborations increasingly demands for automated techniques and algorithms to support the creation of such alliances. Our approach bases on the recommendation of potential alliances by discovery of currently relevant competence sources and the support of semi-automatic formation. The environment is service-oriented comprising humans and software services with distinct capabilities. To mediate between previously separated groups and organizations, we introduce the broker concept that bridges disconnected networks. Here we present a dynamic broker discovery approach based on interaction mining techniques and trust metrics.
SmartSociety – A Platform for Collaborative People-Machine ComputationHong-Linh Truong
We present the SmartSociety Platform for Collaborative People-Machine computation carried out in the FET SmartSociety project: http://www.smart-society-project.eu/
On Developing and Operating of Data Elasticity Management ProcessHong-Linh Truong
The Data-as-a-Service (DaaS) model enables data analytics
providers to provision and offer data assets to their consumers. To achieve quality of results for the data assets, we need to enable DaaS elasticity by trading off quality and cost of resource usage. However, most of the current work on DaaS is focused on infrastructure elasticity, such as scaling
in/out data nodes and virtual machines based on performance and usage, without considering the data assets' quality of results. In this talk, we introduce an elastic data asset model for provisioning data enriched with quality of results. Based on this model, we present techniques to generate and operate data elasticity management process that is used to
monitor, evaluate and enforce expected quality of results. We develop a runtime system to guarantee the quality of resulting data assets provisioned on-demand. We present several experiments to demonstrate the usefulness of our proposed techniques.
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...Hong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase
Governing Elastic IoT Cloud Systems under UncertaintiesHong-Linh Truong
we introduce U-GovOps – a novel framework for
dynamic, on-demand governance of elastic IoT cloud systems under
uncertainty. We introduce a declarative policy language to simplify
the development of uncertainty- and elasticity-aware governance
strategies. Based on that we develop runtime mechanisms, which
enable mitigating the uncertainties by monitoring and governing
the IoT cloud systems through specified strategies.
TUW-ASE Summer 2015: Data marketplaces: core models and conceptsHong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase/
Entity matching and entity resolution are becoming more important disciplines in data management over time, based on increasing number of data sources that should be addressed in economy that is undergoing digital transformation process, growing data volumes and increasing requirements related to data privacy. Data matching process is also called record linkage, entity matching or entity resolution in some published works. For long time research about the process was focused on matching entities from same dataset (i.e. deduplication) or from two datasets. Different algorithms used for matching different types of attributes were described in the literature, developed and implemented in data matching and data cleansing platforms. Entity resolution is element of larger entity integration process that include data acquisition, data profiling, data cleansing, schema alignment, data matching and data merge (fusion).
We can use motivating example of global pharmaceutical company with offices in more than 60 countries worldwide that migrated customer data from various legacy systems in different countries to new common CRM system in the cloud. Migration was phased by regions and countries, with new sources and data incrementally added and merged with data already migrated in previous phases. Entity integration in such case require deep understanding of data architectures, data content and each step of the process. Even with such deep understanding, design and implementation of the solution require many iterations in development process that consume human resources, time and financial resources. Reducing the number of iterations by automating and optimizing steps in the process can save vast amount of resources. There is a lot of available literature addressing any of the steps in the process, proposing different options for improvement of results or processing optimization, but the whole process still require a lot of human work and subject matter specific knowledge and many iterations to produce results that will have high F-measure (both high precision and recall). Most of the algorithms used in the various steps of the process are Human in the loop (HITL) algorithms that require human interaction. Human is always part of the simulation and consequently influences the outcome.
This paper is a part of the work in progress aimed to define conceptual framework that will try to automate and optimize some steps of entity integration process and try to reduce requirements for human influence in the process. In this paper focus will be on conceptual process definition, recommended data architecture and use of existing open source solutions for entity integration process automation and optimization.
Data imputing uses to posit missing data values, as missing data have a negative effect on the computation validity of models. This study develops a genetic algorithm (GA) to optimize imputing for missing cost data of fans used in road tunnels by the Swedish Transport Administration (Trafikverket). GA uses to impute the missing cost data using an optimized valid data period. The results show highly correlated data (R- squared 0.99) after imputing the missing data. Therefore, GA provides a wide search space to optimize imputing and create complete data. The complete data can be used for forecasting and life cycle cost analysis. Ritesh Kumar Pandey | Dr Asha Ambhaikar"Data Imputation by Soft Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd14112.pdf http://www.ijtsrd.com/computer-science/real-time-computing/14112/data-imputation-by-soft-computing/ritesh-kumar-pandey
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...Craig Knoblock
Over the last few years we have been building domain-specific knowledge graphs for a variety of real-world problems, including creating virtual museums, combating human trafficking, identifying illegal arms sales, and predicting cyber attacks. We have developed a variety of techniques to construct such knowledge graphs, including techniques for extracting data from online sources, aligning the data to a domain ontology, and linking the data across sources. In his talk I will present these techniques and describe our experience in applying Semantic Web technologies to build knowledge graphs for real-world problems.
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
Efficient recommender systems are critical for the success of many industries, such as job recommendation, news recommendation, ecommerce, etc. This talk will illustrate how to build an efficient document recommender system by leveraging Natural Language Processing(NLP) and Deep Neural Networks (DNNs). The end-to-end flow of the document recommender system is build on AWS at scale, using Analytics Zoo for Spark and BigDL. The system first processes text rich documents into embeddings by incorporating Global Vectors (GloVe), then trains a K-means model using native Spark APIs to cluster users into several groups. The system further trains a recommender model for each group, and gives an ensemble prediction for each test record. By adopting the end-to-end pipeline of Analytics Zoo solution, we saw about 10% improvement of mean reciprocal ranking and 6% of precision respectively compared to the search recommendations for a job recommendation study.
Speaker: Guoqiong Song
Context-aware Programming for Hybrid and Diversity-aware Collective Adaptive ...Hong-Linh Truong
Collective adaptive systems (CASs) have been researched intensively since many years. However, the recent emerging developments and advanced models in service-oriented computing, cloud computing and human computation have fostered several new forms of CASs. Among them, Hybrid and Diversity-aware CASs (HDA-CASs) characterize new types of CASs in which a collective is composed of hybrid machines and humans that collaborate together with different complementary roles. This emerging HDA-CAS poses several research chal
lenges in terms of programming, management and provisioning. In this paper, we investigate the main issues in programming HDA-CASs. First, we analyze context characterizing HDA-CASs. Second, we propose to use the concept of hybrid compute units to implement HDA-CASs that can be elastic. We call this type of HDA-CASs h2 CAS (Hybrid Compute Unit-based HDA-CAS). We then discuss a meta-view of h2CAS that describes a h 2 CAS program. We analyze and present program features for h2CAS in four main different contexts.
Enabling and controlling elasticity of cloud comput-
ing applications is a challenging issue. Elasticity programming directives have been introduced to
delegate elasticity control to infrastructures and to
separate elasticity control from application logic. Since
coordination models provide a general approach to manage interaction and elasticity control entails interactions among cloud infrastructure components, we present a coordination-based approach to elasticity control, supporting delegation and separation of concerns at design and run-time, paving the way towards coordination-aware elasticity.
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Hong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase/
Interactions spanning multiple organizations have become an important aspect in today’s collaboration landscape. Organizations create alliances to fulfill strategic objectives. The dynamic nature of collaborations increasingly demands for automated techniques and algorithms to support the creation of such alliances. Our approach bases on the recommendation of potential alliances by discovery of currently relevant competence sources and the support of semi-automatic formation. The environment is service-oriented comprising humans and software services with distinct capabilities. To mediate between previously separated groups and organizations, we introduce the broker concept that bridges disconnected networks. Here we present a dynamic broker discovery approach based on interaction mining techniques and trust metrics.
SmartSociety – A Platform for Collaborative People-Machine ComputationHong-Linh Truong
We present the SmartSociety Platform for Collaborative People-Machine computation carried out in the FET SmartSociety project: http://www.smart-society-project.eu/
On Developing and Operating of Data Elasticity Management ProcessHong-Linh Truong
The Data-as-a-Service (DaaS) model enables data analytics
providers to provision and offer data assets to their consumers. To achieve quality of results for the data assets, we need to enable DaaS elasticity by trading off quality and cost of resource usage. However, most of the current work on DaaS is focused on infrastructure elasticity, such as scaling
in/out data nodes and virtual machines based on performance and usage, without considering the data assets' quality of results. In this talk, we introduce an elastic data asset model for provisioning data enriched with quality of results. Based on this model, we present techniques to generate and operate data elasticity management process that is used to
monitor, evaluate and enforce expected quality of results. We develop a runtime system to guarantee the quality of resulting data assets provisioned on-demand. We present several experiments to demonstrate the usefulness of our proposed techniques.
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...Hong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase
Governing Elastic IoT Cloud Systems under UncertaintiesHong-Linh Truong
we introduce U-GovOps – a novel framework for
dynamic, on-demand governance of elastic IoT cloud systems under
uncertainty. We introduce a declarative policy language to simplify
the development of uncertainty- and elasticity-aware governance
strategies. Based on that we develop runtime mechanisms, which
enable mitigating the uncertainties by monitoring and governing
the IoT cloud systems through specified strategies.
TUW-ASE Summer 2015: Data marketplaces: core models and conceptsHong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase/
HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...Hong-Linh Truong
Effective resource management in IoT systems must
represent IoT resources, edge-to-cloud network capabilities, and
cloud resources at a high-level, while being able to link to diverse
low-level types of IoT devices, network functions, and cloud
computing infrastructures. Hence resource management in such
a context demands a highly distributed and extensible approach,
which allows us to integrate and provision IoT, network functions,
and cloud resources from various providers. In this paper, we
address this crucial research issue. We first present a high-
level information model for virtualized IoT, network functions
and cloud resource modeling, which also incorporates software-
defined gateways, network slicing and data centers. This model
is used to glue various low-level resource models from different
types of infrastructures in a distributed manner to capture
sets of resources spanning across different sub-networks. We
then develop a set of utilities and a middleware to support
the integration of information about distributed resources from
various sources. We present a proof of concept prototype with
various experiments to illustrate how various tasks in IoT cloud
systems can be simplified as well as to evaluate the performance
of our framework.
SINC – An Information-Centric Approach for End-to-End IoT Cloud Resource Prov...Hong-Linh Truong
We present SINC –
Slicing IoT, Network Functions, and Clouds – which enables designers to dynamically create/update end-to-end slices of the overall IoT network in order to simultaneously meet multiple user needs.
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
Dr. Pouria Amirian explains data science, steps in a data science workflow and show some experiments in AzureML. He also mentions about big data issues in a data science project and solutions to them.
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
Dr. Pouria Amirian from the University of Oxford explains Data Science and its relationship with Big Data and Cloud Computing. Then he illustrates using AzureML to perform a simple data science analytics.
A modified k means algorithm for big data clusteringSK Ahammad Fahad
Amount of data is getting bigger in every moment and this data comes from everywhere; social media, sensors, search engines, GPS signals, transaction records, satellites, financial markets, ecommerce sites etc. This large volume of data may be semi-structured, unstructured or even structured. So it is important to derive meaningful information from this huge data set. Clustering is the process to categorize data such that data are grouped in the same cluster when they are similar according to specific metrics. In this paper, we are working on k-mean clustering technique to cluster big data. Several methods have been proposed for improving the performance of the k-means clustering algorithm. We propose a method for making the algorithm less time consuming, more effective and efficient for better clustering with reduced complexity. According to our observation, quality of the resulting clusters heavily depends on the selection of initial centroid and changes in data clusters in the subsequence iterations. As we know, after a certain number of iterations, a small part of the data points change their clusters. Therefore, our proposed method first finds the initial centroid and puts an interval between those data elements which will not change their cluster and those which may change their cluster in the subsequence iterations. So that it will reduce the workload significantly in case of very large data sets. We evaluate our method with different sets of data and compare with others methods as well.
presented at WORKS 2021
https://works-workshop.org/
16th Workshop on Workflows in Support of Large-Scale Science
November 15, 2021
Held in conjunction with SC21: The International Conference for High Performance Computing, Networking, Storage and Analysis
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...Hong-Linh Truong
For predictive maintenance of equipment with In-
dustrial Internet of Things (IIoT) technologies, existing IoT Cloud
systems provide strong monitoring and data analysis capabilities
for detecting and predicting status of equipment. However, we
need to support complex interactions among different software
components and human activities to provide an integrated analyt-
ics, as software algorithms alone cannot deal with the complexity
and scale of data collection and analysis and the diversity of
equipment, due to the difficulties of capturing and modeling
uncertainties and domain knowledge in predictive maintenance.
In this paper, we describe how we design and augment complex
IoT big data cloud systems for integrated analytics of IIoT
predictive maintenance. Our approach is to identify various
complex interactions for solving system incidents together with
relevant critical analytics results about equipment. We incorpo-
rate humans into various parts of complex IoT Cloud systems
to enable situational data collection, services management, and
data analytics. We leverage serverless functions, cloud services,
and domain knowledge to support dynamic interactions between
human and software for maintaining equipment. We use a real-
world maintenance of Base Transceiver Stations to illustrate our
engineering approach which we have prototyped with state-of-
the art cloud and IoT technologies, such as Apache Nifi, Hadoop,
Spark and Google Cloud Functions.
The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned, reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy centric data products (confidential computing) as well as integration with cloud services
Modeling and Provisioning IoT Cloud Systems for Testing UncertaintiesHong-Linh Truong
Modern Cyber-Physical Systems (CPS) and Internet of Things (IoT)
systems consist of both loosely and tightly interactions among
various resources in IoT networks, edge servers and cloud data
centers. These elements are being built atop virtualization layers
and deployed in both edge and cloud infrastructures. They also deal
with a lot of data through the interconnection of different types of
networks and services. Therefore, several new types of uncertainties
are emerging, such as data, actuation, and elasticity uncertainties.
This triggers several challenges for testing uncertainty in such
systems. However, there is a lack of novel ways to model and
prepare the right infrastructural elements covering requirements
for testing emerging uncertainties. In this paper, first we present
techniques for modeling CPS/IoT Systems and their uncertainties
to be tested. Second, we introduce techniques for determining and
generating deployment configuration for testing in different IoT
and cloud infrastructures. We illustrate our work with a real-world
use case for monitoring and analysis of Base Transceiver Stations.
Testing Uncertainty of Cyber-Physical Systems in IoT Cloud Infrastructures: C...Hong-Linh Truong
Today’s cyber-physical systems (CPS) span IoT and cloud-based
datacenter infrastructures, which are highly heterogeneous with
various types of uncertainty. Thus, testing uncertainties in these
CPS is a challenging and multidisciplinary activity. We need several
tools for modeling, deployment, control, and analytics to test and
evaluate uncertainties for different configurations of the same CPS.
In this paper, we explain why using state-of-the art model-driven
engineering (MDE) and model-based testing (MBT) tools is not
adequate for testing uncertainties of CPS in IoT Cloud infrastruc-
tures. We discus how to combine them with techniques for elastic
execution to dynamically provision both CPS under test and testing
utilities to perform tests in various IoT Cloud infrastructures.
Towards a Resource Slice Interoperability Hub for IoTHong-Linh Truong
Interoperability for IoT is a challenging problem
because it requires us to tackle (i) cross-system interoperability
issues at the IoT platform sides as well as relevant network
functions and clouds in the edge systems and data centers
and (ii) cross-layer interoperability, e.g., w.r.t. data formats,
communication protocols, data delivery mechanisms, and perfor-
mance. However, existing solutions are quite static w.r.t software
deployment and provisioning for interoperability. Many middle-
ware, services and platforms have been built and deployed as
interoperability bridges but they are not dynamically provisioned
and reconfigured for interoperability at runtime. Furthermore,
they are often not considered together with other services as a
whole in application-specific contexts. In this paper, we focus
on dynamic aspects by introducing the concept of Resource
Slice Interoperability Hub (rsiHub). Our approach leverages
existing software artifacts and services for interoperability to
create and provision dynamic resource slices, including IoT,
network functions and clouds, for addressing application-specific
interoperability requirements. We will present our key concepts,
architectures and examples toward the realization of rsiHub.
On Supporting Contract-aware IoT Dataspace ServicesHong-Linh Truong
Advances in the Internet of Things (IoT) enable a
huge number of connected devices that produce large amounts
of data. Such data is increasingly shared among various
stakeholders to support advanced (predictive) analytics and
precision decision making in different application domains like
smart cities and industrial internet. Currently there are several
platforms that facilitate sharing, buying and selling IoT data.
However, these platforms do not support the establishment and
monitoring of usage contracts for IoT data. In this paper we
address this research issue by introducing a new extensible
platform for enabling contract-aware IoT dataspace services,
which supports data contract specification and IoT data flow
monitoring based on established data contracts. We present
a general architecture of contract monitoring services for
IoT dataspaces and evaluate our platform through illustrative
examples with real-world datasets and through performance
analysis.
Towards the Realization of Multi-dimensional Elasticity for Distributed Cloud...Hong-Linh Truong
As multiple types of distributed, heterogeneous cloud computing environments have proliferated, cloud software can leverage
diverse types of infrastructural, platform and data resources with di
erent cost and quality models. This introduces a multi-
dimensional elasticity perspective for cloud software that would greatly meet changing demands from the user. However, we argue
that current techniques are not enough for dealing with multi-dimensional elasticity in distributed cloud environments. We present
our approach to the realization of multi-dimensional elasticity by introducing novel concepts and a roadmap to achieve them.
On Engineering Analytics of Elastic IoT Cloud SystemsHong-Linh Truong
Developing IoT cloud platforms is very challenging, as IoT
cloud platforms consist of a mix of cloud services and IoT elements, e.g.,
for sensor management, near-realtime events handling, and data analyt-
ics. Developers need several tools for deployment, control, governance
and analytics actions to test and evaluate designs of software compo-
nents and optimize the operation of di erent design con gurations. In
this paper, we describe requirements and our techniques on support-
ing the development and testing of IoT cloud platforms. We present our
choices of tools and engineering actions that help the developer to design,
test and evaluate IoT cloud platforms in multi-cloud environments.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Digital Tools and AI for Teaching Learning and Research
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
1. Quality of Result-aware data analytics
Hong-Linh Truong
Distributed Systems Group,
Vienna University of Technology
truong@dsg.tuwien.ac.at
http://dsg.tuwien.ac.at/staff/truong
1ASE Summer 2015
Advanced Services Engineering,
Summer 2015
Advanced Services Engineering,
Summer 2015
2. Outline
Data analytics: workflow structures and systems
Enable quality of results for data analytics
Quality of data in data analytics workflows
ASE Summer 2015 2
3. Data Analytics
Conceptual View
ASE Summer 2015 3
Data Processing FrameworksData Processing Frameworks
Streaming/Online
Data Processing
Batch Data
Processing
Hybrid Data
Processing
Static data(Near)
realtime data
Decision Data Analysis
Analytics,
Tools,
Processes &
Models
4. Data analytics workflows
ASE Summer 2015 4
Things
People
DaaSDaaS
Computation
Service
Computation
Service
We use the term „workflow“ in a
generic meaning!!!
5. Different views of (data analytics)
workflow systems
5
View
Domain
view
Business
Workflow
Scientific/E-
science
Workflow
Data/Computati
on view
Data
intensive
workflow
Computatio
n intensive
workflow
Human-
intensive
workflow
System
view
Grid
workflow
Enterprise
workflow
Cloud-
based
workflow
Execution
model
view
Service-
based
workflow
Batch job
workflow
Interactive
workflow
ASE Summer 2015
6. Pros and cons of (data analytics)
workflow systems
ASE Summer 2015 6
Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. 2006. Workflows for E-Science: Scientific
Workflows for Grids. Springer-Verlag New York, Inc., Secaucus, NJ, USA.
Bertram Ludäscher, Mathias Weske, Timothy M. McPhillips, Shawn Bowers: Scientific Workflows: Business as
Usual? BPM 2009: 31-47
Mirko Sonntag, Dimka Karastoyanova, Frank Leymann: The Missing Features of Workflow Systems for Scientific
Computations. Software Engineering (Workshops) 2010: 209-216
Lavanya Ramakrishnan and Beth Plale. 2010. A multi-dimensional classification model for scientific workflow
characteristics. In Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric
Science (Wands '10). ACM, New York, NY, USA, , Article 4 , 12 pages. DOI=10.1145/1833398.1833402
http://doi.acm.org/10.1145/1833398.1833402
Jia Yu and Rajkumar Buyya. 2005. A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34,
3 (September 2005), 44-49. DOI=10.1145/1084805.1084814 http://doi.acm.org/10.1145/1084805.1084814
Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. 2006. Workflows for E-Science: Scientific
Workflows for Grids. Springer-Verlag New York, Inc., Secaucus, NJ, USA.
Bertram Ludäscher, Mathias Weske, Timothy M. McPhillips, Shawn Bowers: Scientific Workflows: Business as
Usual? BPM 2009: 31-47
Mirko Sonntag, Dimka Karastoyanova, Frank Leymann: The Missing Features of Workflow Systems for Scientific
Computations. Software Engineering (Workshops) 2010: 209-216
Lavanya Ramakrishnan and Beth Plale. 2010. A multi-dimensional classification model for scientific workflow
characteristics. In Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric
Science (Wands '10). ACM, New York, NY, USA, , Article 4 , 12 pages. DOI=10.1145/1833398.1833402
http://doi.acm.org/10.1145/1833398.1833402
Jia Yu and Rajkumar Buyya. 2005. A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34,
3 (September 2005), 44-49. DOI=10.1145/1084805.1084814 http://doi.acm.org/10.1145/1084805.1084814
7. Hierarchical view of workflows (1)
mProject1Service.java
public void mProject1() {
}
mProject1Service.java
public void mProject1() {
}
WorkflowWorkflow
A();A();
<parallel>
</parallel>
<parallel>
</parallel>
Workflow Region nWorkflow Region n
Activity mActivity m
Invoked Application mInvoked Application m
Code
Region 1
Code
Region 1
Code
Region q
Code
Region q
Code
Region …
Code
Region …
<activity name="mProject1">
<executable name="mProject1"/>
</activity>
<activity name="mProject1">
<executable name="mProject1"/>
</activity>
<activity name="mProject2">
<executable name="mProject2"/>
</activity>
<activity name="mProject2">
<executable name="mProject2"/>
</activity>
while () {
...
}
while () {
...
}
Hong Linh Truong, Schahram Dustdar, Thomas Fahringer:
Performance metrics and ontologies for Grid workflows. Future
Generation Comp. Syst. 23(6): 760-772 (2007)
Hong Linh Truong, Schahram Dustdar, Thomas Fahringer:
Performance metrics and ontologies for Grid workflows. Future
Generation Comp. Syst. 23(6): 760-772 (2007)
ASE Summer 2015 7
8. Representing and programming
data analytics workflows
Programming languages
General- and specific-purpose programming
languages, such as Java, Python, Swift
Programming models, such as MapReduce, Hadoop,
Complex event processing, Spark
Descriptive languages
BPEL and several languages designed for specific
workflow engines
They can also be combined
8ASE Summer 2015
9. Data analytics workflow execution
models
ASE Summer 2015 9
Data analytics
workflows
Data analytics
workflows Execution EngineExecution Engine
Local SchedulerLocal Scheduler
jobjob jobjob jobjob jobjob
Web
serviceWeb
serviceWeb
service
Web
service
People
10. Data analytics workflow execution
models
ASE Summer 2015 10
Data analytics
workflows
Data analytics
workflows
Execution EngineExecution Engine
Service
unit
Local
input
data
Analytics
Results
Web service
MapReduce/Hadoop
Sub-Workflow
MPI
Other solutions
Servers/Cloud/Cluster
11. Examples of systems and
frameworks for data analytics
workflows
ASE Summer 2015 11
ASKALONASKALON
KEPLERKEPLER
TAVERNATAVERNA
TRIDENTTRIDENT
Apache ODE +
WS-BPEL
Apache ODE +
WS-BPEL
PegasusPegasus
JOperaJOperaADEPTADEPT
MapReduce/HadoopMapReduce/Hadoop
SwiftSwiftRR
12. Some examples (1)
ASE Summer 2015 12
Source: Gideon Juve, Ewa Deelman, G. Bruce Berriman, Benjamin P. Berman, Philip Maechling: An Evaluation of the
Cost and Performance of Scientific Workflows on Amazon EC2. J. Grid Comput. 10(1): 5-21 (2012)
Source: Gideon Juve, Ewa Deelman, G. Bruce Berriman, Benjamin P. Berman, Philip Maechling: An Evaluation of the
Cost and Performance of Scientific Workflows on Amazon EC2. J. Grid Comput. 10(1): 5-21 (2012)
13. Some examples (2)
ASE Summer 2015 13
Source: http://www.dps.uibk.ac.at/projects/brokerage/Source: http://www.dps.uibk.ac.at/projects/brokerage/
14. Some examples (3)
ASE Summer 2015 14
Source: Cesare Pautasso, Thomas Heinis, Gustavo Alonso: JOpera: Autonomic Service
Orchestration. IEEE Data Eng. Bull. 29(3): 32-39 (2006)
Source: Cesare Pautasso, Thomas Heinis, Gustavo Alonso: JOpera: Autonomic Service
Orchestration. IEEE Data Eng. Bull. 29(3): 32-39 (2006)
15. Some examples (4)
ASE Summer 2015 15
Source: Sudipto Das, Yannis Sismanis, Kevin S. Beyer, Rainer Gemulla, Peter J. Haas, and John McPherson. 2010.
Ricardo: integrating R and Hadoop. In Proceedings of the 2010 ACM SIGMOD International Conference on Management
of data (SIGMOD '10). ACM, New York, NY, USA, 987-998. DOI=10.1145/1807167.1807275
http://doi.acm.org/10.1145/1807167.1807275
Source: Sudipto Das, Yannis Sismanis, Kevin S. Beyer, Rainer Gemulla, Peter J. Haas, and John McPherson. 2010.
Ricardo: integrating R and Hadoop. In Proceedings of the 2010 ACM SIGMOD International Conference on Management
of data (SIGMOD '10). ACM, New York, NY, USA, 987-998. DOI=10.1145/1807167.1807275
http://doi.acm.org/10.1145/1807167.1807275
17. Recall - QoR
Characterize the results of analytics processes
Different elements of QoR
Performance
Data quality
Cost
Form/data format of output results
Etc.
Customer: expects QoR
Provider: offers QoR and enforces QoR
ASE Summer 2015 17
18. Performance and Data Quality
Aspects
18
Data Analytics
Data in
Data out
Executed on
Analytics
Processes
uses
Execution time?
Performance Overhead?
Memory Consumption?
Is the data good
enough?
How bad data
impacts on
performance?
Is the data good enough
to be stored and shared?
Data quality metrics and models are
strongly domain-specific
Data quality metrics and models are
strongly domain-specific
Which processes should
be used?
ASE Summer 2015 18
19. Model QoR and analytics processes
19ASE Summer 2015
Hong Linh Truong, Schahram Dustdar: Principles of Software-Defined Elastic Systems for Big Data Analytics. IC2E
2014: 562-567
Hong Linh Truong, Schahram Dustdar: Principles of Software-Defined Elastic Systems for Big Data Analytics. IC2E
2014: 562-567
20. SO HOW DO WE ENABLE
QOR-AWARE ANALYTICS?
ASE Summer 2015 20
21. Solutions
Computational resources provisioning?
Replication of analytics ?
Performance and cost measurement and
optimization?
Improve quality of input data ?
Improve the quality of output data?
ASE Summer 2015 21
22. 22
Hong Linh Truong, Peter Brunner, Vlad Nae, Thomas Fahringer: DIPAS: A distributed performance analysis service for
grid service-based workflows. Future Generation Comp. Syst. 25(4): 385-398 (2009)
Hong Linh Truong, Peter Brunner, Vlad Nae, Thomas Fahringer: DIPAS: A distributed performance analysis service for
grid service-based workflows. Future Generation Comp. Syst. 25(4): 385-398 (2009)
Well-addressed concerns --
performance
ASE Summer 2015 22
23. Well-addressed concerns –
performance/cost
ASE Summer 2015 23
Source: David Chiu, Sagar Deshpande, Gagan Agrawal, Rongxing Li: Cost and accuracy sensitive dynamic workflow
composition over grid environments. GRID 2008: 9-16
Source: David Chiu, Sagar Deshpande, Gagan Agrawal, Rongxing Li: Cost and accuracy sensitive dynamic workflow
composition over grid environments. GRID 2008: 9-16
24. QUALITY OF DATA IN DATA
ANALYTICS WORKFLOWS
ASE Summer 2015 24
25. Very little support
Qurator workbench
“Personal quality models” can be expressed and
embedded into query processors or workflows.
Assume that quality evidence is presented
Kepler
A data quality monitor allows user to specify quality
thresholds.
Expect that rules can be used to control the execution
based on quality.
ASE Summer 2015 25
P Missier, S M Embury, M Greenwood, A D Preece, & B Jin, Managing Information Quality in e-Science: the Qurator
Workbench, Proc ACM International Conference on Management of Data (SIGMOD 2007), ACM Press, pages 1150-
1152, 2007.
Aisa Na’im, Daniel Crawl,Maria Indrawan, Ilkay Altintas, and Shulei Sun. Monitoring data quality in kepler. In Salim Hariri
and Kate Keahey, editors, HPDC, pages 560–564. ACM, 2010.
P Missier, S M Embury, M Greenwood, A D Preece, & B Jin, Managing Information Quality in e-Science: the Qurator
Workbench, Proc ACM International Conference on Management of Data (SIGMOD 2007), ACM Press, pages 1150-
1152, 2007.
Aisa Na’im, Daniel Crawl,Maria Indrawan, Ilkay Altintas, and Shulei Sun. Monitoring data quality in kepler. In Salim Hariri
and Kate Keahey, editors, HPDC, pages 560–564. ACM, 2010.
26. Research questions
What are main QoD metrics, what are the relationship between QoD
metrics and other service level objectives, and what are their roles
and possible trade-offs?
How to support different domain-specific QoD models and link them
to workflow structures?
How to model, evaluate and estimate QoD associated with data
movement into, within, and out to workflows? When and where
software or scientists can perform automatic or manual QoD
measurement and analysis
How to optimize the workflow composition and execution based on
QoD specification?
How does QoD impact on the provisioning of data services,
computational services and supporting services?
ASE Summer 2015 26
27. Approach
ASE Summer 2015 27
Core models, techniques and algorithms to allow
the modeling and evaluating QoD metrics
Core models, techniques and algorithms to allow
the modeling and evaluating QoD metrics
QoD-aware composition and executionQoD-aware composition and execution
QoD-aware service provisioning and
infrastructure optimization
QoD-aware service provisioning and
infrastructure optimization
30. HOW TO INTEGRATE QOD
EVALUATORS? AND WHICH CONCERNS
NEED TO BE CONSIDERED?
ASE Summer 2015 30
31. QoD metrics evaluation
Domain-specific metrics
Need specific tools and expertise for determining
metrics
Evaluation
Cannot done by software only: humans are required
Complex integration model
Where to put QoD evaluators and why?
How evaluators obtain the data to be evaluated?
Impact of QoD evaluation on performance of
data analytics workflows
ASE Summer 2015 31
32. WHAT KIND OF OPTIMIZATION CAN BE
DONE?
ASE Summer 2015 32
33. QoD-aware optimization for data
analytics workflows
Improving quality of results
Reducing analytics costs and time
Enabling early failure detection
Enabling elasticitiy of services provisioning
Enabling elastic data analytics support
Etc.
ASE Summer 2015 33
35. 35
QoD-aware simulation workflows
Michael Reiter, Hong Linh Truong, Schahram Dustdar, Dimka Karastoyanova, Robert Krause, Frank Leymann, Dieter
Pahr: On Analyzing Quality of Data Influences on Performance of Finite Elements Driven Computational Simulations.
Euro-Par 2012: 793-804
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A
Novel Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
Michael Reiter, Hong Linh Truong, Schahram Dustdar, Dimka Karastoyanova, Robert Krause, Frank Leymann, Dieter
Pahr: On Analyzing Quality of Data Influences on Performance of Finite Elements Driven Computational Simulations.
Euro-Par 2012: 793-804
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A
Novel Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
ASE Summer 2015
36. Hybrid resources needed for
quality evaluation
Challenges:
Subjective and objective evaluation
Long running processes
Our approach
Different QoD measurements
Human and software tasks
36ASE Summer 2015
37. 37
Evaluating quality of data in
workflows
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A
Novel Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A
Novel Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
ASE Summer 2015
38. QoD Evaluator
Software-based QoD evaluators
Can be provided under libraries integrated into
invoked applications
Web services-based evaluators
Human-based QoD evaluators
Built based on the concept human-based services
Can be interfaces via Human-Task
Simple mapping at the moment
Human resources from clouds/crowds
ASE Summer 2015 38
39. Open issues: quality-of-result
(QoR) driven workflows
Tradeoffs are challenging
How to support QoR driven analytics?
Some basic steps
Conceptualize expected QoR
Associate the expected QoR with workflow activities
Use the expected QoR
to match/select underlying services (e.g., data sources,
cloud IaaS, etc
Utilize the expected QoR and the measured QoR
and apply elasticity principles for
Refine the workflow structure
Provision computation, network and data
ASE Summer 2015 39
40. Exercises
Read mentioned papers
Discuss pros and cons of descriptive languages
- and programming languages – based data
analytics workflows
Examine how QoD evaluators can be integrated
into different programming models for QoR-
aware data analytics workflows
Implement some QoD evaluators
Develop techniques for determining places
where QoD evaluators can be performed
ASE Summer 2015 40
41. 41
Thanks for
your attention
Hong-Linh Truong
Distributed Systems Group
Vienna University of Technology
truong@dsg.tuwien.ac.at
http://dsg.tuwien.ac.at/staff/truong
ASE Summer 2015