The document discusses quality of data-aware data analytics workflows. It begins with outlining the topics to be covered, which include data analytics workflows structures and systems, issues with quality of data-aware workflows, and quality of data-aware simulation workflows. It then provides examples of different workflow systems and frameworks for data analytics workflows. Key points discussed are the need to understand hierarchical workflow structures, addressing data and service concerns, importance of quality of data for data analytics workflows, and approaches to modeling quality of data metrics and optimizing workflows based on quality of data.
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
Big data is the biggest challenges as we need huge processing power system and good algorithms to make a decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
Elastic como solución de analítica avanzada en los procesos del sector petrolero. Analítica de datos de sensores en tiempo real para adicionar valor a las decisiones estratégicas de las organizaciones
Agile Data Science 2.0 (O’Reilly 2017) defines a methodology and a software stack with which to apply the methods. The methodology seeks to deliver data products in short sprints by going meta and putting the focus on the applied research process itself. The stack is but an example of one meeting the requirements that it be utterly scalable and utterly efficient in use by application developers as well as data engineers. It includes everything needed to build a full-blown predictive system: Apache Spark, Apache Kafka, Apache Incubating Airflow, MongoDB, ElasticSearch, Apache Parquet, Python/Flask, JQuery. This talk will cover the full lifecycle of large data application development and will show how to use lessons from agile software engineering to apply data science using this full-stack to build better analytics applications.
Hfsp bringing size based scheduling to hadoop
Do Your Projects With Technology Experts
To Get this projects Call : 9566355386 / 99625 88976
Web : http://www.lemenizinfotech.com
Web : http://www.ieeemaster.com
Mail : projects@lemenizinfotech.com
Blog : http://ieeeprojectspondicherry.weebly.com
Blog : http://www.ieeeprojectsinpondicherry.blogspot.in/
Youtube:https://www.youtube.com/watch?v=eesBNUnKvws
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
Big data is the biggest challenges as we need huge processing power system and good algorithms to make a decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
Elastic como solución de analítica avanzada en los procesos del sector petrolero. Analítica de datos de sensores en tiempo real para adicionar valor a las decisiones estratégicas de las organizaciones
Agile Data Science 2.0 (O’Reilly 2017) defines a methodology and a software stack with which to apply the methods. The methodology seeks to deliver data products in short sprints by going meta and putting the focus on the applied research process itself. The stack is but an example of one meeting the requirements that it be utterly scalable and utterly efficient in use by application developers as well as data engineers. It includes everything needed to build a full-blown predictive system: Apache Spark, Apache Kafka, Apache Incubating Airflow, MongoDB, ElasticSearch, Apache Parquet, Python/Flask, JQuery. This talk will cover the full lifecycle of large data application development and will show how to use lessons from agile software engineering to apply data science using this full-stack to build better analytics applications.
Hfsp bringing size based scheduling to hadoop
Do Your Projects With Technology Experts
To Get this projects Call : 9566355386 / 99625 88976
Web : http://www.lemenizinfotech.com
Web : http://www.ieeemaster.com
Mail : projects@lemenizinfotech.com
Blog : http://ieeeprojectspondicherry.weebly.com
Blog : http://www.ieeeprojectsinpondicherry.blogspot.in/
Youtube:https://www.youtube.com/watch?v=eesBNUnKvws
TUW-ASE Summer 2015 - Quality of Result-aware data analyticsHong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...Hong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Hong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase/
“Semantic Technologies for Smart Services” diannepatricia
Rudi Studer, Full Professor in Applied Informatics at the Karlsruhe Institute of Technology (KIT), Institute AIFB, presentation “Semantic Technologies for Smart Services” as part of the Cognitive Systems Institute Speaker Series, December 15, 2016.
BigData: My Learnings from data analytics at Uber
Reference (highly recommended):
* Designing Data-Intensive Applications http://bit.ly/big_data_architecture
* Big Data and Machine Learning using Python tools http://bit.ly/big_data_machine_learning
* Uber Engineering Blog http://eng.uber.com
* Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale
http://bit.ly/hadoop_guide_bigdata
TUW-ASE Summer 2015 - Quality of Result-aware data analyticsHong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...Hong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Hong-Linh Truong
This is a lecture from the advanced service engineering course from the Vienna University of Technology. See http://dsg.tuwien.ac.at/teaching/courses/ase/
“Semantic Technologies for Smart Services” diannepatricia
Rudi Studer, Full Professor in Applied Informatics at the Karlsruhe Institute of Technology (KIT), Institute AIFB, presentation “Semantic Technologies for Smart Services” as part of the Cognitive Systems Institute Speaker Series, December 15, 2016.
BigData: My Learnings from data analytics at Uber
Reference (highly recommended):
* Designing Data-Intensive Applications http://bit.ly/big_data_architecture
* Big Data and Machine Learning using Python tools http://bit.ly/big_data_machine_learning
* Uber Engineering Blog http://eng.uber.com
* Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale
http://bit.ly/hadoop_guide_bigdata
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...Hong-Linh Truong
For predictive maintenance of equipment with In-
dustrial Internet of Things (IIoT) technologies, existing IoT Cloud
systems provide strong monitoring and data analysis capabilities
for detecting and predicting status of equipment. However, we
need to support complex interactions among different software
components and human activities to provide an integrated analyt-
ics, as software algorithms alone cannot deal with the complexity
and scale of data collection and analysis and the diversity of
equipment, due to the difficulties of capturing and modeling
uncertainties and domain knowledge in predictive maintenance.
In this paper, we describe how we design and augment complex
IoT big data cloud systems for integrated analytics of IIoT
predictive maintenance. Our approach is to identify various
complex interactions for solving system incidents together with
relevant critical analytics results about equipment. We incorpo-
rate humans into various parts of complex IoT Cloud systems
to enable situational data collection, services management, and
data analytics. We leverage serverless functions, cloud services,
and domain knowledge to support dynamic interactions between
human and software for maintaining equipment. We use a real-
world maintenance of Base Transceiver Stations to illustrate our
engineering approach which we have prototyped with state-of-
the art cloud and IoT technologies, such as Apache Nifi, Hadoop,
Spark and Google Cloud Functions.
Modeling and Provisioning IoT Cloud Systems for Testing UncertaintiesHong-Linh Truong
Modern Cyber-Physical Systems (CPS) and Internet of Things (IoT)
systems consist of both loosely and tightly interactions among
various resources in IoT networks, edge servers and cloud data
centers. These elements are being built atop virtualization layers
and deployed in both edge and cloud infrastructures. They also deal
with a lot of data through the interconnection of different types of
networks and services. Therefore, several new types of uncertainties
are emerging, such as data, actuation, and elasticity uncertainties.
This triggers several challenges for testing uncertainty in such
systems. However, there is a lack of novel ways to model and
prepare the right infrastructural elements covering requirements
for testing emerging uncertainties. In this paper, first we present
techniques for modeling CPS/IoT Systems and their uncertainties
to be tested. Second, we introduce techniques for determining and
generating deployment configuration for testing in different IoT
and cloud infrastructures. We illustrate our work with a real-world
use case for monitoring and analysis of Base Transceiver Stations.
Testing Uncertainty of Cyber-Physical Systems in IoT Cloud Infrastructures: C...Hong-Linh Truong
Today’s cyber-physical systems (CPS) span IoT and cloud-based
datacenter infrastructures, which are highly heterogeneous with
various types of uncertainty. Thus, testing uncertainties in these
CPS is a challenging and multidisciplinary activity. We need several
tools for modeling, deployment, control, and analytics to test and
evaluate uncertainties for different configurations of the same CPS.
In this paper, we explain why using state-of-the art model-driven
engineering (MDE) and model-based testing (MBT) tools is not
adequate for testing uncertainties of CPS in IoT Cloud infrastruc-
tures. We discus how to combine them with techniques for elastic
execution to dynamically provision both CPS under test and testing
utilities to perform tests in various IoT Cloud infrastructures.
Towards a Resource Slice Interoperability Hub for IoTHong-Linh Truong
Interoperability for IoT is a challenging problem
because it requires us to tackle (i) cross-system interoperability
issues at the IoT platform sides as well as relevant network
functions and clouds in the edge systems and data centers
and (ii) cross-layer interoperability, e.g., w.r.t. data formats,
communication protocols, data delivery mechanisms, and perfor-
mance. However, existing solutions are quite static w.r.t software
deployment and provisioning for interoperability. Many middle-
ware, services and platforms have been built and deployed as
interoperability bridges but they are not dynamically provisioned
and reconfigured for interoperability at runtime. Furthermore,
they are often not considered together with other services as a
whole in application-specific contexts. In this paper, we focus
on dynamic aspects by introducing the concept of Resource
Slice Interoperability Hub (rsiHub). Our approach leverages
existing software artifacts and services for interoperability to
create and provision dynamic resource slices, including IoT,
network functions and clouds, for addressing application-specific
interoperability requirements. We will present our key concepts,
architectures and examples toward the realization of rsiHub.
On Supporting Contract-aware IoT Dataspace ServicesHong-Linh Truong
Advances in the Internet of Things (IoT) enable a
huge number of connected devices that produce large amounts
of data. Such data is increasingly shared among various
stakeholders to support advanced (predictive) analytics and
precision decision making in different application domains like
smart cities and industrial internet. Currently there are several
platforms that facilitate sharing, buying and selling IoT data.
However, these platforms do not support the establishment and
monitoring of usage contracts for IoT data. In this paper we
address this research issue by introducing a new extensible
platform for enabling contract-aware IoT dataspace services,
which supports data contract specification and IoT data flow
monitoring based on established data contracts. We present
a general architecture of contract monitoring services for
IoT dataspaces and evaluate our platform through illustrative
examples with real-world datasets and through performance
analysis.
Towards the Realization of Multi-dimensional Elasticity for Distributed Cloud...Hong-Linh Truong
As multiple types of distributed, heterogeneous cloud computing environments have proliferated, cloud software can leverage
diverse types of infrastructural, platform and data resources with di
erent cost and quality models. This introduces a multi-
dimensional elasticity perspective for cloud software that would greatly meet changing demands from the user. However, we argue
that current techniques are not enough for dealing with multi-dimensional elasticity in distributed cloud environments. We present
our approach to the realization of multi-dimensional elasticity by introducing novel concepts and a roadmap to achieve them.
On Engineering Analytics of Elastic IoT Cloud SystemsHong-Linh Truong
Developing IoT cloud platforms is very challenging, as IoT
cloud platforms consist of a mix of cloud services and IoT elements, e.g.,
for sensor management, near-realtime events handling, and data analyt-
ics. Developers need several tools for deployment, control, governance
and analytics actions to test and evaluate designs of software compo-
nents and optimize the operation of di erent design con gurations. In
this paper, we describe requirements and our techniques on support-
ing the development and testing of IoT cloud platforms. We present our
choices of tools and engineering actions that help the developer to design,
test and evaluate IoT cloud platforms in multi-cloud environments.
HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...Hong-Linh Truong
Effective resource management in IoT systems must
represent IoT resources, edge-to-cloud network capabilities, and
cloud resources at a high-level, while being able to link to diverse
low-level types of IoT devices, network functions, and cloud
computing infrastructures. Hence resource management in such
a context demands a highly distributed and extensible approach,
which allows us to integrate and provision IoT, network functions,
and cloud resources from various providers. In this paper, we
address this crucial research issue. We first present a high-
level information model for virtualized IoT, network functions
and cloud resource modeling, which also incorporates software-
defined gateways, network slicing and data centers. This model
is used to glue various low-level resource models from different
types of infrastructures in a distributed manner to capture
sets of resources spanning across different sub-networks. We
then develop a set of utilities and a middleware to support
the integration of information about distributed resources from
various sources. We present a proof of concept prototype with
various experiments to illustrate how various tasks in IoT cloud
systems can be simplified as well as to evaluate the performance
of our framework.
SINC – An Information-Centric Approach for End-to-End IoT Cloud Resource Prov...Hong-Linh Truong
We present SINC –
Slicing IoT, Network Functions, and Clouds – which enables designers to dynamically create/update end-to-end slices of the overall IoT network in order to simultaneously meet multiple user needs.
Governing Elastic IoT Cloud Systems under UncertaintiesHong-Linh Truong
we introduce U-GovOps – a novel framework for
dynamic, on-demand governance of elastic IoT cloud systems under
uncertainty. We introduce a declarative policy language to simplify
the development of uncertainty- and elasticity-aware governance
strategies. Based on that we develop runtime mechanisms, which
enable mitigating the uncertainties by monitoring and governing
the IoT cloud systems through specified strategies.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
TUW - Quality of data-aware data analytics workflows
1. Quality of data-aware data analytics
workflows
Hong-Linh Truong
Distributed Systems Group,
Vienna University of Technology
truong@dsg.tuwien.ac.at
http://dsg.tuwien.ac.at/staff/truong
1ASE Summer 2014
Advanced Services Engineering,
Summer 2014
Advanced Services Engineering,
Summer 2014
2. Outline
Data analytics workflows – structures and
systems
Issues on Quality of data aware data analytics
workflows
Quality of data aware simulation workflows
ASE Summer 2014 2
3. Data analytics workflows
ASE Summer 2014 3
Things
People
DaaSDaaS
Computation
Service
Computation
Service
We use the term „workflow“ in a
generic meaning!!!
4. Different views of (data analytics)
workflow systems
4
View
Domain
view
Business
Workflow
Scientific/E-
science
Workflow
Data/Computati
on view
Data
intensive
workflow
Computatio
n intensive
workflow
Human-
intensive
workflow
System
view
Grid
workflow
Enterprise
workflow
Cloud-
based
workflow
Execution
model
view
Service-
based
workflow
Batch job
workflow
Interactive
workflow
ASE Summer 2014
5. Pros and cons of (data analytics)
workflow systems
ASE Summer 2014 5
Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. 2006. Workflows for E-Science: Scientific
Workflows for Grids. Springer-Verlag New York, Inc., Secaucus, NJ, USA.
Bertram Ludäscher, Mathias Weske, Timothy M. McPhillips, Shawn Bowers: Scientific Workflows: Business as
Usual? BPM 2009: 31-47
Mirko Sonntag, Dimka Karastoyanova, Frank Leymann: The Missing Features of Workflow Systems for Scientific
Computations. Software Engineering (Workshops) 2010: 209-216
Lavanya Ramakrishnan and Beth Plale. 2010. A multi-dimensional classification model for scientific workflow
characteristics. In Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric
Science (Wands '10). ACM, New York, NY, USA, , Article 4 , 12 pages. DOI=10.1145/1833398.1833402
http://doi.acm.org/10.1145/1833398.1833402
Jia Yu and Rajkumar Buyya. 2005. A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34,
3 (September 2005), 44-49. DOI=10.1145/1084805.1084814 http://doi.acm.org/10.1145/1084805.1084814
Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. 2006. Workflows for E-Science: Scientific
Workflows for Grids. Springer-Verlag New York, Inc., Secaucus, NJ, USA.
Bertram Ludäscher, Mathias Weske, Timothy M. McPhillips, Shawn Bowers: Scientific Workflows: Business as
Usual? BPM 2009: 31-47
Mirko Sonntag, Dimka Karastoyanova, Frank Leymann: The Missing Features of Workflow Systems for Scientific
Computations. Software Engineering (Workshops) 2010: 209-216
Lavanya Ramakrishnan and Beth Plale. 2010. A multi-dimensional classification model for scientific workflow
characteristics. In Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric
Science (Wands '10). ACM, New York, NY, USA, , Article 4 , 12 pages. DOI=10.1145/1833398.1833402
http://doi.acm.org/10.1145/1833398.1833402
Jia Yu and Rajkumar Buyya. 2005. A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34,
3 (September 2005), 44-49. DOI=10.1145/1084805.1084814 http://doi.acm.org/10.1145/1084805.1084814
6. Hierarchical view of workflows (1)
mProject1Service.java
public void mProject1() {
}
mProject1Service.java
public void mProject1() {
}
WorkflowWorkflow
A();A();
<parallel>
</parallel>
<parallel>
</parallel>
Workflow Region nWorkflow Region n
Activity mActivity m
Invoked Application mInvoked Application m
Code
Region 1
Code
Region 1
Code
Region q
Code
Region q
Code
Region …
Code
Region …
<activity name="mProject1">
<executable name="mProject1"/>
</activity>
<activity name="mProject1">
<executable name="mProject1"/>
</activity>
<activity name="mProject2">
<executable name="mProject2"/>
</activity>
<activity name="mProject2">
<executable name="mProject2"/>
</activity>
while () {
...
}
while () {
...
}
Hong Linh Truong, Schahram Dustdar, Thomas Fahringer:
Performance metrics and ontologies for Grid workflows. Future
Generation Comp. Syst. 23(6): 760-772 (2007)
Hong Linh Truong, Schahram Dustdar, Thomas Fahringer:
Performance metrics and ontologies for Grid workflows. Future
Generation Comp. Syst. 23(6): 760-772 (2007)
ASE Summer 2014 6
7. Representing and programming
data analytics workflows
Programming languages
General- and specific-purpose programming
languages, such as Java, Python, Swift
Programming models, such as MapReduce, Hadoop,
Complex event processing
Descriptive languages
BPEL and several languages designed for specific
workflow engines
They can also be combined
7ASE Summer 2014
8. Data analytics workflow execution
models
ASE Summer 2014 8
Data analytics
workflows
Data analytics
workflows Execution EngineExecution Engine
Local SchedulerLocal Scheduler
jobjob jobjob jobjob jobjob
Web
serviceWeb
serviceWeb
service
Web
service
People
9. Data analytics workflow execution
models
ASE Summer 2014 9
Data analytics
workflows
Data analytics
workflows
Execution EngineExecution Engine
Service
unit
Local
input
data
Analytics
Results
Web service
MapReduce/Hadoop
Sub-Workflow
MPI
Other solutions
Servers/Cloud/Cluster
How data is
transferred among
service units?
How data is
transferred among
service units?
10. Examples of systems and
frameworks for data analytics
workflows
ASE Summer 2014 10
ASKALONASKALON
KEPLERKEPLER
TAVERNATAVERNA
TRIDENTTRIDENT
Apache ODE +
WS-BPEL
Apache ODE +
WS-BPEL
PegasusPegasus
JOperaJOperaADEPTADEPT
MapReduce/HadoopMapReduce/Hadoop
SwiftSwiftRR
11. Some examples (1)
ASE Summer 2014 11
Source: Gideon Juve, Ewa Deelman, G. Bruce Berriman, Benjamin P. Berman, Philip Maechling: An Evaluation of the
Cost and Performance of Scientific Workflows on Amazon EC2. J. Grid Comput. 10(1): 5-21 (2012)
Source: Gideon Juve, Ewa Deelman, G. Bruce Berriman, Benjamin P. Berman, Philip Maechling: An Evaluation of the
Cost and Performance of Scientific Workflows on Amazon EC2. J. Grid Comput. 10(1): 5-21 (2012)
12. Some examples (2)
ASE Summer 2014 12
Source: http://www.dps.uibk.ac.at/projects/brokerage/Source: http://www.dps.uibk.ac.at/projects/brokerage/
13. Some examples (3)
ASE Summer 2014 13
Source: Cesare Pautasso, Thomas Heinis, Gustavo Alonso: JOpera: Autonomic Service
Orchestration. IEEE Data Eng. Bull. 29(3): 32-39 (2006)
Source: Cesare Pautasso, Thomas Heinis, Gustavo Alonso: JOpera: Autonomic Service
Orchestration. IEEE Data Eng. Bull. 29(3): 32-39 (2006)
14. Some examples (4)
ASE Summer 2014 14
Source: Sudipto Das, Yannis Sismanis, Kevin S. Beyer, Rainer Gemulla, Peter J. Haas, and John McPherson. 2010.
Ricardo: integrating R and Hadoop. In Proceedings of the 2010 ACM SIGMOD International Conference on Management
of data (SIGMOD '10). ACM, New York, NY, USA, 987-998. DOI=10.1145/1807167.1807275
http://doi.acm.org/10.1145/1807167.1807275
Source: Sudipto Das, Yannis Sismanis, Kevin S. Beyer, Rainer Gemulla, Peter J. Haas, and John McPherson. 2010.
Ricardo: integrating R and Hadoop. In Proceedings of the 2010 ACM SIGMOD International Conference on Management
of data (SIGMOD '10). ACM, New York, NY, USA, 987-998. DOI=10.1145/1807167.1807275
http://doi.acm.org/10.1145/1807167.1807275
15. Elastic provisioning for workflows
With cloud computing we can
Provision a computing system
E.g., a virtual cloud-based cluster
Provision a workflow execution platform
E.g., batch job workflow engine, Hadoop runtime
Deploy and execute workflows
elastic!
ASE Summer 2014 15
16. WHY DO WE NEED TO KNOW THE HIERARCHICAL
STRUCTURES WELL?
ASE Summer 2014 16
WHICH ASPECTS ARE WELL ADDRESSED W.R.T.
„DATA/SERVICE CONCERNS“
17. 17
Hong Linh Truong, Peter Brunner, Vlad Nae, Thomas Fahringer: DIPAS: A distributed performance analysis service for
grid service-based workflows. Future Generation Comp. Syst. 25(4): 385-398 (2009)
Hong Linh Truong, Peter Brunner, Vlad Nae, Thomas Fahringer: DIPAS: A distributed performance analysis service for
grid service-based workflows. Future Generation Comp. Syst. 25(4): 385-398 (2009)
Well-addressed concerns --
performance
ASE Summer 2014 17
18. Well-addressed concerns –
performance/cost
ASE Summer 2014 18
Source: David Chiu, Sagar Deshpande, Gagan Agrawal, Rongxing Li: Cost and accuracy sensitive dynamic workflow
composition over grid environments. GRID 2008: 9-16
Source: David Chiu, Sagar Deshpande, Gagan Agrawal, Rongxing Li: Cost and accuracy sensitive dynamic workflow
composition over grid environments. GRID 2008: 9-16
19. QUALITY OF DATA IN DATA
ANALYTICS WORKFLOWS
ASE Summer 2014 19
20. Performance and Data Quality
Aspects
20
Data Analytics
Data in
Data out
Executed on
Analytics
Models
uses
Execution time?
Performance Overhead?
Memory Consumption?
Is the data good
enough?
How bad data
impacts on
performance?
Is the data good enough
to be stored and shared?
Data quality metrics and models are
strongly domain-specific
Data quality metrics and models are
strongly domain-specific
Which models should be
used?
ASE Summer 2014 20
21. WHY QOD FOR DATA ANALYTICS
WORKFLOW IS IMPORTANT?
ASE Summer 2014 21
22. Very little support
Qurator workbench
“Personal quality models” can be expressed and
embedded into query processors or workflows.
Assume that quality evidence is presented
Kepler
A data quality monitor allows user to specify quality
thresholds.
Expect that rules can be used to control the execution
based on quality.
ASE Summer 2014 22
P Missier, S M Embury, M Greenwood, A D Preece, & B Jin, Managing Information Quality in e-Science: the Qurator
Workbench, Proc ACM International Conference on Management of Data (SIGMOD 2007), ACM Press, pages 1150-
1152, 2007.
Aisa Na’im, Daniel Crawl,Maria Indrawan, Ilkay Altintas, and Shulei Sun. Monitoring data quality in kepler. In Salim Hariri
and Kate Keahey, editors, HPDC, pages 560–564. ACM, 2010.
P Missier, S M Embury, M Greenwood, A D Preece, & B Jin, Managing Information Quality in e-Science: the Qurator
Workbench, Proc ACM International Conference on Management of Data (SIGMOD 2007), ACM Press, pages 1150-
1152, 2007.
Aisa Na’im, Daniel Crawl,Maria Indrawan, Ilkay Altintas, and Shulei Sun. Monitoring data quality in kepler. In Salim Hariri
and Kate Keahey, editors, HPDC, pages 560–564. ACM, 2010.
23. Research questions
What are main QoD metrics, what are the relationship between QoD
metrics and other service level objectives, and what are their roles
and possible trade-offs?
How to support different domain-specific QoD models and link them
to workflow structures?
How to model, evaluate and estimate QoD associated with data
movement into, within, and out to workflows? When and where
software or scientists can perform automatic or manual QoD
measurement and analysis
How to optimize the workflow composition and execution based on
QoD specification?
How does QoD impact on the provisioning of data services,
computational services and supporting services?
ASE Summer 2014 23
24. Approach
ASE Summer 2014 24
Core models, techniques and algorithms to allow
the modeling and evaluating QoD metrics
Core models, techniques and algorithms to allow
the modeling and evaluating QoD metrics
QoD-aware composition and executionQoD-aware composition and execution
QoD-aware service provisioning and
infrastructure optimization
QoD-aware service provisioning and
infrastructure optimization
27. HOW TO INTEGRATE QOD
EVALUATORS? AND WHICH CONCERNS
NEED TO BE CONSIDERED?
ASE Summer 2014 27
28. QoD metrics evaluation
Domain-specific metrics
Need specific tools and expertise for determining
metrics
Evaluation
Cannot done by software only: humans are required
Complex integration model
Where to put QoD evaluators and why?
How evaluators obtain the data to be evaluated?
Impact of QoD evaluation on performance of
data analytics workflows
ASE Summer 2014 28
29. WHAT KIND OF OPTIMIZATION CAN BE
DONE?
ASE Summer 2014 29
30. QoD-aware optimization for data
analytics workflows
Improving quality of results
Reducing analytics costs and time
Enabling early failure detection
Enabling elasticitiy of services provisioning
Enabling elastic data analytics support
Etc.
ASE Summer 2014 30
32. 32
QoD-aware simulation workflows
Michael Reiter, Hong Linh Truong, Schahram Dustdar, Dimka Karastoyanova, Robert Krause, Frank Leymann, Dieter
Pahr: On Analyzing Quality of Data Influences on Performance of Finite Elements Driven Computational Simulations.
Euro-Par 2012: 793-804
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A
Novel Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
Michael Reiter, Hong Linh Truong, Schahram Dustdar, Dimka Karastoyanova, Robert Krause, Frank Leymann, Dieter
Pahr: On Analyzing Quality of Data Influences on Performance of Finite Elements Driven Computational Simulations.
Euro-Par 2012: 793-804
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A
Novel Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
ASE Summer 2014
33. Hybrid resources needed for
quality evaluation
Challenges:
Subjective and objective evaluation
Long running processes
Our approach
Different QoD measurements
Human and software tasks
33ASE Summer 2014
34. 34
Evaluating quality of data in
workflows
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A
Novel Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
Michael Reiter, Uwe Breitenbücher, Schahram Dustdar, Dimka Karastoyanova, Frank Leymann, Hong Linh Truong: A
Novel Framework for Monitoring and Analyzing Quality of Data in Simulation Workflows. eScience 2011: 105-112
ASE Summer 2014
35. QoD Evaluator
Software-based QoD evaluators
Can be provided under libraries integrated into
invoked applications
Web services-based evaluators
Human-based QoD evaluators
Built based on the concept human-based services
Can be interfaces via Human-Task
Simple mapping at the moment
Human resources from clouds/crowds
ASE Summer 2014 35
36. Open issues: quality-of-result
(QoR) driven workflows
How to support QoR driven analytics?
Some basic steps
Conceptualize expected QoR
Associate the expected QoR with workflow activities
Use the expected QoR
to match/select underlying services (e.g., data sources,
cloud IaaS, etc
Utilize the expected QoR and the measured QoR
and apply elasticity principles for
Refine the workflow structure
Provision computation, network and data
ASE Summer 2014 36
37. Exercises
Read mentioned papers
Discuss pros and cons of descriptive languages
- and programming languages – based data
analytics workflows
Examine how QoD evaluators can be integrated
into different programming models for QoD-
aware data analytics workflows
Implement some QoD evaluators for Hadoop
Develop techniques for determining places
where QoD evaluators are inserted
ASE Summer 2014 37
38. 38
Thanks for
your attention
Hong-Linh Truong
Distributed Systems Group
Vienna University of Technology
truong@dsg.tuwien.ac.at
http://dsg.tuwien.ac.at/staff/truong
ASE Summer 2014