Programming Modes and Performance of Raspberry-Pi ClustersAM Publications
In present times, updated information and knowledge has become readily accessible to researchers, enthusiasts, developers, and academics through the Internet on many different subjects for wider areas of application. The underlying framework facilitating such possibilities is networking of servers, nodes, and personal computers. However, such setups, comprising of mainframes, servers and networking devices are inaccessible to many, costly, and are not portable. In addition, students and lab-level enthusiasts do not have the requisite access to modify the functionality to suit specific purposes. The Raspberry-Pi (R-Pi) is a small device capable of many functionalities akin to super-computing while being portable, economical and flexible. It runs on open source Linux, making it a preferred choice for lab-level research and studies. Users have started using the embedded networking capability to design portable clusters that replace the costlier machines. This paper introduces new users to the most commonly used frameworks and some recent developments that best exploit the capabilities of R-Pi when used in clusters. This paper also introduces some of the tools and measures that rate efficiencies of clusters to help users assess the quality of cluster design. The paper aims to make users aware of the various parameters in a cluster environment.
FutureGrid Computing Testbed as a ServiceGeoffrey Fox
Describes FutureGrid and its role as a Computing Testbed as a Service. FutureGrid is user-customizable, accessed interactively and supports Grid, Cloud and HPC software with and without VM’s. Lessons learnt and example use cases are described
‘Grids’areanapproachforbuildingdynamicallyconstructedproblem-solvingenvironmentsusing
geographically and organizationally dispersed,
high-performance computing and
data handling resources.
Gridsalsoprovideimportantinfrastructuresupportingmulti-institutionalcollaboration.
Open source grid middleware packages – Globus Toolkit (GT4) Architecture , Configuration – Usage of Globus – Main components and Programming model - Introduction to Hadoop Framework - Mapreduce, Input splitting, map and reduce functions, specifying input and output parameters, configuring and running a job – Design of Hadoop file system, HDFS concepts, command line and java interface, dataflow of File read & File write.
With distributed tracing, we can track requests as they pass through multiple services, emitting timing and other metadata throughout, and this information can then be reassembled to provide a complete picture of the application’s behavior at runtime - Read more in https://blog.buoyant.io/2016/05/17/distributed-tracing-for-polyglot-microservices/ and https://www.rookout.com/
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingJaime Martin Losa
An extended, in-depth tutorial explaining how to fully exploit the standard's unique communication capabilities.Presented at the OMG June 2013 Berlin Meeting.
Users upgrading to DDS from a homegrown solution or a legacy-messaging infrastructure often limit themselves to using its most basic publish-subscribe features. This allows applications to take advantage of reliable multicast and other performance and scalability features of the DDS wire protocol, as well as the enhanced robustness of the DDS peer-to-peer architecture. However, applications that do not use DDS's data-centricity do not take advantage of many of its QoS-related, scalability and availability features, such as the KeepLast History Cache, Instance Ownership and Deadline Monitoring. As a consequence some developers duplicate these features in custom application code, resulting in increased costs, lower performance, and compromised portability and interoperability.
This tutorial will formally define the data-centric publish-subscribe model as specified in the OMG DDS specification and define a set of best-practice guidelines and patterns for the design and implementation of systems based on DDS.
Presentation on OSGi Cloud Ecosystems (RFC 183) as given at EclipseCon Boston 2013. The RFC itself is available at http://www.osgi.org/Download/File?url=/download/osgi-early-draft-2013-03.pdf
Distributed Framework for Data Mining As a Service on Private CloudIJERA Editor
Data mining research faces two great challenges: i. Automated mining ii. Mining of distributed data.
Conventional mining techniques are centralized and the data needs to be accumulated at central location. Mining
tool needs to be installed on the computer before performing data mining. Thus, extra time is incurred in
collecting the data. Mining is 4 done by specialized analysts who have access to mining tools. This technique is
not optimal when the data is distributed over the network. To perform data mining in distributed scenario, we
need to design a different framework to improve efficiency. Also, the size of accumulated data grows
exponentially with time and is difficult to mine using a single computer. Personal computers have limitations in
terms of computation capability and storage capacity.
Cloud computing can be exploited for compute-intensive and data intensive applications. Data mining
algorithms are both compute and data intensive, therefore cloud based tools can provide an infrastructure for
distributed data mining. This paper is intended to use cloud computing to support distributed data mining. We
propose a cloud based data mining model which provides the facility of mass data storage along with distributed
data mining facility. This paper provide a solution for distributed data mining on Hadoop framework using an
interface to run the algorithm on specified number of nodes without any user level configuration. Hadoop is
configured over private servers and clients can process their data through common framework from anywhere in
private network. Data to be mined can either be chosen from cloud data server or can be uploaded from private
computers on the network. It is observed that the framework is helpful in processing large size data in less time
as compared to single system.
Programming Modes and Performance of Raspberry-Pi ClustersAM Publications
In present times, updated information and knowledge has become readily accessible to researchers, enthusiasts, developers, and academics through the Internet on many different subjects for wider areas of application. The underlying framework facilitating such possibilities is networking of servers, nodes, and personal computers. However, such setups, comprising of mainframes, servers and networking devices are inaccessible to many, costly, and are not portable. In addition, students and lab-level enthusiasts do not have the requisite access to modify the functionality to suit specific purposes. The Raspberry-Pi (R-Pi) is a small device capable of many functionalities akin to super-computing while being portable, economical and flexible. It runs on open source Linux, making it a preferred choice for lab-level research and studies. Users have started using the embedded networking capability to design portable clusters that replace the costlier machines. This paper introduces new users to the most commonly used frameworks and some recent developments that best exploit the capabilities of R-Pi when used in clusters. This paper also introduces some of the tools and measures that rate efficiencies of clusters to help users assess the quality of cluster design. The paper aims to make users aware of the various parameters in a cluster environment.
FutureGrid Computing Testbed as a ServiceGeoffrey Fox
Describes FutureGrid and its role as a Computing Testbed as a Service. FutureGrid is user-customizable, accessed interactively and supports Grid, Cloud and HPC software with and without VM’s. Lessons learnt and example use cases are described
‘Grids’areanapproachforbuildingdynamicallyconstructedproblem-solvingenvironmentsusing
geographically and organizationally dispersed,
high-performance computing and
data handling resources.
Gridsalsoprovideimportantinfrastructuresupportingmulti-institutionalcollaboration.
Open source grid middleware packages – Globus Toolkit (GT4) Architecture , Configuration – Usage of Globus – Main components and Programming model - Introduction to Hadoop Framework - Mapreduce, Input splitting, map and reduce functions, specifying input and output parameters, configuring and running a job – Design of Hadoop file system, HDFS concepts, command line and java interface, dataflow of File read & File write.
With distributed tracing, we can track requests as they pass through multiple services, emitting timing and other metadata throughout, and this information can then be reassembled to provide a complete picture of the application’s behavior at runtime - Read more in https://blog.buoyant.io/2016/05/17/distributed-tracing-for-polyglot-microservices/ and https://www.rookout.com/
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingJaime Martin Losa
An extended, in-depth tutorial explaining how to fully exploit the standard's unique communication capabilities.Presented at the OMG June 2013 Berlin Meeting.
Users upgrading to DDS from a homegrown solution or a legacy-messaging infrastructure often limit themselves to using its most basic publish-subscribe features. This allows applications to take advantage of reliable multicast and other performance and scalability features of the DDS wire protocol, as well as the enhanced robustness of the DDS peer-to-peer architecture. However, applications that do not use DDS's data-centricity do not take advantage of many of its QoS-related, scalability and availability features, such as the KeepLast History Cache, Instance Ownership and Deadline Monitoring. As a consequence some developers duplicate these features in custom application code, resulting in increased costs, lower performance, and compromised portability and interoperability.
This tutorial will formally define the data-centric publish-subscribe model as specified in the OMG DDS specification and define a set of best-practice guidelines and patterns for the design and implementation of systems based on DDS.
Presentation on OSGi Cloud Ecosystems (RFC 183) as given at EclipseCon Boston 2013. The RFC itself is available at http://www.osgi.org/Download/File?url=/download/osgi-early-draft-2013-03.pdf
Distributed Framework for Data Mining As a Service on Private CloudIJERA Editor
Data mining research faces two great challenges: i. Automated mining ii. Mining of distributed data.
Conventional mining techniques are centralized and the data needs to be accumulated at central location. Mining
tool needs to be installed on the computer before performing data mining. Thus, extra time is incurred in
collecting the data. Mining is 4 done by specialized analysts who have access to mining tools. This technique is
not optimal when the data is distributed over the network. To perform data mining in distributed scenario, we
need to design a different framework to improve efficiency. Also, the size of accumulated data grows
exponentially with time and is difficult to mine using a single computer. Personal computers have limitations in
terms of computation capability and storage capacity.
Cloud computing can be exploited for compute-intensive and data intensive applications. Data mining
algorithms are both compute and data intensive, therefore cloud based tools can provide an infrastructure for
distributed data mining. This paper is intended to use cloud computing to support distributed data mining. We
propose a cloud based data mining model which provides the facility of mass data storage along with distributed
data mining facility. This paper provide a solution for distributed data mining on Hadoop framework using an
interface to run the algorithm on specified number of nodes without any user level configuration. Hadoop is
configured over private servers and clients can process their data through common framework from anywhere in
private network. Data to be mined can either be chosen from cloud data server or can be uploaded from private
computers on the network. It is observed that the framework is helpful in processing large size data in less time
as compared to single system.
Similar to Session 37 - Intro to Workflows, API's and semantics (20)
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Overview on Edible Vaccine: Pros & Cons with Mechanism
Session 37 - Intro to Workflows, API's and semantics
1. Introduction to Workflows, APIs and
Semantics
Session 37. July 13th, 2009
Oscar Corcho
(Universidad Politécnica de Madrid)
Based on slides from all the presenters in the following two days
Work distributed under the license Creative Commons
Attribution-Noncommercial-Share Alike 3.0
2. Themes of the Second Week
Date Theme Technology
Mon 13 July How to solve my problem?
Tue 14 July Higher level APIs: OGSA-DAI, SAGA and SAGA,
metadata management OGSA-DAI,
Grid SAM
Wed 15 July Workflows P-GRADE,
Semantic
Metadata
Thu 16 July Integrating Practical All
Fri 17 July Cloud Computing (lecture)
3. Principles of job Principles of high- Principles of Principles of
submission and throughput service-oriented distributed data
execution computing architecture management
management
Principles of using Higher level APIs: Workflows
distributed and OGSA-DAI, SAGA
high performance and metadata
systems management
4. Motivation
• Grids are:
– Dynamic:
• Version, updates, new resources...
– Heterogenous:
• Operating Systems, Libraries, software stack
• Middleware service versions and semantics
• Administrative policies – access, usage, upgrade
– Complex:
• Production level service with high QoS non-trivial
• Derived from above as well as inherently
• As described by Steven this morning, operating Grids is still an
effort-consuming task, and it is still somehow difficult to develop,
program & deploy Grid applications using the existing Grid
middleware
• But as you have also seen during last week (and in Morris’
presentation today), there are many commonalities among
heterogeneous middleware
5. In general…
• As described by Steven this morning, operating Grids is still an
effort-consuming task, and it is still somehow difficult to develop,
program & deploy Grid applications using the existing Grid
middleware
• But as you have also seen during last week (and in Morris’
presentation today), there are many commonalities among
heterogeneous middleware
• There is a need of:
– Programmatic approaches that provide common grid functionality at a
correct level of abstraction for applications
– Ability to hide underlying complexity of infrastructure, varying semantics,
heterogeneity and changes from the application-developer
– Improved data access and integration mechanisms
– Traceable, repeatable analyses of e-Science experiments
– Graphical modelling languages for the ease of Grid application
development
6. e-Science Approach Interoperability
• Increasing complexity of e-science applications that embrace
multiple physical models (i.e. multi-physics) & larger scale
– Creating a steadily growing demand of compute power
– Demand for a ‘United Federation of world-wide Grids’
III. Complex IV. Interactive
Workflows Access
II. Scientific
Application plug-ins
V. Interoperability
Grid Middleware
I. Simple Scripts & Grid
Control other Grid
type
[2] Morris Riedel et al., ‘Classification of Different Approaches for
Balatonfüred, Hungary, 6th-18th July 2008
e-Science Applications in Next Generation Infrastructures, Int. Conference on e-Science 2008, Indianapolis, Indiana
7. SAGA one-slide summary
• Simple API for Grid Application – SAGA
– Provide simple and usable programmatic interface that can be widely-adopted for
the development of applications for the grid
– Simplicity (80:20 restricted scope)
• easy to use, install, administer and maintain
– Uniformity
• provides support for different application programming languages as well as
consistent semantics and style for different Grid functionality
– Scalability
• Contains mechanisms for the same application (source) code to run on a
variety of systems ranging from laptops to HPC resources
– Genericity
• adds support for different grid middleware
– Modularity
• provides a framework that is easily extendable
• SAGA is not…
– Middleware
– Service management interface
– Does not hide the resources - remote files, job (but the details)
10. Example: Copy a File (SAGA)
#include <string>
#include <saga/saga.hpp>
void copy_file(std::string source_url, std::string target_url)
{
try {
saga::file f(source_url);
f.copy(target_url);
}
Text
catch (saga::exception const &e) {
std::cerr << e.what() << std::endl;
}
}
The interface is simple and the actual function calls remain the same
11. Workflow one-slide summary
• Build distributed applications through orchestration of multiple
services
– Allows to compose larger applications from individual application
components
– The components can be independent or connected by some control
flow/ data flow dependencies.
– Scaled up execution over several computational resources
• Integration of multiple teams involved (collaborative work)
• Unit of reusage: e-science requires traceable, repetable analysis
– Provide automation: Reproducibility of scientific analyses and processes
is at the core of the scientific method
– Support easy analysis modifications
– Sharing workflows is an essential element of education, and
acceleration of knowledge dissemination.”
– Allows capture and generation of provenance information
• Ease the use of grids: graphical representation
– Capture individual data transformation and analysis steps
NSF Workshop on the Challenges of Scientific Workflows, 2006, www.isi.edu/nsf-workflows06
Y. Gil, E. Deelman et al, Examining the Challenges of Scientific Workflows. IEEE Computer, 12/2007
12. Workflow
• The automation of a business process, in whole or part, during
which documents, information or tasks are passed from one
participant to another for action, according to a set of procedural
rules to achieve, or contribute to, an overall business goal.
Workflow Reference Model, 19/11/1998
www.wfmc.org
• Workflow management system (WFMS) is the software that does it
13. What does a typical Grid WFMS provide?
• A level of abstraction above grid processes
– gridftp, lcg-cr, lfc-mkdir, ...
– condor-submit, globus-job-run, glite-wms-job-submit, ...
– lcg-infosites, ...
• A level of abstraction above “legacy processes”
– SQL read/write
– HTTP file transfer, …
• Mapping and execution of tasks grid resources
– Submission of jobs
– Invocation of (Web) services
– Manage data
– Catalog intermediate and final data products
• Improve successful application execution
• Improve application performance
• Provide provenance tracking capabilities
http://www.gridworkflow.org/
14. What does a typical Grid WFMS provide?
Abstract Workflow Executable Workflow
Describes your workflow at a Describes your workflow in
logical level terms of physical files and
paths
Site Independent Site Specific
Captures just the computation Has additional jobs for data
that the user wants to do movement etc.
Source: Jia Yu and Rajkumar Buyya: A Taxonomy of Workflow Management Systems for Grid Computing,
Journal of Grid Computing, Volume 3, Numbers 3-4 / September, 2005
15. What does a typical workflow consist of?
• Dataflow graph
• Activities
– Definition of Jobs
– Specification of services
• Data channels
– Data transfer
– Coordination
• Cyclic (DAG) /acyclic
• Conditional statements
16. Workflow Lifecycle
Workflow
Reuse Creation
and
Component
Libraries Data,
Data
Products Metadata
Catalogs
Populate
Adapt, Workflow
with data
Modify Template
Workflow
Data, Metadata,
Instance
Provenance
Information
Executable Map to
Scheduling/ Execute Workflow available Resource,
Execution resources Application
Component
Compute, Descriptions
Storage
Distributed and Mapping
Network
Resources
17. Data lifecycle in workflows
Metadata Catalogs
Workflow Creation
Data Discovery
Workflow Reuse
Component Libraries
al
d
D
anc ata an
chiv
ata A n
e Ar
Pro rived D
Data Lifecycle
alys is S
in a Workflow Environment
v en
Provenance Catalogs
De
etup
Workflow Template Libraries
Workflow Mapping and
Data Processing
Execution
Data Movement Services Data Replica Catalogs
Software Catalogs
Workflow Execution
18. P-GRADE one-slide summary
• P-GRADE portal desiderata
– Hide the complexity of the underlying grid middlewares
– Provide a high-level graphical user interface that is easy-to-use for e-
scientists
– Support many different grid programming approaches:
• Simple Scripts & Control (sequential and MPI job execution)
• Scientific Application Plug-ins (based on GEMLCA)
• Complex Workflows
• Parameter sweep applications: both on job and workflow level
• Interoperability: transparent access to grids based on different
middleware technology
– Support three levels of parallelism
• History
– Started in the Hungarian SuperComputing Grid project in 2003
– http://portal.p-grade.hu/
– https://sourceforge.net/projects/pgportal/
20. Data access and integration
Researcher wants to obtain
specified data from multiple
distributed data sources and
to supply the result to a
process and then view its
output.
1 Researcher formulates query
2 Researcher submits query
3 Query system transforms and distributes query
4 Data services send back local results
5 Query system combines these to form requested data
6 Query system sends data to process
7 Process system sends derived data to researcher
21. OGSA-DAI one-slide summary
• Enable the sharing of data resources to support:
– Data access - access to structured data in distributed heterogeneous
data resources.
– Data transformation e.g. expose data in schema X to users as data in
schema Y.
– Data integration e.g. expose multiple databases to users as a single
virtual database
– Data delivery - delivering data to where it's needed by the most
appropriate means e.g. web service, e-mail, HTTP, FTP, GridFTP
• History
– Started in February 2002 as part of the UK e-Science Grid Core
Program
– Part of OMII-UK, a partnership between:
• OMII, The University of Southampton
• myGrid, The University of Manchester
• OGSA-DAI, The University of Edinburgh
22. Motivation for Streaming
• Data movement is expensive
– Bandwidth on and off chip may be scarcest resource
• Streaming can avoid data movement
– Eliminating transfers to and from temporary stores
– Pushing selectivity and derivation towards data sources
– Earlier computation termination decisions
• Streaming can reduce elapsed time
– Pipelines of transformations overlap computation time
– When co-located can pass on data via caches
• Streaming is scalable
– Avoids locally assembling complete data sets
– Sometimes this cannot be avoided
• Some data sources and consumers inherently streamed
• Permits light-weight composition and requires optimisation
23. OGSA-DAI Generic web services
Relational
• Manipulate data using OGSA-
Database DAI’s generic web services
• Clients see the data in its ‘raw’
format, e.g.
XML
Database – Tables, columns, rows for
relational data
– Collections, elements etc. for
Indexed XML data
File
• Clients can obtain the schema
of the data
• Clients send queries in
appropriate query language,
Relational
e.g. SQL, XPath
Database
request
XML
OGSA-DAI Database
data
Indexed
File
24. OGSA-DAI Workflows
• Pipeline, Sequence, Parallel workflows
• Composed of activities
• Reduces data transfers and web service calls
25. Metadata Management: A Satellite Scenario
Space
Segment
SATELLITE FILES:
Ground
DMOP files
Segment
Product files
25
27. Metadata can be present in file names…
Namefile (Product):
RA2_MW__1PNPDK20060201_120535_0000000
62044_00424_20518_0349.N1"
Corresponds to:
27
28. …and in file headers
FILE ; DMOP (generated by FOS Mission Planning System)
RECORD fhr RECORD ID
FILENAME="DMOP_SOF__VFOS20060124_103709_00000000_00001215_20060131_01
4048_20060202_035846.N1"
DESTINATION="PDCC"
PHASE_START=2
CYCLE_START=44
REL_START_ORBIT=404 RECORD parameters
ABS_START_ORBIT=20498
ENDRECORD fhr
................................
RECORD dmop_er
RECORD dmop_er_gen_part
RECORD gen_event_params
RECORD parameters
EVENT_TYPE=RA2_MEA
corresponding to other RECORD
EVENT_ID="RA2_MEA_00000000002063"
structure.
NB_EVENT_PR1=1
NB_EVENT_PR3=0
ORBIT_NUMBER=20521
ELAPSED_TIME=623635
DURATION=41627862
ENDRECORD gen_event_params
ENDRECORD dmop_er
ENDLIST all_dmop_er
ENDFILE
29. Metadata can be exposed
• Metadata deserves a better treatment
– In most cases it appears together with files or other resources
– It is difficult to deal with
– What about trying to query about all the files that deal with instrument X
and where the information was taken from time T1 to T2?
Our goal:
Let’s make metadata a FIRST-CLASS CITIZEN in our systems
And let’s make it FLEXIBLE to changes
30. Metadata in Workflows
ID MURA_BACSU STANDARD; PRT; 429 AA.
DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE
DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE
DE ENOLPYRUVYL TRANSFERASE) (EPT).
GN MURA OR MURZ.
OS BACILLUS SUBTILIS.
OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;
OC BACILLUS.
KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.
FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY).
FT CONFLICT 374 374 S -> A (IN REF. 3).
SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32;
MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI
GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP
RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT
IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
31. Workflow Lifecycle
Workflow
Reuse
and
Component
Libraries Data,
Data
Products Metadata
Catalogs
Populate
Adapt, Workflow
with data
Modify Template
Workflow
Data, Metadata,
Instance
Provenance
Information
Executable Map to
Execute Workflow available Resource,
resources Application
Component
Compute, Descriptions
Storage
and
Network
Resources
32. Metadata and workflows
• Metadata for describing workflow entities
– What is the value added of a given workflow?
– What is the task a given service performs?
– What are the services that can be associated with a
processor?
• Metadata for describing workflow provenance
– How did the execution of a given workflow go?
– What this the semantics of a data product?
– How many invocations of a given service failed?
33. Some metadata about a workflow
Reference Ontology1
Metadata content
RDF annotations
A scientific workflow
Reference Ontology2
Social Tags annotations
Reference
Controlled vocabulary
Free-text annotations
35. Metadata is everywhere
• We can attach metadata almost to anything
– Events, notifications, logs
– Services and resources
– Schemas and catalogue entries
– People, meetings, discussions, conference talks
– Scientific publications, recommendations, quality comments
– Models, codes, builds, workflows,
– Data files and data streams
– Sensors and sensor data
• But..., what do we mean by metadata???
36. What is the metadata of this HTML fragment?
Based on Dublin Core
The contributor and creator is the flight booking service “www.flightbookings.com”.
The date would be January 1st, 2003, in case that the HTML page has been generated on that
specific date.
The description would be something like “flight details for a travel between Madrid and Seattle via
Chicago on February 8th, 2004”.
The document format is “HTML”.
The document language is “en”, which stands for English
Based on thesauri
Madrid is a reference to the term with ID 7010413 in the
thesaurus, which refers to the city of Madrid in Spain.
Spain is a reference to the term with ID 1000095, which refers to
the kingdom of Spain in Europe.
Chicago is a reference to the term with ID 7013596, which refers
to the city of Chicago in Illinois, US.
United States of America is a reference to the term “United
States” with ID 7012149, which refers to the US nation.
Seattle is a reference to the term with ID 7014494, which refers
to the city of Seattle in Washington, US.
Based on ontologies
Concept instances relate a part of the document to one or several concepts in an ontology. For example, “Flight details” may
represent an instance of the concept Flight, and can be named as AA7615_Feb08_2003, although concept instances do not
necessarily have a name.
Attribute values relate a concept instance with part of the document, which is the value of one of its attributes. For example,
“American Airlines” can be the value of the attribute companyName.
Relation instances that relate two concept instances by some domain-specific relation. For example, the flight
AA7615_Feb08_2003 and the location Madrid can be connected by the relation departurePlace
37. Need to Add “Semantics”
• External agreement on meaning of annotations
– E.g., Dublin Core for annotation of library/bibliographic information
• Use Ontologies to specify meaning of annotations
– Ontologies provide a vocabulary of terms, plus
– a set of explicit assumptions regarding the intended meaning of the
vocabulary.
• Almost always including concepts and their classification
• Almost always including properties between concepts
• Similar to an object oriented model
– Meaning (semantics) of terms is formally specified
– Can also specify relationships between terms in multiple ontologies
• Thus, an ontology describes a formal specification of a certain
domain:
– Shared understanding of a domain of interest
– Formal and machine manipulable model of a domain of interest
39. Summary
• From the lower level of abstraction…
– Difficulties to develop, program & deploy Grid applications using the
existing Grid middleware
• To a higher level of abstraction:
– High-level APIs and metadata management
• Programmatic approaches that provide common grid functionality at
a correct level of abstraction for applications
• Ability to hide underlying complexity of infrastructure, varying
semantics, heterogeneity and changes from the application-
developer
– Improved data access and integration mechanisms
– Workflow management
• Traceable, repetable analyses of e-Science experiments
• Graphical modelling languages for the ease of Grid application
development
40. Introduction to Workflows, APIs and
Semantics
Session 37. July 13th, 2009
Oscar Corcho
(Universidad Politécnica de Madrid)
Based on slides from all the presenters in the following two days
Work distributed under the license Creative Commons
Attribution-Noncommercial-Share Alike 3.0