SlideShare a Scribd company logo
A Service-Oriented Architecture 
for Collaborative Workflow 
Development and 
Experimentation 
eHumanities Seminar 2012 
University of Leipzig 
10-10-2012 
Clemens Neudecker, KB @cneudecker 
Zeki Mustafa Dogan, SUB-DL 
Sven Schlarb, ÖNB @SvenSchlarb 
Juan Garcés, GCDH @juan_garces
Idea 
• Provide web-based versions of tools 
(web services) 
• Package web services, data and 
documentation into ready-to-run 
“components” (encapsulation) 
• Chain the components to create workflows 
via drag-and-drop operation 
• Share and use workflows to re-run 
experiments and to demonstrate results
Background 
• High degree of diversity in research topics, 
but also tools and frameworks being used 
• Technical resources should be easy to 
use, well documented, accessible from 
anywhere 
• Prevent re-inventing of the wheel
Requirements 
• Interoperability = connect different resources 
• Flexibility = easy to deploy and adapt 
• Modularity = allow different combinations of tools 
• Usability = simple to use for non-technical users 
• Re-usability = easy to share with others 
• Scalability = apt for large-scale processing 
• Sustainability = resources simple to preserve 
• Transparency = tools evaluated separately 
• Distributed development and deployment
Interoperability Framework (IIF) 
• Modules: 
- Java Wrapper for command line tools 
- Web Services (incl. format converters) 
- Taverna Workflow Engine 
- Client interfaces 
- Repository connectors
Sources 
https://github.com/impactcentre/interoperability-framework
IIF Command Line Wrapper 
• Java project, builds using Maven2 
• Creates a web service project from 
a given tool description (XML) 
• Web service exposes SOAP & REST 
endpoints and Java API interface 
• Requirements: command line call, 
no direct user interaction
IIF Web Services 
• Web services are described by a WSDL 
• Input/output data structures 
• Data is referenced by URL 
• Annotations 
• Default values
REST
SOAP
IIF Workflows 
• What is a workflow? (Yahoo Pipes, etc.) 
• Different kinds of workflows: for a single 
command, application, chain of processes 
• Main benefit: Encapsulation, Reuse 
• Workflows as “components”: include link 
to WS endpoint, sample input data and 
documentation = ready-to-use resource 
• Web 2.0 workflow registry: myExperiment
Why workflows? 
• “In-silico experimentation” 
• Good structuring of experiment setup: 
– Challenge/Research question 
– Dataset definition 
– Processing with algorithms 
– Evaluation/Provenance 
– Presentation of results 
• All this can be modelled into a workflow
Integration into Taverna 
• Web Services (SOAP and REST) 
• Command line tools (SH and SSH) 
• Beanshells (can import Java libraries) 
• R (statistics) 
• Excel, CSV 
• Additional service types can be added 
through dedicated plug-ins
Taverna flavours 
• Workbench – local GUI client for Linux, 
Windows, OSX 
• Command line tool – run workflows from 
the command line 
• Server – Webapp with REST API and 
Java/Ruby client libs 
• Web-Wf-Designer – Javascript version for 
designing workflows in a browser
Workbench
Webapp
Workflow registry
Client interfaces 
• Web service client: create a simple HTML 
form from a given web service description 
• Taverna client: create a simple HTML form 
from a given Taverna workflow description 
 integration into production and 
presentation environments via iframes
WS-client
T2-client
Repositories 
• Accessible via web service API 
– Fedora Commons 
– WebDAV 
– PRImA
Architecture
Examples 
• Use case 1: OCR (IMPACT) 
• Start: Images (scanned documents) 
• Processing: OCR, NLP, Evaluation 
• Result: Full text, Entities, Sentiments
Examples 
• Use case 2: Preservation (SCAPE) 
• Start: Document collection preparation 
• Processing: Hadoop, Hive 
• Result: Statistics
Reading image metadata 
Jp2PathCreator HadoopStreamingExiftoolRead 
find 
/NAS/Z119585409/00000001.jp2 
/NAS/Z119585409/00000002.jp2 
/NAS/Z119585409/00000003.jp2 
… 
/NAS/Z117655409/00000001.jp2 
/NAS/Z117655409/00000002.jp2 
/NAS/Z117655409/00000003.jp2 
… 
/NAS/Z119585987/00000001.jp2 
/NAS/Z119585987/00000002.jp2 
/NAS/Z119585987/00000003.jp2 
… 
/NAS/Z119584539/00000001.jp2 
/NAS/Z119584539/00000002.jp2 
/NAS/Z119584539/00000003.jp2 
… 
/NAS/Z119599879/00000001.jp2l 
/NAS/Z119589879/00000002.jp2 
/NAS/Z119589879/00000003.jp2 
... 
... 
NAS 
reading files from NAS 
1,4 GB 1,2 GB 
: ~ 5 h + ~ 38 h = ~ 43 h 
60.000 books 
24 Million pages 
Z119585409/00000001 2345 
Z119585409/00000002 2340 
Z119585409/00000003 2543 
… 
Z117655409/00000001 2300 
Z117655409/00000002 2300 
Z117655409/00000003 2345 
… 
Z119585987/00000001 2300 
Z119585987/00000002 2340 
Z119585987/00000003 2432 
… 
Z119584539/00000001 5205 
Z119584539/00000002 2310 
Z119584539/00000003 2134 
… 
Z119599879/00000001 2312 
Z119589879/00000002 2300 
Z119589879/00000003 2300 
...
HtmlPathCreator SequenceFileCreator 
find 
/NAS/Z119585409/00000707.html 
/NAS/Z119585409/00000708.html 
/NAS/Z119585409/00000709.html 
… 
/NAS/Z138682341/00000707.html 
/NAS/Z138682341/00000708.html 
/NAS/Z138682341/00000709.html 
… 
/NAS/Z178791257/00000707.html 
/NAS/Z178791257/00000708.html 
/NAS/Z178791257/00000709.html 
… 
/NAS/Z967985409/00000707.html 
/NAS/Z967985409/00000708.html 
/NAS/Z967985409/00000709.html 
… 
/NAS/Z196545409/00000707.html 
/NAS/Z196545409/00000708.html 
/NAS/Z196545409/00000709.html 
... 
Z119585409/00000707 
Z119585409/00000708 
Z119585409/00000709 
Z119585409/00000710 
Z119585409/00000711 
Z119585409/00000712 
NAS 
reading files from NAS 
1,4 GB 997 GB (uncompressed) 
: ~ 5 h + ~ 24 h = ~ 29 h 
60.000 books 
24 Million pages 
Sequence file creation
Z119585409/00000001 
Z119585409/00000002 
Z119585409/00000003 
Z119585409/00000004 
Z119585409/00000005 
HTML parsing 
HadoopAvBlockWidthMapReduce 
... 
: ~ 6 h 
60.000 books 
24 Million pages 
Z119585409/00000001 2100 
Z119585409/00000001 2200 
Z119585409/00000001 2300 
Z119585409/00000001 2400 
Z119585409/00000002 2100 
Z119585409/00000002 2200 
Z119585409/00000002 2300 
Z119585409/00000002 2400 
Z119585409/00000003 2100 
Z119585409/00000003 2200 
Z119585409/00000003 2300 
Z119585409/00000003 2400 
Z119585409/00000004 2100 
Z119585409/00000004 2200 
Z119585409/00000004 2300 
Z119585409/00000004 2400 
Z119585409/00000005 2100 
Z119585409/00000005 2200 
Z119585409/00000005 2300 
Z119585409/00000005 2400 
Z119585409/00000001 2250 
Z119585409/00000002 2250 
Z119585409/00000003 2250 
Z119585409/00000004 2250 
Z119585409/00000005 2250 
Map Reduce 
SequenceFile Textfile
Analytic Queries 
CREATE TABLE htmlwidth 
(hid STRING, hwidth INT) 
: ~ 6 h 
60.000 books 
24 Million pages 
HiveLoadExifData & HiveLoadHocrData 
htmlwidth 
hid hwidth 
Z119585409/00000001 1870 
Z119585409/00000002 2100 
Z119585409/00000003 2015 
Z119585409/00000004 1350 
Z119585409/00000005 1700 
jp2width 
jid jwidth 
Z119585409/00000001 2250 
Z119585409/00000002 2150 
Z119585409/00000003 2125 
Z119585409/00000004 2125 
Z119585409/00000005 2250 
Z119585409/00000001 1870 
Z119585409/00000002 2100 
Z119585409/00000003 2015 
Z119585409/00000004 1350 
Z119585409/00000005 1700 
Z119585409/00000001 2250 
Z119585409/00000002 2150 
Z119585409/00000003 2125 
Z119585409/00000004 2125 
Z119585409/00000005 2250 
CREATE TABLE jp2width 
(hid STRING, jwidth INT)
Analytic Queries 
HiveSelect 
jp2width htmlwidth 
jid jwidth 
Z119585409/00000001 2250 
Z119585409/00000002 2150 
Z119585409/00000003 2125 
Z119585409/00000004 2125 
Z119585409/00000005 2250 
: ~ 6 h 
60.000 books 
24 Million pages 
hid hwidth 
Z119585409/00000001 1870 
Z119585409/00000002 2100 
Z119585409/00000003 2015 
Z119585409/00000004 1350 
Z119585409/00000005 1700 
jid jwidth hwidth 
Z119585409/000000 
2250 1870 
01 
Z119585409/000000 
02 
2150 2100 
Z119585409/000000 
03 
2125 2015 
Z119585409/000000 
04 
2125 1350 
Z119585409/000000 
05 
2250 1700 
select jid, jwidth, hwidth from jp2width inner join htmlwidth on jid = hid
Examples 
• Use case 3: Curation (GDZ) 
• Start: Get documents from repository 
• Processing: Enrichment 
(OCR, Entities, GeoNames) 
• Result: Online presentation
ROPEN 
(= Resource Oriented Presentation ENvironment)
Scalability 
• Multiple options: 
- Service parallelization 
- Cloud 
- Grid 
- Hadoop
Compatibility 
• Taverna  UIMA 
• Taverna  Galaxy 
• Taverna  Kepler 
• Taverna  Weblicht 
• Taverna  Seasr
But… 
• Multi-layered approach increases 
complexity (debugging, maintenance) 
• Diverse set of endpoints (OS, CPU, etc.) 
• Multiple dependencies 
• Shared responsibilities 
• Authentication & Authorization 
• Error handling / Fail-over / Monitoring
Demo(s)
Discussion 
• Potential/use cases DH? 
• Tools/features to make available? 
• Questions, comments or remarks?
Thank you!

More Related Content

What's hot

ELK introduction
ELK introductionELK introduction
ELK introduction
Waldemar Neto
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
Robert Sanders
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring Tools
Phase2
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Kaxil Naik
 
Open Source Logging and Metric Tools
Open Source Logging and Metric ToolsOpen Source Logging and Metric Tools
Open Source Logging and Metric Tools
Phase2
 
Monitoramento com ELK - Elasticsearch - Logstash - Kibana
Monitoramento com ELK - Elasticsearch - Logstash - KibanaMonitoramento com ELK - Elasticsearch - Logstash - Kibana
Monitoramento com ELK - Elasticsearch - Logstash - Kibana
Waldemar Neto
 
The ELK Stack - Get to Know Logs
The ELK Stack - Get to Know LogsThe ELK Stack - Get to Know Logs
The ELK Stack - Get to Know Logs
GlobalLogic Ukraine
 
Logmanagement with Icinga2 and ELK
Logmanagement with Icinga2 and ELKLogmanagement with Icinga2 and ELK
Logmanagement with Icinga2 and ELK
Icinga
 
Elasitcsearch + Logstash + Kibana 日誌監控
Elasitcsearch + Logstash + Kibana 日誌監控Elasitcsearch + Logstash + Kibana 日誌監控
Elasitcsearch + Logstash + Kibana 日誌監控
Jui An Huang (黃瑞安)
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
Monal Daxini
 
OSMC 2014: Current state of Icinga | Icinga Team
OSMC 2014: Current state of Icinga | Icinga TeamOSMC 2014: Current state of Icinga | Icinga Team
OSMC 2014: Current state of Icinga | Icinga Team
NETWAYS
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
Roland Kuhn
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3
takezoe
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsTill Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Flink Forward
 
Going Reactive with Spring 5
Going Reactive with Spring 5Going Reactive with Spring 5
Going Reactive with Spring 5
Drazen Nikolic
 
Reactive Everywhere
Reactive EverywhereReactive Everywhere
Reactive Everywhere
trion development GmbH
 
More kibana
More kibanaMore kibana
More kibana
琛琳 饶
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)
Mathew Beane
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward
 

What's hot (19)

ELK introduction
ELK introductionELK introduction
ELK introduction
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring Tools
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
 
Open Source Logging and Metric Tools
Open Source Logging and Metric ToolsOpen Source Logging and Metric Tools
Open Source Logging and Metric Tools
 
Monitoramento com ELK - Elasticsearch - Logstash - Kibana
Monitoramento com ELK - Elasticsearch - Logstash - KibanaMonitoramento com ELK - Elasticsearch - Logstash - Kibana
Monitoramento com ELK - Elasticsearch - Logstash - Kibana
 
The ELK Stack - Get to Know Logs
The ELK Stack - Get to Know LogsThe ELK Stack - Get to Know Logs
The ELK Stack - Get to Know Logs
 
Logmanagement with Icinga2 and ELK
Logmanagement with Icinga2 and ELKLogmanagement with Icinga2 and ELK
Logmanagement with Icinga2 and ELK
 
Elasitcsearch + Logstash + Kibana 日誌監控
Elasitcsearch + Logstash + Kibana 日誌監控Elasitcsearch + Logstash + Kibana 日誌監控
Elasitcsearch + Logstash + Kibana 日誌監控
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
OSMC 2014: Current state of Icinga | Icinga Team
OSMC 2014: Current state of Icinga | Icinga TeamOSMC 2014: Current state of Icinga | Icinga Team
OSMC 2014: Current state of Icinga | Icinga Team
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsTill Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
 
Going Reactive with Spring 5
Going Reactive with Spring 5Going Reactive with Spring 5
Going Reactive with Spring 5
 
Reactive Everywhere
Reactive EverywhereReactive Everywhere
Reactive Everywhere
 
More kibana
More kibanaMore kibana
More kibana
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 

Similar to Collaborative Workflow Development and Experimentation in the Digital Humanities

Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
SCAPE Project
 
A Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesA Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven Architectures
HostedbyConfluent
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
Steven Wu
 
Encode Club workshop slides
Encode Club workshop slidesEncode Club workshop slides
Encode Club workshop slides
Vanessa Lošić
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
confluent
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Logstash - CeBIT 2014 - Open Source Forum
Logstash - CeBIT 2014 - Open Source ForumLogstash - CeBIT 2014 - Open Source Forum
Logstash - CeBIT 2014 - Open Source Forum
NETWAYS
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
AngularJS Basics
AngularJS BasicsAngularJS Basics
AngularJS Basics
Nikita Shounewich
 
WebSockets wiith Scala and Play! Framework
WebSockets wiith Scala and Play! FrameworkWebSockets wiith Scala and Play! Framework
WebSockets wiith Scala and Play! Framework
Fabio Tiriticco
 
Top 10 Kubernetes Native Java Quarkus Features
Top 10 Kubernetes Native Java Quarkus FeaturesTop 10 Kubernetes Native Java Quarkus Features
Top 10 Kubernetes Native Java Quarkus Features
jclingan
 
Service Mesh @Lara Camp Myanmar - 02 Sep,2023
Service Mesh @Lara Camp Myanmar - 02 Sep,2023Service Mesh @Lara Camp Myanmar - 02 Sep,2023
Service Mesh @Lara Camp Myanmar - 02 Sep,2023
Hello Cloud
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
Saltlux Inc.
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
Rakuten Group, Inc.
 
07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...
07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...
07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...
Indonesia Network Operators Group
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
Kevin Webber
 
Icinga @ OSMC 2014
Icinga @ OSMC 2014Icinga @ OSMC 2014
Icinga @ OSMC 2014
Icinga
 
Exploring Relay land
Exploring Relay landExploring Relay land
Exploring Relay land
Stefano Masini
 
Integrating Taverna Player into Scratchpads
Integrating Taverna Player into ScratchpadsIntegrating Taverna Player into Scratchpads
Integrating Taverna Player into Scratchpads
Robert Haines
 

Similar to Collaborative Workflow Development and Experimentation in the Digital Humanities (20)

Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
 
A Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesA Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Guide To End-to-End Tracing In Event Driven Architectures
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
 
Encode Club workshop slides
Encode Club workshop slidesEncode Club workshop slides
Encode Club workshop slides
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Logstash - CeBIT 2014 - Open Source Forum
Logstash - CeBIT 2014 - Open Source ForumLogstash - CeBIT 2014 - Open Source Forum
Logstash - CeBIT 2014 - Open Source Forum
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
AngularJS Basics
AngularJS BasicsAngularJS Basics
AngularJS Basics
 
WebSockets wiith Scala and Play! Framework
WebSockets wiith Scala and Play! FrameworkWebSockets wiith Scala and Play! Framework
WebSockets wiith Scala and Play! Framework
 
Top 10 Kubernetes Native Java Quarkus Features
Top 10 Kubernetes Native Java Quarkus FeaturesTop 10 Kubernetes Native Java Quarkus Features
Top 10 Kubernetes Native Java Quarkus Features
 
Service Mesh @Lara Camp Myanmar - 02 Sep,2023
Service Mesh @Lara Camp Myanmar - 02 Sep,2023Service Mesh @Lara Camp Myanmar - 02 Sep,2023
Service Mesh @Lara Camp Myanmar - 02 Sep,2023
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
 
07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...
07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...
07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 
Icinga @ OSMC 2014
Icinga @ OSMC 2014Icinga @ OSMC 2014
Icinga @ OSMC 2014
 
Exploring Relay land
Exploring Relay landExploring Relay land
Exploring Relay land
 
Integrating Taverna Player into Scratchpads
Integrating Taverna Player into ScratchpadsIntegrating Taverna Player into Scratchpads
Integrating Taverna Player into Scratchpads
 

More from cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
cneudecker
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
cneudecker
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
cneudecker
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
cneudecker
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
cneudecker
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
cneudecker
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
cneudecker
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
cneudecker
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
cneudecker
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
cneudecker
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
cneudecker
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
cneudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
cneudecker
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
cneudecker
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
cneudecker
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
cneudecker
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
cneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
cneudecker
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
cneudecker
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
cneudecker
 

More from cneudecker (20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 

Recently uploaded

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

Collaborative Workflow Development and Experimentation in the Digital Humanities

  • 1. A Service-Oriented Architecture for Collaborative Workflow Development and Experimentation eHumanities Seminar 2012 University of Leipzig 10-10-2012 Clemens Neudecker, KB @cneudecker Zeki Mustafa Dogan, SUB-DL Sven Schlarb, ÖNB @SvenSchlarb Juan Garcés, GCDH @juan_garces
  • 2. Idea • Provide web-based versions of tools (web services) • Package web services, data and documentation into ready-to-run “components” (encapsulation) • Chain the components to create workflows via drag-and-drop operation • Share and use workflows to re-run experiments and to demonstrate results
  • 3. Background • High degree of diversity in research topics, but also tools and frameworks being used • Technical resources should be easy to use, well documented, accessible from anywhere • Prevent re-inventing of the wheel
  • 4. Requirements • Interoperability = connect different resources • Flexibility = easy to deploy and adapt • Modularity = allow different combinations of tools • Usability = simple to use for non-technical users • Re-usability = easy to share with others • Scalability = apt for large-scale processing • Sustainability = resources simple to preserve • Transparency = tools evaluated separately • Distributed development and deployment
  • 5. Interoperability Framework (IIF) • Modules: - Java Wrapper for command line tools - Web Services (incl. format converters) - Taverna Workflow Engine - Client interfaces - Repository connectors
  • 7. IIF Command Line Wrapper • Java project, builds using Maven2 • Creates a web service project from a given tool description (XML) • Web service exposes SOAP & REST endpoints and Java API interface • Requirements: command line call, no direct user interaction
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. IIF Web Services • Web services are described by a WSDL • Input/output data structures • Data is referenced by URL • Annotations • Default values
  • 13. REST
  • 14. SOAP
  • 15. IIF Workflows • What is a workflow? (Yahoo Pipes, etc.) • Different kinds of workflows: for a single command, application, chain of processes • Main benefit: Encapsulation, Reuse • Workflows as “components”: include link to WS endpoint, sample input data and documentation = ready-to-use resource • Web 2.0 workflow registry: myExperiment
  • 16.
  • 17. Why workflows? • “In-silico experimentation” • Good structuring of experiment setup: – Challenge/Research question – Dataset definition – Processing with algorithms – Evaluation/Provenance – Presentation of results • All this can be modelled into a workflow
  • 18. Integration into Taverna • Web Services (SOAP and REST) • Command line tools (SH and SSH) • Beanshells (can import Java libraries) • R (statistics) • Excel, CSV • Additional service types can be added through dedicated plug-ins
  • 19. Taverna flavours • Workbench – local GUI client for Linux, Windows, OSX • Command line tool – run workflows from the command line • Server – Webapp with REST API and Java/Ruby client libs • Web-Wf-Designer – Javascript version for designing workflows in a browser
  • 23. Client interfaces • Web service client: create a simple HTML form from a given web service description • Taverna client: create a simple HTML form from a given Taverna workflow description  integration into production and presentation environments via iframes
  • 26. Repositories • Accessible via web service API – Fedora Commons – WebDAV – PRImA
  • 28. Examples • Use case 1: OCR (IMPACT) • Start: Images (scanned documents) • Processing: OCR, NLP, Evaluation • Result: Full text, Entities, Sentiments
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. Examples • Use case 2: Preservation (SCAPE) • Start: Document collection preparation • Processing: Hadoop, Hive • Result: Statistics
  • 34.
  • 35.
  • 36. Reading image metadata Jp2PathCreator HadoopStreamingExiftoolRead find /NAS/Z119585409/00000001.jp2 /NAS/Z119585409/00000002.jp2 /NAS/Z119585409/00000003.jp2 … /NAS/Z117655409/00000001.jp2 /NAS/Z117655409/00000002.jp2 /NAS/Z117655409/00000003.jp2 … /NAS/Z119585987/00000001.jp2 /NAS/Z119585987/00000002.jp2 /NAS/Z119585987/00000003.jp2 … /NAS/Z119584539/00000001.jp2 /NAS/Z119584539/00000002.jp2 /NAS/Z119584539/00000003.jp2 … /NAS/Z119599879/00000001.jp2l /NAS/Z119589879/00000002.jp2 /NAS/Z119589879/00000003.jp2 ... ... NAS reading files from NAS 1,4 GB 1,2 GB : ~ 5 h + ~ 38 h = ~ 43 h 60.000 books 24 Million pages Z119585409/00000001 2345 Z119585409/00000002 2340 Z119585409/00000003 2543 … Z117655409/00000001 2300 Z117655409/00000002 2300 Z117655409/00000003 2345 … Z119585987/00000001 2300 Z119585987/00000002 2340 Z119585987/00000003 2432 … Z119584539/00000001 5205 Z119584539/00000002 2310 Z119584539/00000003 2134 … Z119599879/00000001 2312 Z119589879/00000002 2300 Z119589879/00000003 2300 ...
  • 37.
  • 38. HtmlPathCreator SequenceFileCreator find /NAS/Z119585409/00000707.html /NAS/Z119585409/00000708.html /NAS/Z119585409/00000709.html … /NAS/Z138682341/00000707.html /NAS/Z138682341/00000708.html /NAS/Z138682341/00000709.html … /NAS/Z178791257/00000707.html /NAS/Z178791257/00000708.html /NAS/Z178791257/00000709.html … /NAS/Z967985409/00000707.html /NAS/Z967985409/00000708.html /NAS/Z967985409/00000709.html … /NAS/Z196545409/00000707.html /NAS/Z196545409/00000708.html /NAS/Z196545409/00000709.html ... Z119585409/00000707 Z119585409/00000708 Z119585409/00000709 Z119585409/00000710 Z119585409/00000711 Z119585409/00000712 NAS reading files from NAS 1,4 GB 997 GB (uncompressed) : ~ 5 h + ~ 24 h = ~ 29 h 60.000 books 24 Million pages Sequence file creation
  • 39.
  • 40. Z119585409/00000001 Z119585409/00000002 Z119585409/00000003 Z119585409/00000004 Z119585409/00000005 HTML parsing HadoopAvBlockWidthMapReduce ... : ~ 6 h 60.000 books 24 Million pages Z119585409/00000001 2100 Z119585409/00000001 2200 Z119585409/00000001 2300 Z119585409/00000001 2400 Z119585409/00000002 2100 Z119585409/00000002 2200 Z119585409/00000002 2300 Z119585409/00000002 2400 Z119585409/00000003 2100 Z119585409/00000003 2200 Z119585409/00000003 2300 Z119585409/00000003 2400 Z119585409/00000004 2100 Z119585409/00000004 2200 Z119585409/00000004 2300 Z119585409/00000004 2400 Z119585409/00000005 2100 Z119585409/00000005 2200 Z119585409/00000005 2300 Z119585409/00000005 2400 Z119585409/00000001 2250 Z119585409/00000002 2250 Z119585409/00000003 2250 Z119585409/00000004 2250 Z119585409/00000005 2250 Map Reduce SequenceFile Textfile
  • 41.
  • 42. Analytic Queries CREATE TABLE htmlwidth (hid STRING, hwidth INT) : ~ 6 h 60.000 books 24 Million pages HiveLoadExifData & HiveLoadHocrData htmlwidth hid hwidth Z119585409/00000001 1870 Z119585409/00000002 2100 Z119585409/00000003 2015 Z119585409/00000004 1350 Z119585409/00000005 1700 jp2width jid jwidth Z119585409/00000001 2250 Z119585409/00000002 2150 Z119585409/00000003 2125 Z119585409/00000004 2125 Z119585409/00000005 2250 Z119585409/00000001 1870 Z119585409/00000002 2100 Z119585409/00000003 2015 Z119585409/00000004 1350 Z119585409/00000005 1700 Z119585409/00000001 2250 Z119585409/00000002 2150 Z119585409/00000003 2125 Z119585409/00000004 2125 Z119585409/00000005 2250 CREATE TABLE jp2width (hid STRING, jwidth INT)
  • 43. Analytic Queries HiveSelect jp2width htmlwidth jid jwidth Z119585409/00000001 2250 Z119585409/00000002 2150 Z119585409/00000003 2125 Z119585409/00000004 2125 Z119585409/00000005 2250 : ~ 6 h 60.000 books 24 Million pages hid hwidth Z119585409/00000001 1870 Z119585409/00000002 2100 Z119585409/00000003 2015 Z119585409/00000004 1350 Z119585409/00000005 1700 jid jwidth hwidth Z119585409/000000 2250 1870 01 Z119585409/000000 02 2150 2100 Z119585409/000000 03 2125 2015 Z119585409/000000 04 2125 1350 Z119585409/000000 05 2250 1700 select jid, jwidth, hwidth from jp2width inner join htmlwidth on jid = hid
  • 44. Examples • Use case 3: Curation (GDZ) • Start: Get documents from repository • Processing: Enrichment (OCR, Entities, GeoNames) • Result: Online presentation
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52. ROPEN (= Resource Oriented Presentation ENvironment)
  • 53. Scalability • Multiple options: - Service parallelization - Cloud - Grid - Hadoop
  • 54. Compatibility • Taverna  UIMA • Taverna  Galaxy • Taverna  Kepler • Taverna  Weblicht • Taverna  Seasr
  • 55. But… • Multi-layered approach increases complexity (debugging, maintenance) • Diverse set of endpoints (OS, CPU, etc.) • Multiple dependencies • Shared responsibilities • Authentication & Authorization • Error handling / Fail-over / Monitoring
  • 57. Discussion • Potential/use cases DH? • Tools/features to make available? • Questions, comments or remarks?