A Service-Oriented-Architecture for Collaborative Workflow Development and Experimentation in the Digital Humanities
2012 Leipzig eHumanities Seminar, 10 October 2012, Leipzig, Germany.
Building scalable rest service using Akka HTTPdatamantra
Akka HTTP is a toolkit for building scalable REST services in Scala. It provides a high-level API built on top of Akka actors and Akka streams for writing asynchronous, non-blocking and resilient microservices. The document discusses Akka HTTP's architecture, routing DSL, directives, testing, additional features like file uploads and websockets. It also compares Akka HTTP to other Scala frameworks and outlines pros and cons of using Akka HTTP for building REST APIs.
Journey into Reactive Streams and Akka StreamsKevin Webber
Are streams just collections? What's the difference between Java 8 streams and Reactive Streams? How do I implement Reactive Streams with Akka? Pub/sub, dynamic push/pull, non-blocking, non-dropping; these are some of the other concepts covered. We'll also discuss how to leverage streams in a real-world application.
Spark is used to perform in-memory transformations on customer data collected by Totango to generate analytics and insights. Luigi is used as a workflow engine to manage dependencies between batch processing tasks like metrics generation, health scoring, and alerting. The tasks are run on Spark and output to S3. A custom Gameboy controller provides monitoring and management of the Luigi workflow.
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
At Booking.com, we have a constant flow of events coming from various applications and internal subsystems. This critical data needs to be stored for real-time, medium and long term analysis. Events are schema-less, making it difficult to use standard analysis tools.This presentation will explain how we built a storage and analysis solution based on Riak. The talk will cover: data aggregation and serialization, Riak configuration, solutions for lowering the network usage, and finally, how Riak's advanced features are used to perform real-time data crunching on the cluster nodes.
Icinga 2 provides many new features compared to Icinga 1 including Icinga Web 2, modules that can be enabled or disabled, new programming language features like constants and operators, improved time granularity, templates to reduce typing, flexible commands, and powerful apply and tag features to selectively configure hosts and services. It also supports new notification methods, distributed monitoring zones, and the monitoring-plugins package.
This document provides an overview and introduction to Akka HTTP, a Scala library built on Akka Streams for HTTP-based applications. Some key points:
- Akka HTTP uses Akka Streams to model HTTP requests and responses as streaming data flows.
- It allows building both HTTP clients and servers by composing stream processing stages together.
- Common directives and operations like routing, marshalling, validation, and testing are supported through a high-level API.
- Examples demonstrate basic usage like creating a route that returns XML, running a server, and writing tests against routes.
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...confluent
At Stitch Fix, we maintain a distributed Kafka Connect cluster running several hundred connectors. Over the years, we've learned invaluable lessons for keeping our connectors going 24/7. As many conference goers probably know, event driven applications require a new way of thinking. With this new paradigm comes unique operational considerations, which I will delve into. Specifically, this talk will be an overview of: 1) Our deployment model and use case (we have a large distributed Kafka Connect cluster that powers a self-service data integration platform tailored to the needs of our Data Scientists). 2) Our favorite operational tools that we have built for making things run smoothly (the jobs, alerts and dashboards we find most useful. A quick run down of the admin service we wrote that sits on top of Kafka Connect). 3) Our approach to end-to-end integrity monitoring (our tracer bullet system that we built to constantly monitor all our sources and sinks). 4) Lessons learned from production issues and painful migrations (why, oh why did we not use schemas from the beginning?? Pausing connectors doesn't do what you think it does... rebalancing is tricky... jar hell problems are a thing of the past, upgrade and use plugin.path!). 5) Future areas of improvement. The target audience member is an engineer who is curious about Kafka Connect or currently maintains a small to medium sized Kafka Connect cluster. They should walk away from the talk with increased confidence in using and maintaining a large Kafka Connect cluster, and should be armed with the hard won experiences of our team. For the most part, we've been very happy with our Kafka Connect powered data integration platform, and we'd love to share our lessons learned with the community in order to drive adoption.
Spark Streaming has quickly established itself as one of the more popular Streaming Engines running on the Hadoop Ecosystem. Not only does it provide integration with many type of message brokers and stream sources, but it also provides the ability to leverage other major modules in Spark like Spark SQL and MLib in conjunction. This allows for businesses and developers to make use out of data in ways they couldn’t hope to do in the past.
However, while building a Spark Streaming pipeline, it’s not sufficient to only know how to express your business logic. Operationalizing these pipelines and running the application with high uptime and continuous monitoring has a lot of operational challenges. Fortunately, Spark Streaming makes all that easy as well. In this talk, we’ll go over some of the main steps you’ll need to take to get your Spark Streaming application ready for production, specifically in conjunction with Kafka. This includes steps to gracefully shutdown your application, steps to perform upgrades, monitoring, various useful spark configurations and more.
Building scalable rest service using Akka HTTPdatamantra
Akka HTTP is a toolkit for building scalable REST services in Scala. It provides a high-level API built on top of Akka actors and Akka streams for writing asynchronous, non-blocking and resilient microservices. The document discusses Akka HTTP's architecture, routing DSL, directives, testing, additional features like file uploads and websockets. It also compares Akka HTTP to other Scala frameworks and outlines pros and cons of using Akka HTTP for building REST APIs.
Journey into Reactive Streams and Akka StreamsKevin Webber
Are streams just collections? What's the difference between Java 8 streams and Reactive Streams? How do I implement Reactive Streams with Akka? Pub/sub, dynamic push/pull, non-blocking, non-dropping; these are some of the other concepts covered. We'll also discuss how to leverage streams in a real-world application.
Spark is used to perform in-memory transformations on customer data collected by Totango to generate analytics and insights. Luigi is used as a workflow engine to manage dependencies between batch processing tasks like metrics generation, health scoring, and alerting. The tasks are run on Spark and output to S3. A custom Gameboy controller provides monitoring and management of the Luigi workflow.
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
At Booking.com, we have a constant flow of events coming from various applications and internal subsystems. This critical data needs to be stored for real-time, medium and long term analysis. Events are schema-less, making it difficult to use standard analysis tools.This presentation will explain how we built a storage and analysis solution based on Riak. The talk will cover: data aggregation and serialization, Riak configuration, solutions for lowering the network usage, and finally, how Riak's advanced features are used to perform real-time data crunching on the cluster nodes.
Icinga 2 provides many new features compared to Icinga 1 including Icinga Web 2, modules that can be enabled or disabled, new programming language features like constants and operators, improved time granularity, templates to reduce typing, flexible commands, and powerful apply and tag features to selectively configure hosts and services. It also supports new notification methods, distributed monitoring zones, and the monitoring-plugins package.
This document provides an overview and introduction to Akka HTTP, a Scala library built on Akka Streams for HTTP-based applications. Some key points:
- Akka HTTP uses Akka Streams to model HTTP requests and responses as streaming data flows.
- It allows building both HTTP clients and servers by composing stream processing stages together.
- Common directives and operations like routing, marshalling, validation, and testing are supported through a high-level API.
- Examples demonstrate basic usage like creating a route that returns XML, running a server, and writing tests against routes.
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...confluent
At Stitch Fix, we maintain a distributed Kafka Connect cluster running several hundred connectors. Over the years, we've learned invaluable lessons for keeping our connectors going 24/7. As many conference goers probably know, event driven applications require a new way of thinking. With this new paradigm comes unique operational considerations, which I will delve into. Specifically, this talk will be an overview of: 1) Our deployment model and use case (we have a large distributed Kafka Connect cluster that powers a self-service data integration platform tailored to the needs of our Data Scientists). 2) Our favorite operational tools that we have built for making things run smoothly (the jobs, alerts and dashboards we find most useful. A quick run down of the admin service we wrote that sits on top of Kafka Connect). 3) Our approach to end-to-end integrity monitoring (our tracer bullet system that we built to constantly monitor all our sources and sinks). 4) Lessons learned from production issues and painful migrations (why, oh why did we not use schemas from the beginning?? Pausing connectors doesn't do what you think it does... rebalancing is tricky... jar hell problems are a thing of the past, upgrade and use plugin.path!). 5) Future areas of improvement. The target audience member is an engineer who is curious about Kafka Connect or currently maintains a small to medium sized Kafka Connect cluster. They should walk away from the talk with increased confidence in using and maintaining a large Kafka Connect cluster, and should be armed with the hard won experiences of our team. For the most part, we've been very happy with our Kafka Connect powered data integration platform, and we'd love to share our lessons learned with the community in order to drive adoption.
Spark Streaming has quickly established itself as one of the more popular Streaming Engines running on the Hadoop Ecosystem. Not only does it provide integration with many type of message brokers and stream sources, but it also provides the ability to leverage other major modules in Spark like Spark SQL and MLib in conjunction. This allows for businesses and developers to make use out of data in ways they couldn’t hope to do in the past.
However, while building a Spark Streaming pipeline, it’s not sufficient to only know how to express your business logic. Operationalizing these pipelines and running the application with high uptime and continuous monitoring has a lot of operational challenges. Fortunately, Spark Streaming makes all that easy as well. In this talk, we’ll go over some of the main steps you’ll need to take to get your Spark Streaming application ready for production, specifically in conjunction with Kafka. This includes steps to gracefully shutdown your application, steps to perform upgrades, monitoring, various useful spark configurations and more.
This document discusses the ELK stack, which consists of Elasticsearch, Logstash, and Kibana. It describes each component and how they work together to parse, index, and visualize log data. Logstash is used to parse logs from various sources and apply filters before indexing the data into Elasticsearch. Kibana then allows users to visualize the indexed data through interactive dashboards and charts. The document also covers production deployments, monitoring, and security options for the ELK stack.
We will introduce Airflow, an Apache Project for scheduling and workflow orchestration. We will discuss use cases, applicability and how best to use Airflow, mainly in the context of building data engineering pipelines. We have been running Airflow in production for about 2 years, we will also go over some learnings, best practices and some tools we have built around it.
Speakers: Robert Sanders, Shekhar Vemuri
A talk about Open Source logging and monitoring tools, using the ELK stack (ElasticSearch, Logstash, Kibana) to aggregate logs, how to track metrics from systems and logs, and how Drupal.org uses the ELK stack to aggregate and process billions of logs a month.
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Kaxil Naik
Apache Airflow allows users to programmatically author, schedule, and monitor workflows or directed acyclic graphs (DAGs) using Python. It is an open-source workflow management platform developed by Airbnb that is used to orchestrate data pipelines. The document provides an overview of Airflow including what it is, its architecture, and concepts like DAGs, tasks, and operators. It also includes instructions on setting up Airflow and running tutorials on basic and dynamic workflows.
This document discusses open source logging and metrics tools. It provides an introduction to customizing logs from common daemons and focuses on log aggregation, parsing, and search. It describes a demo setup using the ELK stack to aggregate and visualize logs and metrics from a Drupal site. The document discusses shipping logs with rsyslog and logstash, and parsing different log formats. It also covers monitoring performance with tools like Graphite and Grafana.
Monitoramento com ELK - Elasticsearch - Logstash - KibanaWaldemar Neto
The document discusses the ELK stack which includes Elasticsearch, Logstash, and Kibana. It describes the workflow of using Logstash to parse and filter logs, Elasticsearch to index the data, and Kibana to visualize the indexed data. It provides examples of how the ELK stack can be used for log parsing, real-time metrics monitoring, and anomaly detection. The document also mentions options for running the ELK stack in the cloud or as a hosted service.
This presentation deals with logging in the course of mobile development, namely describing the open source logging environment built with ELK stack (ElasticSearch, Logstash and Kibana).
Presentation by Igor Rudyk (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
Talk given by Thomas Widhalm at Icinga Camp San Francisco 2016 - https://www.icinga.org/community/events/archive/2016-archive/icinga-camp-san-francisco/
Declarative benchmarking of cassandra and it's data modelsMonal Daxini
Monal Daxini presented on the declarative benchmarking tool NDBench and its Cassandra plugin. The tool allows users to define performance test profiles that specify the Cassandra schema, queries, load patterns, and other parameters. It executes the queries against Cassandra clusters and collects metrics to analyze performance. The plugin supports all Cassandra data types and allows testing different versions. Netflix uses it to validate data models and certify Cassandra upgrades. Future enhancements include adding more data generators and supporting other data stores.
OSMC 2014: Current state of Icinga | Icinga TeamNETWAYS
Seit der ersten Preview auf der OSMC 2012 ist viel passiert und Icinga 2 ist seit einigen Monaten fertig. Neben verbesserter Performance und flexibler Architektur und zentrales und dezentrales Cluster, zeichnet sich Icinga 2 vor allem durch vereinfachte Konfiguration aus. Auch bei Icinga 1 sowie bei den verschiedenen Add-ons und Webinterfaces hat sich viel getan. Highlight des Vortrags wird die Vorstellung des neuen Webinterfaces Icinga Web 2. Neben einem aktuellen Status gibt der Vortrag einen Überblick über die kommenden Entwicklungen und zeigt die aktuellen Versionen in Live Demos.
Reactive Streams: Handling Data-Flow the Reactive WayRoland Kuhn
Building on the success of Reactive Extensions—first in Rx.NET and now in RxJava—we are taking Observers and Observables to the next level: by adding the capability of handling back-pressure between asynchronous execution stages we enable the distribution of stream processing across a cluster of potentially thousands of nodes. The project defines the common interfaces for interoperable stream implementations on the JVM and is the result of a collaboration between Twitter, Netflix, Pivotal, RedHat and Typesafe. In this presentation I introduce the guiding principles behind its design and show examples using the actor-based implementation in Akka.
The document discusses Reactive Slick, a new version of the Slick database access library for Scala that provides reactive capabilities. It allows parallel database execution and streaming of large query results using Reactive Streams. Reactive Slick is suitable for composite database tasks, combining async tasks, and processing large datasets through reactive streams.
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsFlink Forward
http://flink-forward.org/kb_sessions/dynamic-scaling-how-apache-flink-adapts-to-changing-workloads/
Modern stream processing engines not only have to process millions of events per second at sub-second latency but also have to cope with constantly changing workloads. Due to the dynamic nature of stream applications where the number of incoming events can strongly vary with time, systems cannot reliably predetermine the amount of required resources. In order to meet guaranteed SLAs as well as utilizing system resources as efficiently as possible, frameworks like Apache Flink have to adapt their resource consumption dynamically. In this talk, we will take a look under the hood and explain how Flink scales stateful application in and out. Starting with the concept of key groups and partionable state, we will cover ways to detect bottlenecks in streaming jobs and discuss efficient strategies how to scale out operators with minimal down-time.
In recent times, Reactive Programming has gained a lot of popularity. It is not a “silver bullet” nor it is a solution for every problem. Yet, it is a paradigm to build applications which are non-blocking, event-driven and asynchronous and require a small number of threads to scale. Spring Framework 5 embraces Reactive Streams and Reactor for its own reactive use, as well as in many of its core APIs. It also adds an ability to code in a declarative way, as opposed to imperatively, resulting in more responsive and resilient applications. On top of that, you are given an amazing toolbox of functions to combine, create and filter any data stream. It becomes easy to support input and output streaming scenarios for microservices, scatter/gather, data ingestion, and so on. This presentation is about support and building blocks for reactive programming, that come with the latest versions of Spring Framework 5 and Spring Boot 2.
This document discusses reactive programming and its applications. It introduces reactive programming concepts like Observables and Subscribers. It then covers implementing reactive backends with frameworks like RxJava and Spring, including reactive databases, services, and third party APIs. It also discusses reactive web frontends using Angular, covering reactive HTTP requests, Server-Sent Events through Observables, and wrapping WebSockets in Observables. The document advocates that reactive programming allows building flexible, scalable systems and is available to use with frameworks like Spring and Angular today.
This document discusses the author's experience with the ELK stack and Kibana. The author has been using ELK since 2012 and has published content on Logstash and written chapters about ELK in their book. The document then provides an overview of Kibana, describing its core components and features like dashboards, visualizations, and search functionality. It also outlines some custom panels the author created for Kibana through custom development, including range, percentile, and map panels. Lastly, it discusses the author's solution for adding authentication to Kibana.
We're talking about serious log crunching and intelligence gathering with Elastic, Logstash, and Kibana.
ELK is an end-to-end stack for gathering structured and unstructured data from servers. It delivers insights in real time using the Kibana dashboard giving unprecedented horizontal visibility. The visualization and search tools will make your day-to-day hunting a breeze.
During this brief walkthrough of the setup, configuration, and use of the toolset, we will show you how to find the trees from the forest in today's modern cloud environments and beyond.
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward
Stream Processing has evolved quickly in a short time: a few years ago, stream processing was mostly simple real-time aggregations with limited throughput and consistency. Today, many stream processing applications have complex logic, strict correctness guarantees, high performance, low latency, and maintain large state without databases. Since then, Stream processing has become much more sophisticated because the stream processors – the systems that run the application code, coordinate the distributed execution, route the data streams, and ensure correctness in the face of failures and crashes – have become much more technologically advanced. In this talk, we walk through some of the techniques and innovations behind Apache Flink, one of the most powerful open source stream processors. In particular, we plan to discuss: The evolution of fault tolerance in stream processing, Flink’s approach of distributed asynchronous snapshots, and how that approach looks today after multiple years of collaborative work with users running large scale stream processing deployments. How Flink supports applications with terabytes of state and offers efficient snapshots, fast recovery, rescaling, and high throughput. How to build end-to-end consistency (exactly-once semantics) and transactional integration with other systems. How batch and streaming can both run on the same execution model with best-in-class performance.
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
Sven Schlarb of the Austrian National Library gave this introduction to large scale preservation workflows with Taverna at the first SCAPE Training event, ‘Keeping Control: Scalable Preservation Environments for Identification and Characterisation’, in Guimarães, Portugal on 6-7 December 2012.
A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesHostedbyConfluent
"Can you determine how a given event came to be? Is it an aggregation, a combination of multiple events with different sources? What are its origins?
As event driven architectures become more sophisticated, with features such as stateful stream processing, data joining, and multi-cluster flows, it becomes harder to trace the path of an event, its origins and touch points. At the same time, it also becomes more important.
Using code examples and usage scenarios we will dive into the tracing capabilities of OpenTelemetry for Kafka clients, including those using the Consumer/Producer and Kafka Streams libraries, as well as the Connect and ksqlDB platforms. This will culminate in an end-to-end tracing pipeline demonstration.
This talk will cover the following topics:
- Distributed tracing concepts, including context propagation and the OpenTelemetry implementation stack
- OpenTelemetry’s Kafka instrumentation, what is supported out of the box, code examples, edge cases, challenges and solutions
- A demonstration of an end-to-end tracing implementation
In this session, you will gain an understanding of the importance of end-to-end traceability, and several tools & examples for improving observability in your own distributed event driven applications."
This document discusses building stream processing as a service (SPaaS) using Apache Flink. It introduces Flink's stream processing capabilities and describes how to build a SPaaS offering with different levels of complexity and ease of use. It also covers the Keystone router for simple stream routing, building custom Flink jobs, and techniques for recovering from failures using backfill from Hive or rewinding the Flink job.
This document discusses the ELK stack, which consists of Elasticsearch, Logstash, and Kibana. It describes each component and how they work together to parse, index, and visualize log data. Logstash is used to parse logs from various sources and apply filters before indexing the data into Elasticsearch. Kibana then allows users to visualize the indexed data through interactive dashboards and charts. The document also covers production deployments, monitoring, and security options for the ELK stack.
We will introduce Airflow, an Apache Project for scheduling and workflow orchestration. We will discuss use cases, applicability and how best to use Airflow, mainly in the context of building data engineering pipelines. We have been running Airflow in production for about 2 years, we will also go over some learnings, best practices and some tools we have built around it.
Speakers: Robert Sanders, Shekhar Vemuri
A talk about Open Source logging and monitoring tools, using the ELK stack (ElasticSearch, Logstash, Kibana) to aggregate logs, how to track metrics from systems and logs, and how Drupal.org uses the ELK stack to aggregate and process billions of logs a month.
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Kaxil Naik
Apache Airflow allows users to programmatically author, schedule, and monitor workflows or directed acyclic graphs (DAGs) using Python. It is an open-source workflow management platform developed by Airbnb that is used to orchestrate data pipelines. The document provides an overview of Airflow including what it is, its architecture, and concepts like DAGs, tasks, and operators. It also includes instructions on setting up Airflow and running tutorials on basic and dynamic workflows.
This document discusses open source logging and metrics tools. It provides an introduction to customizing logs from common daemons and focuses on log aggregation, parsing, and search. It describes a demo setup using the ELK stack to aggregate and visualize logs and metrics from a Drupal site. The document discusses shipping logs with rsyslog and logstash, and parsing different log formats. It also covers monitoring performance with tools like Graphite and Grafana.
Monitoramento com ELK - Elasticsearch - Logstash - KibanaWaldemar Neto
The document discusses the ELK stack which includes Elasticsearch, Logstash, and Kibana. It describes the workflow of using Logstash to parse and filter logs, Elasticsearch to index the data, and Kibana to visualize the indexed data. It provides examples of how the ELK stack can be used for log parsing, real-time metrics monitoring, and anomaly detection. The document also mentions options for running the ELK stack in the cloud or as a hosted service.
This presentation deals with logging in the course of mobile development, namely describing the open source logging environment built with ELK stack (ElasticSearch, Logstash and Kibana).
Presentation by Igor Rudyk (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
Talk given by Thomas Widhalm at Icinga Camp San Francisco 2016 - https://www.icinga.org/community/events/archive/2016-archive/icinga-camp-san-francisco/
Declarative benchmarking of cassandra and it's data modelsMonal Daxini
Monal Daxini presented on the declarative benchmarking tool NDBench and its Cassandra plugin. The tool allows users to define performance test profiles that specify the Cassandra schema, queries, load patterns, and other parameters. It executes the queries against Cassandra clusters and collects metrics to analyze performance. The plugin supports all Cassandra data types and allows testing different versions. Netflix uses it to validate data models and certify Cassandra upgrades. Future enhancements include adding more data generators and supporting other data stores.
OSMC 2014: Current state of Icinga | Icinga TeamNETWAYS
Seit der ersten Preview auf der OSMC 2012 ist viel passiert und Icinga 2 ist seit einigen Monaten fertig. Neben verbesserter Performance und flexibler Architektur und zentrales und dezentrales Cluster, zeichnet sich Icinga 2 vor allem durch vereinfachte Konfiguration aus. Auch bei Icinga 1 sowie bei den verschiedenen Add-ons und Webinterfaces hat sich viel getan. Highlight des Vortrags wird die Vorstellung des neuen Webinterfaces Icinga Web 2. Neben einem aktuellen Status gibt der Vortrag einen Überblick über die kommenden Entwicklungen und zeigt die aktuellen Versionen in Live Demos.
Reactive Streams: Handling Data-Flow the Reactive WayRoland Kuhn
Building on the success of Reactive Extensions—first in Rx.NET and now in RxJava—we are taking Observers and Observables to the next level: by adding the capability of handling back-pressure between asynchronous execution stages we enable the distribution of stream processing across a cluster of potentially thousands of nodes. The project defines the common interfaces for interoperable stream implementations on the JVM and is the result of a collaboration between Twitter, Netflix, Pivotal, RedHat and Typesafe. In this presentation I introduce the guiding principles behind its design and show examples using the actor-based implementation in Akka.
The document discusses Reactive Slick, a new version of the Slick database access library for Scala that provides reactive capabilities. It allows parallel database execution and streaming of large query results using Reactive Streams. Reactive Slick is suitable for composite database tasks, combining async tasks, and processing large datasets through reactive streams.
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsFlink Forward
http://flink-forward.org/kb_sessions/dynamic-scaling-how-apache-flink-adapts-to-changing-workloads/
Modern stream processing engines not only have to process millions of events per second at sub-second latency but also have to cope with constantly changing workloads. Due to the dynamic nature of stream applications where the number of incoming events can strongly vary with time, systems cannot reliably predetermine the amount of required resources. In order to meet guaranteed SLAs as well as utilizing system resources as efficiently as possible, frameworks like Apache Flink have to adapt their resource consumption dynamically. In this talk, we will take a look under the hood and explain how Flink scales stateful application in and out. Starting with the concept of key groups and partionable state, we will cover ways to detect bottlenecks in streaming jobs and discuss efficient strategies how to scale out operators with minimal down-time.
In recent times, Reactive Programming has gained a lot of popularity. It is not a “silver bullet” nor it is a solution for every problem. Yet, it is a paradigm to build applications which are non-blocking, event-driven and asynchronous and require a small number of threads to scale. Spring Framework 5 embraces Reactive Streams and Reactor for its own reactive use, as well as in many of its core APIs. It also adds an ability to code in a declarative way, as opposed to imperatively, resulting in more responsive and resilient applications. On top of that, you are given an amazing toolbox of functions to combine, create and filter any data stream. It becomes easy to support input and output streaming scenarios for microservices, scatter/gather, data ingestion, and so on. This presentation is about support and building blocks for reactive programming, that come with the latest versions of Spring Framework 5 and Spring Boot 2.
This document discusses reactive programming and its applications. It introduces reactive programming concepts like Observables and Subscribers. It then covers implementing reactive backends with frameworks like RxJava and Spring, including reactive databases, services, and third party APIs. It also discusses reactive web frontends using Angular, covering reactive HTTP requests, Server-Sent Events through Observables, and wrapping WebSockets in Observables. The document advocates that reactive programming allows building flexible, scalable systems and is available to use with frameworks like Spring and Angular today.
This document discusses the author's experience with the ELK stack and Kibana. The author has been using ELK since 2012 and has published content on Logstash and written chapters about ELK in their book. The document then provides an overview of Kibana, describing its core components and features like dashboards, visualizations, and search functionality. It also outlines some custom panels the author created for Kibana through custom development, including range, percentile, and map panels. Lastly, it discusses the author's solution for adding authentication to Kibana.
We're talking about serious log crunching and intelligence gathering with Elastic, Logstash, and Kibana.
ELK is an end-to-end stack for gathering structured and unstructured data from servers. It delivers insights in real time using the Kibana dashboard giving unprecedented horizontal visibility. The visualization and search tools will make your day-to-day hunting a breeze.
During this brief walkthrough of the setup, configuration, and use of the toolset, we will show you how to find the trees from the forest in today's modern cloud environments and beyond.
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward
Stream Processing has evolved quickly in a short time: a few years ago, stream processing was mostly simple real-time aggregations with limited throughput and consistency. Today, many stream processing applications have complex logic, strict correctness guarantees, high performance, low latency, and maintain large state without databases. Since then, Stream processing has become much more sophisticated because the stream processors – the systems that run the application code, coordinate the distributed execution, route the data streams, and ensure correctness in the face of failures and crashes – have become much more technologically advanced. In this talk, we walk through some of the techniques and innovations behind Apache Flink, one of the most powerful open source stream processors. In particular, we plan to discuss: The evolution of fault tolerance in stream processing, Flink’s approach of distributed asynchronous snapshots, and how that approach looks today after multiple years of collaborative work with users running large scale stream processing deployments. How Flink supports applications with terabytes of state and offers efficient snapshots, fast recovery, rescaling, and high throughput. How to build end-to-end consistency (exactly-once semantics) and transactional integration with other systems. How batch and streaming can both run on the same execution model with best-in-class performance.
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
Sven Schlarb of the Austrian National Library gave this introduction to large scale preservation workflows with Taverna at the first SCAPE Training event, ‘Keeping Control: Scalable Preservation Environments for Identification and Characterisation’, in Guimarães, Portugal on 6-7 December 2012.
A Practical Guide To End-to-End Tracing In Event Driven ArchitecturesHostedbyConfluent
"Can you determine how a given event came to be? Is it an aggregation, a combination of multiple events with different sources? What are its origins?
As event driven architectures become more sophisticated, with features such as stateful stream processing, data joining, and multi-cluster flows, it becomes harder to trace the path of an event, its origins and touch points. At the same time, it also becomes more important.
Using code examples and usage scenarios we will dive into the tracing capabilities of OpenTelemetry for Kafka clients, including those using the Consumer/Producer and Kafka Streams libraries, as well as the Connect and ksqlDB platforms. This will culminate in an end-to-end tracing pipeline demonstration.
This talk will cover the following topics:
- Distributed tracing concepts, including context propagation and the OpenTelemetry implementation stack
- OpenTelemetry’s Kafka instrumentation, what is supported out of the box, code examples, edge cases, challenges and solutions
- A demonstration of an end-to-end tracing implementation
In this session, you will gain an understanding of the importance of end-to-end traceability, and several tools & examples for improving observability in your own distributed event driven applications."
This document discusses building stream processing as a service (SPaaS) using Apache Flink. It introduces Flink's stream processing capabilities and describes how to build a SPaaS offering with different levels of complexity and ease of use. It also covers the Keystone router for simple stream routing, building custom Flink jobs, and techniques for recovering from failures using backfill from Hive or rewinding the Flink job.
This document provides an agenda and overview of SubQuery, an open source project that allows developers to index, transform, and query Substrate chain data. It discusses problems with current query speeds and parachain diversity that SubQuery aims to address. The basics of SubQuery are explained, including the components needed and key concepts like the manifest file, GraphQL schema, and mapping files that define how chain data is extracted, transformed, and persisted. Hands-on exercises are proposed to build a sample project. Production infrastructure for hosting SubQuery projects is also introduced.
The document provides details about a ksqlDB workshop including the agenda, speakers, and logistical information. The agenda includes talks on Kafka, Kafka Streams, and ksqlDB as well as hands-on labs. Attendees are encouraged to ask questions during the Q&A session and provide feedback through an online survey.
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
While working with Hadoop, you'll eventually encounter the need to schedule and run workflows to perform various operations like ingesting data or performing ETL. There are a number of tools available to assist you with this type of requirement and one such tool that we at Clairvoyant have been looking to use is Apache Airflow. Apache Airflow is an Apache Incubator project that allows you to programmatically create workflows through a python script. This provides a flexible and effective way to design your workflows with little code and setup. In this talk, we will discuss Apache Airflow and how we at Clairvoyant have utilized it for ETL pipelines on Hadoop.
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward
AirStream is a realtime stream computation framework that supports Flink as one of its processing engines. It allows engineers and data scientists at Airbnb to easily leverage Flink to build real time data pipelines and feedback loops. Multiple mission critical applications have been built on top of it. In this talk, we will start with an overview of AirStream, and describe how we have designed Airstream to leverage SQL support in Flink to allow users to easily build real time data pipelines. We will go over a few production use cases such as building a user activity profiler and building user identity mapping in realtime. We will also cover how we have integrated Airstream into the data infrastructure ecosystem at Airbnb through easily configurable connectors such as Kafka and Hive that allow users to easily leverage these components in their pipelines.
Pivotal - Advanced Analytics for Telecommunications Hortonworks
Innovative mobile operators need to mine the vast troves of unstructured data now available to them to help develop compelling customer experiences and uncover new revenue opportunities. In this webinar, you’ll learn how HDB’s in-database analytics enable advanced use cases in network operations, customer care, and marketing for better customer experience. Join us, and get started on your advanced analytics journey today!
- Angular is an open-source web application framework originally developed in 2009 that uses HTML as its template language. It is commonly used to create single-page applications.
- Angular uses MVC architecture and two-way data binding between models and views. It has built-in dependency injection which allows for testable and scalable code.
- Key Angular concepts include directives, filters, controllers and scopes which allow manipulating DOM elements and binding data. Modules are used to package code into reusable components.
Top 10 Kubernetes Native Java Quarkus Featuresjclingan
This document discusses the top 10 Kubernetes features supported by Quarkus, an open source Kubernetes-native Java framework. These include one-step deployments to Kubernetes, live coding directly in Kubernetes clusters, accessing ConfigMaps and secrets, exposing health endpoints, using the Kubernetes client library, OpenMetrics and OpenTracing support, Knative deployments, Functions as a Service with Funqy, and efficiency with fast startup times and low memory utilization. The full source code is available on GitHub.
Service Mesh @Lara Camp Myanmar - 02 Sep,2023Hello Cloud
Sai Linnthu is a founding partner at HelloCloud.io and discusses service meshes and Istio. Istio provides a framework-agnostic approach for managing communication policies and observability across cloud-native microservices. While Istio addresses many challenges of microservices, its complexity makes it difficult to use and manage across multiple clouds without additional capabilities like centralized metrics, access logging and lifecycle management.
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
The LarKC project aims to build an integrated pluggable platform for large-scale reasoning. It supports parallelization, distribution, and remote execution. The LarKC platform provides a lightweight core that gives standardized interfaces for combining plug-in components, while the real work is done in the plug-ins. There are three types of LarKC users: those building plug-ins, configuring workflows, and using workflows.
This document summarizes an SDN and cloud computing presentation given by Affan Basalamah and Dr.-Ing. Eueung Mulyana from Institut Teknologi Bandung. It discusses SDN and cloud computing research activities at ITB, including implementing OpenFlow networks, developing SDN courses, and student projects involving OpenFlow, OpenStack, and IPsec VPNs. It also describes forming an SDN research group at ITB to facilitate collaboration between academia, network operators, and vendors on SDN topics.
The document provides an introduction to Typesafe Activator and the Play Framework. It discusses how Activator is a tool that helps developers get started with the Typesafe Reactive Platform and Play applications. It also covers some core features of Play like routing, templates, assets, data access with Slick and JSON, and concurrency with Futures, Actors, and WebSockets.
The document summarizes the state of Icinga monitoring software. It discusses Icinga 1 and the introduction of Icinga 2 which is written in C++ and supports clustering out of the box. Icinga Web 2 is also introduced as a new lightweight web interface. The roadmap focuses on integrating with more third party tools and providing configuration interfaces for Icinga 2 and cluster health monitoring. Attendees are encouraged to try out and provide feedback on Icinga 2 and Icinga Web 2.
- Relay is a library that uses GraphQL as its query language and is designed to efficiently manage data fetching and caching for React applications.
- GraphQL provides an alternative to REST APIs that focuses on only fetching the necessary data in a single request. It allows clients to define precisely the data they need through a type system.
- Relay colocates queries and components to optimize data fetching. It caches data to improve performance and allows components to declare their data requirements through fragments.
Integrating Taverna Player into ScratchpadsRobert Haines
Scratchpads, developed as part of the ViBRANT project, are an online virtual research environment for biodiversity, allowing anyone to share their data and create their own research networks. Sites are hosted at the Natural History Museum London, and offered freely to any scientist.
Sites can focus on specific taxonomic groups, or the biodiversity of a biogeographic region, or indeed any aspect of natural history. Scratchpads are also suitable for societies or for managing and presenting projects. Key features of Scratchpads include: tools to manage biological classifications, bibliography management, media (images, video and audio), rich taxon pages (with structured descriptions, specimen records, and distribution data), and character matrices. Scratchpads support various ways of communicating with site members and visitors such as blogs, forums, newsletters and a commenting system. There are currently 568 Scratchpads with 6,759 active users.
Taverna Player, developed as part of the BioVeL project, enables the running of a workflow within a Ruby-on-rails application. Taverna Player has a REST API that allows inputs to the workflow to be specified, a run to be started and monitored, and the resultant outputs to be retrieved. Any interactions the workflow includes are presented to the user for them to complete. Taverna Player has been released in the RubyGems registry and is used within the BioVeL Portal to run a wide range of biodiversity workflows.
As part of a collaboration between BioVeL and ViBRANT, Taverna Player has been integrated into Scratchpads in two ways. Firstly, workflows can be embedded in a page in the same way a video from YouTube would be embedded; the workflow itself is running on the BioVeL Portal but all set up and interaction is done in the embedded widget within the Scratchpads site. Secondly, the Scratchpads can use the Taverna Player REST API directly; this allows workflows to be run with a higher degree of control and results to be ingested back into the Scratchpads for further analysis. In both cases data can be automatically injected into the workflow run from the host Scratchpads site.
Security is handled at the individual Scratchpads level; each Scratchpads site has its own credentials to access the BioVeL Portal and run workflows. This allows the community within a Scratchpads site to create and share workflow runs that all members have access to by default while preserving privacy if required.
Similar to Collaborative Workflow Development and Experimentation in the Digital Humanities (20)
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
The EuropeanaTech Community and Europeana Foundation are delighted to introduce a new webinar series to explore the opportunities and challenges of working with Artificial Intelligence in the cultural heritage and arts sector.
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
The document discusses the role of libraries in digitization and digital humanities. It provides an overview of the Berlin State Library's digitization efforts including its in-house digitization center that produces 1.7M images annually. It also describes the library's digital collections portal containing over 180,000 digitized documents. Additionally, it outlines several projects involving newspaper digitization, optical character recognition improvement, named entity recognition, and developing an experimental space for digital research.
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
This document discusses challenges and opportunities in analyzing digitized historical newspapers. It describes several projects aimed at improving OCR accuracy using deep learning models, extracting structural information using computer vision and heuristics, and establishing standards for metadata and evaluation. Key challenges include the need for more granular and representative ground truth newspaper data, methods that combine machine learning and domain knowledge, and community efforts around shared tasks, seminars, and an atlas of digitized newspapers to advance interdisciplinary research. The overall goal is to make cultural heritage collections more accessible online through improved digitization and analysis of newspapers.
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
OCR-D is an open source framework for optical character recognition (OCR) of historical printed documents. It consists of a coordination project and 8 module projects that develop technical solutions for challenges in OCR of historical prints. The goals are to standardize metadata, annotations, and formats to enable large-scale OCR of historical texts. OCR-D provides specifications, reference implementations, ground truth data, and scientific workflows to support development and evaluation of OCR tools and methods for historical documents.
Extrablatt: The Latest News on Newspaper Digitisation in Europecneudecker
This document summarizes recent developments in newspaper digitization projects across Europe. It discusses Germany's efforts to establish a national newspaper portal and increase availability of digitized newspapers through a DFG funding call. It also briefly outlines newspaper digitization work in other countries like the UK, Sweden, Denmark, and Switzerland. Finally, it provides an overview of the Europeana Newspapers project and efforts to find a new home for its 10TB of digitized newspaper data, as well as growing interest from digital humanities researchers in utilizing digitized historical newspapers.
The Europeana Newspapers project digitized over 1,000 newspaper titles containing 3.3 million issues from 12 European libraries in 40 languages from 1618-2016. The newspapers were run through optical character recognition to make 12 million pages searchable by keyword. Metadata and scans were made public domain and searchable through the TEL Historic Newspaper Browser, which allows browsing by newspaper, date, and other facets. Researchers have used the collection for various studies and it will relaunch in 2018 with improved search and an interface directly on Europeana, supporting further annotation and transcription of the newspapers.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Collaborative Workflow Development and Experimentation in the Digital Humanities
1. A Service-Oriented Architecture
for Collaborative Workflow
Development and
Experimentation
eHumanities Seminar 2012
University of Leipzig
10-10-2012
Clemens Neudecker, KB @cneudecker
Zeki Mustafa Dogan, SUB-DL
Sven Schlarb, ÖNB @SvenSchlarb
Juan Garcés, GCDH @juan_garces
2. Idea
• Provide web-based versions of tools
(web services)
• Package web services, data and
documentation into ready-to-run
“components” (encapsulation)
• Chain the components to create workflows
via drag-and-drop operation
• Share and use workflows to re-run
experiments and to demonstrate results
3. Background
• High degree of diversity in research topics,
but also tools and frameworks being used
• Technical resources should be easy to
use, well documented, accessible from
anywhere
• Prevent re-inventing of the wheel
4. Requirements
• Interoperability = connect different resources
• Flexibility = easy to deploy and adapt
• Modularity = allow different combinations of tools
• Usability = simple to use for non-technical users
• Re-usability = easy to share with others
• Scalability = apt for large-scale processing
• Sustainability = resources simple to preserve
• Transparency = tools evaluated separately
• Distributed development and deployment
5. Interoperability Framework (IIF)
• Modules:
- Java Wrapper for command line tools
- Web Services (incl. format converters)
- Taverna Workflow Engine
- Client interfaces
- Repository connectors
7. IIF Command Line Wrapper
• Java project, builds using Maven2
• Creates a web service project from
a given tool description (XML)
• Web service exposes SOAP & REST
endpoints and Java API interface
• Requirements: command line call,
no direct user interaction
8.
9.
10.
11.
12. IIF Web Services
• Web services are described by a WSDL
• Input/output data structures
• Data is referenced by URL
• Annotations
• Default values
15. IIF Workflows
• What is a workflow? (Yahoo Pipes, etc.)
• Different kinds of workflows: for a single
command, application, chain of processes
• Main benefit: Encapsulation, Reuse
• Workflows as “components”: include link
to WS endpoint, sample input data and
documentation = ready-to-use resource
• Web 2.0 workflow registry: myExperiment
16.
17. Why workflows?
• “In-silico experimentation”
• Good structuring of experiment setup:
– Challenge/Research question
– Dataset definition
– Processing with algorithms
– Evaluation/Provenance
– Presentation of results
• All this can be modelled into a workflow
18. Integration into Taverna
• Web Services (SOAP and REST)
• Command line tools (SH and SSH)
• Beanshells (can import Java libraries)
• R (statistics)
• Excel, CSV
• Additional service types can be added
through dedicated plug-ins
19. Taverna flavours
• Workbench – local GUI client for Linux,
Windows, OSX
• Command line tool – run workflows from
the command line
• Server – Webapp with REST API and
Java/Ruby client libs
• Web-Wf-Designer – Javascript version for
designing workflows in a browser
23. Client interfaces
• Web service client: create a simple HTML
form from a given web service description
• Taverna client: create a simple HTML form
from a given Taverna workflow description
integration into production and
presentation environments via iframes