A presentation of the SciPipe workflow library, written in Go (Golang), inspired by Flow-based programming, at an internal workshop at Uppsala University, Department of Pharmaceutical Biosciences.
Node.Js: Basics Concepts and Introduction Kanika Gera
Node.js is a JavaScript runtime built on Chrome's V8 engine that allows JavaScript to run on the server side. It uses asynchronous and event-driven programming to handle thousands of concurrent connections with minimal overhead. The presentation introduces Node.js and its architecture, explaining how it uses a single thread with non-blocking I/O and an event loop to handle asynchronous operations efficiently. Examples are provided to illustrate synchronous vs asynchronous code. Common use cases for Node.js include real-time applications, chat/messaging, and high concurrency applications, while it is less suitable for heavy computation or large web apps.
Nowadays we all seem to be working with small independent services that need to talk with numerous other services. This is a problem because when developing your service, you need to have a working environment—but bringing up all your dependencies is often not an option.
In this talk, I will take you through our journey of creating a mock server to increase dev speed, and how it allowed us to write better tests.
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
This presentation will compare and contrast application behavior in Java 7 with Java 8, particularly focusing on memory management and usage. Several code examples are presented to show how to recognize and respond to common pitfalls.
Scoutbee - Knowledge graphs at Scoutbee with Neo4jNeo4j
Scoutbee is building a knowledge graph with Neo4j to power its supplier discovery and intelligence platform. The knowledge graph will integrate Scoutbee's data with external sources to create rich supplier profiles connected by relationships. This will allow Scoutbee to provide explainable answers to customer questions about suppliers and their relationships. The knowledge graph is part of Scoutbee's effort to democratize data access and build a foundation for continual data-driven process improvements.
It is a basic presentation which can help you understand the basic concepts about Graphql and how it can be used to resolve the frontend integration of projects and help in reducing the data fetching time
This presentation also explains the core features of Graphql and why It is a great alternative for REST APIs along with the procedure with which we can integrate it into our projects
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help regulate emotions and stress levels.
Node.Js: Basics Concepts and Introduction Kanika Gera
Node.js is a JavaScript runtime built on Chrome's V8 engine that allows JavaScript to run on the server side. It uses asynchronous and event-driven programming to handle thousands of concurrent connections with minimal overhead. The presentation introduces Node.js and its architecture, explaining how it uses a single thread with non-blocking I/O and an event loop to handle asynchronous operations efficiently. Examples are provided to illustrate synchronous vs asynchronous code. Common use cases for Node.js include real-time applications, chat/messaging, and high concurrency applications, while it is less suitable for heavy computation or large web apps.
Nowadays we all seem to be working with small independent services that need to talk with numerous other services. This is a problem because when developing your service, you need to have a working environment—but bringing up all your dependencies is often not an option.
In this talk, I will take you through our journey of creating a mock server to increase dev speed, and how it allowed us to write better tests.
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
This presentation will compare and contrast application behavior in Java 7 with Java 8, particularly focusing on memory management and usage. Several code examples are presented to show how to recognize and respond to common pitfalls.
Scoutbee - Knowledge graphs at Scoutbee with Neo4jNeo4j
Scoutbee is building a knowledge graph with Neo4j to power its supplier discovery and intelligence platform. The knowledge graph will integrate Scoutbee's data with external sources to create rich supplier profiles connected by relationships. This will allow Scoutbee to provide explainable answers to customer questions about suppliers and their relationships. The knowledge graph is part of Scoutbee's effort to democratize data access and build a foundation for continual data-driven process improvements.
It is a basic presentation which can help you understand the basic concepts about Graphql and how it can be used to resolve the frontend integration of projects and help in reducing the data fetching time
This presentation also explains the core features of Graphql and why It is a great alternative for REST APIs along with the procedure with which we can integrate it into our projects
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help regulate emotions and stress levels.
This document provides an overview of graph databases and their use cases. It begins with definitions of graphs and graph databases. It then gives examples of how graph databases can be used for social networking, network management, and other domains where data is interconnected. It provides Cypher examples for creating and querying graph patterns in a social networking and IT network management scenario. Finally, it discusses the graph database ecosystem and how graphs can be deployed for both online transaction processing and batch processing use cases.
YAML Engineering: why we need a new paradigmRaphaël PINSON
YAML has become the de-facto standard to express resources in many fields linked to DevOps practices. What are YAML’s strengths and weaknesses, and what are the other options going forward?
The Science of a Great Career in Data ScienceKate Matsudaira
A data scientist's job is all about details, but a data scientist's career path is much more ambiguous. When you're working in a hot, brand new field, the traditional career ladder just doesn't apply.
So how do you succeed when there is no clear path for success? How can you be amazing at your job when "amazing" is still being defined? It starts with knowing exactly why your job is so different from others (there are no right answers), and learning how to explain your complicated work in an uncomplicated way.
In this talk, you'll learn how to achieve success by leveraging your unique role to create the career you really want.
Business Enterprise Mapping discusses the purpose of having a process for your business and making sure that each process is satisfying a customer need, and providing customers value.
Drupalcon 2023 - How Drupal builds your pages.pdfLuca Lusso
Have you ever wondered what happens when a HTTP request reaches your Drupal web site? How does Drupal find the correct code to execute? Which parts of the page come from the cache and which ones are built from scratch? Which queries are executed against the database? And, why not, how much time and memory the request requires to be converted into a response?
Whether you are a contrib developer or a simple curious person the answers to those questions will let you better understand how Drupal 10 works.
The WebProfiler module can help you in discovering how all the different subsystems of Drupal 10 interact to take a request and return a response. WebProfiler collects data during the build of each page of the site and lets you easily explore the internals of Drupal 10.
Structuring Spark: DataFrames, Datasets, and StreamingDatabricks
This document discusses how Spark provides structured APIs like SQL, DataFrames, and Datasets to organize data and computation. It describes how these APIs allow Spark to optimize queries by understanding their structure. The document outlines how Spark represents data internally and how encoders translate between this format and user objects. It also introduces Spark's new structured streaming functionality, which allows batch queries to run continuously on streaming data using the same API.
Heard about graph databases? Curious about what they are and how they work? Want to know where they're best used? Then this is the session for you!
In this workshop we will:
- Introduce you to graph databases
- Cover approaches for identifying graph-shaped problems
- Get our hands on our very first graph database experience where we will load and query data, using Neo4j Aura Free
We'll also cover what resources are available, and how to continue your graph journey.
The document discusses keys to the future of mobile video advertising, including pre-cached ads that load instantly, value-added ads that give users benefits in exchange for their attention, and making these ad formats available programmatically. It argues that these approaches can improve the user experience of mobile advertising by reducing load times, giving users control and incentives to engage with ads, and addressing issues like ad blocking.
Neo4j Bloom is a breakthrough graph communication and visualization product that allows graph novices and experts the ability to communicate and share their work, thoughts, and plans with peers, managers, and executives. Its illustrative, codeless search to storyboard design makes it the ideal interface for non-technical project participants to share in the innovative work of their graph analytics and development teams.
Hexagonal architecture (a.k.a. Ports and Adapters) is a fabulous pattern that has more advantages than the ones for which it has been originally created.
One can think in an orthodox vision that patterns do not evolve. That it is important to keep Alistair Cockburn’s pattern like it was described back in the days.
One can think that some patterns may evolve, that Hexagonal Architecture has more facets than we think. This session will present both the original pattern in detail, and some alternative versions (related to Domain Driven Design).
Session made at Socrates Soltau 2022
https://www.socrates-conference.de/foundations
Download a full version of the report at:
www.psfk.com/report/future-of-work-2016/
The PSFK Future of Work Report deep dives into the talent and development landscape to identify the conditions and qualities that cultivate tomorrow’s leaders in the workplace. In return for investing in greater opportunity and education, employers will reap the rewards of increased efficiency, engagement and entrepreneurship—reducing mistrust, stress and ultimately turnover across teams.
Additionally, PSFK has developed six workplace visions that were inspired by 10 strategies to develop a new era of internal leadership. These boundary-pushing product and workplace concepts reimagine how teams can onboard employees, expand the office, and prevent miscommunication.
GraphQL has made an excellent entree on the API scene. It is reintroducing the original concepts of RPC-style architecture with a revolutionary API consumer-oriented approach. It brought a new option to the stalled waters of RESTful APIs. But more importantly, GraphQL brought back the principal question: What is the right API architectural style for my project?
If you are building an API, this talk should give you enough of the theoretical background to make the right API-decision for your product.
In this talk, we will take a critical look at predominant API architectural style – RESTful APIs and put it in contrast to GraphQL and Hypermedia APIs. We will discuss the expected properties of distributed systems, the consequences of choosing a particular API style, and reflect these findings in the pros and cons of the popular methods.
Today we all live and work in the Internet Century, where technology is roiling the business landscape, and the pace of change is only accelerating.
In their new book How Google Works, Google Executive Chairman and ex-CEO Eric Schmidt and former SVP of Products Jonathan Rosenberg share the lessons they learned over the course of a decade running Google.
Covering topics including corporate culture, strategy, talent, decision-making, communication, innovation, and dealing with disruption, the authors illustrate management maxims with numerous insider anecdotes from Google’s history.
In an era when everything is speeding up, the best way for businesses to succeed is to attract smart-creative people and give them an environment where they can thrive at scale. How Google Works is a new book that explains how to do just that.
This is a visual preview of How Google Works. You can pick up a copy of the book at www.howgoogleworks.net
15 Quotes To Nurture Your Creative Soul!DesignMantic
Every now and then, we all crave inspiration to get started. but often times, inspiration is hardest is to find when it is needed the most. but powerful words almost always do the trick. They have power that is undeniable. So for all the creative souls out there, here we share some remarkable sayings from legends to feed your mind and strengthen your design game ...
Remember, sharing is caring! :)
14 Tips to Entrepreneurs to start the Right StuffPatrick Stähler
14 tips for Entrepreneurs how they can develop from an idea the Right Thing. The Right is being loved by your customers, gives meaning to you and employees and is profitable. Finding and later doing the Right Thing is an agile and iterative learning journey. With these 14 tips you can profit from the experience of successful entrepreneurs since you do not have to experience and fail by yourself. Hopefully, the slide deck helps other entrepreneurs.
00- Indonesia's PPP - Bidding Process - Investor GuideH2O Management
This document provides an overview of public-private partnerships (PPP) for infrastructure investment in Indonesia, including:
1) PPPs play a key role in meeting Indonesia's infrastructure investment needs estimated at $150 billion from 2010-2014, with the government targeting $94 billion from private investors.
2) The guide outlines the principal parties involved in PPPs - the government contracting agency, the inter-ministerial Policy Committee for Accelerating Infrastructure Provision, and the Public-Private Partnership Central Unit.
3) It provides a high-level overview of Indonesia's legal framework and key features for PPPs, as well as the multi-step PPP development and implementation process.
Les tests unitaires se sont pas limités au code des applications, des tests peuvent également être effectués sur les données et les schémas des bases de données.
Conférence donnée lors du meetup PostgreSQL le 22 juin 2016 à Nantes
AddisDev Meetup ii: Golang and Flow-based ProgrammingSamuel Lampa
The document discusses flow-based programming (FBP), its history and concepts. FBP defines applications as networks of processes that exchange data through message passing over predefined connections. This allows the processes to be reconnected without changing their code. The document provides examples of FBP networks and components implemented in various languages like Go, Java and JavaScript. It also discusses the benefits of FBP and its growing popularity with implementations like NoFlo.
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
Contains a small background on the semantic web, and shows how Prolog is thought to be used from inside Bioclipse research software for RDF data handling.
This document provides an overview of graph databases and their use cases. It begins with definitions of graphs and graph databases. It then gives examples of how graph databases can be used for social networking, network management, and other domains where data is interconnected. It provides Cypher examples for creating and querying graph patterns in a social networking and IT network management scenario. Finally, it discusses the graph database ecosystem and how graphs can be deployed for both online transaction processing and batch processing use cases.
YAML Engineering: why we need a new paradigmRaphaël PINSON
YAML has become the de-facto standard to express resources in many fields linked to DevOps practices. What are YAML’s strengths and weaknesses, and what are the other options going forward?
The Science of a Great Career in Data ScienceKate Matsudaira
A data scientist's job is all about details, but a data scientist's career path is much more ambiguous. When you're working in a hot, brand new field, the traditional career ladder just doesn't apply.
So how do you succeed when there is no clear path for success? How can you be amazing at your job when "amazing" is still being defined? It starts with knowing exactly why your job is so different from others (there are no right answers), and learning how to explain your complicated work in an uncomplicated way.
In this talk, you'll learn how to achieve success by leveraging your unique role to create the career you really want.
Business Enterprise Mapping discusses the purpose of having a process for your business and making sure that each process is satisfying a customer need, and providing customers value.
Drupalcon 2023 - How Drupal builds your pages.pdfLuca Lusso
Have you ever wondered what happens when a HTTP request reaches your Drupal web site? How does Drupal find the correct code to execute? Which parts of the page come from the cache and which ones are built from scratch? Which queries are executed against the database? And, why not, how much time and memory the request requires to be converted into a response?
Whether you are a contrib developer or a simple curious person the answers to those questions will let you better understand how Drupal 10 works.
The WebProfiler module can help you in discovering how all the different subsystems of Drupal 10 interact to take a request and return a response. WebProfiler collects data during the build of each page of the site and lets you easily explore the internals of Drupal 10.
Structuring Spark: DataFrames, Datasets, and StreamingDatabricks
This document discusses how Spark provides structured APIs like SQL, DataFrames, and Datasets to organize data and computation. It describes how these APIs allow Spark to optimize queries by understanding their structure. The document outlines how Spark represents data internally and how encoders translate between this format and user objects. It also introduces Spark's new structured streaming functionality, which allows batch queries to run continuously on streaming data using the same API.
Heard about graph databases? Curious about what they are and how they work? Want to know where they're best used? Then this is the session for you!
In this workshop we will:
- Introduce you to graph databases
- Cover approaches for identifying graph-shaped problems
- Get our hands on our very first graph database experience where we will load and query data, using Neo4j Aura Free
We'll also cover what resources are available, and how to continue your graph journey.
The document discusses keys to the future of mobile video advertising, including pre-cached ads that load instantly, value-added ads that give users benefits in exchange for their attention, and making these ad formats available programmatically. It argues that these approaches can improve the user experience of mobile advertising by reducing load times, giving users control and incentives to engage with ads, and addressing issues like ad blocking.
Neo4j Bloom is a breakthrough graph communication and visualization product that allows graph novices and experts the ability to communicate and share their work, thoughts, and plans with peers, managers, and executives. Its illustrative, codeless search to storyboard design makes it the ideal interface for non-technical project participants to share in the innovative work of their graph analytics and development teams.
Hexagonal architecture (a.k.a. Ports and Adapters) is a fabulous pattern that has more advantages than the ones for which it has been originally created.
One can think in an orthodox vision that patterns do not evolve. That it is important to keep Alistair Cockburn’s pattern like it was described back in the days.
One can think that some patterns may evolve, that Hexagonal Architecture has more facets than we think. This session will present both the original pattern in detail, and some alternative versions (related to Domain Driven Design).
Session made at Socrates Soltau 2022
https://www.socrates-conference.de/foundations
Download a full version of the report at:
www.psfk.com/report/future-of-work-2016/
The PSFK Future of Work Report deep dives into the talent and development landscape to identify the conditions and qualities that cultivate tomorrow’s leaders in the workplace. In return for investing in greater opportunity and education, employers will reap the rewards of increased efficiency, engagement and entrepreneurship—reducing mistrust, stress and ultimately turnover across teams.
Additionally, PSFK has developed six workplace visions that were inspired by 10 strategies to develop a new era of internal leadership. These boundary-pushing product and workplace concepts reimagine how teams can onboard employees, expand the office, and prevent miscommunication.
GraphQL has made an excellent entree on the API scene. It is reintroducing the original concepts of RPC-style architecture with a revolutionary API consumer-oriented approach. It brought a new option to the stalled waters of RESTful APIs. But more importantly, GraphQL brought back the principal question: What is the right API architectural style for my project?
If you are building an API, this talk should give you enough of the theoretical background to make the right API-decision for your product.
In this talk, we will take a critical look at predominant API architectural style – RESTful APIs and put it in contrast to GraphQL and Hypermedia APIs. We will discuss the expected properties of distributed systems, the consequences of choosing a particular API style, and reflect these findings in the pros and cons of the popular methods.
Today we all live and work in the Internet Century, where technology is roiling the business landscape, and the pace of change is only accelerating.
In their new book How Google Works, Google Executive Chairman and ex-CEO Eric Schmidt and former SVP of Products Jonathan Rosenberg share the lessons they learned over the course of a decade running Google.
Covering topics including corporate culture, strategy, talent, decision-making, communication, innovation, and dealing with disruption, the authors illustrate management maxims with numerous insider anecdotes from Google’s history.
In an era when everything is speeding up, the best way for businesses to succeed is to attract smart-creative people and give them an environment where they can thrive at scale. How Google Works is a new book that explains how to do just that.
This is a visual preview of How Google Works. You can pick up a copy of the book at www.howgoogleworks.net
15 Quotes To Nurture Your Creative Soul!DesignMantic
Every now and then, we all crave inspiration to get started. but often times, inspiration is hardest is to find when it is needed the most. but powerful words almost always do the trick. They have power that is undeniable. So for all the creative souls out there, here we share some remarkable sayings from legends to feed your mind and strengthen your design game ...
Remember, sharing is caring! :)
14 Tips to Entrepreneurs to start the Right StuffPatrick Stähler
14 tips for Entrepreneurs how they can develop from an idea the Right Thing. The Right is being loved by your customers, gives meaning to you and employees and is profitable. Finding and later doing the Right Thing is an agile and iterative learning journey. With these 14 tips you can profit from the experience of successful entrepreneurs since you do not have to experience and fail by yourself. Hopefully, the slide deck helps other entrepreneurs.
00- Indonesia's PPP - Bidding Process - Investor GuideH2O Management
This document provides an overview of public-private partnerships (PPP) for infrastructure investment in Indonesia, including:
1) PPPs play a key role in meeting Indonesia's infrastructure investment needs estimated at $150 billion from 2010-2014, with the government targeting $94 billion from private investors.
2) The guide outlines the principal parties involved in PPPs - the government contracting agency, the inter-ministerial Policy Committee for Accelerating Infrastructure Provision, and the Public-Private Partnership Central Unit.
3) It provides a high-level overview of Indonesia's legal framework and key features for PPPs, as well as the multi-step PPP development and implementation process.
Les tests unitaires se sont pas limités au code des applications, des tests peuvent également être effectués sur les données et les schémas des bases de données.
Conférence donnée lors du meetup PostgreSQL le 22 juin 2016 à Nantes
AddisDev Meetup ii: Golang and Flow-based ProgrammingSamuel Lampa
The document discusses flow-based programming (FBP), its history and concepts. FBP defines applications as networks of processes that exchange data through message passing over predefined connections. This allows the processes to be reconnected without changing their code. The document provides examples of FBP networks and components implemented in various languages like Go, Java and JavaScript. It also discusses the benefits of FBP and its growing popularity with implementations like NoFlo.
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
Contains a small background on the semantic web, and shows how Prolog is thought to be used from inside Bioclipse research software for RDF data handling.
Hooking up Semantic MediaWiki with external tools via SPARQLSamuel Lampa
This document discusses integrating Semantic MediaWiki (SMW) with external tools using the RDFIO extension. It describes the motivation for RDFIO as allowing manual schema exploration, automated data generation, and community collaboration. RDFIO solves problems with SMW by allowing the choice of wiki page titles for RDF entities and exporting RDF in the original import format. Real-world uses of RDFIO include visualizing data on SMW pages and pulling data from R into SMW using SPARQL queries. The integration of SMW and Bioclipse is also discussed.
This document provides a cheat sheet overview of key concepts in the IRODS rule language, including numeric and string literals, arithmetic and comparison operators, functions for strings, lists, tuples, if/else statements, foreach loops, defining functions and rules, handling errors, and inductive data types. It describes syntax for defining data types using constructors, and using pattern matching to define functions over data types.
This document discusses the advantages of using Make and Snakemake for bioinformatics workflows compared to other options like shell scripts and Perl. It outlines key features of Make like its focus on files, ability to restart safely, and support for parallelization. The document then introduces Snakemake as building upon Make's strengths while addressing weaknesses like lack of control flow and cluster support. It provides examples of using Snakemake with cluster resources and utilities like logs, dry-runs and workflow diagrams. Finally, it discusses best practices like versioning, caching, integration with R, leaving paper trails and ensuring reproducibility.
RDFIO is an RDF import and query extension for MediaWiki. It allows users to import RDF triples into MediaWiki and query the triples using SPARQL. The architecture includes an in-memory RDF store to hold the triples and a SPARQL endpoint for querying. Future plans include enhancing editing capabilities via templates and importing triples on a per-page basis. Samuel Lampa presented RDFIO and is looking for additional ideas to improve the extension.
Continuous modeling - automating model building on high-performance e-Infrast...Ola Spjuth
This document discusses continuous modeling and automating model building on high-performance infrastructures. It notes the new challenges of data management, analysis, scaling, and automation posed by high-throughput technologies. The author's research focuses on enabling high-throughput biology through large-scale predictive modeling, evaluating performance, and automating model re-building. Predictive toxicology and pharmacology are becoming data-intensive due to more data sources. The document explores modeling large datasets on high-performance computing infrastructures and whether workflow systems or cloud/Big Data frameworks could improve modeling.
Samuel Lampa presented his MSc thesis on integrating SWI-Prolog as a semantic querying tool in Bioclipse. He demonstrated [1] how SWI-Prolog can be used for semantic querying of biological data in RDF format within Bioclipse, [2] examples of SPARQL and Prolog code used to perform semantic queries, and [3] benchmarking of Prolog's performance as a semantic querying tool. The work adds new semantic querying functionality to Bioclipse using SWI-Prolog and demonstrates its ability to efficiently query biological data.
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
Christian Frech presented on using Docker and Anduril to create reproducible bioinformatics pipelines. Anduril is a pipeline framework that aims to make pipelines modular, bundled with their execution environment, and able to be run on clusters. It uses a proprietary scripting language but can embed other languages. Frech demonstrated an RNA-seq analysis pipeline built in Anduril, which generated QC plots, differential expression results, network and enrichment analyses. While adoption of Anduril has been limited by its scripting language, Docker can be used to containerize components and make pipelines fully reproducible and portable.
Agile large-scale machine-learning pipelines in drug discoveryOla Spjuth
This document discusses challenges in scaling machine learning for drug discovery as data grows. The author describes their work developing automated workflows and pipelines for building predictive models on large datasets using techniques like Hadoop, Spark, and cloud computing. Their goal is to enable non-experts to build accurate models and make predictions in real-time as structures are modified. The document outlines several projects applying these techniques to problems like site-of-metabolism prediction, target prediction, and next-generation sequencing analysis. It evaluates challenges in scaling modeling to many datasets and targets on high performance computing clusters and private clouds.
Neuro4j Workflow is an open-source Java workflow engine that allows for faster development through a drag-and-drop interface. It provides an eclipse-based editor to graphically create workflows and allows developers to easily identify bottlenecks. Neuro4j compiles workflows into Java classes that can be run on the JVM independently of other frameworks. It supports debugging workflows locally or remotely in Neuro4j Studio.
This document provides an overview of flow-based programming (FBP). FBP is a programming paradigm where applications are defined as networks of black box processes that exchange data through predefined connections. These connections can be redefined without changing the internal processes, allowing for endless reconfiguration. FBP was invented in the 1960s and has seen a resurgence of interest with tools like NoFlo that allow building distributed applications as connected processes. The document discusses several open source FBP implementations and frameworks and provides examples of how FBP has been used to build applications and bioinformatics libraries.
This document discusses flow-based programming with Elixir. It begins with an introduction to the speaker and overview of topics to be covered. It then covers conventional programming versus flow-based programming, using the "telegram problem" as an example. The document discusses using Elixir's new GenStage feature to implement the telegram problem in a flow-based manner, with independent, asynchronous components communicating via message passing. It also discusses how GenStage dispatchers allow for data routing and partitioning in flow-based applications.
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Samuel Lampa
A very quick overview of how Vagrant, Ansible and Docker fits nicely together as a very productive and flexible solution for creating automated development environments.
Using Mikko Koppanen's PHP ZMQ extension we will look at how you can easily distribute work to background processes, provide flexible service brokering for your next service oriented architecture, and manage caches efficiently and easily with just PHP and the ZeroMQ libraries. Whether the problem is asynchronous communication, message distribution, process management or just about anything, ZeroMQ can help you build an architecture that is more resilient, more scalable and more flexible, without introducing unnecessary overhead or requiring a heavyweight queue manager node.
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsJonas Bonér
The document discusses the need for new tools and approaches for building event-driven, scalable, resilient, and responsive systems. It notes that the demands on applications have changed with the rise of mobile devices, multi-core architectures, and cloud computing. Systems now need to be interactive, responsive, and collaborative. The document advocates building systems that react to events, load, failure, and users using asynchronous messaging and avoiding shared mutable state. It discusses various reactive programming approaches like actors, agents, futures, and reactive extensions that enable building such systems.
Reactive Streams: Handling Data-Flow the Reactive WayRoland Kuhn
Building on the success of Reactive Extensions—first in Rx.NET and now in RxJava—we are taking Observers and Observables to the next level: by adding the capability of handling back-pressure between asynchronous execution stages we enable the distribution of stream processing across a cluster of potentially thousands of nodes. The project defines the common interfaces for interoperable stream implementations on the JVM and is the result of a collaboration between Twitter, Netflix, Pivotal, RedHat and Typesafe. In this presentation I introduce the guiding principles behind its design and show examples using the actor-based implementation in Akka.
This document discusses Apache Airflow, a workflow management platform. It provides an overview of Airflow including its anatomy with Python job definitions, a rich CLI, and web UI. It also discusses Qubole's evaluation of options like Oozie, Pinball, and Luigi that led them to select Airflow. Key aspects of productionizing Airflow at Qubole included availability, reliability, security, and usability.
Jupyter notebooks have arrived to stay as a means to document the scientific analysis protocol, as well as to provide executable recipes shared seamlessly among the community. This has triggered the rise of a plethora of complementary tools and services associated to them. This talk will cover different possibilities to use Jupyter notebooks and JupyterLab interface. We will start with the description of their basic functionalities, as well as functionality extensions not widely known by the community. We will describe how to take advantage of their cross-language capabilities to enhance collaborative work, and also use them as complementary assets in the paper publication process to provide reproducibility of the results. Other aspects on how to deal with modularity and scalability of long complex notebooks will be covered, and we will see several platforms for rendering and execution others then the browser and the local desktop. We will finish on how they are actually being used together with Docker and Binder as part of the versioned executable documentation of a project like Gammapy.
Performance optimization techniques for Java codeAttila Balazs
The presentation covers the the basics of performance optimizations for real-world Java code. It starts with a theoretical overview of the concepts followed by several live demos
showing how performance bottlenecks can be diagnosed and eliminated. The demos include some non-trivial multi-threaded examples
inspired by real-world applications.
This document summarizes a workshop on the Tulipp project, which aims to develop ubiquitous low-power image processing platforms. The workshop covered shortcomings of existing platforms, introduced the Maestro real-time operating system as the reference platform, and described the concept of the Tulipp project to provide an operating system and tools to support heterogeneous architectures including FPGA and multi-core processors. Attendees participated in hands-on labs demonstrating how to build applications with Maestro, leverage OpenMP for parallelism, and use SDSoC tools to automatically accelerate functions in FPGA hardware.
This document provides an overview of IGA Workflow, a web-based tool for managing samples in a wet lab from receipt to sequencing results. Key features include tracking samples, libraries, and pools; configuring sequencing runs; integrating analysis pipelines; and developing a genome browser to view results. The tool was created using Django and integrates technologies like Celery, Redis, PostgreSQL, and nginx to provide scalable pipeline management and optimize computational resources for analysis of hundreds of samples. Future plans include direct sample input from customers and integration of barcodes.
How to write a well-behaved Python command line applicationgjcross
Tutorial #1 from PyCon AU 2012
Python is a fantastic scripting language. It is easy to hack up quick scripts for all sorts of problems - without a lot more effort, that hack can become a robust, easily maintained command line application that your users love.
This tutorial covers how to write useful, well-behaved command line applications that are a joy to use:
* Easily process command line options
* Write a script that can be used interactively or as a filter
* Display help to the user
* Gracefully handle and report errors, to the user and the shell
* Trap and process signals in a robust manner
* Create an easily configured application
* Use a range of the Python standard library modules for easier command line scripting
* Test your application
* Set up the supporting files that any well-behaved application should have, eg. a man page
* How to package your application for other people to use
This tutorial will assume a very basic knowledge of Python and some familiarity with the command line environment of your choice.
The document provides an introduction and overview of FireWorks workflow software. Some key points:
- FireWorks is an open-source, Python-based workflow management software that uses MongoDB and is pip-installable.
- It is used by several large DOE projects and materials science groups for tasks like materials modeling, machine learning, and document processing. Over 100 million CPU-hours have been used with everyday production use.
- FireWorks allows for very dynamic workflows that can modify themselves intelligently and add/remove tasks over long periods of time in response to results. It also features job detection and status persistence.
PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton
Monitoring tools record the result of what happened to your web application when a problem arises, but for some classes of problems, monitoring systems are only a starting point. Sometimes it is necessary to take more intrusive steps to plan for the unexpected by embedding mechanisms that will allow you to interact with a live deployed web application and extract even more detailed information.
Two popular tools for doing Machine Learning on top of JVM ecosystem is H2O and SparkML. This presentation compares these two tools as Machine Learning libraries (Didn't consider Spark's Data Munjing perspective). This work was done during June of 2018.
Slides from the 30 minutes long version of "Wait, IPython can do that?!" presentation. I'm talking about some basic and advanced uses of IPython. For a a longer, 45 minutes long version of the slides, check: https://www.slideshare.net/SebastianWitowski/wait-ipython-can-do-that-154464752
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
Bryan McCoid discusses using eBPF, XDP, and io_uring for high performance networking. XDP allows programs to process packets in the kernel without loading modules. AF_XDP sockets use eBPF to route packets between kernel and userspace via ring buffers. McCoid is building a Rust runtime called Glommio to interface with these techniques. The runtime integrates with io_uring and allows multiple design patterns for receiving packets from AF_XDP sockets.
1) Callgraph analysis of ATLAS software identified clusters of heavily called functions that could benefit from inlining to reduce instruction counts. Inlining requires changes to code and use of link-time optimization with profile guidance.
2) Avoiding position independent code may improve performance but reduce code sharing. Static libraries could allow link-time optimization.
3) Tools like IgProf, SystemTap and perf events can profile memory and performance, but a visualizer is needed to analyze object-oriented software. Sampling branch records may improve basic block counts.
Presentation from the 4th Athens Gophers Meetup.
At a glance we present:
- why we introduced a new language in the organization and why that
was Go
- how we approached the transition
- some of the projects we built in Go
- the challenges we faced and the lessons we learned in the process
This document provides an overview of building an Apache Apex application, including key concepts like DAGs, operators, and ports. It also includes an example "word count" application and demonstrates how to define the application and operators, and build Apache Apex from source code. The document outlines the sample application workflow and includes information on resources for learning more about Apache Apex.
Building Your First Apache Apex ApplicationApache Apex
This document provides an overview of building an Apache Apex application, including key concepts like DAGs, operators, and ports. It also includes an example "word count" application and demonstrates how to define the application and operators, and build Apache Apex from source code. The document outlines the sample application workflow and includes information on resources for learning more about Apache Apex.
Great Tools Heavily Used In Japan, You Don't Know.Junichi Ishida
The document discusses Japanese Perl developers who attended YAPC::EU 2015. It introduces many popular Perl modules created by Japanese developers, such as WebService::Simple for making web service requests, Riji for creating blogs, and GrowthForecast for visualizing metrics graphs. It encourages attendees to talk to the Japanese developers about their work or any questions. It emphasizes that Japanese developers prioritize speed and simplicity in their modules due to their culture of valuing efficiency.
Timothy Spann provides an overview of Apache NiFi, an open source dataflow software. Some key points about NiFi include:
- It provides guaranteed data delivery, buffering, prioritized queuing, and data provenance.
- It supports over 60 source connectors and has hundreds of processors for handling different data formats.
- The architecture includes repositories for storing metadata and provenance data, and supports clustering.
- Spann discusses best practices for using NiFi such as avoiding spaghetti flows, leveraging parameters and templates, and upgrading to the latest version. He also demonstrates how to consume data from sources like MQTT and FTP.
This document compares Apache Spark and Apache Flink. Both are open-source platforms for distributed data processing. Spark was created in 2009 at UC Berkeley and donated to the Apache Foundation in 2013. It uses resilient distributed datasets (RDDs) and lazy evaluation. Flink was started in 2010 as a collaboration between universities in Germany and became an Apache project in 2014. It uses cyclic data flows and supports both batch and stream processing. While Spark is currently more mature with more components and community support, Flink claims to be faster for stream and batch processing. Overall, both platforms continue to evolve and improve.
Similar to SciPipe - A light-weight workflow library inspired by flow-based programming (20)
Using Flow-based programming to write tools and workflows for Scientific Comp...Samuel Lampa
The document summarizes Samuel Lampa's talk on using flow-based programming for scientific computing. It provides biographical information on Samuel Lampa, including his background in pharmaceutical bioinformatics and current work. It then gives an overview of flow-based programming, describing it as using black box processes connected by data flows, with connections specified separately from processes. Benefits mentioned include easy testing, monitoring, and changing connections without rewriting components. Examples of using FBP in Go are also presented.
Linked Data for improved organization of research dataSamuel Lampa
Slides for a talk at a Farmbio BioScience Seminar May 18, 2018, at http://farmbio.uu.se introducing Linked Data as a way to manage research data in a way that can better keep track of provenance, make its semantics more explicit, and make it more easily integrated with other data, and consumed by others, both humans and machines.
How to document computational research projectsSamuel Lampa
These slides are from an internal meeting at pharmb.io where we discussed ways to improve documentation of our internal computational research projects. The winning solutions turns out to be markdown files, versioned with git. The slides explains a little bit about why.
Reproducibility in Scientific Data Analysis - BioScience SeminarSamuel Lampa
Slides for a talk held at BioScience Seminar at Dept. of Pharmaceutical BioSciences at Uppsala University on December 16, 2016.
The event webpage: http://www.farmbio.uu.se/calendar/kalendarium-detaljsida/?eventId=22496
Structure of the talk:
Reproducibility in Scientific Data Analysis ...
● What is it?
● Why is it important?
● Why is it a problem?
● What can we do about it?
● What does pharmb.io do about it?
First encounter with Elixir - Some random thingsSamuel Lampa
The document discusses Samuel Lampa's first encounter with the programming language Elixir. It covers calculating GC ratios in DNA sequences, provides a DNA sequence example file, and compares Elixir processes to Go channels, noting that Elixir processes are named mailboxes tied to a process while Go channels are anonymous and separate from routines. The document is authored by Samuel Lampa from Uppsala University.
Profiling go code a beginners tutorialSamuel Lampa
This document summarizes a presentation on profiling Go code. It introduces pprof, a tool created by Dave Cheney that makes profiling Go code easier. It demonstrates pprof by profiling a string processing program and showing the performance improvements from various optimizations. It recommends resources for learning more about profiling Go programs with pprof and high performance Go programming.
My lightning talk at Go Stockholm meetup Aug 6th 2013Samuel Lampa
This document discusses flow-based programming, an approach to programming invented in the 1970s where the flow of data between components is emphasized. It was successfully used in several domains including data analysis, banking software, and digital signal processing. New implementations of flow-based programming include NoFlo for Node.js and GoFlow, an open-source implementation in Go. More information on flow-based programming can be found on the listed websites.
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
This document summarizes Samuel Lampa's 2010 degree project on integrating SWI-Prolog for semantic reasoning in Bioclipse. It compares SWI-Prolog to other semantic tools like Jena and Pellet in terms of speed and expressiveness when querying biochemical data. Prolog code is presented for querying NMR spectrum data that finds molecules with peak values near a search value. SPARQL queries for the same use case are also shown. Observations indicate Prolog is fastest while SPARQL is easier to understand but Prolog allows easier parameter changes and logic reuse. A final presentation was planned for April 28, 2010.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Climate Impact of Software Testing at Nordic Testing Days
SciPipe - A light-weight workflow library inspired by flow-based programming
1. SciPipe
A light-weight workflow library
inspired by flow-based
programming
Samuel Lampa, @smllmp, bionics.it
Dept. Pharm. Biosci. UU, 2016-04-28
2. Top light-weight workflow tools
Snakemake
● Great for short one-off explorative stuff
● Tricky for complex graphs
Bpipe
● Easy to use for highly linear workflows
● Not so easy with branching workflows
Nextflow
● Dataflow means dynamic scheduling possible(!)
● Own way of organizing outputs
● No “re-usable components” support
3. SciLuigi and SciPipe
SciLuigi
● Great re-usable components story
● Highly customizable output file naming
● Easy to extend API
● No dynamic scheduling :(
● Performance problems with more than 64 workers
SciPipe
● (Same benefits as SciLuigi)
● Also: Allows dynamic scheduling
● Also: Much lower resource usage
(1000s of workers is OK)
● Also: Simpler, less code, less maintenance
● Also: High-performance for in-line components
4. SciPipe in brief
● Website: scipipe.org
● Simple, very little code => maintainable
● Write workflows in a subset of Go(lang)
● Execute readable .go-files:
go run myworkflow.go
● Optional compilation to static executable files:
go build; ./myworkflow
● No new language. Use existing Go tooling:
● Editors, Debuggers, Linters, Profilers ...
6. Flow-based programming principles
● Separate network definition
(separate from process definitions)
● Named ports
● Channels with bounded buffers
● Information packets (IPs) with defined lifetimes
● More info:
en.wikipedia.org/wiki/Flow-based programming
www.jpaulmorrison.com/fbp
12. Architecture: Basic Components
● scipipe.SciProcess
● Long-running
● Typically one per operation
● Typically spawns one task per input
● scipipe.SciTask
● Short lived
● Executes just one shell command or custom Go
function
● Typically one per operation/set of in-data files
● scipipe.FileTarget
● Most common data type passed between processes