The data science team at Zymergen is applying machine learning techniques to identify genetic targets, work that is supported by extensive analytical automation that systematically identifies outliers, removes process-related bias, and quantifies performance improvements. We’re using Apache Airflow to construct robust data pipelines that allow us to produce clean, reliable inputs to our predictive models. In this talk, I’ll discuss the unique data processing challenges we face in working with high-throughput, biological data and provide an overview of how we’re using Apache Airflow to meet those challenges.
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
Introduction to Apache Airflow (Incubating), best practices and roadmap. Airflow is a platform to programmatically author, schedule and monitor workflows.
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
While working with Hadoop, you'll eventually encounter the need to schedule and run workflows to perform various operations like ingesting data or performing ETL. There are a number of tools available to assist you with this type of requirement and one such tool that we at Clairvoyant have been looking to use is Apache Airflow. Apache Airflow is an Apache Incubator project that allows you to programmatically create workflows through a python script. This provides a flexible and effective way to design your workflows with little code and setup. In this talk, we will discuss Apache Airflow and how we at Clairvoyant have utilized it for ETL pipelines on Hadoop.
A 20 minute talk about how WePay runs airflow. Discusses usage and operations. Also covers running Airflow in Google cloud.
Video of the talk is available here:
https://wepayinc.box.com/s/hf1chwmthuet29ux2a83f5quc8o5q18k
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
Introduction to Apache Airflow (Incubating), best practices and roadmap. Airflow is a platform to programmatically author, schedule and monitor workflows.
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
While working with Hadoop, you'll eventually encounter the need to schedule and run workflows to perform various operations like ingesting data or performing ETL. There are a number of tools available to assist you with this type of requirement and one such tool that we at Clairvoyant have been looking to use is Apache Airflow. Apache Airflow is an Apache Incubator project that allows you to programmatically create workflows through a python script. This provides a flexible and effective way to design your workflows with little code and setup. In this talk, we will discuss Apache Airflow and how we at Clairvoyant have utilized it for ETL pipelines on Hadoop.
A 20 minute talk about how WePay runs airflow. Discusses usage and operations. Also covers running Airflow in Google cloud.
Video of the talk is available here:
https://wepayinc.box.com/s/hf1chwmthuet29ux2a83f5quc8o5q18k
Presentation given at Coolblue B.V. demonstrating Apache Airflow (incubating), what we learned from the underlying design principles and how an implementation of these principles reduce the amount of ETL effort. Why choose Airflow? Because it makes your engineering life easier, more people can contribute to how data flows through the organization, so that you can spend more time applying your brain to more difficult problems like Machine Learning, Deep Learning and higher level analysis.
How I learned to time travel, or, data pipelining and scheduling with AirflowLaura Lorenz
****UPDATE: Project is now open sourced at https://www.github.com/industrydive/fileflow****
From Pydata DC 2016
Description
Data warehousing and analytics projects can, like ours, start out small - and fragile. With an organically growing mess of scripts glued together and triggered by cron jobs hiding on different servers, we needed better plumbing. After perusing the data pipelining landscape, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool from Airbnb.
Abstract
The power of any reporting tool breaks based on the data behind it, so when our data warehousing process got too big for its humble origins, we searched for something better. After testing out several options such as Drake, Pydoit, Luigi, AWS Data Pipeline, and Pinball, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool originating from Airbnb, that provides the benefits of pipeline construction as directed acyclic graphs (DAGs), along with a scheduler that can handle alerting, retries, callbacks and more to make your pipeline robust. This talk will discuss the value of DAG based pipelines for data processing workflows, highlight useful features in all of the pipelining projects we tested, and dive into some of the specific challenges (like time travel) and successes (like time travel!) we’ve experienced using Airflow to productionize our data engineering tasks. By the end of this talk, you will learn
- pros and cons of several Python-based/Python-supporting data pipelining libraries
- the design paradigm behind Airflow, an Apache incubating data pipelining and scheduling service, and what it is good for
- some epic fails to avoid and some epic wins to emulate from our experience porting our data engineering tasks to a more robust system
- some quick-start tips for implementing Airflow at your organization.
Slide deck for the fourth data engineering lunch, presented by guest speaker Will Angel. It covered the topic of using Airflow for data engineering. Airflow is a scheduling tool for managing data pipelines.
Building cloud-enabled genomics workflows with Luigi and DockerJacob Feala
Talk given at Bio-IT 2016, Cloud Computing track
Abstract:
As bioinformatics scientists, we tend to write custom tools for managing our workflows, even when viable, open-source alternatives are available from the tech community. Our field has, however, begun to adopt Docker containers to stabilize compute environments. In this talk, I will introduce Luigi, a workflow system built by engineers at Spotify to manage long-running big data processing jobs with complex dependencies. Focusing on a case study of next generation sequencing analysis in cancer genomics research, I will show how Luigi can connect simple, containerized applications into complex bioinformatics pipelines that can be easily integrated with compute, storage, and data warehousing on the cloud.
(CMP310) Data Processing Pipelines Using Containers & Spot InstancesAmazon Web Services
It's difficult to find off-the-shelf, open-source solutions for creating lean, simple, and language-agnostic data-processing pipelines for machine learning (ML). This session shows you how to use Amazon S3, Docker, Amazon EC2, Auto Scaling, and a number of open source libraries as cornerstones to build one. We also share our experience creating elastically scalable and robust ML infrastructure leveraging the Spot instance market.
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
Working in a cloud or on-premises environment, we all somehow move data from A to B on-demand or on schedule. It is essential to have a tool that can automate recurring workflows. This can be anything from an ETL(Extract, Transform, and Load) job for a regular analytics report all the way to automatically re-training a machine learning model.
In this talk, we will introduce Apache Airflow and how it can help orchestrate your workflows. We will cover key concepts, features, and use cases of Apache Airflow, as well as how you can enjoy Apache Airflow on GCP and AWS by demo-ing a few practical workflows.
In the session, we discussed the End-to-end working of Apache Airflow that mainly focused on "Why What and How" factors. It includes the DAG creation/implementation, Architecture, pros & cons. It also includes how the DAG is created for scheduling the Job and what all steps are required to create the DAG using python script & finally with the working demo.
Quick introduction about Apache Spark and how it fits in the cognitive world, how can we use it to help cognitive solutions as well as create distributed algorithms to predict and perform other machine learning tasks.
Presentation given on the 15th July 2021 at the Airflow Summit 2021
Conference website: https://airflowsummit.org/sessions/2021/clearing-airflow-obstructions/
Recording: https://www.crowdcast.io/e/airflowsummit2021/40
Fast and Reliable Apache Spark SQL EngineDatabricks
Building the next generation Spark SQL engine at speed poses new challenges to both automation and testing. At Databricks, we are implementing a new testing framework for assessing the quality and performance of new developments as they produced. Having more than 1,200 worldwide contributors, Apache Spark follows a rapid pace of development. At this scale, new testing tooling such as random query and data generation, fault injection, longevity stress, and scalability tests are essential to guarantee a reliable and performance Spark later in production. By applying such techniques, we will demonstrate the effectiveness of our testing infrastructure by drilling-down into cases where correctness and performance regressions have been found early. In addition, showing how they have been root-caused and fixed to prevent regressions in production and boosting the continuous delivery of new features.
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...Flink Forward
The mission of Uber is to make transportation as reliable as running water. The business is fundamentally driven by real-time data -- more than half of the employees in Uber, many of whom are non-technical, use SQL on a regular basis to analyze data and power their business decisions. We are building AthenaX, a stream processing platform built on top of Apache Flink to enable our users to write SQL to process real-time data efficiently and reliably at Uber's scale. Using Apache Calcite as query parser, AthenaX compiles the SQL down to Flink jobs. Leveraging Flink's unique streaming capabilities, AthenaX supports (1) consistent computations reliably thanks to at-least-once guarantees, (2) nontrivial analytics (e.g., windowing and joins) on multiple data sources, and (3) efficient and cost-effective executions in production through code generation and elastic scaling.
Concurrency in Eclipse: Best Practices and Gotchasamccullo
Slide from presentation titles Concurrency in Eclipse: Best Practices and Gotchas by Andrew McCulloch and Carlin Rogers at EclipseCon 2011 in Santa Clara, CA.
Presentation given at Coolblue B.V. demonstrating Apache Airflow (incubating), what we learned from the underlying design principles and how an implementation of these principles reduce the amount of ETL effort. Why choose Airflow? Because it makes your engineering life easier, more people can contribute to how data flows through the organization, so that you can spend more time applying your brain to more difficult problems like Machine Learning, Deep Learning and higher level analysis.
How I learned to time travel, or, data pipelining and scheduling with AirflowLaura Lorenz
****UPDATE: Project is now open sourced at https://www.github.com/industrydive/fileflow****
From Pydata DC 2016
Description
Data warehousing and analytics projects can, like ours, start out small - and fragile. With an organically growing mess of scripts glued together and triggered by cron jobs hiding on different servers, we needed better plumbing. After perusing the data pipelining landscape, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool from Airbnb.
Abstract
The power of any reporting tool breaks based on the data behind it, so when our data warehousing process got too big for its humble origins, we searched for something better. After testing out several options such as Drake, Pydoit, Luigi, AWS Data Pipeline, and Pinball, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool originating from Airbnb, that provides the benefits of pipeline construction as directed acyclic graphs (DAGs), along with a scheduler that can handle alerting, retries, callbacks and more to make your pipeline robust. This talk will discuss the value of DAG based pipelines for data processing workflows, highlight useful features in all of the pipelining projects we tested, and dive into some of the specific challenges (like time travel) and successes (like time travel!) we’ve experienced using Airflow to productionize our data engineering tasks. By the end of this talk, you will learn
- pros and cons of several Python-based/Python-supporting data pipelining libraries
- the design paradigm behind Airflow, an Apache incubating data pipelining and scheduling service, and what it is good for
- some epic fails to avoid and some epic wins to emulate from our experience porting our data engineering tasks to a more robust system
- some quick-start tips for implementing Airflow at your organization.
Slide deck for the fourth data engineering lunch, presented by guest speaker Will Angel. It covered the topic of using Airflow for data engineering. Airflow is a scheduling tool for managing data pipelines.
Building cloud-enabled genomics workflows with Luigi and DockerJacob Feala
Talk given at Bio-IT 2016, Cloud Computing track
Abstract:
As bioinformatics scientists, we tend to write custom tools for managing our workflows, even when viable, open-source alternatives are available from the tech community. Our field has, however, begun to adopt Docker containers to stabilize compute environments. In this talk, I will introduce Luigi, a workflow system built by engineers at Spotify to manage long-running big data processing jobs with complex dependencies. Focusing on a case study of next generation sequencing analysis in cancer genomics research, I will show how Luigi can connect simple, containerized applications into complex bioinformatics pipelines that can be easily integrated with compute, storage, and data warehousing on the cloud.
(CMP310) Data Processing Pipelines Using Containers & Spot InstancesAmazon Web Services
It's difficult to find off-the-shelf, open-source solutions for creating lean, simple, and language-agnostic data-processing pipelines for machine learning (ML). This session shows you how to use Amazon S3, Docker, Amazon EC2, Auto Scaling, and a number of open source libraries as cornerstones to build one. We also share our experience creating elastically scalable and robust ML infrastructure leveraging the Spot instance market.
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
Working in a cloud or on-premises environment, we all somehow move data from A to B on-demand or on schedule. It is essential to have a tool that can automate recurring workflows. This can be anything from an ETL(Extract, Transform, and Load) job for a regular analytics report all the way to automatically re-training a machine learning model.
In this talk, we will introduce Apache Airflow and how it can help orchestrate your workflows. We will cover key concepts, features, and use cases of Apache Airflow, as well as how you can enjoy Apache Airflow on GCP and AWS by demo-ing a few practical workflows.
In the session, we discussed the End-to-end working of Apache Airflow that mainly focused on "Why What and How" factors. It includes the DAG creation/implementation, Architecture, pros & cons. It also includes how the DAG is created for scheduling the Job and what all steps are required to create the DAG using python script & finally with the working demo.
Quick introduction about Apache Spark and how it fits in the cognitive world, how can we use it to help cognitive solutions as well as create distributed algorithms to predict and perform other machine learning tasks.
Presentation given on the 15th July 2021 at the Airflow Summit 2021
Conference website: https://airflowsummit.org/sessions/2021/clearing-airflow-obstructions/
Recording: https://www.crowdcast.io/e/airflowsummit2021/40
Fast and Reliable Apache Spark SQL EngineDatabricks
Building the next generation Spark SQL engine at speed poses new challenges to both automation and testing. At Databricks, we are implementing a new testing framework for assessing the quality and performance of new developments as they produced. Having more than 1,200 worldwide contributors, Apache Spark follows a rapid pace of development. At this scale, new testing tooling such as random query and data generation, fault injection, longevity stress, and scalability tests are essential to guarantee a reliable and performance Spark later in production. By applying such techniques, we will demonstrate the effectiveness of our testing infrastructure by drilling-down into cases where correctness and performance regressions have been found early. In addition, showing how they have been root-caused and fixed to prevent regressions in production and boosting the continuous delivery of new features.
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...Flink Forward
The mission of Uber is to make transportation as reliable as running water. The business is fundamentally driven by real-time data -- more than half of the employees in Uber, many of whom are non-technical, use SQL on a regular basis to analyze data and power their business decisions. We are building AthenaX, a stream processing platform built on top of Apache Flink to enable our users to write SQL to process real-time data efficiently and reliably at Uber's scale. Using Apache Calcite as query parser, AthenaX compiles the SQL down to Flink jobs. Leveraging Flink's unique streaming capabilities, AthenaX supports (1) consistent computations reliably thanks to at-least-once guarantees, (2) nontrivial analytics (e.g., windowing and joins) on multiple data sources, and (3) efficient and cost-effective executions in production through code generation and elastic scaling.
Concurrency in Eclipse: Best Practices and Gotchasamccullo
Slide from presentation titles Concurrency in Eclipse: Best Practices and Gotchas by Andrew McCulloch and Carlin Rogers at EclipseCon 2011 in Santa Clara, CA.
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEIPutuAdiPratama
This lab introduces students to an end-to-end example of applying a machine learning (ML) workflow to a materials science dataset to address a research problem. The lab aims at deepening the conceptual understanding of ML, and while procedural skills such as writing Python code are not the focus of this lab, students will gain experience with a number of standard open source packages by interacting with code snippets through the Jupyter Notebook format and describing what each essential command does.
Amazon Elastic Kubernetes Service (EKS)는 표준 Kubernetes 환경에서 실행되는 어플리케이션과 완벽히 호환됩니다. AWS상에서 Kubernetes 클러스터를 생성하고, 컨테이너 어플리케이션을 배포, 관리, 확장 및 로깅, 모니터링에 대한 실습과 함께, 최근 릴리즈된 AWS IAM 권한을 Pod에 할당하는 방법 등을 Amazon EKS에서 구현하는 과정을 진행합니다.
Here’s the thing about data: it’s sticky, often rigid, and rarely feels agile. Yes, there are patterns that help increase the agility of data. Ruby on Rails, for example, has data migrations that not just allow but actively encourage incremental schema design. However all too often we hear about relational databases becoming a defacto API between subsystems, and thus resistant to change. It’s not just relational databases. Even supposedly unstructured data stored as key value pairs can be difficult to change if every piece of code that uses the data has duplicated logic to manage the semantic meaning of the data. Further, in-database logic such as rules, stored procedures, or user defined functions can be remarkably difficult to unit test. Finally, there are the often strict data governance requirements that necessitate keeping tight authorization control. Application developers have a wealth of tools and practices available to support incremental delivery. System administrators have DevOps tools and practices to support repeatable, automated operations. Where are the equivalents for data-centric work? Bringing agility to your data strategy may feel like an impossible goal, but it is possible. In this talk we consider the various ways in which data can impede agility, and how to make data strategies more agile-friendly.
Resilience Engineering: A field of study, a community, and some perspective s...John Allspaw
These are slides from my talk on March 28, 2018 at the LA SCALE tech Meetup, graciously hosted at TicketMaster's office. (https://www.meetup.com/scalela/events/248904126/)
This slides show
1. How to obtain code coverage information for Java code
2. What kind of code coverage it is possible to get
3. Is 100% block coverage feasible, is it useful
4. How the code coverage could be used for more than discovering a percentage of uncovered code
Case studies in data-driven merchandisingErin Shellman
For retail businesses, inventory is simultaneously the company’s greatest asset and its largest risk. Competitor pricing, markdowns, and returns all threaten the margins that drive success, and in practice, inventory doesn’t always move according to plan. To win in this highly competitive and rapidly evolving industry, it’s essential to have a flexible toolkit that accurately produces forecasts and intelligently adapts to unplanned inventory dynamics. In this talk, I’ll outline how Nordstrom applies data science and machine learning to build a wholistic view of inventory management from assortment, through stocking with intelligent size runs, and ending with a customer experience that gets the right product to the right customer at the right time.
Catching the most with high-throughput screeningErin Shellman
High-throughput screening (HTS) is a technique applied in drug discovery and synthetic biology to screen thousands of 'candidates' (e.g. molecules, cells, strains) for expression of interesting traits. In exchange for throughput, we often pay a penalty in terms of sample size and variance. Small sample sizes and highly variable measurements can lower our ability to detect valuable improvements. In this talk, I describe a simulation environment for conducting power analysis and describe special considerations for maintaining high power with a moving target.
This is a short talk I delivered as part of the Michigan Institute for Data Science kick-off event at the University of Michigan. Data science is an emergent field that incorporates concepts from statistics, computer science and machine learning to create and apply knowledge from data. In this talk I’ll share what I think are the essential skills and characteristics of effective data scientists. I’ll also provide guidance on how students can develop those skills in school and how educators can prepare them for jobs in industry.
Like many Internet giants Twitter makes money by selling ads, but they’ve got an insidious infestation eroding their advertising credibility: bots. More than 23 million of them. Twitter bots are automatons living in the Twittersphere and ranging wildly in capability. In their simplest form, they follow you maybe fav-ing or retweeting your statuses. At their most complex, they troll and ironically, troll trolls using speech patterns that can, at times, fool humans. But when advertisers pay for engagement, they aren’t interested in a four-hour flame war between a gamergate bot and a Kanye bot. When advertisers analyze social data they want to be sure their findings are the result of human activity. In Bot or Not I describe an end-to-end data analysis to build a classifier with Python.
Businesses are inundated with options of vendors who provide analytical products. We're told in no uncertain terms that without real-time analytical capabilities we are being out- strategized, out-advertised and out-personalized with every passing second. "Real-time real talk" is a critical survey of real-time systems and their application in the areas of retail and marketing. We'll cut through the hype and get to the heart of what it means to be real-time, and explore some of the real-time systems being built today in the Nordstrom Data Lab.
Collaborative Filtering for fun ...and profit!Erin Shellman
This is a fun presentation prepared for 30 high school-aged young ladies at the Girls Who Code summer camp. It's an introduction to concepts of collaborative filtering implemented in Python.
Assumptions: Check yo'self before you wreck yourselfErin Shellman
Predicting the future is hard and it requires a lot of assumptions, also known as beliefs, also known as faith. In “Assumptions: Check yo self, before you wreck yo self” we explore the consequences of beliefs when constructing predictive models. We’ll walk through the process of developing a demand forecast for Evo, a Seattle-based outdoor recreation retailer, and discuss how assumptions influence the behavior of your application and ultimately the decisions you make.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
6. Learning how to efficiently
navigate the genome is the mission
of data science at Zymergen
7. Blocker: process failure
Orchestrating complex experiments with
robots is hard, and there are process failures.
These failures often cause sporadic, extreme
measurement values.
9. Blocker:
different interpretations of results
We’re building a platform that can
support any microbe and any molecule.
Sometimes that results in a proliferation
of solutions with disagreement on which
is best.
10. Processing pipeline
1.Identify process failures
2.Quantify and remove process-
related bias
3.Identify strains that show
improvement using consistent
criteria
Clean model inputs
Outlier detection
Normalization
Hit detection
11. Rolling our own ETL pipeline
There are many
ways to measure
the concentration of
a molecule.
Any microbe, any
molecule… any
experiment, many
data formats.
13. Airflow
https://airflow.incubator.apache.org/
“Airflow is a platform to programmatically author, schedule and
monitor workflows.”
Airflow gives us flexibility to apply a common
set of processing steps to variable data
inputs, schedule complex processing
workflows, and has become a delivery
mechanism for our products.
18. Airflow + PyStan
With Bayesian hierarchical models we estimate
(and monitor) the distribution of batch effects.
Experimental bias
19. DropBox
• Scientists at Zymergen work with data using
many different tools including JMP, SQL, and
Excel.
• We use a custom DropBox hook to make
quick data ingestion pipelines.
23. Pairs well with Superset!
“Apache Superset is a
modern, enterprise-ready
business intelligence web
application”
https://github.com/apache/incubator-superset
25. Fairflow: Functional Airflow
• The core of Fairflow is an
abstract base class foperator
that takes care of
instantiating your Airflow
operators and setting their
dependencies.
• In Fairflow, DAGs are
constructed from foperators
that create the upstream
operators when the final
foperator is called.
27. Defining ML workflows
In the DAG
definition, create
an instance of the
task.
Then, instantiate a
DAG like usual and
call the compare
task on the DAG.
28. Defining ML workflows
The design allows for simple creation of
complicated experimental workflows with arbitrary
sets of models, parameters, and evaluation metrics.
29. Is Airflow for you?
Do you have heterogeneous data sources?
Do you have complex dependencies between
processing tasks?
Do you have data with different velocities?
Do you have constraints on your time?
Probably!