deathstar is a tool for seamlessly running R code across local and cloud computing nodes. It uses ZeroMQ to distribute data and code across nodes. The deathstar daemon runs on each node and listens for work. The R package exposes a zmq.cluster.lapply function similar to regular lapply to distribute jobs. Examples show using deathstar to estimate pi across nodes and calculate stock betas in a distributed manner. deathstar allows scaling R code to multiple cores locally or launching EC2 instances on AWS for more capacity.
Kernel Recipes 2017 - Modern Key Management with GPG - Werner KochAnne Nicolas
Although GnuPG 2 has been around for nearly 15 years, the old 1.4 version was still in wide use. With Debian and others making 2.1 the default, many interesting things can now be done. In this talk he will explain the advantages of modern key algorithms, like ed25519, and why gpg relaxed some of its more paranoid defaults. The new –quick commands of gpg for easily scriptable key management will be described as well as the new key discovery methods. Finally hints for integration of gpg into other programs will be given.
Werner Koch, g10code
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
Talk by Brendan Gregg at AWS re:Invent 2019. Abstract: "Extended BPF (eBPF) is an open source Linux technology that powers a whole new class of software: mini programs that run on events. Among its many uses, BPF can be used to create powerful performance analysis tools capable of analyzing everything: CPUs, memory, disks, file systems, networking, languages, applications, and more. In this session, Netflix's Brendan Gregg tours BPF tracing capabilities, including many new open source performance analysis tools he developed for his new book "BPF Performance Tools: Linux System and Application Observability." The talk includes examples of using these tools in the Amazon EC2 cloud."
Performance Wins with BPF: Getting StartedBrendan Gregg
Keynote by Brendan Gregg for the eBPF summit, 2020. How to get started finding performance wins using the BPF (eBPF) technology. This short talk covers the quickest and easiest way to find performance wins using BPF observability tools on Linux.
Video games are written as a main loop: process player input, update the state of the game, render a new frame to the screen, repeat. They do this 60 times a second, with millisecond timing. Most monitoring tools are also written as loops: send a probe, wait for the response, update a data store, sleep. Often this is done pretty slowly, maybe once a second! In video games if you can’t update fast enough, you skip the rendering step and the frame rate drops. With monitoring tools if your loop takes to long you also stop logging data as often, and instead of choppy gameplay you get gaps in your graphs, often when you need that data the most!
Let’s use ping as an example and see how we can rewrite its main loop to function more like a video game, keeping a high frame rate.
University of Virginia
cs4414: Operating Systems
http://rust-class.org
Scheduling in Linux, 2002-2014
Energy and Scheduling
OSX Mavericks Timer Coalescing
Scheduling Web Servers
Healthcare.gov
For embedded notes, see: http://rust-class.org/class-12-scheduling-in-linux-and-web-servers.html
Talk by Brendan Gregg for All Things Open 2018. "At over one thousand code commits per week, it's hard to keep up with Linux developments. This keynote will summarize recent Linux performance features,
for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability and the new open source tools that use it, Kyber for disk I/O sc
heduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the
most out of your systems with the latest Linux kernels and exciting features."
Kernel Recipes 2017 - Modern Key Management with GPG - Werner KochAnne Nicolas
Although GnuPG 2 has been around for nearly 15 years, the old 1.4 version was still in wide use. With Debian and others making 2.1 the default, many interesting things can now be done. In this talk he will explain the advantages of modern key algorithms, like ed25519, and why gpg relaxed some of its more paranoid defaults. The new –quick commands of gpg for easily scriptable key management will be described as well as the new key discovery methods. Finally hints for integration of gpg into other programs will be given.
Werner Koch, g10code
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
Talk by Brendan Gregg at AWS re:Invent 2019. Abstract: "Extended BPF (eBPF) is an open source Linux technology that powers a whole new class of software: mini programs that run on events. Among its many uses, BPF can be used to create powerful performance analysis tools capable of analyzing everything: CPUs, memory, disks, file systems, networking, languages, applications, and more. In this session, Netflix's Brendan Gregg tours BPF tracing capabilities, including many new open source performance analysis tools he developed for his new book "BPF Performance Tools: Linux System and Application Observability." The talk includes examples of using these tools in the Amazon EC2 cloud."
Performance Wins with BPF: Getting StartedBrendan Gregg
Keynote by Brendan Gregg for the eBPF summit, 2020. How to get started finding performance wins using the BPF (eBPF) technology. This short talk covers the quickest and easiest way to find performance wins using BPF observability tools on Linux.
Video games are written as a main loop: process player input, update the state of the game, render a new frame to the screen, repeat. They do this 60 times a second, with millisecond timing. Most monitoring tools are also written as loops: send a probe, wait for the response, update a data store, sleep. Often this is done pretty slowly, maybe once a second! In video games if you can’t update fast enough, you skip the rendering step and the frame rate drops. With monitoring tools if your loop takes to long you also stop logging data as often, and instead of choppy gameplay you get gaps in your graphs, often when you need that data the most!
Let’s use ping as an example and see how we can rewrite its main loop to function more like a video game, keeping a high frame rate.
University of Virginia
cs4414: Operating Systems
http://rust-class.org
Scheduling in Linux, 2002-2014
Energy and Scheduling
OSX Mavericks Timer Coalescing
Scheduling Web Servers
Healthcare.gov
For embedded notes, see: http://rust-class.org/class-12-scheduling-in-linux-and-web-servers.html
Talk by Brendan Gregg for All Things Open 2018. "At over one thousand code commits per week, it's hard to keep up with Linux developments. This keynote will summarize recent Linux performance features,
for a wide audience: the KPTI patches for Meltdown, eBPF for performance observability and the new open source tools that use it, Kyber for disk I/O sc
heduling, BBR for TCP congestion control, and more. This is about exposure: knowing what exists, so you can learn and use it later when needed. Get the
most out of your systems with the latest Linux kernels and exciting features."
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Databricks
Big companies typically integrate their data from various heterogeneous systems when building a data lake as single point for accessing data. To achieve this goal technical teams often deal with data defined by complex schemas and various data formats. Spark SQL Datasets are currently compatible with data formats such as XML, Avro and Parquet by providing primitive and complex data types such as structs and arrays.
Although Dataset API offers rich set of functions, general manipulation of array and deeply nested data structures is lacking. We will demonstrate this fact by providing examples of data which is currently very hard to process in Spark efficiently. We designed and developed an extension of Dataset API to allow developers to work with array and complex type elements in a more straightforward and consistent way. The extension should help users dealing with complex and structured big data to use Apache Spark as a truly generic processing framework.
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
Talk by Brendan Gregg at Kernel Recipes 2017 (Paris): "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
Keynote by Brendan Gregg for YOW! 2018. Video: https://www.youtube.com/watch?v=03EC8uA30Pw . Description: "At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause
analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances
that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code
, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in
this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build t
o do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service
GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, f
lame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source
tools in the same way to find performance wins in your own environment."
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
A 2015 performance study by Brendan Gregg, Nitesh Kant, and Ben Christensen. Original is in https://github.com/Netflix-Skunkworks/WSPerfLab/tree/master/test-results
Mechanisms and tools of development and monitoring in Linux Kernel
A brief presentation about the tools and mechanisms about development for the Linux Kernel to help understanding of what is required.
Tools that create an execution profile or provide instrumentation through static or dynamic methods, the Linux Kernel code, will be presented.
Also discussed will be the GDB debugger and how through a remote virtual serial connection to a virtual machine it can be used to debug a live Kernel and Linux Kernel modules. Also demonstrated will be how a deeper understanding of the code can be attained by attaching the memory locations used by the Kernel module to the GDB session.
Lastly, some of the Kernel execution contexts, such as, interrupts, deferrable work, context, etc. are presented.
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
Keynote for Ubuntu Masters 2019 by Brendan Gregg, Netflix. Video https://www.youtube.com/watch?v=7pmXdG8-7WU&feature=youtu.be . "Extended BPF is a new type of software, and the first fundamental change to how kernels are used in 50 years. This new type of software is already in use by major companies: Netflix has 14 BPF programs running by default on all of its cloud servers, which run Ubuntu Linux. Facebook has 40 BPF programs running by default. Extended BPF is composed of an in-kernel runtime for executing a virtual BPF instruction set through a safety verifier and with JIT compilation. So far it has been used for software defined networking, performance tools, security policies, and device drivers, with more uses planned and more we have yet to think of. It is changing how we use and think about systems. This talk explores the past, present, and future of BPF, with BPF performance tools as a use case."
Physical replication, logical replication, synchronous, asynchronous, multi-master, scale-out reads, sharding... so many concepts related to replication. In this talk we will review the fundamental ideas behind each of them, and in which cases you will need one or the other. We will compare different architecture designs, discussing their advantages and disadvantages. The practical part of the talk is focused on how PostgreSQL works. The ideas and concepts can be applied to other database engines.
Linux kernel tracing superpowers in the cloudAndrea Righi
The Linux 4.x series introduced a new powerful engine of programmable tracing (BPF) that allows to actually look inside the kernel at runtime. This talk will show you how to exploit this engine in order to debug problems or identify performance bottlenecks in a complex environment like a cloud. This talk will cover the latest Linux superpowers that allow to see what is happening “under the hood” of the Linux kernel at runtime. I will explain how to exploit these “superpowers” to measure and trace complex events at runtime in a cloud environment. For example, we will see how we can measure latency distribution of filesystem I/O, details of storage device operations, like individual block I/O request timeouts, or TCP buffer allocations, investigating stack traces of certain events, identify memory leaks, performance bottlenecks and a whole lot more.
Trip down the GPU lane with Machine LearningRenaldas Zioma
What Machine Learning professional should know about GPU!
Brief outline of the deck:
* GPU architecture explained with simple images
* memory bandwidth cheat-sheats for common hardware configuration,
* overview of GPU programming model
* under the hood peek at the main building block of ML - matrix multiplication
* effect of mini-batch size on performance
Originally I gave this talk at the internal Machine Learning Workshop in Unity Seattle
HIGH QUALITY pdf slides: http://bit.ly/2iQxm7X (on Dropbox)
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Databricks
Big companies typically integrate their data from various heterogeneous systems when building a data lake as single point for accessing data. To achieve this goal technical teams often deal with data defined by complex schemas and various data formats. Spark SQL Datasets are currently compatible with data formats such as XML, Avro and Parquet by providing primitive and complex data types such as structs and arrays.
Although Dataset API offers rich set of functions, general manipulation of array and deeply nested data structures is lacking. We will demonstrate this fact by providing examples of data which is currently very hard to process in Spark efficiently. We designed and developed an extension of Dataset API to allow developers to work with array and complex type elements in a more straightforward and consistent way. The extension should help users dealing with complex and structured big data to use Apache Spark as a truly generic processing framework.
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
Talk by Brendan Gregg at Kernel Recipes 2017 (Paris): "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
Keynote by Brendan Gregg for YOW! 2018. Video: https://www.youtube.com/watch?v=03EC8uA30Pw . Description: "At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause
analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances
that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code
, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in
this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build t
o do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service
GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, f
lame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source
tools in the same way to find performance wins in your own environment."
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
A 2015 performance study by Brendan Gregg, Nitesh Kant, and Ben Christensen. Original is in https://github.com/Netflix-Skunkworks/WSPerfLab/tree/master/test-results
Mechanisms and tools of development and monitoring in Linux Kernel
A brief presentation about the tools and mechanisms about development for the Linux Kernel to help understanding of what is required.
Tools that create an execution profile or provide instrumentation through static or dynamic methods, the Linux Kernel code, will be presented.
Also discussed will be the GDB debugger and how through a remote virtual serial connection to a virtual machine it can be used to debug a live Kernel and Linux Kernel modules. Also demonstrated will be how a deeper understanding of the code can be attained by attaching the memory locations used by the Kernel module to the GDB session.
Lastly, some of the Kernel execution contexts, such as, interrupts, deferrable work, context, etc. are presented.
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
Keynote for Ubuntu Masters 2019 by Brendan Gregg, Netflix. Video https://www.youtube.com/watch?v=7pmXdG8-7WU&feature=youtu.be . "Extended BPF is a new type of software, and the first fundamental change to how kernels are used in 50 years. This new type of software is already in use by major companies: Netflix has 14 BPF programs running by default on all of its cloud servers, which run Ubuntu Linux. Facebook has 40 BPF programs running by default. Extended BPF is composed of an in-kernel runtime for executing a virtual BPF instruction set through a safety verifier and with JIT compilation. So far it has been used for software defined networking, performance tools, security policies, and device drivers, with more uses planned and more we have yet to think of. It is changing how we use and think about systems. This talk explores the past, present, and future of BPF, with BPF performance tools as a use case."
Physical replication, logical replication, synchronous, asynchronous, multi-master, scale-out reads, sharding... so many concepts related to replication. In this talk we will review the fundamental ideas behind each of them, and in which cases you will need one or the other. We will compare different architecture designs, discussing their advantages and disadvantages. The practical part of the talk is focused on how PostgreSQL works. The ideas and concepts can be applied to other database engines.
Linux kernel tracing superpowers in the cloudAndrea Righi
The Linux 4.x series introduced a new powerful engine of programmable tracing (BPF) that allows to actually look inside the kernel at runtime. This talk will show you how to exploit this engine in order to debug problems or identify performance bottlenecks in a complex environment like a cloud. This talk will cover the latest Linux superpowers that allow to see what is happening “under the hood” of the Linux kernel at runtime. I will explain how to exploit these “superpowers” to measure and trace complex events at runtime in a cloud environment. For example, we will see how we can measure latency distribution of filesystem I/O, details of storage device operations, like individual block I/O request timeouts, or TCP buffer allocations, investigating stack traces of certain events, identify memory leaks, performance bottlenecks and a whole lot more.
Trip down the GPU lane with Machine LearningRenaldas Zioma
What Machine Learning professional should know about GPU!
Brief outline of the deck:
* GPU architecture explained with simple images
* memory bandwidth cheat-sheats for common hardware configuration,
* overview of GPU programming model
* under the hood peek at the main building block of ML - matrix multiplication
* effect of mini-batch size on performance
Originally I gave this talk at the internal Machine Learning Workshop in Unity Seattle
HIGH QUALITY pdf slides: http://bit.ly/2iQxm7X (on Dropbox)
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
SQL Server Machine Learning Services is an embedded, predictive analytics and data science engine that can execute R and Python code within a SQL Server database as stored procedures, as T-SQL script containing R or Python statements, or as R or Python code containing T-SQL.
The key value proposition of Machine Learning Services is the power of its proprietary packages to deliver advanced analytics at scale, and the ability to bring calculations and processing to where the data resides, eliminating the need to pull data across the network.
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB
MongoDB Ops Manager is an enterprise-grade end-to-end database management, monitoring, and backup solution. Kubernetes has clearly won the orchestration-platform "wars". In this session we'll take a deep dive on how you can leverage both these technologies to host your MongoDB deployments within your Kubernetes infrastructure whether that's OpenShift, PKS, Azure AKS, or just upstream. This talk will review the core technologies, such as containers, Kubernetes, and MongoDB Ops Manager. You'll also have a chance to see real-live demos of MongoDB running on Kubernetes and managed with MongoDB Ops Manager with the MongoDB Enterprise Kubernetes Operator.
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoScyllaDB
Go gives us net.Dial and net.Listen for sending and receiving data at Layer 4. Now you will see how to send and receive raw packets directly to and from the NIC at Layer 1 to get timestamp information from timestamping-enabled NICs and when packets enter and leave the Linux kernel. Capturing these timestamps allows us to get better granularity when measuring latency and jitter instead of relying on time.Now() in userspace where that is subject to additional time introduced by the OS and Go runtime schedulers.
Presented by: Jason Mimick
Technical Director, MongoDB
MongoDB Ops Manager is an enterprise-grade end-to-end database management, monitoring, and backup solution. Kubernetes has clearly won the orchestration-platform "wars". In this session we'll take a deep dive on how you can leverage both these technologies to host your MongoDB deployments within your Kubernetes infrastructure whether that's OpenShift, PKS, Azure AKS, or just upstream. This talk will review the core technologies, such as containers, Kubernetes, and MongoDB Ops Manager. You'll also have a chance to see real-live demos of MongoDB running on Kubernetes and managed with MongoDB Ops Manager with the MongoDB Enterprise Kubernetes Operator.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Deathstar
1. deathstar – seamless cloud computing for R
Whit Armstrong
armstrong.whit@gmail.com
KLS Diversified Asset Management
May 11, 2012
deathstar – seamless cloud computing for R 1 / 27
2. What is deathstar?
deathstar is a dead simple way to run lapply across EC2 nodes or local
compute nodes
deathstar uses ZMQ (via the rzmq package) to deliver both data and code
over the wire
deathstar consists of two components: a daemon and an R package
simplicity is the key theme
deathstar – seamless cloud computing for R 2 / 27
3. deathstar – visual
user – user workstation
node0,node1 – local or cloud machines
n0 workers, n1 workers – deathstar workers
deathstar – seamless cloud computing for R 3 / 27
4. deathstar daemon – details
deathstar is a daemon that listens on two ports
the ’exec’ port listens for incoming work
the ’status’ port fulfills queries on the current workload capacity of the server
(this is what allows dynamic job allocation)
deathstar – seamless cloud computing for R 4 / 27
5. deathstar – queue device (about 100 lines of c++)
void deathstar(zmq::socket_t& frontend, zmq::socket_t& backend, zmq::socket_t& worker_signal, zmq::socket_t& status) {
int number_of_workers(0);
// loosely copied from lazy pirate (ZMQ guide)
while (1) {
zmq_pollitem_t items [] = {
{ worker_signal, 0, ZMQ_POLLIN, 0 },
{ status, 0, ZMQ_POLLIN, 0 },
{ backend, 0, ZMQ_POLLIN, 0 },
{ frontend, 0, ZMQ_POLLIN, 0 }
};
// frontend only if we have available workers
// otherwise poll both worker_signal and backend
int rc = zmq_poll(items, number_of_workers ? 4 : 3, -1);
// Interrupted
if(rc == -1) {
break;
}
// worker_signal -- worker came online
if(items[0].revents & ZMQ_POLLIN) {
process_worker_signal(worker_signal, number_of_workers);
}
// status -- client status request
if(items[1].revents & ZMQ_POLLIN) {
process_status_request(status, number_of_workers);
}
// backend -- send data from worker to client
if(items[2].revents & ZMQ_POLLIN) {
process_backend(frontend, backend);
}
// frontend -- send data from client to worker
if (items[3].revents & ZMQ_POLLIN) {
process_frontend(frontend, backend, number_of_workers);
}
}
}
deathstar – seamless cloud computing for R 5 / 27
6. deathstar – worker (25 SLOC of R)
#!/usr/bin/env Rscript
library(rzmq)
worker.id <- paste(Sys.info()["nodename"],Sys.getpid(),sep=":")
cmd.args <- commandArgs(trailingOnly=TRUE)
print(cmd.args)
work.endpoint <- cmd.args[1]
ready.endpoint <- cmd.args[2]
log.file <- cmd.args[3]
sink(log.file)
context = init.context()
ready.socket = init.socket(context,"ZMQ_PUSH")
work.socket = init.socket(context,"ZMQ_REP")
connect.socket(ready.socket,ready.endpoint)
connect.socket(work.socket,work.endpoint)
while(1) {
## send control message to indicate worker is up
send.null.msg(ready.socket)
## wait for work
msg = receive.socket(work.socket);
index <- msg$index
fun <- msg$fun
args <- msg$args
print(system.time(result <- try(do.call(fun,args),silent=TRUE)))
send.socket(work.socket,list(index=index,result=result,node=worker.id));
print(gc(verbose=TRUE))
}
deathstar – seamless cloud computing for R 6 / 27
7. deathstar package – exposes zmq.cluster.lapply
code is a little too long for a slide, but does threetwo basic things:
check node capacity, if capacity > 0, submit job
collect jobs
node life cycle monitoring (future feature)
So, what can it do?
deathstar – seamless cloud computing for R 7 / 27
8. Stochastic estimation of Pi
Borrowed from JD Long:
http://www.vcasmo.com/video/drewconway/8468
or
http://joefreeman.co.uk/blog/2009/07/
estimating-pi-with-monte-carlo-methods
estimatePi <- function(seed,draws) {
set.seed(seed)
r <- .5
x <- runif(draws, min=-r, max=r)
y <- runif(draws, min=-r, max=r)
inCircle <- ifelse( (x^2 + y^2)^.5 < r , 1, 0)
sum(inCircle) / length(inCircle) * 4
}
deathstar – seamless cloud computing for R 8 / 27
13. So, what have we learned?
deathstar looks pretty much like lapply, or parLapply
it’s pretty fast
not much configuration
deathstar – seamless cloud computing for R 13 / 27
14. into the cloud!
Basic steps to customize your EC2 instance.
The deathstar daemon can be downloaded here:
http://github.com/downloads/armstrtw/deathstar.core/deathstar_0.0.
1-1_amd64.deb
## launch EC2 instance from web console
## assume instance public dns is: ec2-23-20-55-67.compute-1.amazonaws.com
## from local workstation
scp rzmq_0.6.6.tar.gz
ec2-23-20-55-67.compute-1.amazonaws.com:/home/ubuntu
scp ~/.s3cfg
ec2-23-20-55-67.compute-1.amazonaws.com:/var/lib/deathstar
## from EC2 node
sudo apt-get install r-base-core r-base-dev libzmq-dev s3cmd
sudo R CMD INSTALL rzmq_0.6.6.tar.gz
sudo R CMD INSTALL AWS.tools_0.0.6.tar.gz
sudo dpkg -i deathstar_0.0.1-1_amd64.deb
## from local workstation
## check that the right ports are open (6000,6001)
## and AWS firewall is configured properly
nmap ec2-23-20-55-67.compute-1.amazonaws.com
deathstar – seamless cloud computing for R 14 / 27
15. a little more aggressive – estimatePi on EC2
c1.xlarge is an 8 core machine with 7 GB of RAM.
require(deathstar)
require(AWS.tools)
## start the EC2 cluster
cluster <- startCluster(ami = "ami-1c05a075", key = "kls-ec2",
instance.count = 4, instance.type = "c1.xlarge")
nodes <- get.nodes(cluster)
pi.ec2.runtime <- system.time(pi.ec2 <- zmq.cluster.lapply(cluster = nodes,
1:10000, estimatePi, draws = 1e+05))
deathstar – seamless cloud computing for R 15 / 27
16. estimatePi on EC2 – workload distribution
print(mean(unlist(pi.ec2)))
## [1] 3.142
print(pi.ec2.runtime)
## user system elapsed
## 6.285 1.044 122.751
print(node.summary(attr(pi.ec2, "execution.report")))
## node jobs.completed
## 1 ip-10-115-54-77 2510
## 2 ip-10-34-239-135 2526
## 3 ip-10-39-10-26 2521
## 4 ip-10-78-49-250 2443
deathstar – seamless cloud computing for R 16 / 27
17. mixed mode – use local + cloud resources
## the EC2 cluster is still running from the last example
cloud.nodes <- get.nodes(cluster)
nodes <- c(c("krypton", "xenon", "node1", "mongodb", "research"),
cloud.nodes)
pi.mixed.runtime <- system.time(pi.mixed <- zmq.cluster.lapply(cluster = nodes,
1:10000, estimatePi, draws = 1e+05))
## turn the cluster off
terminateCluster(cluster)
## [,1] [,2] [,3] [,4]
## [1,] "INSTANCE" "i-d35d82b5" "running" "shutting-down"
## [2,] "INSTANCE" "i-d15d82b7" "running" "shutting-down"
## [3,] "INSTANCE" "i-d75d82b1" "running" "shutting-down"
## [4,] "INSTANCE" "i-d55d82b3" "running" "shutting-down"
deathstar – seamless cloud computing for R 17 / 27
18. workload distribution for local + cloud
print(mean(unlist(pi.mixed)))
## [1] 3.142
print(pi.mixed.runtime)
## user system elapsed
## 5.812 0.708 36.957
print(node.summary(attr(pi.mixed, "execution.report")))
## node jobs.completed
## 1 ip-10-115-54-77 815
## 2 ip-10-34-239-135 810
## 3 ip-10-39-10-26 826
## 4 ip-10-78-49-250 827
## 5 krypton 1470
## 6 mongodb 1648
## 7 node1 1051
## 8 research 940
## 9 xenon 1613
deathstar – seamless cloud computing for R 18 / 27
19. distributed memory regression
Borrowing from Thomas Lumley’s presentation1
, one can calculate regression
coefficients in a distributed memory environment via these steps:
Compute XT
X and XT
y in chunks
aggregate chunks
reduce by ˆβ = (XT
X)−1
XT
y
1http://faculty.washington.edu/tlumley/tutorials/user-biglm.pdf
deathstar – seamless cloud computing for R 19 / 27
20. distributed memory regression – data
Airline on-time performance
http://stat-computing.org/dataexpo/2009/
The data:
The data consists of flight arrival and departure details for all commercial flights
within the USA, from October 1987 to April 2008. This is a large dataset: there
are nearly 120 million records in total, and takes up 1.6 gigabytes of space
compressed and 12 gigabytes when uncompressed. To make sure that you’re not
overwhelmed by the size of the data, we’ve provide two brief introductions to some
useful tools: linux command line tools and sqlite, a simple sql database.
deathstar – seamless cloud computing for R 20 / 27
22. distributed memory regression – more setup
What are we estimating?
We supply ’get.y’ and ’get.x’ which are applied to the each chunk.
’get.y’ returns the log of the departure delay in minutes.
’get.x’ returns a matrix of the departure time in hours and the log(distance) of the
flight.
require(AWS.tools)
get.y <- function(X) {
log(ifelse(!is.na(X[, "DepDelay"]) & X[, "DepDelay"] > 0, X[, "DepDelay"],
1e-06))
}
get.x <- function(X) {
as.matrix(cbind(round(X[, "DepTime"]/100, 0) + X[, "DepTime"]%%100/60,
log(ifelse(X[, "Distance"] > 0, X[, "Distance"], 0.1))))
}
deathstar – seamless cloud computing for R 22 / 27
23. distributed memory regression – local execution
## research is an 8 core, 64GB server, 64/8 ~ 8GB per worker
## mongo is a 12 core, 32GB server, 32/12 ~ 2.6GB per worker
nodes <- c("research", "mongodb", "krypton")
coefs.local.runtime <- system.time(coefs.local <- distributed.reg(bucket =
"s3://airline.data",
nodes, y.transform = get.y, x.transform = get.x))
print(as.vector(coefs.local))
## [1] 0.1702 -1.4925
print(coefs.local.runtime)
## user system elapsed
## 0.10 0.02 402.44
print(node.summary(attr(coefs.local, "execution.report")))
## node jobs.completed
## 1 krypton 1
## 2 mongodb 12
## 3 research 8
deathstar – seamless cloud computing for R 23 / 27
24. distributed memory regression – cloud execution
Make sure you pick an instance type that provides adequate RAM per core.
m1.large is a 2 core machine with 7.5 GB of RAM 3.75GB per core (which is
more than we need for this chunk size).
require(AWS.tools)
cluster <- startCluster(ami = "ami-1c05a075", key = "kls-ec2",
instance.count = 11, instance.type = "m1.large")
nodes <- get.nodes(cluster)
coefs.ec2.runtime <- system.time(coefs.ec2 <- distributed.reg(bucket =
"s3://airline.data",
nodes, y.transform = get.y, x.transform = get.x))
## turn the cluster off
res <- terminateCluster(cluster)
print(as.vector(coefs.ec2))
## [1] 0.1702 -1.4925
print(coefs.ec2.runtime)
## user system elapsed
## 0.152 0.088 277.166
print(node.summary(attr(coefs.ec2, "execution.report")))
## node jobs.completed
## 1 ip-10-114-150-31 2
## 2 ip-10-116-235-189 1
## 3 ip-10-12-119-154 2
## 4 ip-10-12-123-135 2
## 5 ip-10-204-111-140 2
## 6 ip-10-204-150-93 2
## 7 ip-10-36-33-101 2
## 8 ip-10-40-62-117 2
## 9 ip-10-83-131-140 2
## 10 ip-10-85-151-177 2
## 11 ip-10-99-21-228 2
deathstar – seamless cloud computing for R 24 / 27
25. Final points
consuming cloud resources should not require arduous configuration steps (in
contrast to MPI, SLURM, Sun Grid Engine, etc.)
deathstar allows cloud access with minimal headache maximum simplicity
deathstar – seamless cloud computing for R 25 / 27
26. Conclusion
The next generation of R programmers needs something better than MPI... Please
help me continue to build that tool.
deathstar – seamless cloud computing for R 26 / 27
27. Thanks!
Many people contributed ideas and helped debug work in progress as the package
was being developed.
Kurt Hornik for putting up with my packaging.
John Laing for initial debugging.
Gyan Sinha for ideas and initial testing.
Prof Brian Ripley for just being himself.
My wife for enduring many solitary hours while I built these packages.
deathstar – seamless cloud computing for R 27 / 27