This document discusses automating analytics pipelines and workflows using a workflow engine. It describes the challenges of managing workflows across multiple cloud services and database technologies. It then introduces a multi-cloud workflow engine called Digdag that can automate workflows, handle errors, enable parallel execution, support modularization and parameterization. Examples are given of using Digdag to define and run workflows across services like BigQuery, Treasure Data, Redshift, and Tableau. Key features of Digdag like loops, parameters, parallel tasks and pushing workflows to servers with Docker are also summarized.
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
We created a plugin-based data collection tool that can read any chaotically formatted files called "CSV" by guessing its schema automatically
Talked at csv,conf,v2 in Berlin
http://csvconf.com/
Scripting Embulk plugins makes plugin development easier drastically. You can develop, test, and productionize data integrations using any scripting languages. It's most suitable way to integrate data with SaaS using vendor-provided SDKs.
https://techplay.jp/event/781988
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
We created a plugin-based data collection tool that can read any chaotically formatted files called "CSV" by guessing its schema automatically
Talked at csv,conf,v2 in Berlin
http://csvconf.com/
Scripting Embulk plugins makes plugin development easier drastically. You can develop, test, and productionize data integrations using any scripting languages. It's most suitable way to integrate data with SaaS using vendor-provided SDKs.
https://techplay.jp/event/781988
Treasure Data is providing Embulk(Open Source bulk load tool) as a hosted bulkload tools.
This slide contains our usercase, relationship with community,and architectures.
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
Congratulations: you've been selected to build an application that will manage reservations for rooms!
On the surface, this sounds simple, but you are building a system for managing a high traffic reservation web page, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the website checking to see what availability each room has.
Fortunately, PostgreSQL is prepared for this! And even better, we will be using Postgres 14 to make the problem even easier!
We will explore the following PostgreSQL features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges / Multirnages
Indexes such as:
* GiST
* Common Table Expressions and Recursion (though multiranges will make things easier!)
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all users made possible by the innovation of PostgreSQL!
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
Natural Language Query and Conversational Interface to Apache SparkDatabricks
Apache Spark has been a great technology for processing and analyzing Big Data. However, it is not accessible to business users, who don’t have technical or programming skills. In this talk, I’ll talk about recent efforts in the space of “Conversational analytics”. This paradigm allows any user to ask text and voice questions, in natural language, of their data to a bot and receive back a natural language and visual result. A key technology is natural language to SQL translation, where we translate natural language queries from a user into Spark SQL queries that can go against a Databricks system, and that can be easily trained on different schemas and databases.
This NLP technology needs to be further combined with dialog management, natural-language generation/narration, data understanding and modeling, augmented analytics and automated visualization generation in order to achieve the goal of “Conversational Analytics”. Using such a technology, a user can ask, in plain English, “How many cases of Covid were there in the last 2 months in states that had no social distancing mandates by type of transmission”, and then dig deeper into the results in a conversational manner to uncover hidden insights from Covid datasets in a Spark instance. We believe that having access to such data and insights at their fingertips can help users make appropriate decisions quickly, improve data literacy and even overcome the scourge of fake news for the general public.
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)Amazon Web Services
Learn how to leverage new workflow management tools to simplify complex data pipelines and ETL jobs spanning multiple systems. In this technical deep dive from Treasure Data, company founder and chief architect walks through the codebase of DigDag, our recently open-sourced workflow management project. He shows how workflows can break large, error-prone SQL statements into smaller blocks that are easier to maintain and reuse. He also demonstrates how a system using ‘last good’ checkpoints can save hours of computation when restarting failed jobs and how to use standard version control systems like Github to automate data lifecycle management across Amazon S3, Amazon EMR, Amazon Redshift, and Amazon Aurora. Finally, you see a few examples where SQL-as-pipeline-code gives data scientists both the right level of ownership over production processes and a comfortable abstraction from the underlying execution engines. This session is sponsored by Treasure Data.
AWS Competency Partner
Treasure Data is providing Embulk(Open Source bulk load tool) as a hosted bulkload tools.
This slide contains our usercase, relationship with community,and architectures.
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
Congratulations: you've been selected to build an application that will manage reservations for rooms!
On the surface, this sounds simple, but you are building a system for managing a high traffic reservation web page, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the website checking to see what availability each room has.
Fortunately, PostgreSQL is prepared for this! And even better, we will be using Postgres 14 to make the problem even easier!
We will explore the following PostgreSQL features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges / Multirnages
Indexes such as:
* GiST
* Common Table Expressions and Recursion (though multiranges will make things easier!)
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all users made possible by the innovation of PostgreSQL!
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
Natural Language Query and Conversational Interface to Apache SparkDatabricks
Apache Spark has been a great technology for processing and analyzing Big Data. However, it is not accessible to business users, who don’t have technical or programming skills. In this talk, I’ll talk about recent efforts in the space of “Conversational analytics”. This paradigm allows any user to ask text and voice questions, in natural language, of their data to a bot and receive back a natural language and visual result. A key technology is natural language to SQL translation, where we translate natural language queries from a user into Spark SQL queries that can go against a Databricks system, and that can be easily trained on different schemas and databases.
This NLP technology needs to be further combined with dialog management, natural-language generation/narration, data understanding and modeling, augmented analytics and automated visualization generation in order to achieve the goal of “Conversational Analytics”. Using such a technology, a user can ask, in plain English, “How many cases of Covid were there in the last 2 months in states that had no social distancing mandates by type of transmission”, and then dig deeper into the results in a conversational manner to uncover hidden insights from Covid datasets in a Spark instance. We believe that having access to such data and insights at their fingertips can help users make appropriate decisions quickly, improve data literacy and even overcome the scourge of fake news for the general public.
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)Amazon Web Services
Learn how to leverage new workflow management tools to simplify complex data pipelines and ETL jobs spanning multiple systems. In this technical deep dive from Treasure Data, company founder and chief architect walks through the codebase of DigDag, our recently open-sourced workflow management project. He shows how workflows can break large, error-prone SQL statements into smaller blocks that are easier to maintain and reuse. He also demonstrates how a system using ‘last good’ checkpoints can save hours of computation when restarting failed jobs and how to use standard version control systems like Github to automate data lifecycle management across Amazon S3, Amazon EMR, Amazon Redshift, and Amazon Aurora. Finally, you see a few examples where SQL-as-pipeline-code gives data scientists both the right level of ownership over production processes and a comfortable abstraction from the underlying execution engines. This session is sponsored by Treasure Data.
AWS Competency Partner
Tech introduction to Yaetos, an open source tool for data engineers, scientists, and analysts to easily create data pipelines in python and SQL and put them in production in the cloud.
DCEU 18: App-in-a-Box with Docker Application PackagesDocker, Inc.
Michael Irwin - Application Architect, Virginia Tech
Docker Application Packages is an experimental tool that makes it easy to share multi-service applications. Create a Compose file, package it in an image, and voilà! You now have an "app-in-a-box"! Not convinced yet? No worries! It took a while for me to be convinced too! In this session, we'll start off by diving into how Docker Application Packages actually works, which will help us understand the use cases. We'll see how dev environments can hook in to this app-in-a-box by replacing the service being worked on with a dev container. Then we'll move on to see how end-to-end functional tests are much easier to run. And, finally, we'll see how to maintain an "app-in-a-box" with the latest versions of each component in a CI/CD pipeline, allowing for a unique app-in-a-box for each feature branch under development. Lots of good material! And lots of live demos!
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesDataWorks Summit
The importance of ingestion and processing streaming data in telecommunication industry is ever increasing. We, SK Telecom which is Korea's number-one telecommunications provider, encounter how to use infra resources more efficiently. Apache Druid supports auto scaling feature for data ingestion, but it is only available on AWS EC2. We cannot rely on the feature on our private cloud.
In this talk, we are going to introduce auto scale-out/in on Kubernetes. This approach is more outstanding than Druid's scaling implementation. Here are the benefits. The first is our approach can be used anywhere on private cloud or (managed) Kubernetes in Azure, AWS and GKE. The second is AWS EC2's startup and termination requires a few minutes, but our approach requires a few seconds. The last is the scaling mechanism is decoupled from Druid's source code. We will also share development of Druid Helm chart, rolling update, custom metric usage for horizontal auto scaling.
The below is about detailed benefit compared with Druid's auto scaling approach:
1. Druid's auto scaling is only available in AWS, but our approach does not have the obstacle. It can be used in Private cloud(on-premise) are (managed) Kubernetes in Azure, AWS and GKE.
2. AWS EC2 is an instance of virtual machine, so the startup is slower than docker container. A few minutes are required for startup or termination of EC2. Docker container is very lightweight, so it requires a few seconds.
3. Druid's auto scaling is tightly coupled with AWS API because Druid engine code uses AWS API. Our scale-out/in algorithm is conceptually equal to Druid's auto scaling approach, but we decoupled the dependency because Kubernetes communicate with one of dispatcher nodes(i.e. Overlord node) using REST API.
Google Cloud Next '22 Recap: Serverless & Data editionDaniel Zivkovic
See what's new in #Serverless and #Data at GCP. Our guest, Guillaume Blaquiere - Stack Overflow contributor & #GCP #Developer Expert from France, covered the best #GoogleCloudNext announcements, practically demoed how to benefit from #BigQuery Remote Functions and answered many questions.
The meetup recording with TOC for easy navigation is at https://youtu.be/AuZZTwHIcdY
P.S. For more interactive lectures like this, go to http://youtube.serverlesstoronto.org/ or sign up for our upcoming live events at https://www.meetup.com/Serverless-Toronto/events/
Machine learning services with SQL Server 2017Mark Tabladillo
SQL Server 2017 introduces Machine Learning Services with two independent technologies: R and Python. The purpose of this presentation is 1) to describe major features of this technology for technology managers; 2) to outline use cases for architects; and 3) to provide demos for developers and data scientists.
Learn how to use AWS services to automate manual tasks, help teams manage complex environments at scale, and keep engineers in control of the high velocity that is enabled by DevOps. In this session, we will provide an overview of the various AWS development and deployment services and when best to use them. We will show how to build a fully automated infrastructure and software delivery pipeline with AWS CodePipeline, AWS CodeBuild, AWS CloudFormation and AWS CodeDeploy. At the end of the session, a GitHub repository of AWS CloudFormation templates will be provided so you can quickly deploy the same pipeline to your AWS account(s).
In the modern era, software is commonly delivered as a service: called web apps, or software-as-a-service. The twelve-factor app is a methodology for building software-as-a-service apps that:
Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;Have a clean contract with the underlying operating system, offering maximum portability between execution environments;Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;Minimize divergence between development and production, enabling continuous deployment for maximum agility;And can scale up without significant changes to tooling, architecture, or development practices.
The twelve-factor methodology can be applied to apps written in any programming language, and which use any combination of backing services (database, queue, memory cache, etc).
Serverless technologies and capabilities are here and are accessible now more than ever.
The power of infinite scale and system capabilities has never been more accessible. This also affects traditional front end development as serverless technologies allow for easy construction of backend support for any frontend with ease and simplicity.
In this talk, we will demonstrate how to build a fully functional Graphql endpoint for FE applications using Apollo Server and Client libraries, utilizing different cloud providers. We will also demonstrate the usage of Servless.com framework to set up the required infrastructure as code to simplify and support this setup
The video of the presentation (Hebrew):
https://youtu.be/8ba4cpdtK-8
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...Amazon Web Services
Organizations around the globe are leveraging the cloud to accomplish world-changing missions. This session will address how AWS can help organizations put more money toward their mission and scale outreach and operations to achieve more with less. Hear some of AWS’s most advanced customers on how their organizations handle DevOps, continuous integration and deployment. Learn how these practices allow them to rapidly develop, iterate, test and deploy highly-scalable web applications and core operational systems on AWS. The discussion will focus on best practices, lessons learned, and the specific technologies and services they use.
BP101 - 10 Things to Consider when Developing & Deploying Applications in Lar...Martijn de Jong
Many common development techniques can cause dramatic effects when your application is rolled out over hundreds of servers. As a developer, you need a good understanding of certain parts of the infrastructure to build an application designed for wide-scale deployment. System administrators who review applications before deployment should know what to look for in the code to prevent problems when rolled out to production. This session takes a look at the area where Application Development and System Administration come together. You will hear about real-life problems, view examples of bad code as well as good code, and learn what you should consider when you have to develop or deploy an application which will be rolled out in a large-scale deployment, or how to "harden" your code to support large quantities of documents.
Similar to Automating Workflows for Analytics Pipelines (20)
Talk at RubyKaigi 2015.
Plugin architecture is known as a technique that brings extensibility to a program. Ruby has good language features for plugins. RubyGems.org is an excellent platform for plugin distribution. However, creating plugin architecture is not as easy as writing code without it: plugin loader, packaging, loosely-coupled API, and performance. Loading two versions of a gem is a unsolved challenge that is solved in Java on the other hand.
I have designed some open-source software such as Fluentd and Embulk. They provide most of functions by plugins. I will talk about their plugin-based architecture.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
2. Sadayuki Furuhashi
A founder of Treasure Data, Inc. located in Silicon Valley.
OSS projects I founded:
An open-source hacker.
Github: @frsyuki
3. What's Workflow Engine?
• Automates your manual operations.
• Load data → Clean up → Analyze → Build reports
• Get customer list → Generate HTML → Send email
• Monitor server status → Restart on abnormal
• Backup database → Alert on failure
• Run test → Package it → Deploy
(Continuous Delivery)
6. Challenge: Multiple DB technologies
Amazon S3
Amazon
Redshift
Amazon EMR
> Hi!
> I'm a new technology!
7. Challenge: Modern complex data analytics
Ingest
Application logs
User attribute data
Ad impressions
3rd-party cookie data
Enrich
Removing bot access
Geo location from IP
address
Parsing User-Agent
JOIN user attributes
to event logs
Model
A/B Testing
Funnel analysis
Segmentation
analysis
Machine learning
Load
Creating indexes
Data partitioning
Data compression
Statistics
collection
Utilize
Recommendation
API
Realtime ad bidding
Visualize using BI
applications
Ingest UtilizeEnrich Model Load
8. Traditional "false" solution
#!/bin/bash
./run_mysql_query.sh
./load_facebook_data.sh
./rsync_apache_logs.sh
./start_emr_cluster.sh
for query in emr/*.sql; do
./run_emr_hive $query
done
./shutdown_emr_cluster.sh
./run_redshift_queries.sh
./call_finish_notification.sh
> Poor error handling
> Write once, Nobody reads
> No alerts on failure
> No alerts on too long run
> No retrying on errors
> No resuming
> No parallel execution
> No distributed execution
> No log collection
> No visualized monitoring
> No modularization
> No parameterization
9. Solution: Multi-Cloud Workflow Engine
Solves
> Poor error handling
> Write once, Nobody reads
> No alerts on failure
> No alerts on too long run
> No retrying on errors
> No resuming
> No parallel execution
> No distributed execution
> No log collection
> No visualized monitoring
> No modularization
> No parameterization
10. Example in our case
1. Dump data to
BigQuery
2. load all tables to
Treasure Data
3. Run queries
5. Notify on slack
4. Create reports
on Tableau Server
(on-premises)
12. Unite Engineering & Analytic Teams
+wait_for_arrival:
s3_wait>: |
bucket/www_${session_date}.csv
+load_table:
redshift>: scripts/copy.sql
Powerful for Engineers
> Comfortable for advanced users
Friendly for Analysts
> Still straight forward for analysts to
understand & leverage workflows
13. Unite Engineering & Analytic Teams
Powerful for Engineers
> Comfortable for advanced users
Friendly for Analysts
> Still straight forward for analysts to
understand & leverage workflows
+wait_for_arrival:
s3_wait>: |
bucket/www_${session_date}.csv
+load_table:
redshift>: scripts/copy.sql
+ is a task
> is an operator
${...} is a variable
14. Operator library
_export:
td:
database: workflow_temp
+task1:
td>: queries/open.sql
create_table: daily_open
+task2:
td>: queries/close.sql
create_table: daily_close
Standard libraries
redshift>: runs Amazon Redshift queries
emr>: create/shutdowns a cluster & runs
steps
s3_wait>: waits until a file is put on S3
pg>: runs PostgreSQL queries
td>: runs Treasure Data queries
td_for_each>: repeats task for result rows
mail>: sends an email
Open-source libraries
You can release & use open-source
operator libraries.
16. Loops & Parameters
+send_email_to_active_users:
td_for_each>: list_active.sql
_do:
+send:
email>: tempalte.txt
to: ${td.for_each.addr}
Parameter
A task can propagate parameters to
following tasks
Loop
Generate subtasks dynamically so
that Digdag applies the same set of
operators to different data sets.
19. Pushing workflows to a server with Docker image
schedule:
daily>: 01:30:00
timezone: Asia/Tokyo
_export:
docker:
image: my_image:latest
+task:
sh>: ./run_in_docker
Digdag server
> Develop on laptop, push it to a server.
> Workflows run periodically on a server.
> Backfill
> Web editor & monitor
Docker
> Install scripts & dependences in a
Docker image, not on a server.
> Workflows can run anywhere including
developer's laptop.
22. Digdag is production-ready
PostgreSQL
Stateless servers + Replicated DB
Digdag
client
API &
scheduler &
executor
PostgreSQL
All task state
Digdag
server
Digdag
server
HTTP Load
Balancer
Visual UI
HA
26. Learning from my OSS projects
• Make it pluggable!
700+ plugins in 6 years
200+ plugins in 3 years
input/output, parser/formatter,
decoder/encoder, filter, and executor
input/output, and filter
70+ implementations
in 8 years
27. Digdag also has plugin architecture
32 operators
7 schedulers
2 command executors
1 error notification module