This document discusses strategies for improving the reliability and performance of scheduled crawling jobs. It proposes using Sidekiq to schedule jobs instead of cron, storing job definitions and schedules in a database table. Jobs would be invoked by a CronJobWorker that runs every minute, querying the database to find and enqueue jobs due for execution. This avoids issues with cron like unreliable scheduling. It also allows prioritizing popular jobs and dealing with throttling from target servers by rate limiting job queues.
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Thoughtworks
3 years ago, Springer decided to use Scala on a large, strategic project. This talk is about the journey the development teams made. Why did they choose Scala in the first place? Did they get what they hoped for? What challenges and surprises did they encounter along the way? And, most importantly, are they still happy with their choice?
A chronicle of my attempt to create a real time web app using pure clojure at every layer of the stack, from the client to the styles to the web server
This document summarizes a presentation about optimizing server-side performance. It discusses measuring performance metrics like time to first byte, optimizing databases through techniques like adding indexes and reducing joins, using caching with Memcached and APC, choosing fast web servers like Nginx and Lighttpd, and using load testing tools like JMeter to test performance before deployment. The presentation was given by a senior engineer at Wayfair to discuss their experiences optimizing their platform.
Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014datafundamentals
This document discusses two approaches to ETL jobs in Hadoop: a manual "special snowflake" approach and an automated approach. The manual approach involves a team spending a year copying and pasting code for 15 jobs. This leads to spaghetti code and is not sustainable. The automated approach involves designing reusable templates and rules to automate the ETL process. This frees up the developer Brent to focus on design rather than manual work. It results in code that is clean, consistent, easy to maintain and passes the "10 minute test" of being idempotent. The document demonstrates generating ETL code from metadata and deploying the automated jobs to Hadoop.
Promise of a better future by Rahul Goma Phulore and Pooja Akshantal, Thought...Thoughtworks
With the recent, vivid trend towards multicore hardware and the ever growing application requirements, concurrency is no more a niche area it used to be, and is slowly becoming a norm. In this talk, we will talk about promises/futures, one of the concurrency models that has risen to the occasion. We will look at what they are, how they're implemented and used in Java and Javascript. We will see how Scala, with its functional paradigm and greater abstraction capabilities, avoids "callback hell" typically associated with the model, allows writing of concurrent code in "direct style", and thereby greatly reduces the cognitive burden, allowing you to focus on application logic better.
Developer-friendly taskqueues: What you should ask yourself before choosing oneSylvain Zimmer
This document summarizes key considerations for choosing a task queue system. It discusses task properties like idempotency and reentrancy. It covers performance factors like latency and throughput as well as consistency models. Common task queue systems like Celery, RQ, and MRQ are evaluated based on factors like performance, complexity, community support, and future plans. The document emphasizes thinking carefully about specific needs before choosing a system and being grateful for open source software.
This document summarizes Azkaban, an open source workflow scheduler that was created at LinkedIn to manage Hadoop jobs and their dependencies. Key features of Azkaban include defining job dependencies in a simple interface, retry functionality, scheduling, and viewing logs and execution details in the web UI. The document also discusses how the author uses Azkaban to manage Python batch jobs at their company, including writing job files in YAML format and using the Azkaban API. In conclusion, the author finds Azkaban simple to use and sees no reason to replace it, though hopes for more active development.
Getting Started with Apache Camel at DevNation 2014Claus Ibsen
Get off to a good start with Apache Camel. This session will give you an introduction to Apache Camel and teach you:
- How Camel is related to enterprise integration patterns (EIPs).
- How to use EIPs in Camel routes written in Java code or XML files.
- How to get started developing with Camel, including how to set up new projects from scratch using Maven and Eclipse.
- With a live demo, how to build Camel applications in Java, Spring, and OSGi Blueprint.
- How ready-to-use features make integration much easier.
- About the web console tools that give you insight into your running Apache Camel applications, including visual route diagrams with tracing, debugging, and profiling capabilities.
- Useful resources to learn more about Camel.
This session will be taught with a 50/50 mix of slides and live demos, and it will conclude with Q&A time.
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Thoughtworks
3 years ago, Springer decided to use Scala on a large, strategic project. This talk is about the journey the development teams made. Why did they choose Scala in the first place? Did they get what they hoped for? What challenges and surprises did they encounter along the way? And, most importantly, are they still happy with their choice?
A chronicle of my attempt to create a real time web app using pure clojure at every layer of the stack, from the client to the styles to the web server
This document summarizes a presentation about optimizing server-side performance. It discusses measuring performance metrics like time to first byte, optimizing databases through techniques like adding indexes and reducing joins, using caching with Memcached and APC, choosing fast web servers like Nginx and Lighttpd, and using load testing tools like JMeter to test performance before deployment. The presentation was given by a senior engineer at Wayfair to discuss their experiences optimizing their platform.
Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014datafundamentals
This document discusses two approaches to ETL jobs in Hadoop: a manual "special snowflake" approach and an automated approach. The manual approach involves a team spending a year copying and pasting code for 15 jobs. This leads to spaghetti code and is not sustainable. The automated approach involves designing reusable templates and rules to automate the ETL process. This frees up the developer Brent to focus on design rather than manual work. It results in code that is clean, consistent, easy to maintain and passes the "10 minute test" of being idempotent. The document demonstrates generating ETL code from metadata and deploying the automated jobs to Hadoop.
Promise of a better future by Rahul Goma Phulore and Pooja Akshantal, Thought...Thoughtworks
With the recent, vivid trend towards multicore hardware and the ever growing application requirements, concurrency is no more a niche area it used to be, and is slowly becoming a norm. In this talk, we will talk about promises/futures, one of the concurrency models that has risen to the occasion. We will look at what they are, how they're implemented and used in Java and Javascript. We will see how Scala, with its functional paradigm and greater abstraction capabilities, avoids "callback hell" typically associated with the model, allows writing of concurrent code in "direct style", and thereby greatly reduces the cognitive burden, allowing you to focus on application logic better.
Developer-friendly taskqueues: What you should ask yourself before choosing oneSylvain Zimmer
This document summarizes key considerations for choosing a task queue system. It discusses task properties like idempotency and reentrancy. It covers performance factors like latency and throughput as well as consistency models. Common task queue systems like Celery, RQ, and MRQ are evaluated based on factors like performance, complexity, community support, and future plans. The document emphasizes thinking carefully about specific needs before choosing a system and being grateful for open source software.
This document summarizes Azkaban, an open source workflow scheduler that was created at LinkedIn to manage Hadoop jobs and their dependencies. Key features of Azkaban include defining job dependencies in a simple interface, retry functionality, scheduling, and viewing logs and execution details in the web UI. The document also discusses how the author uses Azkaban to manage Python batch jobs at their company, including writing job files in YAML format and using the Azkaban API. In conclusion, the author finds Azkaban simple to use and sees no reason to replace it, though hopes for more active development.
Getting Started with Apache Camel at DevNation 2014Claus Ibsen
Get off to a good start with Apache Camel. This session will give you an introduction to Apache Camel and teach you:
- How Camel is related to enterprise integration patterns (EIPs).
- How to use EIPs in Camel routes written in Java code or XML files.
- How to get started developing with Camel, including how to set up new projects from scratch using Maven and Eclipse.
- With a live demo, how to build Camel applications in Java, Spring, and OSGi Blueprint.
- How ready-to-use features make integration much easier.
- About the web console tools that give you insight into your running Apache Camel applications, including visual route diagrams with tracing, debugging, and profiling capabilities.
- Useful resources to learn more about Camel.
This session will be taught with a 50/50 mix of slides and live demos, and it will conclude with Q&A time.
George Wilson presented on modern cloud architecture and automation for websites built with content management systems like Joomla. He demonstrated how to automate the deployment of a Joomla site on AWS using just 7 commands and a configuration file. This included uploading the code, creating the application version, and provisioning the environment. Wilson discussed the rise of using CLIs and APIs to manage websites and their content programmatically. Documenting APIs with OpenAPI/Swagger was presented as a best practice. While these techniques may not apply to all Joomla sites, Wilson argued they are relevant for many sites in Joomla's target markets that prioritize agility and automation.
This document provides tips for writing LotusScript code for large systems with a focus on logging, performance, code reuse, and handling weird situations. Some key points include:
- Logging is important for stability and managing large systems. Recommends using OpenLog or creating and emailing log documents to avoid performance impacts.
- Views with click-sorted columns and unnecessary views hurt performance. Recommends minimizing views and avoiding click-sort.
- Agents need to be well-behaved to avoid overloading servers. Suggests profiling agents, breaking large tasks into multiple runs, and not relying on Agent Manager to kill misbehaving agents.
- Code reuse is important for maintenance. Recommends creating
Van Wilson
Senior Consultant with Cardinal Solutions
Find more by Van Wilson: https://speakerdeck.com/vjwilson
All Things Open
October 26-27, 2016
Raleigh, North Carolina
Enterprise Integration Patterns with Apache CamelIoan Eugen Stan
This document discusses Enterprise Integration Patterns (EIPs) using Apache Camel, a Java framework for integration and mediation. It provides an overview of common EIPs like content-based routing, normalization, and the transactional client pattern. It also demonstrates how to implement EIPs like these using the Java and Spring DSLs in Camel. Key features of Camel like components, exchanges, processors and error handling are explained. Tools for working with Camel like Fuse IDE and Hawt.io are also introduced.
Developing Microservices with Apache CamelClaus Ibsen
Red Hat Microservices Architecture Day - New York, November 2015. Presented by Claus Ibsen.
Apache Camel is a very popular integration library that works very well with microservice architecture. This talk introduces you to Apache Camel and how you can easily get started with Camel on your computer. Then we cover how to create new Camel projects from scratch as microservices, which you can boot using Camel or Spring Boot, or other micro containers such as Jetty or fat JARs. We then take a look at what options you have for monitoring and managing your Camel microservices using tooling such as Jolokia, and hawtio web console.
2 hour session where I cover what is Apache Camel, latest news on the upcoming Camel v3, and then the main topic of the talk is the new Camel K sub-project for running integrations natively on the cloud with kubernetes. The last part of the talk is about running Camel with GraalVM / Quarkus to archive native compiled binaries that has impressive startup and footprint.
Modernizing Legacy Applications in PHP, por Paul JonesiMasters
Paul Jones, criador do "Aura for PHP" e autor de "Modernizing Legacy App in PHP", falou sobre 'Modernizing Legacy Applications in PHP' no iMasters PHP Experience 2015.
O iMasters PHP Experience 2015 aconteceu dia 25 de Abril de 2015, no Hotel Renaissance em São Paulo-SP - http://phpexperience.imasters.com.br/
2 hour session where I cover what is Apache Camel, latest news on the upcoming Camel v3, and then the main topic of the talk is the new Camel K sub-project for running integrations natively on the cloud with kubernetes. The last part of the talk is about running Camel with GraalVM / Quarkus to archive native compiled binaries that has impressive startup and footprint.
Camel Day Italy 2021 - What's new in Camel 3Claus Ibsen
Slides for the 50 min presentation at Camel Day Italy 2021, where Claus Ibsen and Andrea Cosentino had the opporunity to give a more deep dive talk about the journey towards Camel 3, and what we have done to re-architect camel core in v3 to make it awesome for microservices, cloud native, kubernetes, quarkus, graalvm, knative, apache kafka.
Camel Day Italy 2021: https://www.meetup.com/it-IT/red-hat-developers-italy/events/275332376/
Camel K allows building and deploying Apache Camel integration applications on Kubernetes in about 1 second. It provides a lightweight runtime for Camel on Kubernetes that enables low-code/no-code integration using Camel's Java DSL. Camel K applications can take advantage of serverless capabilities provided by Knative like autoscaling and scaling to zero. Quarkus is a Kubernetes-native Java stack that provides a minimal footprint and container-first experience for building microservices. It works well with Camel/Camel K by enabling native compilation of Camel routes for very fast startup times and low memory usage.
The document discusses Reactive Xamarin, which combines Rx (Reactive Extensions) and Xamarin. It introduces key Rx concepts like Observables, LINQ, and Schedulers. Observables represent asynchronous push-based collections and address concurrency using Schedulers. LINQ allows querying Observables. Reactive UI and RxLite provide UI frameworks that integrate these Rx concepts into Xamarin apps through bindings and commands. In summary, Reactive Xamarin leverages Rx to build responsive and concurrent Xamarin apps in a reactive and declarative manner.
The document describes the journey of automating large scale enterprise crash dump analysis. It details how manual crash analysis used to be a slow and difficult process involving passing large files between experts. Through four steps of automation - automating analysis, adding a web frontend, integrating workflows, and enabling deep analysis in the browser - a tool called SuperDump was created that transformed the process. SuperDump reduced analysis time from days to minutes, enabled non-experts to analyze crashes, and improved productivity, security, and quality by making analysis scalable and easy.
This document discusses tools and techniques for optimizing Ruby performance. It begins by looking at common expensive tasks like database operations, network access, and inefficient algorithms. It then discusses tools for benchmarking and profiling Ruby code like Benchmark, benchmark-ips, and stackprof. The document provides examples of optimizing ActiveRecord queries and using caching and memoization. It also discusses optimizing the environment through server, database, and caching configuration. Finally, it notes that in some CPU-intensive or async tasks, Ruby may not be the best tool.
Developer day - AWS: Fast Environments = Fast DeploymentsMatthew Cwalinski
The document discusses how AWS enables fast and flexible deployments through automation. It outlines problems with manual and unique server deployments like breakages and lack of change management. The solution presented is to automate the entire process through continuous integration and deployment tools like Jenkins, GitHub, Grunt, and AWS CloudFormation. This treats servers as identical and deployable resources, ensures all code is tested and production-ready, and allows for boring but successful automated deployments on demand.
Apache Camel Introduction & What's in the boxClaus Ibsen
Slides from JavaBin talk in Grimstad Norway, presented by Claus Ibsen in February 2016.
This slide deck is full up to date with latest Apache Camel 2.16.2 release and includes additional slides to present many of the features that Apache Camel provides out of the box.
Konrad Malawski gave a talk at Scala Days CPH 2017 about the current state and future direction of Akka. He discussed how Akka is moving from the actor model to reactive streams and Akka Streams for better concurrency and distribution capabilities. Akka Cluster provides robust membership and fault tolerance for distributed actors across many nodes, while Cluster Sharding enables easy sharding of data and work across a cluster. The talk outlined Akka's past successes and hinted at upcoming improvements to further "do better than that."
The document discusses different tools for automating processes in Salesforce - Workflow, Process Builder, and Apex - and provides guidance on which tool to use for different situations. It notes that Workflow is quick but limited, Process Builder can meet most needs but has scalability risks, and Apex can do anything but requires more effort. Key considerations for each tool include ease of use, functionality, scalability, debugging, and time to deploy.
ApacheCon EU 2016 - Apache Camel the integration libraryClaus Ibsen
This presentation will demonstrate to developers involved with integration how the Apache Camel project can make your life much easier.
We start with an introduction to what Apache Camel is, and how you can use Camel to make integration much easier. Allowing you to focus on your business logic, rather than low level messaging protocols, and transports.
You will hear how Apache Camel is related Enterprise Integration Patterns which you can use in your architectural designs and as well in Java or XML code, running on the JVM with Camel.
You will also hear what other features Camel provides out of the box, which can make integration much easier for you.
1) Ansible is being used at Backbase to automate the provisioning of different server configurations for testing their Customer Experience Platform (CXP).
2) A REST API and UI allow users to easily provision new environments from available server stacks configured with Ansible for testing.
3) This enables Backbase to implement continuous delivery practices like automated testing of new versions without affecting production environments.
The document discusses using Apache Camel and Apache Karaf to build distributed, asynchronous systems in a similar way to AKKA. It provides examples of building a dynamic routing system using Camel routing and JMS, as well as a modular ETL system for processing CSV files using a configurable, hot-deployable mutation framework. The examples demonstrate how to achieve scalability, modularity, and asynchronous behavior without deep knowledge of the underlying technologies through an event-driven architecture based on messaging.
Este documento presenta un resumen del proyecto de innovación docente PID 11-145 para la asignatura "Fundamentos de la Programación" del Grado en Información y Documentación de la Universidad de Granada. El proyecto consistió en la elaboración de materiales audiovisuales para apoyar la enseñanza de elementos básicos de programación e instrucciones y operadores.
El documento describe tres pasos para modificar y agregar información a una hoja de cálculo. Primero, agrega dos filas al final para mostrar el total y promedio de los importes. Segundo, agrega un nuevo cuadro con los nombres de las zonas y el número de clientes. Tercero, agrega otro cuadro nuevo con los nombres de las zonas y el importe máximo.
George Wilson presented on modern cloud architecture and automation for websites built with content management systems like Joomla. He demonstrated how to automate the deployment of a Joomla site on AWS using just 7 commands and a configuration file. This included uploading the code, creating the application version, and provisioning the environment. Wilson discussed the rise of using CLIs and APIs to manage websites and their content programmatically. Documenting APIs with OpenAPI/Swagger was presented as a best practice. While these techniques may not apply to all Joomla sites, Wilson argued they are relevant for many sites in Joomla's target markets that prioritize agility and automation.
This document provides tips for writing LotusScript code for large systems with a focus on logging, performance, code reuse, and handling weird situations. Some key points include:
- Logging is important for stability and managing large systems. Recommends using OpenLog or creating and emailing log documents to avoid performance impacts.
- Views with click-sorted columns and unnecessary views hurt performance. Recommends minimizing views and avoiding click-sort.
- Agents need to be well-behaved to avoid overloading servers. Suggests profiling agents, breaking large tasks into multiple runs, and not relying on Agent Manager to kill misbehaving agents.
- Code reuse is important for maintenance. Recommends creating
Van Wilson
Senior Consultant with Cardinal Solutions
Find more by Van Wilson: https://speakerdeck.com/vjwilson
All Things Open
October 26-27, 2016
Raleigh, North Carolina
Enterprise Integration Patterns with Apache CamelIoan Eugen Stan
This document discusses Enterprise Integration Patterns (EIPs) using Apache Camel, a Java framework for integration and mediation. It provides an overview of common EIPs like content-based routing, normalization, and the transactional client pattern. It also demonstrates how to implement EIPs like these using the Java and Spring DSLs in Camel. Key features of Camel like components, exchanges, processors and error handling are explained. Tools for working with Camel like Fuse IDE and Hawt.io are also introduced.
Developing Microservices with Apache CamelClaus Ibsen
Red Hat Microservices Architecture Day - New York, November 2015. Presented by Claus Ibsen.
Apache Camel is a very popular integration library that works very well with microservice architecture. This talk introduces you to Apache Camel and how you can easily get started with Camel on your computer. Then we cover how to create new Camel projects from scratch as microservices, which you can boot using Camel or Spring Boot, or other micro containers such as Jetty or fat JARs. We then take a look at what options you have for monitoring and managing your Camel microservices using tooling such as Jolokia, and hawtio web console.
2 hour session where I cover what is Apache Camel, latest news on the upcoming Camel v3, and then the main topic of the talk is the new Camel K sub-project for running integrations natively on the cloud with kubernetes. The last part of the talk is about running Camel with GraalVM / Quarkus to archive native compiled binaries that has impressive startup and footprint.
Modernizing Legacy Applications in PHP, por Paul JonesiMasters
Paul Jones, criador do "Aura for PHP" e autor de "Modernizing Legacy App in PHP", falou sobre 'Modernizing Legacy Applications in PHP' no iMasters PHP Experience 2015.
O iMasters PHP Experience 2015 aconteceu dia 25 de Abril de 2015, no Hotel Renaissance em São Paulo-SP - http://phpexperience.imasters.com.br/
2 hour session where I cover what is Apache Camel, latest news on the upcoming Camel v3, and then the main topic of the talk is the new Camel K sub-project for running integrations natively on the cloud with kubernetes. The last part of the talk is about running Camel with GraalVM / Quarkus to archive native compiled binaries that has impressive startup and footprint.
Camel Day Italy 2021 - What's new in Camel 3Claus Ibsen
Slides for the 50 min presentation at Camel Day Italy 2021, where Claus Ibsen and Andrea Cosentino had the opporunity to give a more deep dive talk about the journey towards Camel 3, and what we have done to re-architect camel core in v3 to make it awesome for microservices, cloud native, kubernetes, quarkus, graalvm, knative, apache kafka.
Camel Day Italy 2021: https://www.meetup.com/it-IT/red-hat-developers-italy/events/275332376/
Camel K allows building and deploying Apache Camel integration applications on Kubernetes in about 1 second. It provides a lightweight runtime for Camel on Kubernetes that enables low-code/no-code integration using Camel's Java DSL. Camel K applications can take advantage of serverless capabilities provided by Knative like autoscaling and scaling to zero. Quarkus is a Kubernetes-native Java stack that provides a minimal footprint and container-first experience for building microservices. It works well with Camel/Camel K by enabling native compilation of Camel routes for very fast startup times and low memory usage.
The document discusses Reactive Xamarin, which combines Rx (Reactive Extensions) and Xamarin. It introduces key Rx concepts like Observables, LINQ, and Schedulers. Observables represent asynchronous push-based collections and address concurrency using Schedulers. LINQ allows querying Observables. Reactive UI and RxLite provide UI frameworks that integrate these Rx concepts into Xamarin apps through bindings and commands. In summary, Reactive Xamarin leverages Rx to build responsive and concurrent Xamarin apps in a reactive and declarative manner.
The document describes the journey of automating large scale enterprise crash dump analysis. It details how manual crash analysis used to be a slow and difficult process involving passing large files between experts. Through four steps of automation - automating analysis, adding a web frontend, integrating workflows, and enabling deep analysis in the browser - a tool called SuperDump was created that transformed the process. SuperDump reduced analysis time from days to minutes, enabled non-experts to analyze crashes, and improved productivity, security, and quality by making analysis scalable and easy.
This document discusses tools and techniques for optimizing Ruby performance. It begins by looking at common expensive tasks like database operations, network access, and inefficient algorithms. It then discusses tools for benchmarking and profiling Ruby code like Benchmark, benchmark-ips, and stackprof. The document provides examples of optimizing ActiveRecord queries and using caching and memoization. It also discusses optimizing the environment through server, database, and caching configuration. Finally, it notes that in some CPU-intensive or async tasks, Ruby may not be the best tool.
Developer day - AWS: Fast Environments = Fast DeploymentsMatthew Cwalinski
The document discusses how AWS enables fast and flexible deployments through automation. It outlines problems with manual and unique server deployments like breakages and lack of change management. The solution presented is to automate the entire process through continuous integration and deployment tools like Jenkins, GitHub, Grunt, and AWS CloudFormation. This treats servers as identical and deployable resources, ensures all code is tested and production-ready, and allows for boring but successful automated deployments on demand.
Apache Camel Introduction & What's in the boxClaus Ibsen
Slides from JavaBin talk in Grimstad Norway, presented by Claus Ibsen in February 2016.
This slide deck is full up to date with latest Apache Camel 2.16.2 release and includes additional slides to present many of the features that Apache Camel provides out of the box.
Konrad Malawski gave a talk at Scala Days CPH 2017 about the current state and future direction of Akka. He discussed how Akka is moving from the actor model to reactive streams and Akka Streams for better concurrency and distribution capabilities. Akka Cluster provides robust membership and fault tolerance for distributed actors across many nodes, while Cluster Sharding enables easy sharding of data and work across a cluster. The talk outlined Akka's past successes and hinted at upcoming improvements to further "do better than that."
The document discusses different tools for automating processes in Salesforce - Workflow, Process Builder, and Apex - and provides guidance on which tool to use for different situations. It notes that Workflow is quick but limited, Process Builder can meet most needs but has scalability risks, and Apex can do anything but requires more effort. Key considerations for each tool include ease of use, functionality, scalability, debugging, and time to deploy.
ApacheCon EU 2016 - Apache Camel the integration libraryClaus Ibsen
This presentation will demonstrate to developers involved with integration how the Apache Camel project can make your life much easier.
We start with an introduction to what Apache Camel is, and how you can use Camel to make integration much easier. Allowing you to focus on your business logic, rather than low level messaging protocols, and transports.
You will hear how Apache Camel is related Enterprise Integration Patterns which you can use in your architectural designs and as well in Java or XML code, running on the JVM with Camel.
You will also hear what other features Camel provides out of the box, which can make integration much easier for you.
1) Ansible is being used at Backbase to automate the provisioning of different server configurations for testing their Customer Experience Platform (CXP).
2) A REST API and UI allow users to easily provision new environments from available server stacks configured with Ansible for testing.
3) This enables Backbase to implement continuous delivery practices like automated testing of new versions without affecting production environments.
The document discusses using Apache Camel and Apache Karaf to build distributed, asynchronous systems in a similar way to AKKA. It provides examples of building a dynamic routing system using Camel routing and JMS, as well as a modular ETL system for processing CSV files using a configurable, hot-deployable mutation framework. The examples demonstrate how to achieve scalability, modularity, and asynchronous behavior without deep knowledge of the underlying technologies through an event-driven architecture based on messaging.
Este documento presenta un resumen del proyecto de innovación docente PID 11-145 para la asignatura "Fundamentos de la Programación" del Grado en Información y Documentación de la Universidad de Granada. El proyecto consistió en la elaboración de materiales audiovisuales para apoyar la enseñanza de elementos básicos de programación e instrucciones y operadores.
El documento describe tres pasos para modificar y agregar información a una hoja de cálculo. Primero, agrega dos filas al final para mostrar el total y promedio de los importes. Segundo, agrega un nuevo cuadro con los nombres de las zonas y el número de clientes. Tercero, agrega otro cuadro nuevo con los nombres de las zonas y el importe máximo.
El documento describe las partes internas y externas del computador. Entre las partes internas se encuentran el disco duro, la memoria RAM, la unidad de disquete, la ventiladora, la fuente de poder, la tarjeta madre, el procesador y la batería. Las partes externas incluyen el ratón, la CPU, el monitor y el teclado. El documento proporciona una breve descripción de cada parte y su función dentro del computador.
Numerosas especies de animales se encuentran en peligro de extinción debido a factores como el cambio climático, la destrucción de hábitat y la caza furtiva. Entre ellas se encuentran ballenas, tiburones, osos polares, elefantes pigmeos y otros que ya sólo podemos ver en fotografías. El cóndor andino y el manatí también están amenazados; el cóndor es el ave más imponente de los Andes y el manatí es un mamífero acuático que se alimenta de vegetación y se
George Steiner, um filósofo e ensaísta de 88 anos, defende que os jovens precisam da liberdade de errar para se tornarem adultos completos. Ele critica a educação atual por não permitir que as crianças cometam erros ou sonhem com utopias. Steiner também acredita que a cultura clássica pode estar morrendo, sendo substituída por novas formas culturais mais inclusivas.
This document discusses how to break bad habits by using GitLab CI to automate routine tasks. It provides examples of automating tests, packaging code, and deploying artifacts and websites. Specifically, it shows how to:
1. Run automated tests with GitLab CI
2. Package code into downloadable artifacts
3. Deploy packages and websites to AWS S3 and GitLab Pages
4. Separate testing and production using environments
5. Allow multiple developers to work on the same project simultaneously
6. Avoid mistakes by not deploying directly to production
El documento presenta información sobre alternativas de mitigación para el cambio climático en Bucaramanga, Colombia. Describe las principales manifestaciones del cambio climático observadas en la ciudad como aumento de temperatura, escasez de agua, incendios forestales y enfermedades virales. Luego detalla varias medidas de adaptación y mitigación en sectores como agua, agricultura, salud, turismo y energía, las cuales buscan reducir los efectos del cambio climático mediante el uso eficiente de recursos y tecnologías limpias. Finalmente
SearchLove Boston 2016 | Kindra Hall | Storytelling: The Secret of Irresistib...Distilled
It's no secret; in marketing, whoever tells the best story wins. The problem? ‘Storytelling’ has surpassed buzzword status and now everyday marketers are missing opportunities to connect with their customers because they simply don't know what a good story is anymore. In this engaging and immediately applicable presentation, strategic storytelling consultant Kindra Hall will reveal specific storytelling strategies to create great content to win customers without a fight.
Introducing Cloudera Director at Big Data BashAndrei Savu
My slide deck for Big Data Bash. This is a quick introduction on Cloudera Director and it ends with a list of open questions around some interesting future problems we are planning to work on.
The document discusses improving the onboarding process for new engineers. It describes current problems with README-driven onboarding like errors, things not working, and a lack of helpful guidance. It then provides suggestions for a nicer onboarding process like using automated setup scripts, provisioning consistent development environments with Vagrant, providing example projects and tasks, documenting best practices, and ensuring new engineers get help and have time for questions. The overall message is that the onboarding process should be made easier and more successful for new engineers.
The document discusses asynchronous programming using async and await in C#. It begins by explaining what asynchronous programming is and why it is useful for improving app responsiveness and simplifying asynchronous code. It then describes how async and await works by generating state machines and using continuation tasks. The document covers some gotchas with async code as well as best practices like naming conventions. It provides references for further reading on asynchronous patterns, tasks, and unit testing asynchronous code.
1) The document discusses the challenges of building a non-serverless lottery application and how the team transitioned to a serverless architecture using AWS Lambda and other serverless technologies.
2) It describes the process of mapping out the value stream for the original non-serverless application versus the serverless approach.
3) The document then outlines how the team implemented the serverless lottery application including using Lambda, API Gateway, MongoDB Atlas, SQS, and other services and how they addressed challenges like cold starts and resiliency.
Startup DevOps - Jon Milsom Pitchero - Leeds DevOps - August 2014Jon Milsom
Presentation at Leeds DevOps by Jon Milsom (Co-Founder & CTO, Pitchero), August 2014
http://www.pitchero.com/
http://www.leedsdevops.org.uk/
https://twitter.com/jonmilsom
https://twitter.com/leedsDevops
Continuous Integration, the minimum viable productJulian Simpson
What does it mean to 'do' Continuous Integration? It used to be enough to execute your unit tests in CI. But the bar is steadily raising for engineering practices. In the last decade we've seen tremendous improvements inacceptance testing. JavaScript is now a platform in it's own right. Cloudcomputing is now vital. There's growing interest in deployment to prod.So Continuous Integration is under more pressure than ever. As the bar slowly raises for engineering practices, we ll present 2011's minimum viable feature set for Continuous Integration
Asynchronous Processing with Ruby on Rails (RailsConf 2008)Jonathan Dahl
The document discusses asynchronous processing and provides recommendations for when and how to implement it. It describes asynchronous processing as running tasks without blocking normal execution flow. Common uses include sending emails, processing images, and database synchronization. It recommends using a background job queue like Delayed Job for general purpose asynchronous tasks and message queues like SQS with custom workers for distributed processing tasks requiring high speed and scalability.
Writing Asynchronous Programs with Scala & AkkaYardena Meymann
The document provides an overview of Yardena Meymann's background and experience working with asynchronous programming in Scala. It discusses some of the common tools and approaches for writing asynchronous programs in Scala, including Futures, Actors, Streams, HTTP clients/servers, and integration with Kafka. It highlights some of the challenges of asynchronous programming and how different tools address issues like error handling, retries, and backpressure.
Building source code level profiler for C++.pdfssuser28de9e
1. The document describes building a source code level profiler for C++ applications. It outlines 4 milestones: logging execution time, reducing macros, tracking function hit counts, and call path profiling using a radix tree.
2. Key aspects discussed include using timers to log function durations, storing profiling data in a timed entry class, and maintaining a call tree using a radix tree with nodes representing functions and profiling data.
3. The goal is to develop a customizable profiler to identify performance bottlenecks by profiling execution times and call paths at the source code level.
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Databricks
At DataVisor, we fight online fraud, abuse, and money laundering using unsupervised machine learning approach that clusters millions of users. In order to support the computationally intensive workload, DataVisor uses Spark as the mainstay of its computation infrastructure. The scalability and portability of our Spark infrastructure is critical to our company when we expand our business. In this talk, we will present our story of how we manage our Spark infrastructure at scale.
At peak time, we have 2000+ Spark workers online, and we group these workers into ~50 clusters of various size. The benefits of this, on one hand, is data isolation, which is critical to DataVisor as we are processing multi-customer data. On the other hand, this is for cost and performance consideration, as we want to provide just enough resources to each Spark application. When under-provision, Spark application will fail due to out-of-memory or out-of-disk. However we want to avoid unnecessary over-provision as it dramatically increases our cloud cost.
Next, we will present our DataVisor SparkGenerator (DSG), which is designed to automatically manage our Spark infrastructure. The responsibility of DSG includes (a) launching and shutting down Spark cluster, to maximize concurrency and minimize cost, (b) assigning Spark applications to the proper clusters intelligently, according to the Spark application profile, and (c) managing the dependency among Spark applications, to make our pipeline run smoothly and efficiently, and (d) running all of the Spark worker on Spot instances, reducing the cloud computation cost versus on-demand by over 80%.
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMManuel Bernhardt
The document discusses the need for reactive and functional programming approaches to build scalable applications that can take advantage of many-core processors and distributed systems. It introduces key concepts like immutability, functions, and declarative programming. Specific frameworks like Scala, Play and Akka are presented as tools that support this reactive, functional style for building web applications that can horizontally scale across multiple cores and nodes. The talk promotes adopting these approaches to build systems that can better handle concurrency, distribution and failure.
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...Amazon Web Services
Most developers write them and every company has them – a vast library of small and large scripts that are designed to run on a scheduled basis. These background angels help keep the lights on and the doors open. They’ve been built up over time and are forgotten little heroes that are only remembered when the machines they live on fail. They are scattered throughout a company’s IT infrastructure and do important things.
In this session, we will explain how to use Ruby on Simple Workflow to quickly build a system that schedules scripts, runs them on time, retries them if they fail, and stores the history of their execution. You will walk away from this session with an understanding of how Simple Workflow brings resiliency, concurrency, and tracking to your applications.
This document discusses various techniques for improving Rails application performance, including reducing roundtrips through CSS sprites and data URIs, using tools like Firebug and NewRelic to diagnose issues, avoiding N+1 queries, and leveraging caching, monitoring, and scaling. It also briefly mentions plugins like Bullet and tools like RubyProf that can help optimize applications.
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
Presentation covering 25 years worth of lessons learned while performance benchmarking applications and databases. Presented at Percona Live London in November 2014.
Advanced technic for OS upgrading in 3 minutesHiroshi SHIBATA
This document discusses strategies for rapidly automating operating system upgrades and application deployments at scale. It proposes a two-phase image creation strategy using official OS images and Packer to build minimal and role-specific images. Automated tools like Puppet, Capistrano, Consul and Fluentd are configured to allow deployments to complete within 30 minutes through infrastructure-as-code practices. Continuous integration testing with Drone and Serverspec is used to refactor configuration files and validate server configurations.
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive DataSumit Rangwala
The “People You May Know” (PYMK) recommendation service helps LinkedIn’s members identify other members that they might want to connect to and is the major driver for growing LinkedIn's social network. The principal challenge in developing a service like PYMK is dealing with the sheer scale of computation needed to make precise recommendations with a high recall. PYMK service at LinkedIn has been operational for over a decade, during which it has evolved from an Oracle-backed system that took weeks to compute recommendations to a Hadoop backed system that took a few days to compute recommendations to its most modern embodiment where it can compute recommendations in near real time.
This talk will present the evolution of PYMK to its current architecture. We will focus on various systems we built along the way, with an emphasis on systems we built for our most recent architecture, namely Gaia, our real-time graph computing capability, and Venice our online feature store with scoring capability, and how we integrate these individual systems to generate recommendations in a timely and agile manner, while still being cost-efficient. We will briefly talk about the lessons learned about scalability limits of our past and current design choices and how we plan to tackle the scalability challenges for the next phase of growth.
https://qcon.ai/qconai2019/presentation/people-you-may-know-fast-recommendations-over-massive-data
The document discusses using queue systems to execute tasks asynchronously in the background to improve application performance and scalability. It provides an overview of different types of queue systems including dedicated job queues like Gearman and Beanstalkd, message queues like RabbitMQ, and software-as-a-service queues like Amazon SQS. It also discusses using databases like Redis as queues. The document then dives deeper into examples of using Gearman and Beanstalkd in PHP applications and compares their performance. It also discusses using queue abstraction layers and best practices for queueing jobs.
"Drupal is always so fast!" ... said no one, ever.
Drupal has a reputation as being a slow CMS, but that reputation is undeserved; there are many small things that impact a Drupal site's performance in sometimes substantial ways. This session will highlight many 'quick wins' that will get your site performing like a champ in no time!
Then we'll take a demonstration site that has many elements of real-world 'slow' Drupal sites, show how to do a quick performance evaluation/triage, and change the site from loading in 4-5 seconds to loading in less than a second, and maxing out at 2 requests per second to a speedy 4,000+ requests per second!
The session will also discuss the importance of a plan, benchmarking, and assumptions when you do performance work on your own Drupal site.
Serverless in production, an experience report (FullStack 2018)Yan Cui
This document discusses considerations for making serverless applications production ready. It covers topics like testing, monitoring, logging, deployment pipelines, performance optimization, and security. The document emphasizes principles over specific tools, and recommends focusing on shipping working software through practices like embracing external services for testing instead of mocking.
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf
There are benefits to be gained when patterns and practices from developer techniques are applied to operations. Notably, a fully automated solution where infrastructure is managed as code and all changes are automatically validated before reaching production. This is a process shift that is recognized among industry innovators. For organizations already leveraging these processes, it should be clear how to leverage Microsoft platforms. For organizations that are new to the topic, it should be clear how to bring this process to your environment and what it means to your organizational culture. This presentation explains the components of a Release Pipeline for configuration as code, the value to operations, and solutions that are used when designing a new Release Pipeline architecture.
Similar to Building Efficient and Reliable Crawler System With Sidekiq Enterprise (20)
What is Augmented Reality Image Trackingpavan998932
Augmented Reality (AR) Image Tracking is a technology that enables AR applications to recognize and track images in the real world, overlaying digital content onto them. This enhances the user's interaction with their environment by providing additional information and interactive elements directly tied to physical images.
WhatsApp offers simple, reliable, and private messaging and calling services for free worldwide. With end-to-end encryption, your personal messages and calls are secure, ensuring only you and the recipient can access them. Enjoy voice and video calls to stay connected with loved ones or colleagues. Express yourself using stickers, GIFs, or by sharing moments on Status. WhatsApp Business enables global customer outreach, facilitating sales growth and relationship building through showcasing products and services. Stay connected effortlessly with group chats for planning outings with friends or staying updated on family conversations.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Odoo ERP software
Odoo ERP software, a leading open-source software for Enterprise Resource Planning (ERP) and business management, has recently launched its latest version, Odoo 17 Community Edition. This update introduces a range of new features and enhancements designed to streamline business operations and support growth.
The Odoo Community serves as a cost-free edition within the Odoo suite of ERP systems. Tailored to accommodate the standard needs of business operations, it provides a robust platform suitable for organisations of different sizes and business sectors. Within the Odoo Community Edition, users can access a variety of essential features and services essential for managing day-to-day tasks efficiently.
This blog presents a detailed overview of the features available within the Odoo 17 Community edition, and the differences between Odoo 17 community and enterprise editions, aiming to equip you with the necessary information to make an informed decision about its suitability for your business.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfUndress Baby
The quest for the best AI face swap solution is marked by an amalgamation of technological prowess and artistic finesse, where cutting-edge algorithms seamlessly replace faces in images or videos with striking realism. Leveraging advanced deep learning techniques, the best AI face swap tools meticulously analyze facial features, lighting conditions, and expressions to execute flawless transformations, ensuring natural-looking results that blur the line between reality and illusion, captivating users with their ingenuity and sophistication.
Web:- https://undressbaby.com/
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
13. • Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
14. • Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Problems of the past practice
15. • Introduction to Statementdog
• Data behind Statementdog
• Past practice of Statementdog
• Problems of the past practice
• How we design our system to solve the problems.
16. Focus on:
• More reliable job scheduling
• Dealing with throttling issue
37. Yearly - dividend, remuneration of directors and supervisors
Quarterly - quarterly financial statements
Monthly - Revenue
Weekly -
Daily - closing price
Hourly - stock news from Yahoo stock feed
Minutely - important news from Taiwan Market Observation Post System
38.
39. Something like this,
but written in PHP
A super long running process (1 hour+)
loops from the first stock to the last one
Stock.find_each do |stock|
# download xml financial report data
…
# extract xml data
…
# calculate advanced data
…
end
40. A super long running process
for quarterly report
41. A super long running process
for quarterly report
A super long running process
for monthly revenue
42. A super long running process
for quarterly report
A super long running process
for monthly revenue
A super long running process
for daily price
43. A super long running process
for quarterly report
A super long running process
for monthly revenue
A super long running process
for daily price
A super long running process
for news
.
.
.
54. • Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
55. • Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
56. • Inherent problems of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Not easy to deal with bandwidth throttling issue
70. Sidekiq Pro Sidekiq Enterprise
Batches
Enhanced Reliability
Search in Web UI
Worker Metrics
Expiring Jobs
Rate Limiting
Periodic Jobs
Unique Jobs
Historical Metrics
Multi-process
Encryption
73. • Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
74. • Efficient - only retry the failed one
• Predictable server loading
• Easy to scale out
75. • Really slow
• Inefficient - unable to only retry the failed one.
• Unpredictable server loading
• Scale out is not easy
76. • Inherent problem of Unix Cron:
• Unreliable scheduling
• High availability is not easy
• Hard to prioritize job by the popularity
• Not easy to deal with bandwidth throttling issue
80. Keep states of cron executions in
our robustest part of system - database
All scheduled jobs are invoked by a particular job
executed minutely
81. Keep states of cron executions in
our robustest part of system - database
All scheduled jobs are invoked by a particular job
executed minutely
82. create_table :cron_jobs do |t|
t.string :klass, null: false
t.string :cron_expression, null: false
t.timestamp :next_run_at, null: false, index: true
end
Create table for storing cron settings
table name: cron_jobs
83. create_table :cron_jobs do |t|
t.string :klass, null: false
t.string :cron_expression, null: false
t.timestamp :next_run_at, null: false, index: true
end
Create table for storing cron settings
worker class name
84. create_table :cron_jobs do |t|
t.string :klass, null: false
t.string :cron_expression, null: false
t.timestamp :next_run_at, null: false, index: true
end
Create table for storing cron settings
Something like
0 */2 * * *
85. create_table :cron_jobs do |t|
t.string :klass, null: false
t.string :cron_expression, null: false
t.timestamp :next_run_at, null: false, index: true
end
Create table for storing cron settings
when will a job should be executed
87. # Add to your Cron setting
every :minute do
runner 'CronJobWorker.perform_async'
end
Cron only schedules one job minutely
88. class CronJobWorker
include Sidekiq::Worker
def perform
CronJob.find_each("next_run_at <= ?", Time.now) do |job|
end
end
end
CronJobWorker to invoke all of your crawlers
Find jobs should be executed
89. class CronJobWorker
include Sidekiq::Worker
def perform
CronJob.find_each("next_run_at <= ?", Time.now) do |job|
Sidekiq::Client.push(
class: job.klass.constantize,
args: ['foo', ‘bar']
)
end
end
end
CronJobWorker to invoke all of your crawlers
Push jobs to job queue
90. class CronJobWorker
include Sidekiq::Worker
def perform
CronJob.find_each("next_run_at <= ?", Time.now) do |job|
Sidekiq::Client.push(
class: job.klass.constantize,
args: ['foo', ‘bar']
)
x = Sidekiq::CronParser.new(job.cron_expression)
job.update!(next_run_at: x.next.to_time)
end
end
end
CronJobWorker to invoke all of your crawlers
Setup the next execution time
91. class CronJobWorker
include Sidekiq::Worker
def perform
CronJob.find_each("next_run_at <= ?", Time.now) do |job|
Sidekiq::Client.push(
class: job.klass.constantize,
args: ['foo', ‘bar']
)
x = Sidekiq::CronParser.new(job.cron_expression)
job.update!(next_run_at: x.next.to_time)
end
end
end
CronJobWorker to invoke all of your crawlers
92. The missed job executions will be
executed at next minute
93. • Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
94. Drawbacks solved
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
97. Drawbacks solved
• Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
98. • Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
100. • Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
101. • Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
104. However, your target server doesn’t
always allow you to crawl with
unlimited rate
105. Insert 2000 jobs to the queue at the same time
Stock.pluck(:id).each do |stock_id|
SomeWorker.perform_async(stock_id)
end
If you want to craw data for your 2000 stocks
106. Assume a target server accepts request at
maximum rate equals to
1 request / second
108. Improvement 1
Schedule jobs with incremental delays
Stock.pluck(:id).each_with_index do |stock_id, index|
SomeWorker.perform_in(index, stock_id)
end
111. Workable, but…
1 2 3
job1 job2 job3
…
job2000
2000
If the target server is unreachable
job3~2000 will still execute at the same time
Time
(second)
112. • Limit your worker thread to perform specific job
with bounded rate
• Sidekiq Enterprise provides two types of rate
limiting API
116. You must fine tune parameters of your limiter
for each data source for better performance
117. By far, you already got better performance.
However, the throttling control of your target server
may not always be static.
Many websites are dynamically throttling controlled.
127. class SomeWorker
include Sidekiq::Worker
def perform
# try to crawl something
# ...
if throttled
queue_name = self.class.get_sidekiq_options['queue']
queue = Sidekiq::Queue.new(queue_name)
queue.pause!
ResumeJobQueueWorker.perform_in(30.seconds, queue_name)
end
end
end
128. class SomeWorker
include Sidekiq::Worker
def perform
# try to crawl something
# ...
if throttled
queue_name = self.class.get_sidekiq_options['queue']
queue = Sidekiq::Queue.new(queue_name)
queue.pause!
ResumeJobQueueWorker.perform_in(30.seconds, queue_name)
end
end
end
129. class SomeWorker
include Sidekiq::Worker
def perform
# try to crawl something
# ...
if throttled
queue_name = self.class.get_sidekiq_options['queue']
queue = Sidekiq::Queue.new(queue_name)
queue.pause!
ResumeJobQueueWorker.perform_in(30.seconds, queue_name)
end
end
end
class ResumeJobQueueWorker
include Sidekiq::Worker
sidekiq_options queue: :queue_control, unique: :until_executed
def perform(queue_name)
queue = Sidekiq::Queue.new(queue_name)
queue.unpause! if queue.paused?
end
end
130. The queue for ResumeJobQueueWorker
MUST NOT equal to the paused queue
We have a dedicated queue for
ResumeJobQueueWorker
143. • Inherent problem of Unix Cron:
• Unreliable scheduling
• Hard to prioritize job by the popularity
• High availability is not easy
• Not easy to deal with bandwidth throttling issue
144.
145. • With Sidekiq (Enterprise) and a proper design, the following problems
are solved
• Slow crawler
• Inefficient - unable to only retry the failed one
• Unpredictable server loading
• Scale out is not easy
• Inherent problem of Unix Cron
• Not easy to deal with bandwidth throttling issue