The document discusses using asyncio to perform asynchronous web scraping in Python. It begins with an overview of common Python web scraping tools like Requests, BeautifulSoup, Scrapy. It then covers key concepts of asyncio like event loops, coroutines, tasks and futures. It provides examples of building an asynchronous web crawler and downloader using these concepts. Finally, it discusses alternatives to asyncio for asynchronous programming like ThreadPoolExecutor and ProcessPoolExecutor.
Description
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
Abstract
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
This workshop will provide an overview of Scrapy, starting from the fundamentals and working through each new topic with hands-on examples.
Participants will come away with a good understanding of Scrapy, the principles behind its design, and how to apply the best practices encouraged by Scrapy to any scraping task.
Goals:
Set up a python environment.
Learn basic concepts of the Scrapy framework.
The slides from my talk at PHPUK2015.
The comapniuon code can be found at: https://github.com/LoveSoftware/application-logging-with-logstash
If you saw it, please rate it!
https://joind.in/talk/view/13369
Logging. Everyone does it. Many don't know why they do it. It is often considered a boring chore. A chore that is done by habit rather than for a purpose. But it doesn't have to be! Learn how to build a powerful, scalable open source logging environment with LogStash.
Description
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
Abstract
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
This workshop will provide an overview of Scrapy, starting from the fundamentals and working through each new topic with hands-on examples.
Participants will come away with a good understanding of Scrapy, the principles behind its design, and how to apply the best practices encouraged by Scrapy to any scraping task.
Goals:
Set up a python environment.
Learn basic concepts of the Scrapy framework.
The slides from my talk at PHPUK2015.
The comapniuon code can be found at: https://github.com/LoveSoftware/application-logging-with-logstash
If you saw it, please rate it!
https://joind.in/talk/view/13369
Logging. Everyone does it. Many don't know why they do it. It is often considered a boring chore. A chore that is done by habit rather than for a purpose. But it doesn't have to be! Learn how to build a powerful, scalable open source logging environment with LogStash.
Logstash for SEO: come monitorare i Log del Web Server in realtimeAndrea Cardinale
Durante questo intervento verrà illustrato come si possono installare software di analisi in tempo reale dei log del server (ELK pattern: ElasticSearch, Logstash, Kibana) in modo da ottenere tutte le informazioni su Googlebot e per scoprire i punti di debolezza e gli eventi relativi ai nostri siti che non potremmo altrimenti conoscere.
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
Presentation I gave during DevOps Days Warsaw 2014 about combining Elasticsearch, Logstash and Kibana together or use our Logsene solution instead of Elasticsearch.
A review of the webshells used by bad guys. How they are protected but also mistakes in their implementation. This talk was presented at the OWASP Belgium Chapter Meeting in May 2017.
Storing all of the reply content is usually not possible: it may be dynamic. A proxy allows directing only the content that needs to be handled locally to the test server, other content can go to the cloud. The final step, closing the loop between client and server, requires wapping LWP::UserAgent to direct locally handled requests to the test server.
Testing your infrastructure with litmusBram Vogelaar
We have been able to test our puppet modules using rspec-puppet and
serverspec for a while now and the quality of our code is improving because
of it. This talk will introduce the new kid on the block litmus. This talk will show you how
to use litmus to test puppet modules and how to convert your existing modules to make use of litmus.
Asynchronous PHP and Real-time MessagingSteve Rhoades
With every major browser supporting WebSockets, HTML 5 has changed how we handle client to server communications. The high demand for real time client and server messaging has developers flocking away from PHP to languages such as Node.js. In this session we'll explore the libraries and extensions that make Asynchronous PHP possible and analyze the performance differences with Node.js. In addition we'll identify use cases and walk through examples of how Asynchronous PHP can handle everything from WebSockets and Message Queues to MySQL.
Logstash for SEO: come monitorare i Log del Web Server in realtimeAndrea Cardinale
Durante questo intervento verrà illustrato come si possono installare software di analisi in tempo reale dei log del server (ELK pattern: ElasticSearch, Logstash, Kibana) in modo da ottenere tutte le informazioni su Googlebot e per scoprire i punti di debolezza e gli eventi relativi ai nostri siti che non potremmo altrimenti conoscere.
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
Presentation I gave during DevOps Days Warsaw 2014 about combining Elasticsearch, Logstash and Kibana together or use our Logsene solution instead of Elasticsearch.
A review of the webshells used by bad guys. How they are protected but also mistakes in their implementation. This talk was presented at the OWASP Belgium Chapter Meeting in May 2017.
Storing all of the reply content is usually not possible: it may be dynamic. A proxy allows directing only the content that needs to be handled locally to the test server, other content can go to the cloud. The final step, closing the loop between client and server, requires wapping LWP::UserAgent to direct locally handled requests to the test server.
Testing your infrastructure with litmusBram Vogelaar
We have been able to test our puppet modules using rspec-puppet and
serverspec for a while now and the quality of our code is improving because
of it. This talk will introduce the new kid on the block litmus. This talk will show you how
to use litmus to test puppet modules and how to convert your existing modules to make use of litmus.
Asynchronous PHP and Real-time MessagingSteve Rhoades
With every major browser supporting WebSockets, HTML 5 has changed how we handle client to server communications. The high demand for real time client and server messaging has developers flocking away from PHP to languages such as Node.js. In this session we'll explore the libraries and extensions that make Asynchronous PHP possible and analyze the performance differences with Node.js. In addition we'll identify use cases and walk through examples of how Asynchronous PHP can handle everything from WebSockets and Message Queues to MySQL.
Talk given at DomCode meetup in Utrecht (August 2014) on different frameworks to do asynchronous I/O y Python, with a strong focus on asyncio (PEP-3156).
~10min dive to Python Asynchronous IO
HTML version (recommended): https://dl.dropboxusercontent.com/u/1565687/speak/Python3%20AsyncIO%20Horizon/index.html
Practical continuous quality gates for development processAndrii Soldatenko
There are a lot of books and publications about the continuous integration in the world. But in my experience it’s difficult to find information about how to open quality gates between automated tests and to continuous integration practice to in your current project. After reading several articles and even a couple of books you will understand how to work with it. But what next? I will share with you practical tips and tricks on how to lift iron curtain to your automated tests before a continuous quality practice today. It is for this reason why I am pleased to share with you my acquired experience in my presentation.
https://www.youtube.com/watch?v=Dv1bpmYV0vU
Channels is the most exciting thing to happen to Django since, well, Django! It is both an elegant and backwards compatible extension of the core Django request response model to allow direct support of WebSockets and lightweight async tasks. This talk will cover the current state of Channels, work through an asynchronous task example, touch on deployment and point towards other resources.
What is the best full text search engine for Python?Andrii Soldatenko
Nowadays we can see lot’s of benchmarks and performance tests of different web frameworks and Python tools. Regarding to search engines, it’s difficult to find useful information especially benchmarks or comparing between different search engines. It’s difficult to manage what search engine you should select for instance, ElasticSearch, Postgres Full Text Search or may be Sphinx or Whoosh. You face a difficult choice, that’s why I am pleased to share with you my acquired experience and benchmarks and focus on how to compare full text search engines for Python.
Presentation: Everything you wanted to know about writing async, high-concurr...Baruch Sadogursky
Facing tens of millions of clients continuously downloading binaries from our repositories, we decided to offer an OSS client that natively supports these downloads. In this talk we will share the main challenges in developing a highly-concurrent, resumable, async download library on top of Apache HTTP client. We will cover other libraries we tested and why we decided to reinvent the wheel. We will see important pitfalls we came across when working with HTTP and how using the right combination of techniques can improve performance by a magnitude. We will also see why your initial assumptions may completely change when faced with other players on the network. Consider yourself forewarned: lots of HTTP internals, NIO and concurrency ahead!
As presented at JFokus 2015 (http://www.jfokus.se/jfokus/talks.jsp#Everythingyouwantedt)
Everything you wanted to know about writing async, concurrent http apps in java Baruch Sadogursky
As presented at CodeMotion Tel Aviv:
Facing tens of millions of clients continuously downloading binaries from its repositories, JFrog decided to offer an OSS client that natively supports these downloads. This session shares the main challenges of developing a highly concurrent, resumable, async download library on top of an Apache HTTP client. It also covers other libraries JFrog tested and why it decided to reinvent the wheel. Consider yourself forewarned: lots of HTTP internals, NIO, and concurrency ahead!
Facebook is a company that operates at massive scale. In this talk we’ll talk about how we use Python at Facebook.
Be it building back-end services, fast prototyping, automation, scaling operations, or simply gluing together various pieces of our infrastructure, Python is at the heart of it and allows our engineers to quickly deliver working solutions.
We’ll talk about our review process, unit testing, deployment workflow and various open-source framework we use.
With the advent of asyncio, the python community started to build new performant web frameworks and servers for asynchronous backends. In this way, ASGI specification appeared as a successor to WSGI.
In this talk, we will take a closer look into the details of this new specification and consider its implementation in the uvicorn server and Starlette framework.
The asyncio project was officially launched with the release of Python 3.4 in March 2014. The project was public before that under the name “tulip”. asyncio is just a core network library, it requires third party library to be usable for common protocols. One year later, asyncio has a strong community writing libraries on top of it.
The most advanced library is aiohttp which includes a complete HTTP client but also a HTTP server. There are also libraries to access asynchronously the file system, resolve names with DNS, have variables local to tasks, read-write locks, etc. There are clients for AMQP, Asterisk, ElasticSearch, IRC, XMPP (Jabber), etc. (and even an IRC server!). There are asynchronous drivers for all common databases, and even for some ORMs. As expected, there are tons of new web frameworks based on asyncio. It’s also possible to plug asyncio into Gtk, Qt, gevent, eventlet, gunicorn, tornado, etc.
I will also discuss use cases of asyncio in production and benchmarks. Spoiler: asyncio is not slow.
The asyncio library also evolved to become more usable: it has a better documentation, is easier to debug and has a few new functions. There is also a port to Python 2: trollius.
Most of us are familiar with HTTP, but when it actually comes to creating cacheable web content, there is still a lot to be learned. In this presentation I will show you how to leverage specific mechanism to achieve a good hit rate without losing touch with some of the challenges of real-life web projects. Keywords: cache control, cache variations, conditional requests, stateful content, HTTP fragments, invalidation. The goals is to empower developers to control the behavior of reverse caching proxies like Varnish, Content Delivery Networks, or even browser cache, using the power of HTTP.
More information about this HTTP caching talk can be found on https://feryn.eu/speaking/leverage-http-to-deliver-cacheable-websites-codemotion-rome-2018/
This talk was given at the Dutch PHP Conference 2011 and details the use of Comet (aka reverse ajax or ajax push) technologies and the importance of websockets and server-sent events. More information is available at http://joind.in/3237.
Lucio Grenzi - Building serverless applications on the Apache OpenWhisk platf...Codemotion
Apache OpenWhisk provides a powerful and flexible environment for deploying cloud-native applications driven by data, message, and API call events. We will show how and why we integrated Apache OpenWhisk and GitHub to make deployment as easy and transparent as `git push`. We will also discuss the benefit of using an open source cloud platform and explain how serverless allows developers to focus on writing value-adding code.
In this presentation, I show the audience how to implement HTTP caching best practices in a non-intrusive way in PHP Symfony 4 code base.
This presentation focuses on topics like:
- Caching using cache-control headers
- Cache variations using the Vary header
- Conditional requests using headers like ETag & If-None-Match
- ESI discovery & parsing using headers like Surrogate-Capability & Surrogate-Control
- Caching stateful content using JSON Web Token Validation in Varnish
More information about this presentation is available at https://feryn.eu/speaking/developing-cacheable-php-applications-php-limburg-be/
Herramientas de benchmarks para evaluar el rendimiento en máquinas y aplicaci...Jose Manuel Ortega Candel
Los benchmarks son programas que permiten evaluar el rendimiento de un sistema, componente o proceso en comparación con otros sistemas similares. Son herramientas esenciales para medir y comparar el rendimiento de hardware, software y sistemas en diferentes áreas. El objetivo es dar a conocer las principales herramientas de benchmark que disponemos hoy en día para medir el rendimiento.
Entre los puntos a tratar podemos destacar:
-Introducción a Benchmarks: Definición y propósito de los benchmarks en la medición del rendimiento
-Tipos de Benchmarks: Benchmarks sintéticos vs. Benchmarks del mundo real.Benchmarks específicos para CPU, memoria, almacenamiento, y gráficos
-Selección de Benchmarks: Consideraciones al elegir benchmarks según el tipo de aplicación y los objetivos de evaluación
En el mundo actual, las APIs juegan un papel importante en la creación de aplicaciones y servicios robustos y flexibles. Sin embargo, con la expansión de las APIs, también surge la necesidad de abordar los desafíos de seguridad asociados.
En esta charla, exploraremos en detalle el OWASP Top 10 de Seguridad en APIs, una lista de las principales vulnerabilidades que los desarrolladores y equipos de seguridad deben tener en cuenta al diseñar, desarrollar y asegurar sus APIs. Por último, comentaremos las mejores prácticas para mitigar los riesgos y garantizar la seguridad de tus APIs. Entre los puntos a tratar podemos destacar:
1.Introducir el concepto de seguridad en las APIs
2.OWASP Top 10 y su importancia para la seguridad en APIs
3.Actualización del OWASP Top 10 security en 2023
4.Herramientas para evaluar y mejorar la seguridad de tus APIs.
5.Estrategias y mejores prácticas para garantizar la seguridad de tus APIs.
La seguridad en aplicaciones web es un aspecto fundamental para garantizar la protección de los datos y la confidencialidad de los usuarios. Si nuestro objetivo es aprender como Django gestiona la seguridad, PyGoat es una aplicación desarrollada con Django vulnerable de forma intencionada que puede ser utilizada para aprender a asegurar nuestras aplicaciones Django.
En esta charla, analizamos como Django gestiona la seguridad utilizando la aplicación vulnerable Pygoat, identificando los problemas de seguridad subyacentes. Aprenderemos sobre vulnerabilidades de seguridad comunes como las que aparecen en el OWASP Top 10 en aplicaciones Django y cómo solucionarlas para que podamos mantener nuestras aplicaciones a salvo de atacantes.
Entre los puntos a tratar podemos destacar:
Introducción a la seguridad en aplicaciones Django
Pygoat como ejemplo de aplicación vulnerable
Vulnerabilidades OWASP top 10 y mitigación
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...Jose Manuel Ortega Candel
En la actualidad, la tecnología blockchain y los smart contracts están revolucionando la forma en que interactuamos con la información y realizamos transacciones. Sin embargo, esta innovación no está exenta de desafíos en cuanto a la ciberseguridad se refiere. En esta charla, exploraremos los desafíos desde el punto de vista de la ciberseguridad en blockchain y smart contracts, así como las soluciones y enfoques para mitigar los riesgos asociados. A medida que continuamos adoptando estas tecnologías disruptivas, es fundamental comprender y abordar adecuadamente los aspectos de seguridad para minimizar los posibles riesgos. Entre los puntos a tratar podemos destacar:
1. Fundamentos de Blockchain y Smart Contracts 2. Desafíos de Seguridad en Blockchain 3. Seguridad en Smart Contracts 4. Auditorías y Pruebas de Seguridad en smart contracts
In the latest versions of K8s there has been an evolution regarding the definition of security strategies at the level of access policies to the cluster by users and developers. The security contexts (securityContext) allow you to define the configurations at the level of access control and privileges for a pod or container in a simple way using keywords in the configuration files.
To facilitate the implementation of these security strategies throughout the cluster, new strategies have emerged such as the Pod Security Policy (PSP) where the cluster administrator is in charge of defining these policies at the cluster level with the aim that developers can follow these policies.
Other interesting projects include Open Policy Agent (OPA) as the main cloud-native authorization policy agent for creating policies and managing user permissions for access to applications.
The objective of this talk is to present the evolution that has occurred in security strategies and how we could use them together, as well as analyze their behavior in accessing resources. Among the points to be discussed we can highlight:
-Introduction to security strategies in K8s environments
-Pod Security Admission(PSA) vs Open Policy Agent (OPA)
-Combination of different security strategies together
-Access to resources in privileged and non-privileged mode
In the latest versions of K8s there has been an evolution regarding the definition of security strategies at the level of access policies to the cluster by users and developers. The security contexts (securityContext) allow you to define the configurations at the level of access control and privileges for a pod or container in a simple way using keywords in the configuration files.
To facilitate the implementation of these security strategies throughout the cluster, new strategies have emerged such as the Pod Security Policy (PSP) where the cluster administrator is in charge of defining these policies at the cluster level with the aim that developers can follow these policies.
Other interesting projects include Open Policy Agent (OPA) as the main cloud-native authorization policy agent for creating policies and managing user permissions for access to applications.
The objective of this talk is to present the evolution that has occurred in security strategies and how we could use them together, as well as analyze their behavior in accessing resources. Among the points to be discussed we can highlight:
*Introduction to security strategies in K8s environments
*Pod Security Admission(PSA) vs Open Policy Agent (OPA)
*Combination of different security strategies together
*Access to resources in privileged and non-privileged mode
No production system is complete without a way to monitor it. In software, we define observability as the ability to understand how our system is performing. This talk dives into capabilities and tools that are recommended for implementing observability when running K8s in production as the main platform today for deploying and maintaining containers with cloud-native solutions.
We start by introducing the concept of observability in the context of distributed systems such as K8s and the difference with monitoring. We continue by reviewing the observability stack in K8s and the main functionalities. Finally, we will review the tools K8s provides for monitoring and logging, and get metrics from applications and infrastructure.
Between the points to be discussed we can highlight:
-Introducing the concept of observability
-Observability stack in K8s
-Tools and apps for implementing Kubernetes observability
-Integrating Prometheus with OpenMetrics
La computación distribuída es un nuevo modelo de computación que surgió con el objetivo de resolver problemas de computación masiva donde diferentes máquinas trabajan en paralelo formando un clúster de computación.
En los últimos años han surgido diferentes frameworks como Apache Hadoop, Apache Spark y Apache Flink que permiten resolver este tipo de problemas donde tenemos datos masivos desde diferentes fuentes de datos.
Dentro del ecosistema de Python podemos destacar las librerías de Pyspark y Dask de código abierto que permiten la ejecución de tareas de forma paralela y distribuida en Python.
Entre los puntos a tratar podemos destacar:
Introducción a la computación distribuida
Comparando tecnologías de computación distribuida
Frameworks y módulos en Python para computación distribuida
Casos de uso en proyectos Big Data
En los últimos años, las arquitecturas cloud han evolucionado a un modelo serverless que trae como principales ventajas la posibilidad de ejecutar código sin aprovisionar ni administrar servidores. Este tipo de arquitecturas permite ejecutar el código en una infraestructura con alta disponibilidad y escalado automático, así como capacidades de monitorización de forma automática. Sin embargo, estos tipos de arquitecturas introducen un conjunto completamente nuevo de implicaciones de seguridad que deben tenerse en cuenta al crear sus aplicaciones.
El OWASP Serverless Top 10 es una excelente referencia para conocer los posibles riesgos de seguridad y las consecuencias de implementar una arquitectura serverless, así como también cómo mitigarlos.
En esta charla se analizará el estado actual de la seguridad en arquitecturas serverless, los principales riesgos y cómo podríamos mitigarlos de una forma sencilla. Entre los puntos a tratar podemos destacar:
-Introducción a las arquitecturas serverless
-Seguridad en arquitecturas serverless y OWASP Serverless Top 10
-Pentesting sobre aplicaciones serverless
-Mejoras prácticas de seguridad al trabajar en entornos cloud
La adopción de arquitecturas basadas en microservicios ha crecido de manera exponencial en los últimos años. Cuando se trata de obtener la máxima seguridad utilizamos lo que se denomina arquitecturas de “confianza cero” (zero trust architecture). Las arquitecturas de este tipo establecen mecanismos de autenticación y autorización entre nuestros propios microservicios, aumentando de esta manera la seguridad en entornos altamente regulados.
El objetivo de esta charla es dar a conocer los principios básicos para construir aplicaciones utilizando arquitecturas zero trust y algunas herramientas para realizar auditorías de seguridad en entornos cloud. Entre los puntos a tratar podemos destacar:
Introducción a DevSecOps y modelado de amenazas
Modelo de confianza cero(zero trust) en la nube
Mejoras prácticas a nivel de permisos y estrategias de seguridad al trabajar en entornos cloud
Herramientas de análisis orientadas al pentesting en entornos cloud
Python has become the most widely used language for machine learning and data science projects due to its simplicity and versatility.
Furthermore, developers get to put all their effort into solving an Machine Learning or data science problem instead of focusing on the technical aspects of the language.
For this purpose, Python provides access to great libraries and frameworks for AI and machine learning (ML), flexibility and platform independence
In this talk I will try to get a selection of libraries and frameworks that can help us introduce in the Machine Learning world and answer the question that all people is doing, What makes Python the best programming language for machine learning?
In this talk I will show how to save secret keys in Docker containers and K8s in production and best practices for saving and securing distribution of secrets. With Docker and k8s secrets we can manage information related to keys that are needed at runtime but cannot be exposed in the Docker image or source code repository. These could be the main talking points:
1.Challenges of security and secret keys in containers
2.Best practices for saving and securing distribution of secrets in Docker Containers
3.Managing secrets in Kubernetes using volumes and sealed-secrets
4.Other tools for distributing secrets in containers like Hashicorp Vault and KeyWhiz
One of the best practices from a security point of view is to introduce the management of the certificates that we are going to use to support protocols such as SSL / TLS. In this talk we will explain cert-manager and his implementation in K8s as a native Kubernetes certificate management controller that allows us to manage connection certificates and secure communications through SSL/TLS protocols. Later I will explain the main functionalities and advantages that cert-manager provides, for example it allows us to validate that the certificates we are using in different environments are correct. Finally, some use cases are studied in which to use cert-manager and the integration with other services such as Let's Encrypt or HashiCorp Vault.
Python se ha convertido en el lenguaje más usado para desarrollar herramientas dentro del ámbito de la seguridad. Esta charla se centrará en las diferentes formas en que un analista puede aprovechar el lenguaje de programación Python tanto desde el punto de vista defensivo como ofensivo.
Desde el punto de vista defensivo Python es una de las mejores opciones como herramienta de pentesting por la gran cantidad de módulos que nos pueden ayudar a desarrollar nuestras propias herramientas con el objetivo de realizar un análisis de nuestro objetivo.
Desde el punto de vista ofensivo podemos utilizar Python para recolección de información de nuestro objetivo de forma pasiva y activa. El objetivo final es obtener el máximo conocimiento posible en el contexto que estamos auditando. Entre los principales puntos a tratar podemos destacar:
1.Introducción a Python para proyectos de ciberseguridad(5 min)
2.Herramientas de pentesting(10 min)
3.Herramientas Python desde el punto de vista defensivo(10 min)
4.Herramientas Python desde el punto de vista ofensivo(10 min)
Python se ha convertido en el lenguaje más usado para desarrollar herramientas dentro del ámbito de la seguridad. Esta charla se centrará en las diferentes formas en que un analista puede aprovechar el lenguaje de programación Python tanto desde el punto de vista defensivo como ofensivo.
Desde el punto de vista defensivo Python es una de las mejores opciones como herramienta de pentesting por la gran cantidad de módulos que nos pueden ayudar a desarrollar nuestras propias herramientas con el objetivo de realizar un análisis de nuestro objetivo.
Desde el punto de vista ofensivo podemos utilizar Python para recolección de información de nuestro objetivo de forma pasiva y activa. El objetivo final es obtener el máximo conocimiento posible en el contexto que estamos auditando. Entre los principales puntos a tratar podemos destacar:
1.Introducción a Python para proyectos de ciberseguridad(5 min)
2.Herramientas de pentesting(10 min)
3.Herramientas Python desde el punto de vista defensivo(10 min)
4.Herramientas Python desde el punto de vista ofensivo(10 min)
Shodan es una de las plataformas de hacking más utilizadas en todo el mundo que nos brinda un completo motor de búsqueda avanzado desde el que podemos encontrar cualquier dispositivo conectado a la red junto con sus servicios activos, puertos abiertos y posibles vulnerabilidades.
En esta charla mostraremos las principales herramientas que podemos utilizar para maximizar nuestras búsquedas en Shodan, así como desarrollar nuestros propios scripts con python para automatizar las búsquedas.
Entre los puntos a tratar podemos destacar:
-Filtros y búsquedas personalizadas en Shodan
-Detectando vulnerabilidades con Shodan
-Buscar bases de datos abiertas en Shodan
-Shodan desde lineas de comandos con ShodanCLI
La charla trataría sobre cómo usar el stack Elasticsearch, Logstash y Kibana (ELK) para respuestas ante incidentes, monitorización de logs y otras tareas relacionadas con los equipos blue team. Por ejemplo, podríamos analizar los registros basados en autenticación y eventos del sistema operativo.
Entre los puntos a tratar podemos destacar:
-Introducción al estándar ELK y cómo nos puede ayudar para crear nuestro laboratorio de análisis.
-Comentar las diferentes fuentes de datos que podríamos usar (eventos del sistema operativo, capturas de red).
-Indexación y búsqueda de datos en ElasticSearch.
-Recopilación y manipulación de datos con LogStash.
-Creación de dashboards con Kibana.
-Ejemplo de aplicación para alertar sobre eventos basados en la autenticación en el sistema operativo.
The world is advancing towards accelerated deployments using DevOps and cloud native technologies. In architectures based on microservices, container monitoring and management become even more important as we need to scale our application.
In this talk, I will show how to monitor and manage docker containers to manage the status of your applications. We will review how to monitor for security events using open source solutions to build an actionable monitoring system for Docker and Kubernetes.
Through a web interface, tools such as cadvisor, portainer and rancher give us a global overview of the containers you are running as well as facilitate their management.
These could be the main points to discuss:
Challenges in containers and architectures distributed from the point of view of monitoring and administration
Most important metrics that we can use to measure container performance.
Tools for monitoring and management of containers such as cadvisor, sysdig and portainer
Rancher as a platform for the administration of Kubernetes
In this talk I will try explain the memory internals of Python and discover how it handles memory management and object creation.
The idea is explain how objects are created and deleted in Python and how garbage collector(gc) functions to automatically release memory when the object taking the space is no longer in use.
I will review the main mechanims for memory allocation and how the garbage collector works in conjunction with the memory manager for reference counting of the python objects.
Finally, I will comment the best practices for memory managment such as writing efficient code in python scripts.
In this talk I will speak about main tips for integrating Security into DevOps. I will share my knowledge and experience and help people learn to focus more on DevOps Security. In addition to the so-called best practices, the development of efficient, readable, scalable and secure code, requires the right tools for security development.
These could be the main talking points:
-How to integrate security into iteration and pipeline application development with containers.
-How to secure development environments.
-DevOps security best practices
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
5. Agenda
▶ Webscraping python tools
▶ Requests vs aiohttp
▶ Introduction to asyncio
▶ Async client/server
▶ Building a webcrawler with asyncio
▶ Alternatives to asyncio
10. Web scraping with Python
1. Download webpage with HTTP
module(requests,urllib,aiohttp)
2. Parse the page with
BeautifulSoup/lxml
3. Select elements with Regular
expressions,XPath or css selectors
4. Store results in a database,csv,json
13. BeautifulSoup functions
▪ find_all(‘a’)→Returns all links
▪ find(‘title’)→Returns the first element <title>
▪ get(‘href’)→Returns the attribute href value
▪ (element).text → Returns the text inside an
element
for link in soup.find_all('a'):
print(link.get('href'))
22. Spiders /crawlers
▶ A Web crawler is an Internet bot that
systematically browses the World Wide Web,
typically for the purpose of Web indexing. A
Web crawler may also be called a Web
spider.
https://en.wikipedia.org/wiki/Web_crawler
25. Scrapy
▶ Uses a mechanism based on XPath
expressions called Xpath
Selectors.
▶ Uses Parser LXML to find elements
▶ Twisted for asynchronous
operations
26. Scrapy advantages
▶ Faster than mechanize because it
uses twisted for asynchronous operations.
▶ Scrapy has better support for html
parsing.
▶ Scrapy has better support for unicode
characters, redirections, gzipped
responses, encodings.
▶ You can export the extracted data directly
to JSON,XML and CSV.
29. The concurrency problem
▶ Different approaches:
▶ Multiple processes
▶ Threads
▶ Separate distributed machines
▶ Asynchronous programming(event
loop)
30. Requests problems
▶ Requests operations are blocking the
main thread
▶ It pauses until operation completed
▶ We need one thread for each request if
we want non-blocking operations
38. Requests vs aiohttp
#!/usr/local/bin/python3.5
import asyncio
from aiohttp import ClientSession
async def hello():
async with ClientSession() as session:
async with session.get("http://httpbin.org/headers") as response:
response = await response.read()
print(response)
loop = asyncio.get_event_loop()
loop.run_until_complete(hello())
import requests
def hello()
return requests.get("http://httpbin.org/get")
print(hello())
39. Event Loop
▶ An event loop allow us to write asynchronous
code using callbacks or coroutines.
▶ Event loop function like task switcher,just the way
operating systems switch between active tasks on the
CPU.
▶ The idea is that we have an event loop running until all
tasks scheduled are completed.
▶ Features and tasks are created through the event loop.
40. Event Loop
▶ An event loop is used to orchestrate the
execution of the coroutines.
▶ asyncio.get_event_loop()
▶ asyncio.run_until_complete(coroutines,futures)
▶ asyncio.run_forever()
▶ asyncio.stop()
42. Coroutines
▶ Coroutines are functions that allow for
multitasking without requiring multiple
threads or processes.
▶ Coroutines are like functions, but they can be
suspended or resumed at certain points in the
code.
▶ Coroutines allow write asynchronous code that
combines the efficiency of callbacks with the
classic good looks of multithreaded.
43. Coroutines 3.4 vs 3.5
import asyncio
@asyncio.coroutine
def fetch(self, url):
response = yield from self.session.get(url)
body = yield from response.read()
import asyncio
async def fetch(self, url):
response = await self.session.get(url)
body = await response.read()
45. Requests in event loop
async def getpage_with_requests(url):
return await loop.run_in_executor(None,requests.get,url)
#methods equivalents
async def getpage_with_aiohttp(url):
with aitohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.read()
46. Tasks
▶ The asyncio.Task class is a subclass of
asyncio.Future to encapsulate and manage
coroutines.
▶ Allow independently running tasks to run
concurrently with other tasks on the same event
loop.
▶ When a coroutine is wrapped in a task, it
connects the task to the event loop.
51. Futures
▶ To manage an object Future in Asyncio, we
must declare the following:
▶ import asyncio
▶ future = asyncio.Future()
▶ https://docs.python.org/3/library/asyncio
-task.html#future
▶ https://docs.python.org/3/library/concurr
ent.futures.html
52. Futures
▶ The asyncio.Future class is essentially a
promise of a result.
▶ A Future will returns the results when they
are available, and once it receives results, it
will pass them along to all the registered
callbacks.
▶ Each future is a task to be executed in the
event loop
54. Semaphores
▶ Adding synchronization
▶ Limiting number of concurrent requests.
▶ The argument indicates the number of
simultaneous requests we want to allow.
▶ sem = asyncio.Semaphore(5)
with (await sem):
page = await get(url, compress=True)
55. Async Client /server
▶ asyncio.start_server
▶ server =
asyncio.start_server(handle_connection,host=HOST,port=PORT)
59. Async Web crawler
▶ Send asynchronous requests to all the links
on a web page and add the responses to a
queue to be processed as we go.
▶ Coroutines allow running independent tasks and
processing their results in 3 ways:
▶ Using asyncio.as_completed →by processing
the results as they come.
▶ Using asyncio.gather→ only once they have all
finished loading.
▶ Using asyncio.ensure_future
60. Async Web crawler
import asyncio
import random
@asyncio.coroutine
def get_url(url):
wait_time = random.randint(1, 4)
yield from asyncio.sleep(wait_time)
print('Done: URL {} took {}s to get!'.format(url, wait_time))
return url, wait_time
@asyncio.coroutine
def process_results_as_come_in():
coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']]
for coroutine in asyncio.as_completed(coroutines):
url, wait_time = yield from coroutine
print('Coroutine for {} is done'.format(url))
def main():
loop = asyncio.get_event_loop()
print(“Process results as they come in:")
loop.run_until_complete(process_results_as_come_in())
if __name__ == '__main__':
main()
asyncio.as_completed
62. Async Web crawler
import asyncio
import random
@asyncio.coroutine
def get_url(url):
wait_time = random.randint(1, 4)
yield from asyncio.sleep(wait_time)
print('Done: URL {} took {}s to get!'.format(url, wait_time))
return url, wait_time
@asyncio.coroutine
def process_once_everything_ready():
coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']]
results = yield from asyncio.gather(*coroutines)
print(results)
def main():
loop = asyncio.get_event_loop()
print(“Process results once they are all ready:")
loop.run_until_complete(process_once_everything_ready())
if __name__ == '__main__':
main()
asyncio.gather
63. asyncio.gather
From Python documentation, this is what asyncio.gather does:
asyncio.gather(*coros_or_futures, loop=None,
return_exceptions=False)
Return a future aggregating results from the given coroutine
objects or futures.
All futures must share the same event loop. If all the tasks
are done successfully, the returned future’s result is the
list of results (in the order of the original sequence, not
necessarily the order of results arrival). If
return_exceptions is True, exceptions in the tasks are
treated the same as successful results, and gathered in the
result list; otherwise, the first raised exception will be
immediately propagated to the returned future.
64. Async Web crawler
import asyncio
import random
@asyncio.coroutine
def get_url(url):
wait_time = random.randint(1, 4)
yield from asyncio.sleep(wait_time)
print('Done: URL {} took {}s to get!'.format(url, wait_time))
return url, wait_time
@asyncio.coroutine
def process_ensure_future():
tasks= [asyncio.ensure_future(get_url(url) )for url in ['URL1',
'URL2', 'URL3']]
results = yield from asyncio.wait(tasks)
print(results)
def main():
loop = asyncio.get_event_loop()
print(“Process ensure future:")
loop.run_until_complete(process_ensure_future())
if __name__ == '__main__':
main()
asyncio.ensure_future
73. Parallel python
▶ SMP(symmetric multiprocessing)
architecture with multiple cores in the same
machine
▶ Distribute tasks in multiple machines
▶ Cluster