We will be using Beautiful Soup to Webscrape the IMDB website and create a function that will allow you to create a dictionary object on specific metadata of the IMDB profile for any IMDB ID you pass through as an argument.
What etagwerker worked on for the open source days that we had in Ombu Labs on December 2015.
EmailSpec, DatabaseCleaner, Bitpagos, Oca Epak library news and an update on the latest articles in the Ombu Labs blog.
Hack information of any website using webkillerSoniakohli6
For hacking any website or web application, information gathering phase about the target is must. Hackers use different tools for collecting unique information about the target. Web killer is another information-gathering tool with nice options to scan the target. In this tool, we have all the option to perform information gathering and this tool is completely built on the python programming language.
Taking a piece of code from your laptop to a discoverable open source package is challenging. This deck shows what's involved in getting v0.1.0 of a Pub/Sub package up and running.
Bonus points if you can guess all 5 movies the slide backgrounds were taken from.
What etagwerker worked on for the open source days that we had in Ombu Labs on December 2015.
EmailSpec, DatabaseCleaner, Bitpagos, Oca Epak library news and an update on the latest articles in the Ombu Labs blog.
Hack information of any website using webkillerSoniakohli6
For hacking any website or web application, information gathering phase about the target is must. Hackers use different tools for collecting unique information about the target. Web killer is another information-gathering tool with nice options to scan the target. In this tool, we have all the option to perform information gathering and this tool is completely built on the python programming language.
Taking a piece of code from your laptop to a discoverable open source package is challenging. This deck shows what's involved in getting v0.1.0 of a Pub/Sub package up and running.
Bonus points if you can guess all 5 movies the slide backgrounds were taken from.
Among the #1 complaints of Python in a data analysis context is the presence of the Global Interpreter Lock, or GIL. At its core, it means that a given Python program cannot easily utilize more than one core of a multi-core machine to do computation in parallel. However, fear not! To beat the GIL, you just need to be willing to adopt a little magic -- and this talk will tell you how.
This slide explains how to call Python from Windows Task Scheduler. What is special in this contents is that the log will be output into a log file, so if something happens, you will be able to track what happened.
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /
Start guide to web scraping with Scrapy, one of best python modules to do web scraping, with Scrapy everything is more easy.
This presentation covers the key concepts of scrapy and the process of criation of spiders.
It's the first draft version and will be other versions, until the last version, if you see something that you want to be improved, give feedback and I will take that in consideration.
I also talk about some alternatives to scrapy like lxml, newspapers and others.
In the final i give you acess to the code used on this presentation, so you cant test easy and fast the concepts talked on this presentation.
I hope you like it :D
A quick start guide to start working with Robot Framework.
End to End flow form installation to test case automation to verifying result, using both GUI and Command Prompt options.
Among the #1 complaints of Python in a data analysis context is the presence of the Global Interpreter Lock, or GIL. At its core, it means that a given Python program cannot easily utilize more than one core of a multi-core machine to do computation in parallel. However, fear not! To beat the GIL, you just need to be willing to adopt a little magic -- and this talk will tell you how.
This slide explains how to call Python from Windows Task Scheduler. What is special in this contents is that the log will be output into a log file, so if something happens, you will be able to track what happened.
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /
Start guide to web scraping with Scrapy, one of best python modules to do web scraping, with Scrapy everything is more easy.
This presentation covers the key concepts of scrapy and the process of criation of spiders.
It's the first draft version and will be other versions, until the last version, if you see something that you want to be improved, give feedback and I will take that in consideration.
I also talk about some alternatives to scrapy like lxml, newspapers and others.
In the final i give you acess to the code used on this presentation, so you cant test easy and fast the concepts talked on this presentation.
I hope you like it :D
A quick start guide to start working with Robot Framework.
End to End flow form installation to test case automation to verifying result, using both GUI and Command Prompt options.
Web scraping with BeautifulSoup, LXML, RegEx and ScrapyLITTINRAJAN
Web Scraping Introduction. It will cover cover all the most available libraries and the way they can be handled to scrape our required data. Created by Littin Rajan
The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.
The 5 most common reasons for a slow WordPress site and how to fix them – ext...Otto Kekäläinen
Presentation given in WP Meetup in October 2019.
Includes fresh new tips from summer/fall 2019!
A Must read for all WordPress site owners and developers.
https://youtu.be/zMWYC2ai91w
https://gist.github.com/daniyalb/42106c74afe20f2c442b796dba727d28
Web Scraping workshop 2023-02-22
Daniyal Bokhari on February 22, 2023
CSC108 level of knowledge required
Wed 22nd, 5pm
What you’ll learn:
How to use python requests
* Use BS4 to parse through HTML and find info
* Use python data structures to manage info
* Make a price finder?
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
At this workshop, you will build your own messaging insights system - data ingestion from a live data source (Reddit), queueing, deploying a machine learning model, and serving messages with insights to your mobile phone!
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
In the same way that we need to make assertions about how code functions, we need to make assertions about data, and unit testing is a promising framework. In this talk, we'll explore what is unique about unit testing data, and see how Two Sigma's open source library Marbles addresses these unique challenges in several real-world scenarios.
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
TileDB is an open-source storage manager for multi-dimensional sparse and dense array data. It has a novel architecture that addresses some of the pain points in storing array data on “big-data” and “cloud” storage architectures. This talk will highlight TileDB’s design and its ability to integrate with analysis environments relevant to the PyData community such as Python, R, Julia, etc.
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
In this talk I will discuss exponential family embeddings, which are methods that extend the idea behind word embeddings to other data types. I will describe how we used dynamic embeddings to understand how data science skill-sets have transformed over the last 3 years using our large corpus of jobs. The key takeaway is that these models can enrich analysis of specialized datasets.
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
How many newspapers should be distributed to each store for sale every day? The data science group at The New York Times addresses this optimization problem using custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. I'll describe our modeling and data engineering approaches, written in Python and hosted on Google Cloud Platform.
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
However, the the graph theory jargon can make graph analytics seem more intimidating for self-study than is necessary. In this talk, the audience will be exposed to some of the basic concepts of graph theory (no prerequisite math knowledge needed!) and a few of the Python tools available for graph analysis.
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
To productionize data science work (and have it taken seriously by software engineers, CTOs, clients, or the open source community), you need to write tests! Except… how can you test code that performs nondeterministic tasks like natural language parsing and modeling? This talk presents an approach to testing probabilistic functions in code, illustrated with concrete examples written for Pytest.
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
Those of us who use TensorFlow often focus on building the model that's most predictive, not the one that's most deployable. So how to put that hard work to work? In this talk, we'll walk through a strategy for taking your machine learning models from Jupyter Notebook into production and beyond.
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
In September 2017, dockless bikeshare joined the transportation options in the District of Columbia. In March 2018, scooter share followed. During the pilot of these technologies, Python has helped District Department of Transportation answer some critical questions. This talk will discuss how Python was used to answer research questions and how it supported the evaluation of this demonstration.
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
There are many stories of developers creating databases that don't operate at scale. The application is good, but the database won't work the realistic volumes of data. It's like a horror movie where they never looked behind the door, ran into the dark forest and night, and discovered the database was the monster killing their application. How can we leverage Python to avoid scaling problems?
Machine learning often requires us to think spatially and make choices about what it means for two instances to be close or far apart. So which is best - Euclidean? Manhattan? Cosine? It all depends! In this talk, we'll explore open source tools and visual diagnostic strategies for picking good distance metrics when doing machine learning on text.
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
The recent advances in machine learning and artificial intelligence are amazing! Yet, in order to have real value within a company, data scientists must be able to get their models off of their laptops and deployed within a company’s data pipelines and infrastructure. In this session, I'll demonstrate how one-off experiments can be transformed into scalable ML pipelines with minimal effort.
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
This talk describes an experimental approach to time series modeling using 1D convolution filter layers in a neural network architecture. This approach was developed at System1 for forecasting marketplace value of online advertising categories.
Extending Pandas with Custom Types - Will AydPyData
Pandas v.0.23 brought to life a new extension interface through which you can extend NumPy's type system. This talk will explain what that means in more detail and provide practical examples of how the new interface can be leveraged to drastically improve your reporting.
Machine learning models are increasingly used to make decisions that affect people’s lives. With this power comes a responsibility to ensure that model predictions are fair. In this talk I’ll introduce several common model fairness metrics, discuss their tradeoffs, and finally demonstrate their use with a case study analyzing anonymized data from one of Civis Analytics’s client engagements.
What's the Science in Data Science? - Skipper SeaboldPyData
The gold standard for validating any scientific assumption is to run an experiment. Data science isn’t any different. Unfortunately, it’s not always possible to design the perfect experiment. In this talk, we’ll take a realistic look at measurement using tools from the social sciences to conduct quasi-experiments with observational data.
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
Forecasting time-series data has applications in many fields, including finance, health, etc. There are potential pitfalls when applying classic statistical and machine learning methods to time-series problems. This talk will give folks the basic toolbox to analyze time-series data and perform forecasting using statistical and machine learning models, as well as interpret and convey the outputs.
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
A historical text may now be unreadable, because its language is unknown, or its script forgotten (or both), or because it was deliberately enciphered. Deciphering needs two steps: Identify the language, then map the unknown script to a familiar one. I’ll present an algorithm to solve a cartoon version of this problem, where the language is known, and the cipher is alphabet rearrangement.
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
Artificial intelligence is emerging as a new paradigm in materials science. This talk describes how physical intuition and (insightful) machine learning can solve the complicated task of structure recognition in materials at the nanoscale.
Deprecating the state machine: building conversational AI with the Rasa stack...PyData
Rasa NLU & Rasa Core are the leading open source libraries for building machine learning-based chatbots and voice assistants. In this live-coding workshop you will learn the fundamentals of conversational AI and how to build your own using the Rasa Stack.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
2. Web Scraping
Extracting information from a website (usually HTML) and parsing it in a readable
format (this is called getting soup).
We will be using a package in the Python Library called Beautiful Soup which parses
HTML and XML.
Some Best Practices:
- Check the Robots.txt to make sure you can web scrape the website. Look for
User Agent: *
- Scraping too many pages at once quickly can get your IP blocked
- Use a browser developer tool to determine the classes of objects (in Chrome
you select Inspect)
- Use an LXML parser which is faster than HTML parser
5. Our Web Scraping Exercise
1. Check the Robots.txt of IMDB
2. Save your Python File and import the packages that you need (Beautiful Soup &
Requests)
3. Parse the HTML of 1 url of IMDB of a movie: Monty Python Holy Grail’s IMDB.
4. Let’s web scrape the Title, Content Rating, Description of this movie
5. Let’s create a function where we pass through an ID of an IMDB movie to add
the previous information we web scraped in the form of a dictionary.