A tour of Python: slides from presentation given in 2012.
[Some slides are not properly rendered in SlideShare: the original is still available at http://www.aleksa.org/2015/04/python-presentation_7.html.]
This document provides an overview of the Python programming language. It covers topics such as syntax, types and objects, operators and expressions, functions, classes and object-oriented programming, modules and packages, input/output, and the Python execution environment. It also discusses generators and iterators versus regular functions, namespaces and scopes, and classes in Python.
The document provides a cheat sheet on the pandas DataFrame object. It discusses importing pandas, creating DataFrames from various data sources like CSVs, Excel, and dictionaries. It covers common operations on DataFrames like selecting, filtering, and transforming columns; handling indexes; and saving DataFrames. The DataFrame is a two-dimensional data structure with labeled columns that can be manipulated using various methods.
This document describes ggTimeSeries, an R package that provides extensions to ggplot2 for creating time series plots. It includes examples of using functions from ggTimeSeries to create calendar heatmaps, horizon graphs, steam graphs, and marimekko plots from time series data. The examples demonstrate how to generate sample time series data, create basic plots, and add formatting customizations.
This document discusses building machine learning models to predict if Titanic passengers survived using a dataset from Kaggle. It outlines the steps of exploratory data analysis, data processing including encoding, standardization, and imputation, feature selection, splitting the data into training and test sets, building models like logistic regression, random forest and evaluating them on the test set using metrics like accuracy, confusion matrix and classification report. Random forest is found to have the best performance with an accuracy of 77% on the test set.
1sequences and sampling. Suppose we went to sample the x-axis from X.pdfrushabhshah600
1sequences and sampling. Suppose we went to sample the x-axis from Xmin to Xmax using a
step size of step
A)Draw a picture of what is going on.
B) Write a expression for n the total number of samples involved (in terms of Xmin, Xmax and
step)
C) Write out the sequence of x-samples
D) Write a direct and general expression for xi that captures the sequence
E) Write a recursive expression for the sequence
F) Write a program to compute and store the x-samples over the range -5x5 using a step size of
0.1 do everything in main ()
2 . We talked about the following string functions that are available in C (as long as you include
string.h):
int strlen(char str[])
void strcpy(char str1[], char str2[])
void strcat(char str1[], str2[])
Write your own versions of these functions; for example: int paul_strlen(int char str[]). Hint: for
your version of the strlen function, start at the first character in the array and keep counting until
you find the ‘\\0’ character (use a while loop for this). Note: Use your version of the strlen
function in the strcpy and strcat functions.
9. We want to insert a number into an array.
(a) Formulate the problem mathematically with two sequences: x and y. (b) Write a function of
the form:
insertNumIntoArray(int n, int array[], int num, int index)
The function inserts num into the array at the specified index. The rest of the array then follows.
For example, if num = 9 and index = 3 and array = [7 2 8 8 3 1 2] then the function will produce:
array = [7 2 8 9 8 3 1 2]
Note: assume that array is properly dimensioned to have at least 1 extra space for storage.
10. Repeat #2 by for the delete operation; that is, we want to delete a single element (at a
specified index) from an array; for example, suppose index = 3 and array = [50 70 10 90 60 20],
then the result will be
array: [50 70 10 60 20]
11. Repeat #2 by for an insert operation where we are inserting several values into the array. The
function should be of the form:
int insertArrayIntoArray(int n, int inArray[],
int nInsert, int insertArray[], int outArray[], int index)
The dimension of outArray is returned (explicitly). For example:
inArrayarray: [7 2 8 6 3 9]
insertArray: [50 60 70]
index: 2
outArray: [7 2 50 60 70 8 6 3 9]
Assume that outArray is large enough to hold all n + nInsert values.
Solution
#include
//Simulates strlen() library function
int paul_strlen(char str[])
{
int l;
for(l = 0; str[l] != \'\\0\'; l++) ;
return l;
}
//Simulates strcpy() library function
void paul_strcpy(char str1[], char str2[])
{
int c;
for(c = 0; str1[c] != \'\\0\'; c++)
str2[c] = str1[c];
str2[c] = \'\\0\';
printf(\"\ Original String: %s\", str1);
printf(\"\ Copied String: %s\", str2);
}
//Simulates strcat() library function
void paul_strcat(char str1[], char str2[])
{
int i, j;
for(i = 0; str1[i] != \'\\0\'; i++) ;
for (j = 0; str2[j] != \'\\0\'; i++, j++)
{
str1[i] = str2[j];
}
str1[i] = \'\\0\';
printf(\"\ Concatenated String: %s\", str1);
}
int main()
{
char data1[20], data2[20];
pri.
A tour of Python: slides from presentation given in 2012.
[Some slides are not properly rendered in SlideShare: the original is still available at http://www.aleksa.org/2015/04/python-presentation_7.html.]
This document provides an overview of the Python programming language. It covers topics such as syntax, types and objects, operators and expressions, functions, classes and object-oriented programming, modules and packages, input/output, and the Python execution environment. It also discusses generators and iterators versus regular functions, namespaces and scopes, and classes in Python.
The document provides a cheat sheet on the pandas DataFrame object. It discusses importing pandas, creating DataFrames from various data sources like CSVs, Excel, and dictionaries. It covers common operations on DataFrames like selecting, filtering, and transforming columns; handling indexes; and saving DataFrames. The DataFrame is a two-dimensional data structure with labeled columns that can be manipulated using various methods.
This document describes ggTimeSeries, an R package that provides extensions to ggplot2 for creating time series plots. It includes examples of using functions from ggTimeSeries to create calendar heatmaps, horizon graphs, steam graphs, and marimekko plots from time series data. The examples demonstrate how to generate sample time series data, create basic plots, and add formatting customizations.
This document discusses building machine learning models to predict if Titanic passengers survived using a dataset from Kaggle. It outlines the steps of exploratory data analysis, data processing including encoding, standardization, and imputation, feature selection, splitting the data into training and test sets, building models like logistic regression, random forest and evaluating them on the test set using metrics like accuracy, confusion matrix and classification report. Random forest is found to have the best performance with an accuracy of 77% on the test set.
1sequences and sampling. Suppose we went to sample the x-axis from X.pdfrushabhshah600
1sequences and sampling. Suppose we went to sample the x-axis from Xmin to Xmax using a
step size of step
A)Draw a picture of what is going on.
B) Write a expression for n the total number of samples involved (in terms of Xmin, Xmax and
step)
C) Write out the sequence of x-samples
D) Write a direct and general expression for xi that captures the sequence
E) Write a recursive expression for the sequence
F) Write a program to compute and store the x-samples over the range -5x5 using a step size of
0.1 do everything in main ()
2 . We talked about the following string functions that are available in C (as long as you include
string.h):
int strlen(char str[])
void strcpy(char str1[], char str2[])
void strcat(char str1[], str2[])
Write your own versions of these functions; for example: int paul_strlen(int char str[]). Hint: for
your version of the strlen function, start at the first character in the array and keep counting until
you find the ‘\\0’ character (use a while loop for this). Note: Use your version of the strlen
function in the strcpy and strcat functions.
9. We want to insert a number into an array.
(a) Formulate the problem mathematically with two sequences: x and y. (b) Write a function of
the form:
insertNumIntoArray(int n, int array[], int num, int index)
The function inserts num into the array at the specified index. The rest of the array then follows.
For example, if num = 9 and index = 3 and array = [7 2 8 8 3 1 2] then the function will produce:
array = [7 2 8 9 8 3 1 2]
Note: assume that array is properly dimensioned to have at least 1 extra space for storage.
10. Repeat #2 by for the delete operation; that is, we want to delete a single element (at a
specified index) from an array; for example, suppose index = 3 and array = [50 70 10 90 60 20],
then the result will be
array: [50 70 10 60 20]
11. Repeat #2 by for an insert operation where we are inserting several values into the array. The
function should be of the form:
int insertArrayIntoArray(int n, int inArray[],
int nInsert, int insertArray[], int outArray[], int index)
The dimension of outArray is returned (explicitly). For example:
inArrayarray: [7 2 8 6 3 9]
insertArray: [50 60 70]
index: 2
outArray: [7 2 50 60 70 8 6 3 9]
Assume that outArray is large enough to hold all n + nInsert values.
Solution
#include
//Simulates strlen() library function
int paul_strlen(char str[])
{
int l;
for(l = 0; str[l] != \'\\0\'; l++) ;
return l;
}
//Simulates strcpy() library function
void paul_strcpy(char str1[], char str2[])
{
int c;
for(c = 0; str1[c] != \'\\0\'; c++)
str2[c] = str1[c];
str2[c] = \'\\0\';
printf(\"\ Original String: %s\", str1);
printf(\"\ Copied String: %s\", str2);
}
//Simulates strcat() library function
void paul_strcat(char str1[], char str2[])
{
int i, j;
for(i = 0; str1[i] != \'\\0\'; i++) ;
for (j = 0; str2[j] != \'\\0\'; i++, j++)
{
str1[i] = str2[j];
}
str1[i] = \'\\0\';
printf(\"\ Concatenated String: %s\", str1);
}
int main()
{
char data1[20], data2[20];
pri.
This document provides an overview of using R for financial modeling. It covers basic R commands for calculations, vectors, matrices, lists, data frames, and importing/exporting data. Graphical functions like plots, bar plots, pie charts, and boxplots are demonstrated. Advanced topics discussed include distributions, parameter estimation, correlations, linear and nonlinear regression, technical analysis packages, and practical exercises involving financial data analysis and modeling.
The document discusses clustering and numpy arrays in Python. It shows how to create arrays using numpy, perform operations like summing and finding min/max values, and access elements and slices. It also introduces Cython and demonstrates compiling a simple "Hello World" Cython program and using Cython to optimize a Python prime number generation function for improved performance.
R is a programming language and software environment for statistical analysis and graphics. It allows users to analyze data, create visualizations, and perform statistical tests. Common R commands include functions to get and set the working directory, list objects in the workspace, remove objects, view and set options, save and load the command history, and save and load the entire workspace. R supports various data structures like vectors, arrays, matrices, data frames, and lists to store and manipulate different types of data. Data can be input into R from files, databases, and Excel spreadsheets. Graphs and visualizations created in R can be exported to file formats like PNG, JPEG, PDF and others.
1. The document provides examples of various functions in R including string functions, mathematical functions, statistical probability functions and other statistical functions. Examples are given for functions like substr, grep, sub, paste etc. to manipulate strings and functions like mean, sd, median etc. for statistical calculations.
2. Examples are shown for commonly used probability distribution functions like dnorm, pnorm, qnorm, rnorm etc. Other examples include functions for binomial, Poisson and uniform distributions.
3. The document also lists various other useful statistical functions like range, sum, diff, min, max etc. with examples. Examples are provided to illustrate the use of these functions through loops and to create a matrix.
The document describes implementing and demonstrating several machine learning algorithms:
1) The FIND-S algorithm is used to find the most specific hypothesis from a training data set.
2) The Candidate Elimination algorithm outputs all hypotheses consistent with the training examples.
3) The ID3 decision tree algorithm builds a classification tree from a training data set.
4) Backpropagation for an artificial neural network is demonstrated on a sample data set.
5) Naive Bayes classification is implemented and test accuracy computed on sample data.
The document contains a list of 16 programs related to arrays in Swift. The programs cover topics like printing arrays, searching arrays, sorting arrays, inserting/deleting elements from arrays, and performing operations on elements in arrays. Functions are defined to implement each program with arrays and size passed as parameters.
This document provides a summary of a seminar presentation on robotic process automation and virtual internships. It introduces popular Python libraries for data science like NumPy, SciPy, Pandas, matplotlib and Seaborn. It covers reading, exploring and manipulating data frames; filtering and selecting data; grouping; descriptive statistics. It also discusses missing value handling and aggregation functions. The goal is to provide an overview of key Python tools and techniques for data analysis.
The Pandas library provides easy-to-use data structures and analysis tools for Python. It uses NumPy as a foundation and allows import as pd. Pandas has two main data structures: Series for 1D labeled arrays and DataFrame for 2D labeled data with columns that can be different types. DataFrames allow selection of data by label, position, and filtering with Boolean conditions. Data can be manipulated with arithmetic operations and alignment between Series/DataFrames.
The document provides an overview of the Pandas library in Python for working with data structures like Series and DataFrames. It covers common operations for selecting, filtering, sorting, applying functions, handling missing data, and reading/writing data to files and databases. These operations allow for easy manipulation and analysis of data in Pandas.
Using R in financial modeling provides an introduction to using R for financial applications. It discusses importing stock price data from various sources and visualizing it using basic graphs and technical indicators. It also covers topics like calculating returns, estimating distributions of returns, correlations, volatility modeling, and value at risk calculations. The document provides examples of commands and functions in R to perform these financial analytics tasks on sample stock price data.
This document provides an overview of essential data wrangling tasks in R, including importing, exploring, indexing/subsetting, reshaping, merging, aggregating, and repeating/looping data. It discusses functions for reading different file types like CSV, Excel, and plain text. It also covers exploring data structure and summary statistics, subsetting vectors, data frames and matrices, reshaping between wide and long format, performing different types of joins to merge data, and using loops and sequences to repeat operations.
This document discusses parameters and graphics in Python. It covers:
- Using constants by declaring variables at the top of code
- Drawing graphics using the DrawingPanel module
- Defining functions with parameters and default parameter values
- Drawing shapes, lines, polygons using methods like create_rectangle, create_oval, and animation using the sleep function.
Matplotlib is a Python library used for 2D plotting. It can create publication-quality figures in both hardcopy and interactive formats across platforms. The basic steps for creating plots with Matplotlib are: 1) prepare data, 2) create a plot, 3) add elements to the plot, 4) customize the plot, 5) save the plot, and 6) show the plot. Matplotlib provides various functions for plotting different types of data, customizing figures, axes, and layouts.
The NumPy library provides multidimensional array objects and tools for working with these arrays. It allows users to create arrays, perform arithmetic operations on arrays, manipulate array shapes, combine and split arrays, and more. NumPy arrays can be inspected, saved/loaded from files, sorted, and copied.
The NumPy library provides multidimensional array objects and tools for working with these arrays. It allows users to create arrays, perform arithmetic operations on arrays, manipulate array shapes, combine and split arrays, and more. NumPy arrays can be inspected, saved to disk, sorted, and loaded from text files for data analysis.
The NumPy library provides multidimensional array objects and tools for working with these arrays. It allows users to create arrays, perform arithmetic operations on arrays, manipulate array shapes, combine and split arrays, and more. NumPy arrays can be saved to and loaded from files to allow for inspection and persistence of array data.
This document provides an overview of machine learning algorithms and scikit-learn. It begins with an introduction and table of contents. Then it covers topics like dataset loading from files, pandas, scikit-learn datasets, preprocessing data like handling missing values, feature selection, dimensionality reduction, training and test sets, supervised and unsupervised learning models, and saving/loading machine learning models. For each topic, it provides code examples and explanations.
Stork Product Overview: An AI-Powered Autonomous Delivery FleetVince Scalabrino
Imagine a world where instead of blue and brown trucks dropping parcels on our porches, a buzzing drove of drones delivered our goods. Now imagine those drones are controlled by 3 purpose-built AI designed to ensure all packages were delivered as quickly and as economically as possible That's what Stork is all about.
The Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdfkalichargn70th171
Testing is pivotal in the DevOps framework, serving as a linchpin for early bug detection and the seamless transition from code creation to deployment.
DevOps teams frequently adopt a Continuous Integration/Continuous Deployment (CI/CD) methodology to automate processes. A robust testing strategy empowers them to confidently deploy new code, backed by assurance that it has passed rigorous unit and performance tests.
More Related Content
Similar to Python Cheat Sheet Presentation Learning
This document provides an overview of using R for financial modeling. It covers basic R commands for calculations, vectors, matrices, lists, data frames, and importing/exporting data. Graphical functions like plots, bar plots, pie charts, and boxplots are demonstrated. Advanced topics discussed include distributions, parameter estimation, correlations, linear and nonlinear regression, technical analysis packages, and practical exercises involving financial data analysis and modeling.
The document discusses clustering and numpy arrays in Python. It shows how to create arrays using numpy, perform operations like summing and finding min/max values, and access elements and slices. It also introduces Cython and demonstrates compiling a simple "Hello World" Cython program and using Cython to optimize a Python prime number generation function for improved performance.
R is a programming language and software environment for statistical analysis and graphics. It allows users to analyze data, create visualizations, and perform statistical tests. Common R commands include functions to get and set the working directory, list objects in the workspace, remove objects, view and set options, save and load the command history, and save and load the entire workspace. R supports various data structures like vectors, arrays, matrices, data frames, and lists to store and manipulate different types of data. Data can be input into R from files, databases, and Excel spreadsheets. Graphs and visualizations created in R can be exported to file formats like PNG, JPEG, PDF and others.
1. The document provides examples of various functions in R including string functions, mathematical functions, statistical probability functions and other statistical functions. Examples are given for functions like substr, grep, sub, paste etc. to manipulate strings and functions like mean, sd, median etc. for statistical calculations.
2. Examples are shown for commonly used probability distribution functions like dnorm, pnorm, qnorm, rnorm etc. Other examples include functions for binomial, Poisson and uniform distributions.
3. The document also lists various other useful statistical functions like range, sum, diff, min, max etc. with examples. Examples are provided to illustrate the use of these functions through loops and to create a matrix.
The document describes implementing and demonstrating several machine learning algorithms:
1) The FIND-S algorithm is used to find the most specific hypothesis from a training data set.
2) The Candidate Elimination algorithm outputs all hypotheses consistent with the training examples.
3) The ID3 decision tree algorithm builds a classification tree from a training data set.
4) Backpropagation for an artificial neural network is demonstrated on a sample data set.
5) Naive Bayes classification is implemented and test accuracy computed on sample data.
The document contains a list of 16 programs related to arrays in Swift. The programs cover topics like printing arrays, searching arrays, sorting arrays, inserting/deleting elements from arrays, and performing operations on elements in arrays. Functions are defined to implement each program with arrays and size passed as parameters.
This document provides a summary of a seminar presentation on robotic process automation and virtual internships. It introduces popular Python libraries for data science like NumPy, SciPy, Pandas, matplotlib and Seaborn. It covers reading, exploring and manipulating data frames; filtering and selecting data; grouping; descriptive statistics. It also discusses missing value handling and aggregation functions. The goal is to provide an overview of key Python tools and techniques for data analysis.
The Pandas library provides easy-to-use data structures and analysis tools for Python. It uses NumPy as a foundation and allows import as pd. Pandas has two main data structures: Series for 1D labeled arrays and DataFrame for 2D labeled data with columns that can be different types. DataFrames allow selection of data by label, position, and filtering with Boolean conditions. Data can be manipulated with arithmetic operations and alignment between Series/DataFrames.
The document provides an overview of the Pandas library in Python for working with data structures like Series and DataFrames. It covers common operations for selecting, filtering, sorting, applying functions, handling missing data, and reading/writing data to files and databases. These operations allow for easy manipulation and analysis of data in Pandas.
Using R in financial modeling provides an introduction to using R for financial applications. It discusses importing stock price data from various sources and visualizing it using basic graphs and technical indicators. It also covers topics like calculating returns, estimating distributions of returns, correlations, volatility modeling, and value at risk calculations. The document provides examples of commands and functions in R to perform these financial analytics tasks on sample stock price data.
This document provides an overview of essential data wrangling tasks in R, including importing, exploring, indexing/subsetting, reshaping, merging, aggregating, and repeating/looping data. It discusses functions for reading different file types like CSV, Excel, and plain text. It also covers exploring data structure and summary statistics, subsetting vectors, data frames and matrices, reshaping between wide and long format, performing different types of joins to merge data, and using loops and sequences to repeat operations.
This document discusses parameters and graphics in Python. It covers:
- Using constants by declaring variables at the top of code
- Drawing graphics using the DrawingPanel module
- Defining functions with parameters and default parameter values
- Drawing shapes, lines, polygons using methods like create_rectangle, create_oval, and animation using the sleep function.
Matplotlib is a Python library used for 2D plotting. It can create publication-quality figures in both hardcopy and interactive formats across platforms. The basic steps for creating plots with Matplotlib are: 1) prepare data, 2) create a plot, 3) add elements to the plot, 4) customize the plot, 5) save the plot, and 6) show the plot. Matplotlib provides various functions for plotting different types of data, customizing figures, axes, and layouts.
The NumPy library provides multidimensional array objects and tools for working with these arrays. It allows users to create arrays, perform arithmetic operations on arrays, manipulate array shapes, combine and split arrays, and more. NumPy arrays can be inspected, saved/loaded from files, sorted, and copied.
The NumPy library provides multidimensional array objects and tools for working with these arrays. It allows users to create arrays, perform arithmetic operations on arrays, manipulate array shapes, combine and split arrays, and more. NumPy arrays can be inspected, saved to disk, sorted, and loaded from text files for data analysis.
The NumPy library provides multidimensional array objects and tools for working with these arrays. It allows users to create arrays, perform arithmetic operations on arrays, manipulate array shapes, combine and split arrays, and more. NumPy arrays can be saved to and loaded from files to allow for inspection and persistence of array data.
This document provides an overview of machine learning algorithms and scikit-learn. It begins with an introduction and table of contents. Then it covers topics like dataset loading from files, pandas, scikit-learn datasets, preprocessing data like handling missing values, feature selection, dimensionality reduction, training and test sets, supervised and unsupervised learning models, and saving/loading machine learning models. For each topic, it provides code examples and explanations.
Similar to Python Cheat Sheet Presentation Learning (20)
Stork Product Overview: An AI-Powered Autonomous Delivery FleetVince Scalabrino
Imagine a world where instead of blue and brown trucks dropping parcels on our porches, a buzzing drove of drones delivered our goods. Now imagine those drones are controlled by 3 purpose-built AI designed to ensure all packages were delivered as quickly and as economically as possible That's what Stork is all about.
The Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdfkalichargn70th171
Testing is pivotal in the DevOps framework, serving as a linchpin for early bug detection and the seamless transition from code creation to deployment.
DevOps teams frequently adopt a Continuous Integration/Continuous Deployment (CI/CD) methodology to automate processes. A robust testing strategy empowers them to confidently deploy new code, backed by assurance that it has passed rigorous unit and performance tests.
The Role of DevOps in Digital Transformation.pdfmohitd6
DevOps plays a crucial role in driving digital transformation by fostering a collaborative culture between development and operations teams. This approach enhances the speed and efficiency of software delivery, ensuring quicker deployment of new features and updates. DevOps practices like continuous integration and continuous delivery (CI/CD) streamline workflows, reduce manual errors, and increase the overall reliability of software systems. By leveraging automation and monitoring tools, organizations can improve system stability, enhance customer experiences, and maintain a competitive edge. Ultimately, DevOps is pivotal in enabling businesses to innovate rapidly, respond to market changes, and achieve their digital transformation goals.
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsOnePlan Solutions
Clinical operations professionals encounter unique challenges. Balancing regulatory requirements, tight timelines, and the need for cross-functional collaboration can create significant internal pressures. Our upcoming webinar will introduce key strategies and tools to streamline and enhance clinical development processes, helping you overcome these challenges.
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
Implementing Odoo, a robust and all-inclusive business management software, can significantly improve your organisation. To get the most out of it and ensure a smooth implementation, it is important to have a strategic plan. This blog shares some essential tips to help you with successful Odoo ERP implementation. From planning and customisation to training and support, this blog outlines some expert advice that will guide you through the process confidently. It is true that adopting a new software can be challenging, and hence, this post has tailored these tips to help you avoid common mistakes and achieve the best results. Whether you run a small business or a large enterprise, these tips will help you streamline operations, boost productivity, and drive growth. Whether you are new to Odoo or looking to improve your current setup, it is essential to learn the key strategies for a successful Odoo implementation. Implementing Odoo doesn’t have to be difficult. With the right approach and guidance, you can use this software to elevate your business. Read on to discover the secrets of a successful Odoo implementation.
Why is successful Odoo implementation crucial?
Implementing Odoo effectively can transform your business by making processes smoother, increasing efficiency, and providing useful insights. It helps align your operations with best practices, boosting productivity and aiding better decision-making. A well-executed implementation ensures you get the most out of your investment, while a poor one can cause disruptions, higher costs, and frustration among employees.
Hyperledger Besu 빨리 따라하기 (Private Networks)wonyong hwang
Hyperledger Besu의 Private Networks에서 진행하는 실습입니다. 주요 내용은 공식 문서인https://besu.hyperledger.org/private-networks/tutorials 의 내용에서 발췌하였으며, Privacy Enabled Network와 Permissioned Network까지 다루고 있습니다.
This is a training session at Hyperledger Besu's Private Networks, with the main content excerpts from the official document besu.hyperledger.org/private-networks/tutorials and even covers the Private Enabled and Permitted Networks.
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
How GenAI Can Improve Supplier Performance Management.pdfZycus
Data Collection and Analysis with GenAI enables organizations to gather, analyze, and visualize vast amounts of supplier data, identifying key performance indicators and trends. Predictive analytics forecast future supplier performance, mitigating risks and seizing opportunities. Supplier segmentation allows for tailored management strategies, optimizing resource allocation. Automated scorecards and reporting provide real-time insights, enhancing transparency and tracking progress. Collaboration is fostered through GenAI-powered platforms, driving continuous improvement. NLP analyzes unstructured feedback, uncovering deeper insights into supplier relationships. Simulation and scenario planning tools anticipate supply chain disruptions, supporting informed decision-making. Integration with existing systems enhances data accuracy and consistency. McKinsey estimates GenAI could deliver $2.6 trillion to $4.4 trillion in economic benefits annually across industries, revolutionizing procurement processes and delivering significant ROI.
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
Orca: Nocode Graphical Editor for Container OrchestrationPedro J. Molina
Tool demo on CEDI/SISTEDES/JISBD2024 at A Coruña, Spain. 2024.06.18
"Orca: Nocode Graphical Editor for Container Orchestration"
by Pedro J. Molina PhD. from Metadev
Software Test Automation - A Comprehensive Guide on Automated Testing.pdfkalichargn70th171
Moving to a more digitally focused era, the importance of software is rapidly increasing. Software tools are crucial for upgrading life standards, enhancing business prospects, and making a smart world. The smooth and fail-proof functioning of the software is very critical, as a large number of people are dependent on them.
These are the slides of the presentation given during the Q2 2024 Virtual VictoriaMetrics Meetup. View the recording here: https://www.youtube.com/watch?v=hzlMA_Ae9_4&t=206s
Topics covered:
1. What is VictoriaLogs
Open source database for logs
● Easy to setup and operate - just a single executable with sane default configs
● Works great with both structured and plaintext logs
● Uses up to 30x less RAM and up to 15x disk space than Elasticsearch
● Provides simple yet powerful query language for logs - LogsQL
2. Improved querying HTTP API
3. Data ingestion via Syslog protocol
* Automatic parsing of Syslog fields
* Supported transports:
○ UDP
○ TCP
○ TCP+TLS
* Gzip and deflate compression support
* Ability to configure distinct TCP and UDP ports with distinct settings
* Automatic log streams with (hostname, app_name, app_id) fields
4. LogsQL improvements
● Filtering shorthands
● week_range and day_range filters
● Limiters
● Log analytics
● Data extraction and transformation
● Additional filtering
● Sorting
5. VictoriaLogs Roadmap
● Accept logs via OpenTelemetry protocol
● VMUI improvements based on HTTP querying API
● Improve Grafana plugin for VictoriaLogs -
https://github.com/VictoriaMetrics/victorialogs-datasource
● Cluster version
○ Try single-node VictoriaLogs - it can replace 30-node Elasticsearch cluster in production
● Transparent historical data migration to object storage
○ Try single-node VictoriaLogs with persistent volumes - it compresses 1TB of production logs from
Kubernetes to 20GB
● See https://docs.victoriametrics.com/victorialogs/roadmap/
Try it out: https://victoriametrics.com/products/victorialogs/
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...kalichargn70th171
Visual testing plays a vital role in ensuring that software products meet the aesthetic requirements specified by clients in functional and non-functional specifications. In today's highly competitive digital landscape, users expect a seamless and visually appealing online experience. Visual testing, also known as automated UI testing or visual regression testing, verifies the accuracy of the visual elements that users interact with.
Streamlining End-to-End Testing Automation with Azure DevOps Build & Release Pipelines
Automating end-to-end (e2e) test for Android and iOS native apps, and web apps, within Azure build and release pipelines, poses several challenges. This session dives into the key challenges and the repeatable solutions implemented across multiple teams at a leading Indian telecom disruptor, renowned for its affordable 4G/5G services, digital platforms, and broadband connectivity.
Challenge #1. Ensuring Test Environment Consistency: Establishing a standardized test execution environment across hundreds of Azure DevOps agents is crucial for achieving dependable testing results. This uniformity must seamlessly span from Build pipelines to various stages of the Release pipeline.
Challenge #2. Coordinated Test Execution Across Environments: Executing distinct subsets of tests using the same automation framework across diverse environments, such as the build pipeline and specific stages of the Release Pipeline, demands flexible and cohesive approaches.
Challenge #3. Testing on Linux-based Azure DevOps Agents: Conducting tests, particularly for web and native apps, on Azure DevOps Linux agents lacking browser or device connectivity presents specific challenges in attaining thorough testing coverage.
This session delves into how these challenges were addressed through:
1. Automate the setup of essential dependencies to ensure a consistent testing environment.
2. Create standardized templates for executing API tests, API workflow tests, and end-to-end tests in the Build pipeline, streamlining the testing process.
3. Implement task groups in Release pipeline stages to facilitate the execution of tests, ensuring consistency and efficiency across deployment phases.
4. Deploy browsers within Docker containers for web application testing, enhancing portability and scalability of testing environments.
5. Leverage diverse device farms dedicated to Android, iOS, and browser testing to cover a wide range of platforms and devices.
6. Integrate AI technology, such as Applitools Visual AI and Ultrafast Grid, to automate test execution and validation, improving accuracy and efficiency.
7. Utilize AI/ML-powered central test automation reporting server through platforms like reportportal.io, providing consolidated and real-time insights into test performance and issues.
These solutions not only facilitate comprehensive testing across platforms but also promote the principles of shift-left testing, enabling early feedback, implementing quality gates, and ensuring repeatability. By adopting these techniques, teams can effectively automate and execute tests, accelerating software delivery while upholding high-quality standards across Android, iOS, and web applications.
2. Pandas
Cheat Sheet
Pandas provides data analysis tools for Python. All of the
following code examples refer to the dataframe below.
Create a series:
s = pd.Series([1, 2, 3],
index=['A', 'B', 'C'],
name='col1')
Create a dataframe:
data = [[1, 4], [2, 5], [3, G]]
index = ['A', 'B', 'C']
df = pd.DataFrame(data, index=index,
columns=['col1', 'col2'])
Load a dataframe:
df = pd.read_csv('filename.csv', sep=',',
names=['col1', 'col2'],
index_col=0,
encoding='utf-8',
nrows=3)
Selecting rows and columns
Select single column:
df['col1']
Select multiple columns:
df[['col1', 'col2']]
Show first n rows:
df.head(2)
Show last n rows:
df.tail(2)
Select rows by index values:
df.loc['A'] df.loc[['A', 'B']]
Select rows by position:
df.loc[1] df.loc[1:]
Data wrangling
Filter by value:
df[df['col1'] > 1]
Sort by columns:
df.sort_values(['col2', 'col2'],
ascending=[False, True])
Identify duplicate rows:
df.duplicated()
Identify unique rows:
df['col1'].unique()
Swap rows and columns:
df = df.transpose()
df = df.T
Drop a column:
df = df.drop('col1', axis=1)
Clone a data frame:
clone = df.copy()
Connect multiple data frames vertically:
df2 = df + 5 #new dataframe
pd.concat([df,df2])
Merge multiple data frames horizontally:
df3 = pd.DataFrame([[1, 7],[8,9]],
index=['B', 'D'],
columns=['col1', 'col3'])
#df3: new dataframe
Only merge complete rows (INNER JOIN):
df.merge(df3)
Left column stays complete (LEFT OUTER JOIN):
df.merge(df3, how='left')
Right column stays complete (RIGHT OUTER JOIN):
df.merge(df3, how='right')
Preserve all values (OUTER JOIN):
df.merge(df3, how='outer')
Merge rows by index:
df.merge(df3,left_index=True,
right_index=True)
Fill NaN values:
df.fillna(0)
Apply your own function:
def func(x):
return 2**x
df.apply(func)
Arithmetics and statistics
Add to all values:
df + 10
Sum over columns:
df.sum()
Cumulative sum over columns:
df.cumsum()
Mean over columns:
df.mean()
Standard deviation over columns:
df.std()
Count unique values:
df['col1'].value_counts()
Summarize descriptive statistics:
df.describe()
col1 col2
A 1 4
B 2 5
C 3 6
Getting Started
Import pandas:
import pandas as pd
df =
axis 0
axis 1
3. Hierarchical indexing
Create hierarchical index:
df.stack()
Dissolve hierarchical index:
df.unstack()
Aggregation
Create group object:
g = df.groupby('col1')
Iterate over groups:
for i, group in g:
print(i, group)
Aggregate groups:
g.sum()
g.prod()
g.mean()
g.std()
g.describe()
Select columns from groups:
g['col2'].sum()
g[['col2', 'col3']].sum()
Transform values:
import math
g.transform(math.log)
Apply a list function on each group:
def strsum(group):
return ''.join([str(x) for x in group.value])
g['col2'].apply(strsum)
Box-and-whisker plot:
df.plot.box()
Histogram over one column:
df['col1'].plot.hist(bins=3)
Histogram over all columns:
df.plot.hist(bins=3, alpha=0.5)
Set tick marks:
labels = ['A', 'B', 'C', 'D']
positions = [1, 2, 3, 4]
plt.xticks(positions, labels)
plt.yticks(positions, labels)
Select area to plot:
plt.axis([0, 2.5, 0, 10]) # [from
x, to x, from y, to y]
Label diagram and axes:
plt.title('Correlation')
plt.xlabel('Nunstück')
plt.ylabel('Slotermeyer')
Save most recent diagram:
plt.savefig('plot.png')
plt.savefig('plot.png',dpi=300)
plt.savefig('plot.svg')
Data export
Data as NumPy array:
df.values
Save data as CSV file:
df.to_csv('output.csv', sep=",")
Format a dataframe as tabular string:
df.to_string()
Convert a dataframe to a dictionary:
df.to_dict()
Save a dataframe as an Excel table:
df.to_excel('output.xlsx')
Visualization
Import matplotlib:
import matplotlib.pyplot as plt
Start a new diagram:
plt.figure()
Scatter plot:
df.plot.scatter('col1', 'col2',
style='ro')
Bar plot:
df.plot.bar(x='col1', y='col2',
width=0.7)
Area plot:
df.plot.area(stacked=True,
alpha=1.0)
Find practical examples in these
guides I made:
- Pandas Guide for Excel Users(link)
- Data Wrangling Guide (link)
- Regular Expression Guide (link)
Made by Frank Andrade frank-andrade.medium.com
4. NumPy
Cheat Sheet
Getting Started
Import numpy:
import numpy as np
Create arrays:
a = np.array([1,2,3])
b = np.array([(1.5,2,3), (4,5,G)], dtype=float)
c = np.array([[(1.5,2,3), (4,5,G)],
[(3,2,1), (4,5,G)]],
dtype = float)
Initial placeholders:
np.zeros((3,4)) #Create an array of zeros
np.ones((2,3,4),dtype=np.int1G)
d = np.arange(10,25,5)
np.linspace( 0,2, 9)
e = np.full((2,2), 7)
f = np.eye(2)
np.random.random((2,2))
np.empty((3,2))
Saving & Loading On Disk:
np.save('my_array', a)
np.savez('array.npz', a, b)
np.load('my_array.npy')
Saving & Loading Text Files
np.loadtxt('my_file.txt')
np.genfromtxt('my_file.csv',
delimiter=',')
np.savetxt('myarray.txt', a,
delimiter= ' ')
Inspecting Your Array
a.shape
len(a)
b.ndim
e.size
b.dtype #data type
b.dtype.name
b.astype(int) #change data type
Data Types
np.intG4
np.float32
np.complex
np.bool
np.object
np.string_
np.unicode_
Array Mathematics
Arithmetic Operations
>>> g = a-b
array([[-0.5, 0. , 0. ],
[-3. , 3. , 3. ]])
>>> np.subtract(a,b)
>>> b+a
array([[2.5, 4. , G. ],
[ 5. , 7. , 9. ]])
>>> np.add(b,a)
>>> a/b
array([[ 0.GGGGGGG7, 1. , 1. ],
[ 0.2 5 , 0.4 , 0 . 5 ]])
>>> np.divide(a,b)
>>> a*b
array([[ 1 . 5, 4. , 9. ],
[ 4. , 10. , 18. ]])
>>> np.multiply(a,b)
>>> np.exp(b)
>>> np.sqrt(b)
>>> np.sin(a)
>>> np.log(a)
>>> e.dot(f)
Aggregate functions:
a.sum()
a.min()
b.max(axis= 0)
b.cumsum(axis= 1) #Cumulative sum
a.mean()
b.median()
a.corrcoef() #Correlation coefficient
np.std(b) #Standard deviation
Copying arrays:
h = a.view() #Create a view
np.copy(a)
h = a.copy() #Create a deep copy
Sorting arrays:
a.sort() #Sort an array
c.sort(axis=0)
Array Manipulation
Transposing Array:
i = np.transpose(b)
i.T
Changing Array Shape:
b.ravel()
g.reshape(3,-2)
Adding/removing elements:
h.resize((2,G))
np.append(h,g)
np.insert(a, 1, 5)
np.delete(a,[1])
Combining arrays:
np.concatenate((a,d),axis=0)
np.vstack((a,b)) #stack vertically
np.hstack((e,f)) #stack horizontally
Splitting arrays:
np.hsplit(a,3) #Split horizontally
np.vsplit(c,2) #Split vertically
Subsetting
b[1,2]
Slicing:
a[0:2]
Boolean Indexing:
a[a<2]
NumPy provides tools for working with arrays. All of the
following code examples refer to the arrays below.
NumPy Arrays
1DArray 2D Array
1 2 3 1.5 2 3
4 5 6
axis 0
axis 1
1.5 2 3
4 5 6
1 2 3
1 2 3
5. Scikit-Learn
Cheat Sheet
Sklearn is a free machine learning library for Python. It features various
classification, regression and clustering algorithms.
Getting Started
The code below demonstrates the basic steps of using sklearn to create and run a model
on a set of data.
The steps in the code include loading the data, splitting into train and test sets, scaling
the sets, creating the model, fitting the model on the data using the trained model to
make predictions on the test set, and finally evaluating the performance of the model.
from sklearn import neighbors,datasets,preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris = datasets.load_iris()
X,y = iris.data[:,:2], iris.target
X_train, X_test, y_train, y_test=train_test_split(X,y)
scaler = preprocessing_StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
knn = neighbors.KNeighborsClassifier(n_neighbors = 5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy_score(y_test, y_pred)
Loading the Data
The data needs to be numeric and stored as NumPy arrays or SciPy spare matrix
(numeric arrays, such as Pandas DataFrame’s are also ok)
>>> import numpy as np
>>> X = np.random.random((10,5))
array([[0.21,0.33],
[0.23, 0.G0],
[0.48, 0.G2]])
>>> y = np.array(['A','B','A'])
array(['A', 'B', 'A'])
Training and Test Data
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,
random_state = 0)#Splits data into training and test set
Preprocessing The Data
Standardization
Standardizes the features by removing the mean and scaling to unit variance.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
standarized_X = scaler.transform(X_train)
standarized_X_test = scaler.transform(X_test)
Normalization
Each sample (row of the data matrix) with at least one non-zero component is
rescaled independently of other samples so that its norm equals one.
from sklearn.preprocessing import Normalizer
scaler = Normalizer().fit(X_train)
normalized_X = scaler.transform(X_train)
normalized_X_test = scaler.transform(X_test)
Binarization
Binarize data (set feature values to 0 or 1)according to a threshold.
from sklearn.preprocessing import Binarizer
binarizer = Binarizer(threshold = 0.0).fit(X)
binary_X = binarizer.transform(X_test)
Encoding Categorical Features
Imputation transformer for completing missing values.
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit_transform(X_train)
Imputing Missing Values
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=0, strategy ='mean')
imp.fit_transform(X_train)
Generating Polynomial Features
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(5)
poly.fit_transform(X)
Find practical examples in these
guides I made:
- Scikit-Learn Guide (link)
- Tokenize text with Python (link)
- Predicting Football Games (link)
Made by Frank Andrade frank-andrade.medium.com
6. Create Your Model
Supervised Learning Models
Linear Regression
from sklearn.linear_model import LinearRegression
lr = LinearRegression(normalize = True)
Support Vector Machines (SVM)
from sklearn.svm import SVC
svc = SVC(kernel = 'linear')
Naive Bayes
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
KNN
from sklearn import neighbors
knn = neighbors.KNeighborsClassifier(n_neighbors = 5)
Unsupervised Learning Models
Principal Component Analysis (PCA)
from sklearn.decomposition import PCA
pca = PCA(n_components = 0.95)
K means
from sklearn.cluster import KMeans
k_means = KMeans(n_clusters = 3, random_state = 0)
Model Fitting
Fitting supervised and unsupervised learning models onto data.
Supervised Learning
lr.fit(X, y) #Fit the model to the data
knn.fit(X_train,y_train)
svc.fit(X_train,y_train)
Unsupervised Learning
k_means.fit(X_train) #Fit the model to the data
pca_model = pca.fit_transform(X_train)#Fit to data,then transform
Prediction
Predict Labels
y_pred = lr.predict(X_test) #Supervised Estimators
y_pred = k_means.predict(X_test) #Unsupervised Estimators
Estimate probability of a label
y_pred = knn.predict_proba(X_test)
Evaluate Your Model’s Performance
Classification Metrics
Accuracy Score
knn.score(X_test,y_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)
Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))
Confusion Matrix
from sklearn .metrics import confusion_matrix
print(confusion_matrix(y_test,y_pred))
Regression Metrics
Mean Absolute Error
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test,y_pred)
Mean Squared Error
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test,y_pred)
R² Score
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)
Clustering Metrics
Adjusted Rand Index
from sklearn.metrics import adjusted_rand_score
adjusted_rand_score(y_test,y_pred)
Homogeneity
from sklearn.metrics import homogeneity_score
homogeneity_score(y_test,y_pred)
V-measure
from sklearn.metrics import v_measure_score
v_measure_score(y_test,y_pred)
Tune Your Model
Grid Search
from sklearn.model_selection import GridSearchCV
params = {'n_neighbors':np.arange(1,3),
'metric':['euclidean','cityblock']}
grid = GridSearchCV(estimator = knn, param_grid = params)
grid.fit(X_train, y_train)
print(grid.best_score_)
print(grid.best_estimator_.n_neighbors)
7. Matplotlib
Workflow
Data, Plot, Customize Plot, Save Plot and Show Plot.
import matplotlib.pyplot as plt
Example with lineplot
Prepare data
x = [2017, 2018, 2019, 2020, 2021]
y = [43, 45, 47, 48, 50]
Plot & Customize Plot
plt.plot(x,y,marker='o',linestyle='--',
color='g', label='USA')
plt.xlabel('Years')
plt.ylabel('Population (M)')
plt.title('Years vs Population')
plt.legend(loc='lower right')
plt.yticks([41, 45, 48, 51])
Save Plot
plt.savefig('example.png')
Show Plot
plt.show()
Markers: '.', 'o', 'v', '<', '>'
Line Styles: '-', '--', '-.', ':'
Colors: 'b', 'g', 'r', 'y' #blue, green, red, yellow
Barplot
x = ['USA', 'UK', 'Australia']
y = [40, 50, 33]
plt.bar(x, y)
plt.show()
Piechart
plt.pie(y, labels=x, autopct='%.0f %%')
plt.show()
Histogram
ages = [15, 1G, 17, 30, 31, 32, 35]
bins = [15, 20, 25, 30, 35]
plt.hist(ages, bins, edgecolor='black')
plt.show()
Boxplots
ages = [15, 1G, 17, 30, 31, 32, 35]
plt.boxplot(ages)
plt.show()
The basic steps to creating plots with matplotlib are Prepare Scatterplot
a = [1, 2, 3, 4, 5, 4, 3 ,2, 5, G, 7]
b = [7, 2, 3, 5, 5, 7, 3, 2, G, 3, 2]
plt.scatter(a, b)
plt.show()
Subplots
Add the code below to make multple plots with 'n'
number of rows and columns.
fig, ax = plt.subplots(nrows=1,
ncols=2,
sharey=True,
figsize=(12, 4))
Plot & Customize Each Graph
ax[0].plot(x, y, color='g')
ax[0].legend()
ax[1].plot(a, b, color='r')
ax[1].legend()
plt.show()
Data Viz
Cheat Sheet
Seaborn
Workflow
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
Lineplot
plt.figure(figsize=(10, 5))
flights = sns.load_dataset("flights")
may_flights=flights.query("month=='May'")
ax = sns.lineplot(data=may_flights,
x="year",
y="passengers")
ax.set(xlabel='x', ylabel='y',
title='my_title, xticks=[1,2,3])
ax.legend(title='my_legend,
title_fontsize=13)
plt.show()
Barplot
tips = sns.load_dataset("tips")
ax = sns.barplot(x="day",
y="total_bill,
data=tips)
Histogram
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins,
x="flipper_length_mm")
Boxplot
tips = sns.load_dataset("tips")
ax = sns.boxplot(x=tips["total_bill"])
Scatterplot
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips,
x="total_bill",
y="tip")
Figure aesthetics
sns.set_style('darkgrid') #stlyes
sns.set_palette('husl', 3) #palettes
sns.color_palette('husl') #colors
Fontsize of the axes title, x and y labels, tick labels
and legend:
plt.rc('axes', titlesize=18)
plt.rc('axes', labelsize=14)
plt.rc('xtick', labelsize=13)
plt.rc('ytick', labelsize=13)
plt.rc('legend', fontsize=13)
plt.rc('font', size=13)
X-axis
Y-axis
Matplotlib is a Python 2D plotting library that produces
figures in a variety of formats.
Figure
Find practical examples in these
guides I made:
- Matplotlib & Seaborn Guide (link)
- Wordclouds Guide (link)
- Comparing Data Viz libraries(link)
Made by Frank Andrade frank-andrade.medium.com
8. HTML for Web Scraping
Let's take a look at the HTML element syntax.
This is a single HTML element, but the HTML code behind a
website has hundreds of them.
The HTML code is structured with “nodes”. Each rectangle below
represents a node (element, attribute and text nodes)
Text
XPath
We need to learn XPath to scrape with Selenium or
Scrapy.
//article[@class="main-article"]
//h1
//div[@class="full-script"]
XPath Special Characters
“Siblings” are nodes with the same parent.
A node’s children and its children’s children are
called its “descendants”. Similarly, a node’s parent
and its parent’s parent are called its “ancestors”.
it’s recommended to find element in this order.
a. ID
b. Class name
c. Tag name
d. Xpath
Beautiful Soup
Workflow
Importing the libraries
from bs4 import BeautifulSoup
import requests
Fetch the pages
result=requests.get("www.google.com")
result.status_code #get status code
result.headers #get the headers
Page content
content = result.text
Create soup
soup = BeautifulSoup(content,"lxml")
HTML in a readable format
print(soup.prettify())
Find an element
soup.find(id="specific_id")
Find elements
soup.find_all("a")
soup.find_all("a","css_class")
soup.find_all("a",class_="my_class")
soup.find_all("a",attrs={"class":
"my_class"})
Get inner text
sample = element.get_text()
sample = element.get_text(strip=True,
separator= ' ')
Get specific attributes
sample = element.get('href')
Web Scraping
Cheat Sheet
Web Scraping is the process of extracting data from a
website. Before studying Beautiful Soup and Selenium, it's
good to review some HTML basics first.
End
tag
Tag Attribute
Attribute name
name value
<h1 class="title"> Titanic (1997) </h1>
Attribute Affected content
HTML
Element
HTML code example
<article class="main-article">
<h1> Titanic (1997) </h1>
<p class="plot"> 84 years later ... </p>
<div class="full-script"> 13 meters. You ... </div>
</article>
Root
Element
<article>
Attribute
class="main-
article"
Eleme
nt
<h1>
Eleme
nt
<p>
Eleme
nt
<div>
Text
Titanic
(1997)
Attribut
e
class="plo
t"
Text
84 years later
...
Attribut
e
class="full-script"" 13 meters.
You ...
Parent
Node
Sibling
s
XPath Syntax
An XPath usually contains a tag name, attribute
name, and attribute value.
//tagName[@AttributeName="Value"]
Let’s check some examples to locate the article,
title, and transcript elements of the HTML code we
used before.
XPath Functions and Operators
XPath functions
//tag[contains(@AttributeName, "Value")]
XPath Operators: and, or
//tag[(expression 1) and (expression 2)]
/
Selects the children from the node set on the
left side of this character
/ /
Specifies that the matching node set should
be located at any level within the document
.
..
Specifies the current context should be used
(refers to present node)
Refers to a parent node
*
A wildcard character that selects all
elements or attributes regardless of names
@ Select an attribute
() Grouping an XPath expression
[n]
Indicates that a node with index "n" should
be selected
9. Find practical examples in these guides I
made:
- Web Scraping Complete Guide (link)
- Web Scraping with Selenium (link)
- Web Scraping with Beautiful Soup (link)
Made by Frank Andrade frank-andrade.medium.com
Selenium
Workflow
from selenium import webdriver
web="www.google.com"
path='introduce chromedriver path'
driver = webdriver.Chrome(path)
driver.get(web)
Find an element
driver.find_element_by_id('name')
Find elements
driver.find_elements_by_class_name()
driver.find_elements_by_css_selector
driver.find_elements_by_xpath()
driver.find_elements_by_tag_name()
driver.find_elements_by_name()
Quit driver
driver.quit()
Getting the text
data = element.text
Implicit Waits
import time
time.sleep(2)
Explicit Waits
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.ID,
'id_name'))) #Wait 5 seconds until an element is clickable
Options: Headless mode, change window size
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
options.add_argument('window-size=1920x1080')
driver=webdriver.Chrome(path,options=options)
Scrapy
Scrapy is the most powerful web scraping framework in Python, but it's a bit
complicated to set up, so check my guide or its documentation to set it up.
Creating a Project and Spider
To create a new project, run the following command in the terminal.
scrapy startproject my_first_spider
To create a new spider, first change the directory.
cd my_first_spider
Create an spider
scrapy genspider example example.com
The Basic Template
When you create a spider, you obtain a template with the following content.
import scrapy
class ExampleSpider(scrapy.Spider):
name = 'example'
allowed_domains = ['example.com']
start_urls = ['http://example.com/']
def parse(self, response):
pass
Class
Parse method
The class is built with the data we introduced in the previous command, but the
parse method needs to be built by us. To build it, use the functions below.
Finding elements
To find elements in Scrapy, use the response argument from the parse method
response.xpath('//tag[@AttributeName="Value"]')
Getting the text
To obtain the text element we use text() and either .get() or .getall(). For example:
response.xpath(‘//h1/text()’).get()
response.xpath(‘//tag[@Attribute=”Value”]/text()’).getall()
Return data extracted
To see the data extracted we have to use the yield keyword
def parse(self, response):
title = response.xpath(‘//h1/text()’).get()
# Return data extracted
yield {'titles': title}
Run the spider and export data to CSV or JSON
scrapy crawl example
scrapy crawl example -o name_of_file.csv
scrapy crawl example -o name_of_file.json