SlideShare a Scribd company logo
PRESENTATION: PYTHON TOPIC: DEVELOPMENT OF SCRAPPING API
TALA KHOURY, Computer and Communication Engineering Student, Notre Dame University,
Lebanon. INTERN WITH AIIHC INTERNATIONAL LTD. AUGUST-NOVEMBER 2023.
SUPERVISOR: RAMZI EL FEGHALI
DESCRIPTION WITH EXAMPLES:
BASICS:
Python has a clean and easy-to-understand syntax. Statements are terminated with line breaks. The
print() function is used to display output.
>>> print("Hello World")
Hello World
>>>
Variables store data values. Python has various data types, including integers, floats, strings, and
more.
OPERATORS:
 Arithmetic operators(+,-,*,/): These operators are used for basic arithmetic operations like
addition, subtraction, multiplication, division, and modulus (remainder).
>>> i= 10
>>> j=3
>>> print('sum: ', i+j)
sum: 13
>>> print('subtraction: ', i-j)
subtraction: 7
>>> print('multiplication: ', i*j)
multiplication: 30
>>> print('division: ', i/j)
division: 3.3333333333333335
 Assignment operators( +=, -=, *=, /=): Assignment operators are used to assign values to
variables.
>>> a=10
>>> b=2
>>> a += b
>>> print(a)
12
>>> a-=b
>>> print(a)
10
>>> a*=b
>>> print(a)
20
>>> a/=b
>>> print(a)
10.0
 Comparison operators(==, >,<,<=,>=, !=) Comparison operators are used to compare two
values and return a Boolean result (True or False).
>>> p=9
>>> k=8
>>> print ('a==b: ', a==b)
a==b: False
>>> print ('a>b; ', a>b)
a>b; True
>>> print ('a<b; ', a<b)
a<b; False
>>> print ('a<=b: ', a<=b)
a<=b: False
>>> print ('a>=b: ', a>=b)
a>=b: True
>>> print ('a!=b: ', a!=b)
a!=b: True
 Logical operators (and, or, not): Logical operators are used to combine and manipulate
Boolean values.
>>> s=9
>>> r=8
>>> print (s>8 and r>7)
True
>>> print (s>8 or r>7)
True
>>> print( not s<8)
True
 Identity operators( is, is not): Identity operators are used to compare the memory addresses
of two objects
>>> q= 4
>>> e= 3
>>> x= 'HI'
>>> y= 'Hi'
>>> r= [1,2,3]
>>> t= [1,2,3]
>>> print (q is not e)
True
>>> print (x is y)
False
>>> print (r is t)
False
 Membership operators( in, not in): Membership operators are used to test if a value is
present in a sequence (like a string, list, or tuple).
>>> x= 'HELLO'
>>> G={1:"A", 2:"B"}
>>> print ("H" in x)
True
>>> print ("HELLO" not in x)
False
VARIABLES:
 Global variables: Variables defined outside of any function, at the highest level of the
program, have global scope. They can be accessed from anywhere within the program, both
inside and outside functions
>>> def f():
global p
print(p)
p = "hello"
>>> p ="world"
>>> f()
world
>>>
 Local variables: Variables defined inside a function have local scope. They are only
accessible within that function. When the function execution completes, the local variables
are destroyed.
>>> def f():
s =" hello"
print (s)
>>> f()
Hello
 Instance Variables (Attributes):
Instance variables are specific to instances (objects) of a class. They are defined within the
class but outside any method. Each instance of the class has its own copy of instance
variables.
TABLES: Certainly, a "table" in Python generally refers to a data structure that stores information
in rows and columns, much like a spreadsheet or a database table. One of the most commonly used
types of tables in Python is a list of dictionaries, where each dictionary represents a row and
contains key-value pairs for each column.
import pandas as pd
data = {'Name': ['John', 'Jane', 'Alice'],
'Age': [25, 30, 22]}
df = pd.DataFrame(data)
avg_age = df['Age'].mean()
print(df)
print("Average Age:", avg_age)
Output:
Name Age
0 John 25
1 Jane 30
2 Alice 22
Average Age: 25.666666666666668
LOOPS :
Python has for and while loops for iteration.
>>> Names =[ "justin", "Taylor", "pierre"]
>>> for Names in Names:
print(Names)
justin
Taylor
pierre
FUNCTIONS;
Functions are reusable blocks of code. They are defined using the def keyword.
>>> fname ="paul"
>>> lname ="jean"
>>> def my_function(fname, lname):
print(fname, lname)
>>>
>>> print(fname+" "+ lname)
Paul jean
CLASSES: A class is defined using the class keyword, followed by the class name and a colon.
Inside the class, you can define attributes and methods.
>>> class Student:
s_id=20209080
>>> stud1= Student()
>>> stud2 = Student()
>>> stud1.studid= 987
>>> print(f"Student ID: {stud1.studid}")
Student ID: 987
WIDGETS: Python libraries and frameworks provide widgets for creating interactive applications.
Tkinter is a commonly used built-in library for creating simple GUI applications, while libraries
like PyQt and PyGTK offer more advanced features and a wider range of widgets.
>>> import tkinter as tk
>>> window = tk.Tk()
>>> window.title("My Tkinter Window")
''
>>> window.mainloop()
LIBRARIES;
NumPy:
Description: A library for numerical computations with support for large, multi-dimensional arrays
and matrices.
import numpy as np
array = np.array([1, 2, 3])
print(array)
Pandas:
Description: A powerful library for data manipulation and analysis, providing data structures like
DataFrame for tabular data.
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
Django:
Description: A high-level web framework for building robust and scalable web applications.
from django.http import HttpResponse
def hello(request):
return HttpResponse("Hello, Django!")
Python_fetching_scrapping:
1. Auto Export Data into Excel from SQL using Python Pyodbc:
Automate the process of extracting data from an SQL database and exporting it into an Excel file
using Python and the Pyodbc library. Pyodbc is used for connecting to SQL databases from Python.
Python SQL Automation:
This phrase emphasizes that you want to automate SQL-related tasks using Python. Python is a
versatile programming language often used for tasks like data manipulation, analysis, and
automation.
Task Scheduler:
Task Scheduler is a utility in Windows that allows you to schedule and automate various tasks on
your computer. In this context, you might want to schedule the Python script to run at specific times
or intervals automatically.
Example:
import pandas as pd
import pyodbc
# Database connection settings
server = 'your_server_name'
database = 'mydb'
username = 'your_username'
password = 'your_password'
# Create a connection to the SQL Server database
conn = pyodbc.connect(f'DRIVER={{SQL
Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}')
# SQL query to fetch data from the database
sql_query = 'SELECT * FROM mytable'
# Use Pandas to read data from SQL into a DataFrame
df = pd.read_sql_query(sql_query, conn)
# Close the database connection
conn.close()
# Export the DataFrame to an Excel file
excel_file = 'output_data.xlsx'
df.to_excel(excel_file, index=False)
print(f'Data exported to {excel_file}')
2. Extract text links images tables from Pdf with Python PyMuPDF PyPdf PdfPlumber tutorial:
Text Extraction with PyMuPDF:
PyMuPDF (also known as Fitz) is a powerful library for working with PDF files. To extract
text from a PDF using PyMuPDF,
import fitz # PyMuPDF
pdf_document = "example.pdf"
# Open the PDF file
pdf = fitz.open(pdf_document)
# Iterate through pages and extract text
for page_num in range(pdf.page_count):
page = pdf[page_num]
text = page.get_text()
print(text)
# Close the PDF file
pdf.close()
Extracting Links with PyMuPDF:
PyMuPDF can be used to extract links from a PDF as well. Links are typically represented
as annotations.
import fitz
pdf_document = "example.pdf"
pdf = fitz.open(pdf_document)
for page_num in range(pdf.page_count):
page = pdf[page_num]
links = page.get_links()
for link in links:
print("Link:", link.get("uri"))
pdf.close()
Image Extraction with PyMuPDF:
You can also extract images from a PDF using PyMuPDF.
import fitz
pdf_document = "example.pdf"
pdf = fitz.open(pdf_document)
for page_num in range(pdf.page_count):
page = pdf[page_num]
images = page.get_images(full=True)
for img_index, img in enumerate(images):
xref = img[0]
base_image = pdf.extract_image(xref)
image_data = base_image["image"]
with open(f"image_{page_num}_{img_index}.png", "wb") as f:
f.write(image_data)
pdf.close()
Table Extraction with PdfPlumber:
PdfPlumber is an excellent library for extracting tables from PDFs.
import pdfplumber
pdf_document = "example.pdf"
with pdfplumber.open(pdf_document) as pdf:
for page in pdf.pages:
table = page.extract_table()
if table:
for row in table:
print(row)
3. Fetching all data from database using python Python with MySql
Install the MySQL Connector Library (if not already installed):
pip install mysql-connector-python
Import Required Libraries:
import mysql.connector
Establish a Connection to the MySQL Database:
connection = mysql.connector.connect(
host="your_host_name",
user="your_username",
password="your_password",
database="your_database_name"
)
Create a Cursor Object:
cursor = connection.cursor()
Execute SQL Query to Fetch Data:
query = "SELECT * FROM your_table_name"
cursor.execute(query)
Fetch Data and Process It:
After executing the query, you can fetch the data using methods like fetchall(), fetchone(), or
fetchmany():
all_data = cursor.fetchall()
for row in all_data:
# Process each row of data here
print(row)
Close the Cursor and Database Connection:
It's important to close the cursor and the database connection when you're done:
import mysql.connector
# Establish a connection to the MySQL database
connection = mysql.connector.connect(
host="your_host_name",
user="your_username",
password="your_password",
database="your_database_name"
)
# Create a cursor object
cursor = connection.cursor()
# Execute SQL query to fetch all data from a table
query = "SELECT * FROM your_table_name"
cursor.execute(query)
# Fetch all data and process it
all_data = cursor.fetchall()
for row in all_data:
# Process each row of data here (e.g., print it)
print(row)
# Close the cursor and database connection
cursor.close()
connection.close()
PACKAGES
Definition:
A package is a collection of modules organized into a directory hierarchy. It allows you to create a
structured and organized codebase by grouping related functionality together.
Creating Packages:
To create a package, you need to create a directory (folder) and place a special __init__.py file in it.
This file can be empty or contain initialization code for the package.
Subpackages:
Packages can contain subpackages, forming a hierarchical structure. Subpackages are simply
directories within the main package directory, each with its own __init__.py file.
Using Packages:
You can import modules from a package using dot notation (package.module). Importing from
subpackages follows the same pattern.
Example:
Let's say you have a project with a package named mypackage containing subpackages utils and
models. The directory structure would look like this:
from mypackage.utils.math_operations import add
from mypackage.models.user import User
result = add(5, 3)
user = User("Alice", 25)
SKETCH (SCHEMA - DRAWING) OF YOUR DEVELOPMENT
Input (Files, Databases) ====> Python Scraping API ====> Output (MySQL database)
PROGRAMS YOU WILL USE TO DEVELOP THE TOOLS OF THE PROJECT
Python IDLE, mysql, Django, jupterlite, command
Class for scrapping:
{
"metadata": {
"language_info": {
"codemirror_mode": "sql",
"file_extension": "",
"mimetype": "",
"name": "sql",
"version": "3.32.3"
},
"kernelspec": {
"name": "SQLite",
"display_name": "SQLite",
"language": "sql"
}
},
"nbformat_minor": 4,
"nbformat": 4,
"cells": [
{
"cell_type": "code",
"source": "from pymysql import connectnimport pandas as pd",
"metadata": {
"jupyter": {
"outputs_hidden": true
},
"tags": [],
"collapsed": true,
"trusted": true
},
"execution_count": 2,
"outputs": [
{
"ename": "Error",
"evalue": "Please load a database to perform operations",
"traceback": [
"Error: Please load a database to perform operations"
],
"output_type": "error"
}
]
},
{
"cell_type": "code",
"source": "data-base = connect(host = 'localhost'n user = 'root', n
passwd = n ncur =data_base.cursor()nnquer= "show databses"ndatabases=
cur.fetchall()nfor data in datbases:n print(data)n ",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "import mysql.connectornfrom openpyxl import load_workbooknimport PyPDF2
nimport os",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "mysql_config = {n 'host': 'localhost',n 'user': 'root',n 'password': '',n
'database': 'aiihcschema'n}",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "def connect_to_mysql():n return mysql.connector.connect(**mysql_config)",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "def extract_from_database(database_name):npass",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "def extract_from_spreadsheet(file_path):npass",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "def extract_from_files(self, file_path):npass",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "def process_and_store_data(self, data):npass",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": " def close_connections(self):n if self.db_connection:n
self.db_connection.close()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "info_extractor = InformationExtraction(mysql_config)
ninfo_extractor.connect_to_mysql()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "info_extractor.extract_from_database('aiihcschema')",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "info_extractor.extract_from_spreadsheet('path/to/your/spreadsheet.xlsx')",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "info_extractor.extract_from_files('path/to/your/file.pdf')
ninfo_extractor.extract_from_files('path/to/your/file.docx')
ninfo_extractor.extract_from_files('path/to/your/file.txt')",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "data_to_store = ninfo_extractor.process_and_store_data(data_to_store)n
ninfo_extractor.close_connections()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "mycusor.excute('CREATE TABLE extracted_data (n id INT
AUTO_INCREMENT PRIMARY KEY,n column1 VARCHAR(255),n column2 INT)')n ",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": "mycursor.execute('show tables')nfor x in mycursor:n print(x)n ",
"metadata": {},
"execution_count": null,
"outputs": []
}
]
}
SQL code for MySQL database:
CREATE DATABASE aiihcschema;
USE aiihcshema;
CREATE TABLE extracted_data (
id INT AUTO_INCREMENT PRIMARY KEY,
column1 VARCHAR(255),
column2 INT);
INSERT INTO extracted_data (column1, column2) VALUES ('value1', 123);
SELECT * FROM extracted_data;
UPDATE extracted_data SET column1 = 'new_value' WHERE id = 1;
DELETE FROM extracted_data WHERE id = 1;

More Related Content

Similar to Pres_python_talakhoury_26_09_2023.pdf

Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Yashpatel821746
 
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
Yashpatel821746
 
JLK Chapter 5 – Methods and ModularityDRAFT January 2015 Edition.docx
JLK Chapter 5 – Methods and ModularityDRAFT January 2015 Edition.docxJLK Chapter 5 – Methods and ModularityDRAFT January 2015 Edition.docx
JLK Chapter 5 – Methods and ModularityDRAFT January 2015 Edition.docx
vrickens
 
Functions2.pptx
Functions2.pptxFunctions2.pptx
Functions2.pptx
AkhilTyagi42
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptx
vishnupriyapm4
 
Python Lecture 4
Python Lecture 4Python Lecture 4
Python Lecture 4
Inzamam Baig
 
Functions.pdf
Functions.pdfFunctions.pdf
Functions.pdf
kailashGusain3
 
Functionscs12 ppt.pdf
Functionscs12 ppt.pdfFunctionscs12 ppt.pdf
Functionscs12 ppt.pdf
RiteshKumarPradhan1
 
cbse class 12 Python Functions2 for class 12 .pptx
cbse class 12 Python Functions2 for class 12 .pptxcbse class 12 Python Functions2 for class 12 .pptx
cbse class 12 Python Functions2 for class 12 .pptx
tcsonline1222
 
Fundamentals of functions in C program.pptx
Fundamentals of functions in C program.pptxFundamentals of functions in C program.pptx
Fundamentals of functions in C program.pptx
Chandrakant Divate
 
Functions
FunctionsFunctions
Functions
PralhadKhanal1
 
Functions_21_22.pdf
Functions_21_22.pdfFunctions_21_22.pdf
Functions_21_22.pdf
paijitk
 
Advanced Web Technology ass.pdf
Advanced Web Technology ass.pdfAdvanced Web Technology ass.pdf
Advanced Web Technology ass.pdf
simenehanmut
 
C++ manual Report Full
C++ manual Report FullC++ manual Report Full
C++ manual Report Full
Thesis Scientist Private Limited
 
python lab programs.pdf
python lab programs.pdfpython lab programs.pdf
python lab programs.pdf
CBJWorld
 
III MCS python lab (1).pdf
III MCS python lab (1).pdfIII MCS python lab (1).pdf
III MCS python lab (1).pdf
srxerox
 
Python Programming - II. The Basics
Python Programming - II. The BasicsPython Programming - II. The Basics
Python Programming - II. The Basics
Ranel Padon
 
PPt Revision of the basics of python1.pptx
PPt Revision of the basics of python1.pptxPPt Revision of the basics of python1.pptx
PPt Revision of the basics of python1.pptx
tcsonline1222
 
Objects and Graphics
Objects and GraphicsObjects and Graphics
Objects and Graphics
Edwin Flórez Gómez
 

Similar to Pres_python_talakhoury_26_09_2023.pdf (20)

Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
 
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
 
JLK Chapter 5 – Methods and ModularityDRAFT January 2015 Edition.docx
JLK Chapter 5 – Methods and ModularityDRAFT January 2015 Edition.docxJLK Chapter 5 – Methods and ModularityDRAFT January 2015 Edition.docx
JLK Chapter 5 – Methods and ModularityDRAFT January 2015 Edition.docx
 
Functions2.pptx
Functions2.pptxFunctions2.pptx
Functions2.pptx
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptx
 
Python Lecture 4
Python Lecture 4Python Lecture 4
Python Lecture 4
 
Functions.pdf
Functions.pdfFunctions.pdf
Functions.pdf
 
Functionscs12 ppt.pdf
Functionscs12 ppt.pdfFunctionscs12 ppt.pdf
Functionscs12 ppt.pdf
 
cbse class 12 Python Functions2 for class 12 .pptx
cbse class 12 Python Functions2 for class 12 .pptxcbse class 12 Python Functions2 for class 12 .pptx
cbse class 12 Python Functions2 for class 12 .pptx
 
Fundamentals of functions in C program.pptx
Fundamentals of functions in C program.pptxFundamentals of functions in C program.pptx
Fundamentals of functions in C program.pptx
 
Functions
FunctionsFunctions
Functions
 
Functions_21_22.pdf
Functions_21_22.pdfFunctions_21_22.pdf
Functions_21_22.pdf
 
Advanced Web Technology ass.pdf
Advanced Web Technology ass.pdfAdvanced Web Technology ass.pdf
Advanced Web Technology ass.pdf
 
C++ manual Report Full
C++ manual Report FullC++ manual Report Full
C++ manual Report Full
 
python lab programs.pdf
python lab programs.pdfpython lab programs.pdf
python lab programs.pdf
 
III MCS python lab (1).pdf
III MCS python lab (1).pdfIII MCS python lab (1).pdf
III MCS python lab (1).pdf
 
Python Programming - II. The Basics
Python Programming - II. The BasicsPython Programming - II. The Basics
Python Programming - II. The Basics
 
PPt Revision of the basics of python1.pptx
PPt Revision of the basics of python1.pptxPPt Revision of the basics of python1.pptx
PPt Revision of the basics of python1.pptx
 
Chapter04.pptx
Chapter04.pptxChapter04.pptx
Chapter04.pptx
 
Objects and Graphics
Objects and GraphicsObjects and Graphics
Objects and Graphics
 

Recently uploaded

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 

Recently uploaded (20)

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 

Pres_python_talakhoury_26_09_2023.pdf

  • 1. PRESENTATION: PYTHON TOPIC: DEVELOPMENT OF SCRAPPING API TALA KHOURY, Computer and Communication Engineering Student, Notre Dame University, Lebanon. INTERN WITH AIIHC INTERNATIONAL LTD. AUGUST-NOVEMBER 2023. SUPERVISOR: RAMZI EL FEGHALI DESCRIPTION WITH EXAMPLES: BASICS: Python has a clean and easy-to-understand syntax. Statements are terminated with line breaks. The print() function is used to display output. >>> print("Hello World") Hello World >>> Variables store data values. Python has various data types, including integers, floats, strings, and more. OPERATORS:  Arithmetic operators(+,-,*,/): These operators are used for basic arithmetic operations like addition, subtraction, multiplication, division, and modulus (remainder). >>> i= 10 >>> j=3 >>> print('sum: ', i+j) sum: 13 >>> print('subtraction: ', i-j) subtraction: 7 >>> print('multiplication: ', i*j) multiplication: 30 >>> print('division: ', i/j) division: 3.3333333333333335  Assignment operators( +=, -=, *=, /=): Assignment operators are used to assign values to variables. >>> a=10 >>> b=2 >>> a += b >>> print(a) 12 >>> a-=b >>> print(a) 10 >>> a*=b >>> print(a) 20 >>> a/=b >>> print(a) 10.0
  • 2.  Comparison operators(==, >,<,<=,>=, !=) Comparison operators are used to compare two values and return a Boolean result (True or False). >>> p=9 >>> k=8 >>> print ('a==b: ', a==b) a==b: False >>> print ('a>b; ', a>b) a>b; True >>> print ('a<b; ', a<b) a<b; False >>> print ('a<=b: ', a<=b) a<=b: False >>> print ('a>=b: ', a>=b) a>=b: True >>> print ('a!=b: ', a!=b) a!=b: True  Logical operators (and, or, not): Logical operators are used to combine and manipulate Boolean values. >>> s=9 >>> r=8 >>> print (s>8 and r>7) True >>> print (s>8 or r>7) True >>> print( not s<8) True  Identity operators( is, is not): Identity operators are used to compare the memory addresses of two objects >>> q= 4 >>> e= 3 >>> x= 'HI' >>> y= 'Hi' >>> r= [1,2,3] >>> t= [1,2,3] >>> print (q is not e) True >>> print (x is y) False >>> print (r is t) False  Membership operators( in, not in): Membership operators are used to test if a value is present in a sequence (like a string, list, or tuple). >>> x= 'HELLO' >>> G={1:"A", 2:"B"} >>> print ("H" in x) True >>> print ("HELLO" not in x) False
  • 3. VARIABLES:  Global variables: Variables defined outside of any function, at the highest level of the program, have global scope. They can be accessed from anywhere within the program, both inside and outside functions >>> def f(): global p print(p) p = "hello" >>> p ="world" >>> f() world >>>  Local variables: Variables defined inside a function have local scope. They are only accessible within that function. When the function execution completes, the local variables are destroyed. >>> def f(): s =" hello" print (s) >>> f() Hello  Instance Variables (Attributes): Instance variables are specific to instances (objects) of a class. They are defined within the class but outside any method. Each instance of the class has its own copy of instance variables. TABLES: Certainly, a "table" in Python generally refers to a data structure that stores information in rows and columns, much like a spreadsheet or a database table. One of the most commonly used types of tables in Python is a list of dictionaries, where each dictionary represents a row and contains key-value pairs for each column. import pandas as pd data = {'Name': ['John', 'Jane', 'Alice'], 'Age': [25, 30, 22]} df = pd.DataFrame(data) avg_age = df['Age'].mean() print(df) print("Average Age:", avg_age) Output: Name Age 0 John 25 1 Jane 30 2 Alice 22 Average Age: 25.666666666666668 LOOPS : Python has for and while loops for iteration. >>> Names =[ "justin", "Taylor", "pierre"] >>> for Names in Names: print(Names)
  • 4. justin Taylor pierre FUNCTIONS; Functions are reusable blocks of code. They are defined using the def keyword. >>> fname ="paul" >>> lname ="jean" >>> def my_function(fname, lname): print(fname, lname) >>> >>> print(fname+" "+ lname) Paul jean CLASSES: A class is defined using the class keyword, followed by the class name and a colon. Inside the class, you can define attributes and methods. >>> class Student: s_id=20209080 >>> stud1= Student() >>> stud2 = Student() >>> stud1.studid= 987 >>> print(f"Student ID: {stud1.studid}") Student ID: 987 WIDGETS: Python libraries and frameworks provide widgets for creating interactive applications. Tkinter is a commonly used built-in library for creating simple GUI applications, while libraries like PyQt and PyGTK offer more advanced features and a wider range of widgets. >>> import tkinter as tk >>> window = tk.Tk() >>> window.title("My Tkinter Window") '' >>> window.mainloop() LIBRARIES; NumPy: Description: A library for numerical computations with support for large, multi-dimensional arrays and matrices. import numpy as np array = np.array([1, 2, 3]) print(array) Pandas: Description: A powerful library for data manipulation and analysis, providing data structures like DataFrame for tabular data. import pandas as pd data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
  • 5. df = pd.DataFrame(data) print(df) Django: Description: A high-level web framework for building robust and scalable web applications. from django.http import HttpResponse def hello(request): return HttpResponse("Hello, Django!") Python_fetching_scrapping: 1. Auto Export Data into Excel from SQL using Python Pyodbc: Automate the process of extracting data from an SQL database and exporting it into an Excel file using Python and the Pyodbc library. Pyodbc is used for connecting to SQL databases from Python. Python SQL Automation: This phrase emphasizes that you want to automate SQL-related tasks using Python. Python is a versatile programming language often used for tasks like data manipulation, analysis, and automation. Task Scheduler: Task Scheduler is a utility in Windows that allows you to schedule and automate various tasks on your computer. In this context, you might want to schedule the Python script to run at specific times or intervals automatically. Example: import pandas as pd import pyodbc # Database connection settings server = 'your_server_name' database = 'mydb' username = 'your_username' password = 'your_password' # Create a connection to the SQL Server database conn = pyodbc.connect(f'DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}') # SQL query to fetch data from the database sql_query = 'SELECT * FROM mytable' # Use Pandas to read data from SQL into a DataFrame df = pd.read_sql_query(sql_query, conn) # Close the database connection conn.close() # Export the DataFrame to an Excel file excel_file = 'output_data.xlsx'
  • 6. df.to_excel(excel_file, index=False) print(f'Data exported to {excel_file}') 2. Extract text links images tables from Pdf with Python PyMuPDF PyPdf PdfPlumber tutorial: Text Extraction with PyMuPDF: PyMuPDF (also known as Fitz) is a powerful library for working with PDF files. To extract text from a PDF using PyMuPDF, import fitz # PyMuPDF pdf_document = "example.pdf" # Open the PDF file pdf = fitz.open(pdf_document) # Iterate through pages and extract text for page_num in range(pdf.page_count): page = pdf[page_num] text = page.get_text() print(text) # Close the PDF file pdf.close() Extracting Links with PyMuPDF: PyMuPDF can be used to extract links from a PDF as well. Links are typically represented as annotations. import fitz pdf_document = "example.pdf" pdf = fitz.open(pdf_document) for page_num in range(pdf.page_count): page = pdf[page_num] links = page.get_links() for link in links: print("Link:", link.get("uri")) pdf.close() Image Extraction with PyMuPDF: You can also extract images from a PDF using PyMuPDF. import fitz pdf_document = "example.pdf"
  • 7. pdf = fitz.open(pdf_document) for page_num in range(pdf.page_count): page = pdf[page_num] images = page.get_images(full=True) for img_index, img in enumerate(images): xref = img[0] base_image = pdf.extract_image(xref) image_data = base_image["image"] with open(f"image_{page_num}_{img_index}.png", "wb") as f: f.write(image_data) pdf.close() Table Extraction with PdfPlumber: PdfPlumber is an excellent library for extracting tables from PDFs. import pdfplumber pdf_document = "example.pdf" with pdfplumber.open(pdf_document) as pdf: for page in pdf.pages: table = page.extract_table() if table: for row in table: print(row) 3. Fetching all data from database using python Python with MySql Install the MySQL Connector Library (if not already installed): pip install mysql-connector-python Import Required Libraries: import mysql.connector Establish a Connection to the MySQL Database: connection = mysql.connector.connect( host="your_host_name", user="your_username", password="your_password", database="your_database_name" ) Create a Cursor Object: cursor = connection.cursor() Execute SQL Query to Fetch Data: query = "SELECT * FROM your_table_name" cursor.execute(query) Fetch Data and Process It:
  • 8. After executing the query, you can fetch the data using methods like fetchall(), fetchone(), or fetchmany(): all_data = cursor.fetchall() for row in all_data: # Process each row of data here print(row) Close the Cursor and Database Connection: It's important to close the cursor and the database connection when you're done: import mysql.connector # Establish a connection to the MySQL database connection = mysql.connector.connect( host="your_host_name", user="your_username", password="your_password", database="your_database_name" ) # Create a cursor object cursor = connection.cursor() # Execute SQL query to fetch all data from a table query = "SELECT * FROM your_table_name" cursor.execute(query) # Fetch all data and process it all_data = cursor.fetchall() for row in all_data: # Process each row of data here (e.g., print it) print(row) # Close the cursor and database connection cursor.close() connection.close() PACKAGES Definition: A package is a collection of modules organized into a directory hierarchy. It allows you to create a structured and organized codebase by grouping related functionality together. Creating Packages: To create a package, you need to create a directory (folder) and place a special __init__.py file in it. This file can be empty or contain initialization code for the package. Subpackages:
  • 9. Packages can contain subpackages, forming a hierarchical structure. Subpackages are simply directories within the main package directory, each with its own __init__.py file. Using Packages: You can import modules from a package using dot notation (package.module). Importing from subpackages follows the same pattern. Example: Let's say you have a project with a package named mypackage containing subpackages utils and models. The directory structure would look like this: from mypackage.utils.math_operations import add from mypackage.models.user import User result = add(5, 3) user = User("Alice", 25) SKETCH (SCHEMA - DRAWING) OF YOUR DEVELOPMENT Input (Files, Databases) ====> Python Scraping API ====> Output (MySQL database) PROGRAMS YOU WILL USE TO DEVELOP THE TOOLS OF THE PROJECT Python IDLE, mysql, Django, jupterlite, command
  • 10.
  • 11. Class for scrapping: { "metadata": { "language_info": { "codemirror_mode": "sql", "file_extension": "", "mimetype": "", "name": "sql", "version": "3.32.3" }, "kernelspec": { "name": "SQLite", "display_name": "SQLite", "language": "sql" } },
  • 12. "nbformat_minor": 4, "nbformat": 4, "cells": [ { "cell_type": "code", "source": "from pymysql import connectnimport pandas as pd", "metadata": { "jupyter": { "outputs_hidden": true }, "tags": [], "collapsed": true, "trusted": true }, "execution_count": 2, "outputs": [ { "ename": "Error", "evalue": "Please load a database to perform operations", "traceback": [ "Error: Please load a database to perform operations" ], "output_type": "error" } ] }, { "cell_type": "code", "source": "data-base = connect(host = 'localhost'n user = 'root', n passwd = n ncur =data_base.cursor()nnquer= "show databses"ndatabases= cur.fetchall()nfor data in datbases:n print(data)n ", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "import mysql.connectornfrom openpyxl import load_workbooknimport PyPDF2 nimport os", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "mysql_config = {n 'host': 'localhost',n 'user': 'root',n 'password': '',n 'database': 'aiihcschema'n}", "metadata": {}, "execution_count": null,
  • 13. "outputs": [] }, { "cell_type": "code", "source": "def connect_to_mysql():n return mysql.connector.connect(**mysql_config)", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "def extract_from_database(database_name):npass", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "def extract_from_spreadsheet(file_path):npass", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "def extract_from_files(self, file_path):npass", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "def process_and_store_data(self, data):npass", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": " def close_connections(self):n if self.db_connection:n self.db_connection.close()", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "info_extractor = InformationExtraction(mysql_config) ninfo_extractor.connect_to_mysql()",
  • 14. "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "info_extractor.extract_from_database('aiihcschema')", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "info_extractor.extract_from_spreadsheet('path/to/your/spreadsheet.xlsx')", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "info_extractor.extract_from_files('path/to/your/file.pdf') ninfo_extractor.extract_from_files('path/to/your/file.docx') ninfo_extractor.extract_from_files('path/to/your/file.txt')", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "data_to_store = ninfo_extractor.process_and_store_data(data_to_store)n ninfo_extractor.close_connections()", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "mycusor.excute('CREATE TABLE extracted_data (n id INT AUTO_INCREMENT PRIMARY KEY,n column1 VARCHAR(255),n column2 INT)')n ", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "mycursor.execute('show tables')nfor x in mycursor:n print(x)n ", "metadata": {}, "execution_count": null, "outputs": []
  • 15. } ] } SQL code for MySQL database: CREATE DATABASE aiihcschema; USE aiihcshema; CREATE TABLE extracted_data ( id INT AUTO_INCREMENT PRIMARY KEY, column1 VARCHAR(255), column2 INT); INSERT INTO extracted_data (column1, column2) VALUES ('value1', 123); SELECT * FROM extracted_data; UPDATE extracted_data SET column1 = 'new_value' WHERE id = 1; DELETE FROM extracted_data WHERE id = 1;