Pres_python_talakhoury_26_09_2023.pdf

PRESENTATION: PYTHON TOPIC: DEVELOPMENT OF SCRAPPING API
TALA KHOURY, Computer and Communication Engineering Student, Notre Dame University,
Lebanon. INTERN WITH AIIHC INTERNATIONAL LTD. AUGUST-NOVEMBER 2023.
SUPERVISOR: RAMZI EL FEGHALI
DESCRIPTION WITH EXAMPLES:
BASICS:
Python has a clean and easy-to-understand syntax. Statements are terminated with line breaks. The
print() function is used to display output.
>>> print("Hello World")
Hello World
>>>
Variables store data values. Python has various data types, including integers, floats, strings, and
more.
OPERATORS:
 Arithmetic operators(+,-,*,/): These operators are used for basic arithmetic operations like
addition, subtraction, multiplication, division, and modulus (remainder).
>>> i= 10
>>> j=3
>>> print('sum: ', i+j)
sum: 13
>>> print('subtraction: ', i-j)
subtraction: 7
>>> print('multiplication: ', i*j)
multiplication: 30
>>> print('division: ', i/j)
division: 3.3333333333333335
 Assignment operators( +=, -=, *=, /=): Assignment operators are used to assign values to
variables.
>>> a=10
>>> b=2
>>> a += b
>>> print(a)
12
>>> a-=b
>>> print(a)
10
>>> a*=b
>>> print(a)
20
>>> a/=b
>>> print(a)
10.0

 Comparison operators(==, >,<,<=,>=, !=) Comparison operators are used to compare two
values and return a Boolean result (True or False).
>>> p=9
>>> k=8
>>> print ('a==b: ', a==b)
a==b: False
>>> print ('a>b; ', a>b)
a>b; True
>>> print ('a<b; ', a<b)
a<b; False
>>> print ('a<=b: ', a<=b)
a<=b: False
>>> print ('a>=b: ', a>=b)
a>=b: True
>>> print ('a!=b: ', a!=b)
a!=b: True
 Logical operators (and, or, not): Logical operators are used to combine and manipulate
Boolean values.
>>> s=9
>>> r=8
>>> print (s>8 and r>7)
True
>>> print (s>8 or r>7)
True
>>> print( not s<8)
True
 Identity operators( is, is not): Identity operators are used to compare the memory addresses
of two objects
>>> q= 4
>>> e= 3
>>> x= 'HI'
>>> y= 'Hi'
>>> r= [1,2,3]
>>> t= [1,2,3]
>>> print (q is not e)
True
>>> print (x is y)
False
>>> print (r is t)
False
 Membership operators( in, not in): Membership operators are used to test if a value is
present in a sequence (like a string, list, or tuple).
>>> x= 'HELLO'
>>> G={1:"A", 2:"B"}
>>> print ("H" in x)
True
>>> print ("HELLO" not in x)
False

VARIABLES:
 Global variables: Variables defined outside of any function, at the highest level of the
program, have global scope. They can be accessed from anywhere within the program, both
inside and outside functions
>>> def f():
global p
print(p)
p = "hello"
>>> p ="world"
>>> f()
world
>>>
 Local variables: Variables defined inside a function have local scope. They are only
accessible within that function. When the function execution completes, the local variables
are destroyed.
>>> def f():
s =" hello"
print (s)
>>> f()
Hello
 Instance Variables (Attributes):
Instance variables are specific to instances (objects) of a class. They are defined within the
class but outside any method. Each instance of the class has its own copy of instance
variables.
TABLES: Certainly, a "table" in Python generally refers to a data structure that stores information
in rows and columns, much like a spreadsheet or a database table. One of the most commonly used
types of tables in Python is a list of dictionaries, where each dictionary represents a row and
contains key-value pairs for each column.
import pandas as pd
data = {'Name': ['John', 'Jane', 'Alice'],
'Age': [25, 30, 22]}
df = pd.DataFrame(data)
avg_age = df['Age'].mean()
print(df)
print("Average Age:", avg_age)
Output:
Name Age
0 John 25
1 Jane 30
2 Alice 22
Average Age: 25.666666666666668
LOOPS :
Python has for and while loops for iteration.
>>> Names =[ "justin", "Taylor", "pierre"]
>>> for Names in Names:
print(Names)

justin
Taylor
pierre
FUNCTIONS;
Functions are reusable blocks of code. They are defined using the def keyword.
>>> fname ="paul"
>>> lname ="jean"
>>> def my_function(fname, lname):
print(fname, lname)
>>>
>>> print(fname+" "+ lname)
Paul jean
CLASSES: A class is defined using the class keyword, followed by the class name and a colon.
Inside the class, you can define attributes and methods.
>>> class Student:
s_id=20209080
>>> stud1= Student()
>>> stud2 = Student()
>>> stud1.studid= 987
>>> print(f"Student ID: {stud1.studid}")
Student ID: 987
WIDGETS: Python libraries and frameworks provide widgets for creating interactive applications.
Tkinter is a commonly used built-in library for creating simple GUI applications, while libraries
like PyQt and PyGTK offer more advanced features and a wider range of widgets.
>>> import tkinter as tk
>>> window = tk.Tk()
>>> window.title("My Tkinter Window")
''
>>> window.mainloop()
LIBRARIES;
NumPy:
Description: A library for numerical computations with support for large, multi-dimensional arrays
and matrices.
import numpy as np
array = np.array([1, 2, 3])
print(array)
Pandas:
Description: A powerful library for data manipulation and analysis, providing data structures like
DataFrame for tabular data.
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}

df = pd.DataFrame(data)
print(df)
Django:
Description: A high-level web framework for building robust and scalable web applications.
from django.http import HttpResponse
def hello(request):
return HttpResponse("Hello, Django!")
Python_fetching_scrapping:
1. Auto Export Data into Excel from SQL using Python Pyodbc:
Automate the process of extracting data from an SQL database and exporting it into an Excel file
using Python and the Pyodbc library. Pyodbc is used for connecting to SQL databases from Python.
Python SQL Automation:
This phrase emphasizes that you want to automate SQL-related tasks using Python. Python is a
versatile programming language often used for tasks like data manipulation, analysis, and
automation.
Task Scheduler:
Task Scheduler is a utility in Windows that allows you to schedule and automate various tasks on
your computer. In this context, you might want to schedule the Python script to run at specific times
or intervals automatically.
Example:
import pandas as pd
import pyodbc
# Database connection settings
server = 'your_server_name'
database = 'mydb'
username = 'your_username'
password = 'your_password'
# Create a connection to the SQL Server database
conn = pyodbc.connect(f'DRIVER={{SQL
Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}')
# SQL query to fetch data from the database
sql_query = 'SELECT * FROM mytable'
# Use Pandas to read data from SQL into a DataFrame
df = pd.read_sql_query(sql_query, conn)
# Close the database connection
conn.close()
# Export the DataFrame to an Excel file
excel_file = 'output_data.xlsx'

df.to_excel(excel_file, index=False)
print(f'Data exported to {excel_file}')
2. Extract text links images tables from Pdf with Python PyMuPDF PyPdf PdfPlumber tutorial:
Text Extraction with PyMuPDF:
PyMuPDF (also known as Fitz) is a powerful library for working with PDF files. To extract
text from a PDF using PyMuPDF,
import fitz # PyMuPDF
pdf_document = "example.pdf"
# Open the PDF file
pdf = fitz.open(pdf_document)
# Iterate through pages and extract text
for page_num in range(pdf.page_count):
page = pdf[page_num]
text = page.get_text()
print(text)
# Close the PDF file
pdf.close()
Extracting Links with PyMuPDF:
PyMuPDF can be used to extract links from a PDF as well. Links are typically represented
as annotations.
import fitz
links = page.get_links()
for link in links:
print("Link:", link.get("uri"))
pdf.close()
Image Extraction with PyMuPDF:
You can also extract images from a PDF using PyMuPDF.
import fitz

images = page.get_images(full=True)
for img_index, img in enumerate(images):
xref = img[0]
base_image = pdf.extract_image(xref)
image_data = base_image["image"]
with open(f"image_{page_num}_{img_index}.png", "wb") as f:
f.write(image_data)
pdf.close()
Table Extraction with PdfPlumber:
PdfPlumber is an excellent library for extracting tables from PDFs.
import pdfplumber
with pdfplumber.open(pdf_document) as pdf:
for page in pdf.pages:
table = page.extract_table()
if table:
for row in table:
print(row)
3. Fetching all data from database using python Python with MySql
Install the MySQL Connector Library (if not already installed):
pip install mysql-connector-python
Import Required Libraries:
import mysql.connector
Establish a Connection to the MySQL Database:
connection = mysql.connector.connect(
host="your_host_name",
user="your_username",
password="your_password",
database="your_database_name"
)
Create a Cursor Object:
cursor = connection.cursor()
Execute SQL Query to Fetch Data:
query = "SELECT * FROM your_table_name"
cursor.execute(query)
Fetch Data and Process It:

After executing the query, you can fetch the data using methods like fetchall(), fetchone(), or
fetchmany():
all_data = cursor.fetchall()
for row in all_data:
# Process each row of data here
print(row)
Close the Cursor and Database Connection:
It's important to close the cursor and the database connection when you're done:
import mysql.connector
# Establish a connection to the MySQL database
connection = mysql.connector.connect(
host="your_host_name",
user="your_username",
password="your_password",
database="your_database_name"
)
# Create a cursor object
cursor = connection.cursor()
# Execute SQL query to fetch all data from a table
query = "SELECT * FROM your_table_name"
cursor.execute(query)
# Fetch all data and process it
all_data = cursor.fetchall()
for row in all_data:
# Process each row of data here (e.g., print it)
print(row)
# Close the cursor and database connection
cursor.close()
connection.close()
PACKAGES
Definition:
A package is a collection of modules organized into a directory hierarchy. It allows you to create a
structured and organized codebase by grouping related functionality together.
Creating Packages:
To create a package, you need to create a directory (folder) and place a special __init__.py file in it.
This file can be empty or contain initialization code for the package.
Subpackages:

Packages can contain subpackages, forming a hierarchical structure. Subpackages are simply
directories within the main package directory, each with its own __init__.py file.
Using Packages:
You can import modules from a package using dot notation (package.module). Importing from
subpackages follows the same pattern.
Example:
Let's say you have a project with a package named mypackage containing subpackages utils and
models. The directory structure would look like this:
from mypackage.utils.math_operations import add
from mypackage.models.user import User
result = add(5, 3)
user = User("Alice", 25)
SKETCH (SCHEMA - DRAWING) OF YOUR DEVELOPMENT
Input (Files, Databases) ====> Python Scraping API ====> Output (MySQL database)
PROGRAMS YOU WILL USE TO DEVELOP THE TOOLS OF THE PROJECT
Python IDLE, mysql, Django, jupterlite, command

Class for scrapping:
{
"metadata": {
"language_info": {
"codemirror_mode": "sql",
"file_extension": "",
"mimetype": "",
"name": "sql",
"version": "3.32.3"
},
"kernelspec": {
"name": "SQLite",
"display_name": "SQLite",
"language": "sql"
}
},

"nbformat_minor": 4,
"nbformat": 4,
"cells": [
{
"cell_type": "code",
"source": "from pymysql import connectnimport pandas as pd",
"metadata": {
"jupyter": {
"outputs_hidden": true
},
"tags": [],
"collapsed": true,
"trusted": true
},
"execution_count": 2,
"outputs": [
{
"ename": "Error",
"evalue": "Please load a database to perform operations",
"traceback": [
"Error: Please load a database to perform operations"
],
"output_type": "error"
}
]
},
{
"source": "data-base = connect(host = 'localhost'n user = 'root', n
passwd = n ncur =data_base.cursor()nnquer= "show databses"ndatabases=
cur.fetchall()nfor data in datbases:n print(data)n ",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"source": "import mysql.connectornfrom openpyxl import load_workbooknimport PyPDF2
nimport os",
"metadata": {},
"outputs": []
},
{
"source": "mysql_config = {n 'host': 'localhost',n 'user': 'root',n 'password': '',n
'database': 'aiihcschema'n}",
"metadata": {},

"outputs": []
},
{
"source": "def connect_to_mysql():n return mysql.connector.connect(**mysql_config)",
"metadata": {},
"outputs": []
},
{
"source": "def extract_from_database(database_name):npass",
"metadata": {},
"outputs": []
},
{
"source": "def extract_from_spreadsheet(file_path):npass",
"metadata": {},
"outputs": []
},
{
"source": "def extract_from_files(self, file_path):npass",
"metadata": {},
"outputs": []
},
{
"source": "def process_and_store_data(self, data):npass",
"metadata": {},
"outputs": []
},
{
"source": " def close_connections(self):n if self.db_connection:n
self.db_connection.close()",
"metadata": {},
"outputs": []
},
{
"source": "info_extractor = InformationExtraction(mysql_config)
ninfo_extractor.connect_to_mysql()",

"metadata": {},
"outputs": []
},
{
"source": "info_extractor.extract_from_database('aiihcschema')",
"metadata": {},
"outputs": []
},
{
"source": "info_extractor.extract_from_spreadsheet('path/to/your/spreadsheet.xlsx')",
"metadata": {},
"outputs": []
},
{
"source": "info_extractor.extract_from_files('path/to/your/file.pdf')
ninfo_extractor.extract_from_files('path/to/your/file.docx')
ninfo_extractor.extract_from_files('path/to/your/file.txt')",
"metadata": {},
"outputs": []
},
{
"source": "data_to_store = ninfo_extractor.process_and_store_data(data_to_store)n
ninfo_extractor.close_connections()",
"metadata": {},
"outputs": []
},
{
"source": "mycusor.excute('CREATE TABLE extracted_data (n id INT
AUTO_INCREMENT PRIMARY KEY,n column1 VARCHAR(255),n column2 INT)')n ",
"metadata": {},
"outputs": []
},
{
"source": "mycursor.execute('show tables')nfor x in mycursor:n print(x)n ",
"metadata": {},
"outputs": []

}
]
}
SQL code for MySQL database:
CREATE DATABASE aiihcschema;
USE aiihcshema;
CREATE TABLE extracted_data (
id INT AUTO_INCREMENT PRIMARY KEY,
column1 VARCHAR(255),
column2 INT);
INSERT INTO extracted_data (column1, column2) VALUES ('value1', 123);
SELECT * FROM extracted_data;
UPDATE extracted_data SET column1 = 'new_value' WHERE id = 1;
DELETE FROM extracted_data WHERE id = 1;

Pres_python_talakhoury_26_09_2023.pdf

Recommended

Recommended

More Related Content

Similar to Pres_python_talakhoury_26_09_2023.pdf

Similar to Pres_python_talakhoury_26_09_2023.pdf (20)

Recently uploaded

Recently uploaded (20)

Pres_python_talakhoury_26_09_2023.pdf