Visvesvaraya Technological University
“jnana Sangam” Belagavi Karnataka
India
SECAB Institute of Engineering & Technology, Vijayapur
Department Of Master Of Computer Applications 2022-2023
A Seminar On
Web Scraping And Numerical Analysis
By
Course Co-Ordinator
Prof.Nazeera Madabhavi
Mohammad Azeem Maniyar 2SA22MC013
Web Scraping
 Web scraping in Python is a technique used to
extract data from websites. It's a valuable skill in
data analytics as it allows you to collect large
amounts of data from the web for analysis
 Beautiful Soup
 lxml
 Requests
 Scrapy
 Selenium
 html5lib
Python libraries are commonly used.
Here are some most Popular libraries
Parsing XML with lxml.objectify
<?xml version="1.0" encoding="UTF-8"?>
<root>
<room>
<n35237 type="number">1.0</n35237>
<n32238 type="number">3.0</n32238>
<n44699 type="number">nan</n44699>
</room>
<price>
<n35237 type="number">7020000.0</n35237>
<n32238 type="number">10000000.0</n32238>
<n44699 type="number">4128000.0</n44699>
</price>
<property_id>
<n35237 type="number">35237.0</n35237>
<n32238 type="number">32238.0</n32238>
<n44699 type="number">44699.0</n44699>
</property_id>
</root>
Program
from lxml import objectify
import pandas as pd
# Parse XML data
xml_data = objectify.parse('properties.xml')
root = xml_data.getroot() # Root element
# Extract data and column names
data = []
cols = []
for child in root.getchildren():
data.append([subchild.text for subchild in child.getchildren()])
cols.append(child.tag)
# Create DataFrame
df = pd.DataFrame(data).T # Create DataFrame and transpose it
# Set column names
df.columns = cols
# Print DataFrame
print(df)
Output
Python Seminar of Data analytics using python

Python Seminar of Data analytics using python

  • 1.
    Visvesvaraya Technological University “jnanaSangam” Belagavi Karnataka India SECAB Institute of Engineering & Technology, Vijayapur Department Of Master Of Computer Applications 2022-2023 A Seminar On Web Scraping And Numerical Analysis By Course Co-Ordinator Prof.Nazeera Madabhavi Mohammad Azeem Maniyar 2SA22MC013
  • 2.
    Web Scraping  Webscraping in Python is a technique used to extract data from websites. It's a valuable skill in data analytics as it allows you to collect large amounts of data from the web for analysis
  • 3.
     Beautiful Soup lxml  Requests  Scrapy  Selenium  html5lib Python libraries are commonly used. Here are some most Popular libraries
  • 4.
    Parsing XML withlxml.objectify <?xml version="1.0" encoding="UTF-8"?> <root> <room> <n35237 type="number">1.0</n35237> <n32238 type="number">3.0</n32238> <n44699 type="number">nan</n44699> </room> <price> <n35237 type="number">7020000.0</n35237> <n32238 type="number">10000000.0</n32238> <n44699 type="number">4128000.0</n44699> </price> <property_id> <n35237 type="number">35237.0</n35237> <n32238 type="number">32238.0</n32238> <n44699 type="number">44699.0</n44699> </property_id> </root>
  • 5.
    Program from lxml importobjectify import pandas as pd # Parse XML data xml_data = objectify.parse('properties.xml') root = xml_data.getroot() # Root element # Extract data and column names data = [] cols = [] for child in root.getchildren(): data.append([subchild.text for subchild in child.getchildren()]) cols.append(child.tag) # Create DataFrame df = pd.DataFrame(data).T # Create DataFrame and transpose it # Set column names df.columns = cols # Print DataFrame print(df)
  • 6.