Advanced Programming (DS40108)_ Working with File Paths and Directories in Python.pptx

Advanced Programming (DS40108):
Working with File Paths and Directories in
Python
Level: 400
Credit: 2
Domain: Data Science
Instructor: Manjish Pal

What is a File path
● A file path tells you where a file or directory is located in your system
● Two types:
○ Absolute path: Starts from the root (C:Users..., /home/user/...)
○ Relative path: Relative to the current working directory
# Absolute path
"C:/Users/John/Documents/file.txt"
# Relative path
"./data/file.txt"

Working Directory
● Python starts running from a "current working directory"
Using os:
import os
print(os.getcwd())
Using pathlib:
from pathlib import Path
print(Path.cwd())
● Changing the Working Directory
os.chdir(path) – change the current working directory
os.chdir('C:/Users/John/Desktop')
print(os.getcwd()) # Check new location

Building File Paths
Using os.path.join():
folder = "data"
filename = "report.csv"
path = os.path.join(folder, filename)
print(path) # "data/report.csv"
Using pathlib:
p = Path("data") / "report.csv"
print(p) # data/report.csv
Using os.path.join():
folder = "data"
filename = "report.csv"
path = os.path.join(folder, filename)
print(path) # "data/report.csv"
Using pathlib:
p = Path("data") / "report.csv"
print(p) # data/report.csv

Directories
● Checking File/Directory Existence
os.path.exists() and pathlib.Path.exists()
# Using os
os.path.exists("data/report.csv")
# Using pathlib
Path("data/report.csv").exists()
● Creating Directories
# os
os.mkdir("my_folder")
# pathlib
Path("my_folder").mkdir()
- Use exist_ok=True to avoid errors if folder exists
Path("my_folder").mkdir(exist_ok=True)

Directories
● Creating Nested Directories
# With os
os.makedirs("projects/2025/reports")
# With pathlib
Path("projects/2025/reports").mkdir(parents=True, exist_ok=True)
parents=True creates all intermediate folders
● Listing Files in a Directory
# os
os.listdir(".")
# pathlib
list(Path(".").iterdir())
You can also filter files:
[p for p in Path(".").iterdir() if p.is_file()]

Directories
● File vs Directory Check
# os
os.path.isfile("example.txt")
os.path.isdir("folder")
# pathlib
p = Path("example.txt")
p.is_file()
p.is_dir()
● Deleting Files and Directories
# Deleting files
os.remove("old.txt")
Path("old.txt").unlink()
# Deleting empty directories
os.rmdir("empty_folder")
Path("empty_folder").rmdir()
For non-empty folders, use shutil.rmtree()

Directories
● Cross-Platform Paths
Use os.path or pathlib to avoid hardcoding path separators like / or
# Good
Path("data") / "file.csv"
# Bad
"datafile.csv" # Windows-only
● Temporary Directories
Use tempfile module when working with temporary files/folders
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
print("Temporary folder created at:", tmpdir)
FileNotFoundError Wrong path or missing file
PermissionError Lack of write permission
OSError Path doesn’t exist

Lab Activity
Problem 1: Move all .txt files from Downloads/ to TextFiles/
Problem 2: Create a folder Reports2025, create subfolders: Jan, Feb, Mar, create an empty file
summary.txt in each and finally list all .txt files in Reports2025
src = Path("Downloads")
dst = Path("TextFiles")
dst.mkdir(exist_ok=True)
for file in src.glob("*.txt"):
file.rename(dst / file.name)

solutions
import os
import shutil
# Define source and destination directories
source_dir = 'Downloads'
dest_dir = 'TextFiles'
# Create destination directory if it doesn't exist
os.makedirs(dest_dir, exist_ok=True)
# Iterate through all files in the source directory
for filename in os.listdir(source_dir):
if filename.endswith('.txt'):
src_path = os.path.join(source_dir, filename)
dest_path = os.path.join(dest_dir, filename)
# Move the file
shutil.move(src_path, dest_path)
print(f"Moved: {filename}")

Solutions (using os)
import os
# Step 1: Create main directory
main_folder = 'Reports2025'
os.makedirs(main_folder, exist_ok=True)
# Step 2: Create subfolders
subfolders = ['Jan', 'Feb', 'Mar']
for month in subfolders:
path = os.path.join(main_folder, month)
os.makedirs(path, exist_ok=True)

solutions
# Step 3: Create an empty summary.txt in each subfolder
summary_path = os.path.join(path, 'summary.txt')
open(summary_path, 'w').close() # Creates an empty file
# Step 4: List all .txt files in Reports2025
print("List of .txt files in Reports2025:")
for root, dirs, files in os.walk(main_folder):
for file in files:
if file.endswith('.txt'):
print(os.path.join(root, file))

Solutions (using Path)
# Step 1: Create main directory
main_folder = Path('Reports2025')
main_folder.mkdir(exist_ok=True)
# Step 2: Create subfolders
subfolders = ['Jan', 'Feb', 'Mar']
for month in subfolders:
month_folder = main_folder / month
month_folder.mkdir(exist_ok=True)
# Step 3: Create an empty summary.txt in each subfolder
summary_file = month_folder / 'summary.txt'
summary_file.touch(exist_ok=True) # Creates empty file if it doesn't exist
# Step 4: List all .txt files in Reports2025
print("List of .txt files in Reports2025:")
for txt_file in main_folder.rglob('*.txt'):
print(txt_file)

Multiprocessing and Threading in
Python
Level: 400
Credit: 2

Introduction to Concurrency and Parallelism

Concurrency & Parallelism
● Difference between concurrency and parallelism
Term Meaning Example
Concurrency Multiple tasks making
progress together
Switching between
downloads
Parallelism Tasks running at the same
time on cores
Downloading multiple files
at once

Threading in Python
● Why Use Threading?
- Useful for I/O-bound tasks like: Network requests, File reading/writing, User interaction
- More time efficient.
● Creating a Thread
import threading
def print_msg():
print("Hello from a thread!")
t = threading.Thread(target=print_msg)
t.start()
t.join()
- start() runs the thread
- join() waits for it to finish

Threading in Python
● Creating Multiple Threads:
import threading
def count():
for i in range(10000):
print(“Counting: ”, i)
# Launch two threads
t1 = threading.Thread(target=count)
t2 = threading.Thread(target=count)
t1.start()
t2.start()
t1.join()
t2.join()
● May see interleaved output (non-deterministic)

Problems with Threading in Python
● Race Condition : Several threads trying to access a common
variable can lead to unexpected results.

Race Condition
import threading
# Shared variable
counter = 0
def increment():
global counter
for _ in range(100000):
counter += 1
# Create multiple threads
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
# Start threads
t1.start()
t2.start()
# Wait for them to finish
t1.join()
t2.join()
print("Expected counter = 200000")
print("Actual counter = “,counter)

Solving Race Conditions
lock = threading.Lock()
def safe_increment():
global counter
for _ in range(100000):
with lock:
counter += 1
● Python’s Global Interpreter Lock (GIL) ensures that only one thread executes Python
bytecode at a time, but it does not guarantee atomicity of operations like counter += 1.
● While the GIL does not prevent race conditions on all operations, it can prevent race
conditions when you're working with atomic operations on built-in types (like append() on a list
or += on small integers under certain conditions) — but only in CPython, and it's still not
something we should rely on for correctness.

Threading in Python (Execution Time)
import time
def print_fib(number: int) -> None:
def fib(n: int) -> int:
if n == 1:
return 0
elif n == 2:
return 1
else:
return fib(n - 1) + fib(n - 2)
print(“fib(“,number,”) is”, fib(number))
start = time.time()
print_fib(40)
print_fib(41)
end = time.time()
print(“Completed in”, end – start, “seconds”)

import threading
import time
def print_fib(number: int) -> None:
def fib(n: int) -> int:
if n == 1:
return 0
elif n == 2:
return 1
else:
return fib(n - 1) + fib(n - 2)

def fibs_with_threads():
fortieth_thread = threading.Thread(target=print_fib, args=(40,))
forty_first_thread = threading.Thread(target=print_fib, args=(41,))
fortieth_thread.start()
forty_first_thread.start()
fortieth_thread.join()
forty_first_thread.join()
start_threads = time.time()
fibs_with_threads()
end_threads = time.time()
print(“Threads took”, end_threads - start_threads, “seconds.”)

Lab Activity
● Create 3 threads that each count down from a given number to 0,
with a delay of 1 second between prints.
● Simulate 3 threads that “download” different files (just sleep for a few
seconds) and print progress messages.

Solutions
import threading
import time
def countdown(n, name):
while n > 0:
print(name,”counting down:",n)
time.sleep(1)
n -= 1
print(f"{name} finished!")

Solutions
# Create and start threads
t1 = threading.Thread(target=countdown, args=(5, "Thread-A"))
t2 = threading.Thread(target=countdown, args=(3, "Thread-B"))
t3 = threading.Thread(target=countdown, args=(4, "Thread-C"))
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
print("All countdowns completed!")

Solutions
import threading
import time
def download_file(file_name, duration):
print(f"Starting download: {file_name}")
time.sleep(duration)
print(f"Finished download: {file_name}")

Solutions
# List of mock files with "download times"
files = [("file1.txt", 3), ("file2.jpg", 5), ("file3.mp4", 2),]
threads = []
for file_name, duration in files:
t = threading.Thread(target=download_file, args=(file_name, duration))
threads.append(t)
t.start()
for t in threads:
t.join()
print("All downloads completed!")

Multiprocessing Python
Level: 400
Credit: 2

Multiprocessing in Python
Why Use Multiprocessing?
● Bypasses the GIL
● Ideal for CPU-bound tasks like:
○ Image processing
○ Data crunching
○ Simulations

Comparison of Threads and Processes
Aspect Thread Process
Memory Shares memory with other
threads
Independent memory space
Speed Faster to start and less
overhead
Slower due to memory
isolation.
Use case I/O bound task CPU bound task
Crash Isolation If one thread crashes it
affects others
There is provision to avoid
other processes being
affected due to crash of one
process
Communication Shared Memory (faster but
risky-race conditions)
Use queues (slower but
safer)
GIL impact Affected by GIL Bypasses GIL

Processes
● Creating a Process
from multiprocessing import Process
def say_hi():
print("Hello from a process!")
p = Process(target=say_hi)
p.start()
p.join()
● Multiple Processes Example
def compute():
for _ in range(5):
print("Computing...")
p1 = Process(target=compute)
p2 = Process(target=compute)
p1.start()
p2.start()
p1.join()
p2.join()

Processes
● Process with Arguments
def square(n):
print("{n}^2 = “, n*n)
p = Process(target=square, args=(5,))
p.start()
● Using Process Pool
from multiprocessing import Pool
def square(x):
return x * x
with Pool(4) as pool:
results = pool.map(square, [1, 2, 3, 4])
print(results)
● Pool handles worker processes
● Automatically distributes workload

Processes
● Inter-Process Communication
Use multiprocessing.Queue or multiprocessing.Pipe
from multiprocessing import Queue
def producer(q):
q.put("data")
q = Queue()
p = Process(target=producer, args=(q,))
p.start()
print(q.get())
p.join()

Lab Activity
1. Write a function greet(name) that prints "Hello, <name>!". Create
threads for Alice, Bob, and Charlie.
2. Use multiprocessing.Pool to compute squares of a list of numbers.
3. Use 3 processes to "write" to different files (simulate using print and
time.sleep())

solns.
def square(x):
return x * x
if __name__ == "__main__":
with Pool(4) as pool:
results = pool.map(square, [1, 2, 3, 4, 5])
print("Squares:", results)

Solns
from multiprocessing import Process
import time
def write_file(file_name):
print(f"Writing to {file_name}...")
time.sleep(2)
print(f"Finished writing to {file_name}")

Solns
if __name__ == "__main__":
files = ["file1.txt", "file2.txt", "file3.txt"]
processes = []
for f in files:
p = Process(target=write_file, args=(f,))
processes.append(p)
p.start()
for p in processes:
p.join()
print("All files processed.")

Advanced Programming
(DS40108): Asynchronous I/O
Level: 400
Credit: 2

Introduction and Motivation
1. What is I/O?
● I/O (Input/Output) refers to communication between the computer
and the outside world (keyboard, disk, network, etc.).
● Examples: reading a file from disk, fetching data from a website,
sending a message to a server.
1. Why is I/O often slow?
● Most I/O operations involve waiting on external systems: Disk
read/write: mechanical delays, Network I/O: latency and server
delays
● CPU is idle while waiting for I/O to complete

Traditional Program Flow (Synchronous I/O)
1. Do Task A
2. Wait for I/O
3. Do Task B
Problem: Entire program blocks during I/O
Wasted time = Inefficiency
import time
def read_file():
print("Reading file...")
time.sleep(2)
print("File read complete!")
read_file()
print("Next task")

Asynchronous I/O
● I/O operations are non-blocking.
● Useful for handling multiple I/O-bound tasks efficiently.
● asyncio in Python
import asyncio
async def read_file():
print("Reading file...")
await asyncio.sleep(2)
print("File read complete!")
async def main():
await read_file()
print("Next task")
asyncio.run(main())

Comparison
Features Synchronous Asynchronous
Execution Blocking Non Blocking
Efficiency Less efficient for I/O More Efficient for I/O
Complexity Easy Requires event loop, async
wait
Use case Simple scripts Web servers, I/O heavy
apps

The Event Loop & Syntax
What is an Event Loop?
● Central controller of async programs
● Runs and manages all async tasks
async / await Syntax:
import asyncio
async def greet():
print("Hello")
print("World")
asyncio.run(greet())
async def defines an async function
await pauses function until result is ready
asyncio.run() starts the event loop and runs the coroutine

asyncio Basics
asyncio.sleep(): Simulates non-blocking delay
async def main():
print("Start")
print("End")
asyncio.run(main())

asyncio Basics
asyncio.gather() — Run in parallel
async def task(name, delay):
print("Starting”, name)
await asyncio.sleep(delay)
print(name,”done")
async def main():
await asyncio.gather(
task("A", 2),
task("B", 1)
)
asyncio.run(main())

asyncio Basics
● asyncio.create_task()
async def task(name, delay):
await asyncio.sleep(delay)
print(name, “done")
async def main():
t1 = asyncio.create_task(task("Task1", 2))
t2 = asyncio.create_task(task("Task2", 1))
await t1
await t2
asyncio.run(main())

GPU Computing in Python
Level: 400
Credit: 2

Introduction to GPU Computing in Python: CUDA vs. OpenCL
Understand the role and architecture of GPUs in modern computing
Understand the use of CuPy which is GPU accelerated Numpy like syntax.
Use Numba to write CUDA programs in Python. We can also also use
PyCUDA.
Use PyOpenCL for OpenCL programming in Python
Compare CUDA and OpenCL based on performance, portability, and ease of
use

CuPy: GPU accelerated computing with Numpy like syntax
What is CuPy?
● NumPy-compatible array library that runs on NVIDIA GPUs
● Developed by Preferred Networks
● Uses CUDA under the hood
Why CuPy?
● No new syntax – uses NumPy-like API
● Automatically dispatches computations to GPU
● Great for vectorized math, matrix ops, FFTs, and more
Use Cases:
● GPU-accelerated data science
● Deep learning preprocessing
● Replacing slow NumPy CPU cod

CuPy vs Numpy
Feature Numpy CuPy
Runs on CPU GPU
Syntax Standard Numpy Almost identical
Performance Lower for Large arrays Higher for Large Arrays
Dependencies None Requires CUDA

CuPy Basics
Import and Array Creation:
import cupy as cp
a = cp.array([1, 2, 3])
b = cp.arange(10)
c = cp.random.rand(3, 3)
CuPy to/from NumPy:
import numpy as np
a = np.array([1, 2, 3])
b = cp.asarray(a) # NumPy CuPy
→
c = cp.asnumpy(b) # CuPy NumPy
→

CuPy : Vector Addition
N = 1_000_000
a = cp.random.rand(N).astype(cp.float32)
b = cp.random.rand(N).astype(cp.float32)
import time
cp.cuda.Device(0).synchronize()
start = time.time()
c = a + b
end = time.time()
print("CuPy vector addition time: {:.3f} ms".format((end - start) * 1000))

Compare with Numpy
a_cpu = a.get()
b_cpu = b.get()
start = time.time()
c_cpu = a_cpu + b_cpu
end = time.time()
print("NumPy vector addition time: {:.3f} ms".format((end - start) * 1000))

CuPy Matrix Multiplication
A = cp.random.rand(1024, 1024)
B = cp.random.rand(1024, 1024)
start = time.time()
C = cp.matmul(A, B)
end = time.time()
print("CuPy matrix multiplication time: {:.3f} ms".format((end - start) * 1000))

Performance comparison of CuPy and Numpy
● Numpy equivalent of vector addition
a_np = cp.asnumpy(a)
b_np = cp.asnumpy(b)
start = time.time()
c_np = a_np + b_np
end = time.time()
print("NumPy vector addition time: {:.3f} ms".format((end - start) * 1000))
● NumPy Equivalent for Matrix Multiplication:
A_np = cp.asnumpy(A)
B_np = cp.asnumpy(B)
start = time.time()
C_np = np.matmul(A_np, B_np)
end = time.time()
print("NumPy matrix multiplication time: {:.3f} ms".format((end - start) * 1000))

Key Observations
● CuPy demonstrates significant performance improvements over
NumPy for large-scale computations due to GPU acceleration.
● For smaller datasets, the overhead of data transfer between CPU and
GPU may negate performance gains.
● CuPy offers a seamless transition for NumPy users to leverage GPU
acceleration.
● Ideal for large-scale numerical computations, machine learning
preprocessing, and scientific simulations.

CPU vs GPU
● CPU: Few powerful cores (good for serial tasks)
● GPU: Many simple cores (great for parallel tasks)
Applications:
● Deep learning (PyTorch, TensorFlow)
● Simulations (climate, physics)
● Image & signal processing

Numba
Numba is a Just-In-Time (JIT) compiler that can compile Python
functions to optimized machine code using LLVM (Low Level Virtual
Machine).
With @cuda.jit, you can run Python code directly on the GPU (using
CUDA).

CUDA Programming Model
CUDA (Compute Unified Device Architecture)
Thread hierarchy: gridDim, blockDim, threadIdx, blockIdx
Memory hierarchy:
● Global
● Shared
● Constant
● Local

CPU vs GPU
● Compute the element-wise square of a matrix (CPU)
import numpy as np
import time
# Simple element-wise square
N = 1_000_000
a = np.arange(N)
start = time.time()
b = a * a
print("CPU time:", time.time() - start)

CPU vs GPU
from numba import cuda
@cuda.jit //Converts the square_gpu function into a GPU kernel.
def square_gpu(a, b):
i = cuda.grid(1)
// Computes the global thread index in 1D. For example, thread 0 handles element 0, thread 1 handles element 1, etc.
if i < a.size:
b[i] = a[i] * a[i]
# Allocate arrays
a_gpu = np.arange(N, dtype=np.float32)
b_gpu = np.zeros_like(a_gpu)

CPU vs GPU
# Set up thread/block config
threads_per_block = 256
blocks_per_grid = (a_gpu.size + threads_per_block - 1) // threads_per_block
//GPUs run threads in groups called blocks.
//We choose 256 threads per block (common convention).
# Time the GPU kernel
start_gpu = time.time()
square_gpu[blocks_per_grid, threads_per_block](a_gpu, b_gpu)
//This syntax tells Numba to run the GPU function in parallel.
cuda.synchronize() //waits until the GPU is done computing before recording the time.
end_gpu = time.time()
print("GPU Time:", end_gpu - start_gpu)

CUDA vector addition
from numba import cuda
import numpy as np
import time
@cuda.jit
def vector_add(a, b, c):
idx = cuda.grid(1)
if idx < a.size:
c[idx] = a[idx] + b[idx]

CUDA vector addition
N = 1_000_000
a = np.arange(N, dtype=np.float32)
b = np.arange(N, dtype=np.float32)
c = np.zeros_like(a)
threads_per_block = 256
blocks_per_grid = (a.size + threads_per_block - 1) // threads_per_block
start = time.time()
vector_add[blocks_per_grid, threads_per_block](a, b, c)
cuda.synchronize()
print("CUDA time:", time.time() - start)
print("c[0] =", c[0]) # Should be 0 + 0

CUDA vector multiplication
@cuda.jit
def elementwise_multiply(a, b, out):
i = cuda.grid(1)
if i < a.size:
out[i] = a[i] * b[i]
N = 1000000
a = np.full(N, 2.0, dtype=np.float32)
b = np.full(N, 3.0, dtype=np.float32)
out = np.zeros_like(a)
elementwise_multiply[blocks_per_grid, threads_per_block](a, b, out)
cuda.synchronize()
print("out[0] =", out[0]) # should be 6.0

CUDA matrix Addition
@cuda.jit
def matrix_add(A, B, C):
row, col = cuda.grid(2)
if row < A.shape[0] and col < A.shape[1]:
C[row, col] = A[row, col] + B[row, col]
rows, cols = 512, 512
A = np.ones((rows, cols), dtype=np.float32)
B = np.ones((rows, cols), dtype=np.float32)
C = np.zeros((rows, cols), dtype=np.float32)

CUDA matrix addition
threads_per_block = (16, 16)
blocks_per_grid_x = (A.shape[0] + threads_per_block[0] - 1) // threads_per_block[0]
blocks_per_grid_y = (A.shape[1] + threads_per_block[1] - 1) // threads_per_block[1]
matrix_add[(blocks_per_grid_x, blocks_per_grid_y), threads_per_block](A, B, C)
cuda.synchronize()
print("C[0, 0] =", C[0, 0]) # Should be 2.0

PyCuda
● CUDA: NVIDIA's parallel computing architecture
● PyCUDA: Python wrapper for CUDA (via pycuda library)
● Offers GPU acceleration with Python using NVIDIA GPUs

PyCUDA
● Native CUDA in Python
● High-level APIs + access to raw kernels
● Fast prototyping for Python users
Features:
● Uses numpy arrays on host
● Device functions written in CUDA C
● Easy memory transfer

Pycuda Setup and Syntax
● Basic Setup
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import numpy as np
● Kernel in Cuda
mod = SourceModule("""
__global__ void add(float *a, float *b, float *c) {
int idx = threadIdx.x;
c[idx] = a[idx] + b[idx];
}
""")
add_func = mod.get_function("add")

Full Example
a = np.random.randn(256).astype(np.float32)
b = np.random.randn(256).astype(np.float32)
c = np.empty_like(a)
add_func(
cuda.In(a), cuda.In(b), cuda.Out(c),
block=(256,1,1), grid=(1,1)
)
print(c[:5])

PyCUDA Memory Management
Types of Memory:
● cuda.In() – Host to device
● cuda.Out() – Device to host
● cuda.InOut() – Bidirectional
Manual allocation:
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)

Comparison with Numba
Feature PyCUDA Numba
Language CUDA C + Python Pure Python
Control More Control Simpler Syntax
Compilation Ahead of Time Just in Time

PyCUDA - Vector Addition
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import numpy as np
__global__ void vec_add(float *a, float *b, float *c, int N) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx < N)
c[idx] = a[idx] + b[idx];
}
""")

PyCUDA Vector Addition
N = 1_000_000
a = np.random.rand(N).astype(np.float32)
b = np.random.rand(N).astype(np.float32)
func = mod.get_function("vec_add")
start = cuda.Event()
end = cuda.Event()
start.record()
func(cuda.In(a), cuda.In(b), cuda.Out(c), np.int32(N), block=(256,1,1), grid=(N // 256,1))
end.record()
end.synchronize()
time_ms = start.time_till(end)
print("Vector addition (PyCUDA) time: {:.3f} ms".format(time_ms))

PyCUDA Vector Multiplication
__global__ void vec_mul(float *a, float *b, float *c, int N) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx < N)
c[idx] = a[idx] * b[idx];
}
""")
a = np.random.rand(N).astype(np.float32)
b = np.random.rand(N).astype(np.float32)

PyCUDA vector multiplication
func = mod.get_function("vec_mul")
end = cuda.Event()
start.record()
func(cuda.In(a), cuda.In(b), cuda.Out(c), np.int32(N), block=(256,1,1), grid=(N // 256,1))
end.record()
end.synchronize()
print("Vector multiplication (PyCUDA) time: {:.3f} ms".format(time_ms))

PyCUDA Matrix Addition
__global__ void mat_add(float *A, float *B, float *C, int width, int height) {
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
int idx = row * width + col;
if (row < height && col < width)
C[idx] = A[idx] + B[idx];
}
""")
width, height = 1024, 1024
size = width * height

PyCUDA Matrix Addition
A = np.random.rand(height, width).astype(np.float32)
B = np.random.rand(height, width).astype(np.float32)
C = np.empty_like(A)
func = mod.get_function("mat_add")
end = cuda.Event()
start.record()
func(cuda.In(A), cuda.In(B), cuda.Out(C),
np.int32(width), np.int32(height),
block=(16,16,1), grid=(width//16, height//16))
end.record()
end.synchronize()
print("Matrix addition (PyCUDA) time: {:.3f} ms".format(time_ms))

OpenCL (Open Computing Language)
A framework for CPUs and GPUs

PyOpenCL
● Python wrapper for the OpenCL API
● Enables GPU and parallel programming from Python
● Supports CPUs, GPUs, FPGAs across vendors (AMD, Intel, NVIDIA)
● Combines flexibility of Python with power of OpenCL

Advantages of PyOpenCL
● Vendor-neutral (runs on many types of devices)
● Portable: Write once, run anywhere
● Fine-grained control over device, kernel, memory
● Great for:
○ Heterogeneous computing
○ Custom GPU kernel development
○ Research prototypes in Python

Advantages of PyOpenCL over Numba
● Cross-Platform Compatibility:
○ Runs on GPUs, CPUs, FPGAs from multiple vendors
○ Ideal for heterogeneous computing
● Explicit Device and Memory Control:
○ Better control over buffer allocation and kernel dispatch
● Support for Non-NVIDIA Hardware:
○ Works with Intel, AMD, Apple M-series GPUs
● Standards-Based:
○ Follows Khronos OpenCL specification
● Advanced Kernel Features:
○ Access to OpenCL-specific tuning like workgroups, barriers, local
memory
● OpenCL Interoperability:
○ Can be integrated into C/C++ or multi-language systems

First PyOpenCL program
Perform parallel computation on a GPU (or any OpenCL-supported device) from Python
using PyOpenCL.
Set up an OpenCL context and command queue.
Transfer data between host (CPU) and device (GPU).
Define a simple GPU kernel in OpenCL C that adds two vectors.
Execute that kernel on the GPU.
Retrieve and verify the result on the host.

import pyopencl as cl
import numpy as np
# Create OpenCL context using any available platform and device
devices = cl.get_platforms()[0].get_devices()
ctx = cl.Context(devices=devices)
# Create a command queue to submit work to the device
queue = cl.CommandQueue(ctx)
# Prepare input arrays (on host/CPU)
a = np.array([1, 2, 3, 4], dtype=np.float32)
b = np.array([5, 6, 7, 8], dtype=np.float32)
c = np.empty_like(a) # output array (initially empty)
# Memory flags for buffer creation
mf = cl.mem_flags

# Transfer data from host to device memory (READ_ONLY + COPY_HOST_PTR)
a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b)
# Allocate device memory for output array (WRITE_ONLY)
c_buf = cl.Buffer(ctx, mf.WRITE_ONLY, a.nbytes)
# Define the OpenCL kernel for vector addition
program = cl.Program(ctx, """
__kernel void vector_add(__global const float *a,
__global const float *b,
__global float *c) {
int gid = get_global_id(0); // unique thread index
c[gid] = a[gid] + b[gid]; // element-wise addition
}
""").build()

# Launch the kernel: (global work size = size of input array)
program.vector_add(queue, a.shape, None, a_buf, b_buf, c_buf)
# Copy result from device (GPU) memory back to host (CPU) memory
cl.enqueue_copy(queue, c, c_buf)
# Print the result array
print("Result:", c)

Key Concepts in an OpenCl program
● Context: Environment for kernel execution
● CommandQueue: Submits work to device
● Buffers: Transfer data to/from GPU memory
● Kernel: C-like OpenCL function compiled at runtime
● Global ID: Index of the current work item

Vector Squaring
__kernel void square(__global float *a, __global float *b) {
int gid = get_global_id(0);
b[gid] = a[gid] * a[gid];
}
""").build()
program.square(queue, a.shape, None, a_buf, b_buf)
cl.enqueue_copy(queue, b, b_buf)

Vector Multiplication
__kernel void multiply(__global float *a, __global float *b, __global float *out) {
int gid = get_global_id(0);
out[gid] = a[gid] * b[gid];
}
""").build()
program.multiply(queue, a.shape, None, a_buf, b_buf, out_buf)
cl.enqueue_copy(queue, out, out_buf)

2D Matrix Addition
__kernel void mat_add(__global float* A, __global float* B, __global float* C, int width) {
int row = get_global_id(0);
int col = get_global_id(1);
int idx = row * width + col;
C[idx] = A[idx] + B[idx];
}
""").build()
program.mat_add(queue, (rows, cols), None, A_buf, B_buf, C_buf, np.int32(cols))
cl.enqueue_copy(queue, C, C_buf)

GPU Acceleration using Tensorflow
● TensorFlow automatically utilizes available GPU for computation
● Supports NVIDIA GPUs via CUDA and cuDNN
● Use tf.config.list_physical_devices('GPU') to check availability

Tensorflow basics
pip install tensorflow
● Alternatively: tensorflow-gpu (for legacy versions)
● Requires CUDA Toolkit and cuDNN installed (see TensorFlow compatibility chart)
Check GPU Access
import tensorflow as tf
print("Num GPUs Available:",len(tf.config.list_physical_devices('GPU')))

Tensorflow - Vector Addition
import time
N = 1000000
a = tf.random.normal([N])
b = tf.random.normal([N])
start = time.time()
c = tf.add(a, b)
tf.print("Vector Addition Time (GPU):", time.time() - start)

Tensorflow - Matrix Addition
import time
M, N = 512, 512
a = tf.random.normal([M, N])
b = tf.random.normal([M, N])
start = time.time()
c = tf.add(a, b)
tf.print("Matrix Addition Time (GPU):", time.time() - start)

Tensorflow - Matrix Multiplication
import time
a = tf.random.normal([1000, 1000])
b = tf.random.normal([1000, 1000])
start = time.time()
c = tf.matmul(a, b)
tf.print("Matrix Multiplication Time (GPU):", time.time() - start)

Further features of Tensorflow
with tf.device('/GPU:0'):
result = tf.matmul(a, b)
with tf.device('/CPU:0'):
result = tf.matmul(a, b)
Enable XLA (Accelerated Linear Algebra) for further speed-up:
# Enable XLA globally
tf.config.optimizer.set_jit(True)
@tf.function(jit_compile=True)
def matmul_xla(a, b):
return tf.matmul(a, b)
a = tf.random.normal([512, 512])
b = tf.random.normal([512, 512])
tf.print(matmul_xla(a, b)[0][0])

Tensorflow Advantages
● TensorFlow abstracts GPU usage – easy to deploy without writing
kernels
● Ideal for ML and deep learning
● Supports vectorized ops: addition, multiplication, matrix ops
● Use TensorBoard for advanced profiling
● Supports mixed precision and distributed training on multi-GPU
setups

Data Extraction from Web in Python
(BeautifulSoup + Scrapy)
Level: 400
Credit: 2

Introduction to Web Scraping
What is Web Scraping?
Web scraping is the process of automatically retrieving data from websites using scripts or software. It allows you to:
● Collect data from HTML pages
● Monitor prices, news, job boards, or sports scores
● Extract structured information from unstructured sources
Legal and Ethical Considerations:
● Always check the site's robots.txt:
Example → https://example.com/robots.txt
● Respect Terms of Service
● Avoid scraping personal or copyrighted data
● Implement rate limiting, sleep delays, and user-agent headers to avoid blocking
pip install requests beautifulsoup4 scrapy lxml

What is ‘robots.txt’
A robots.txt file is a text file that webmasters create to tell search engine crawlers which parts of their website are
allowed and not allowed to be crawled and indexed.
It's essentially a set of instructions for bots, helping to manage their activities and prevent them from overloading
the site.
● Simple Text File: It's a plain text file, usually located in the root directory of a
website.
● Directives: The file contains directives like User-agent, Disallow, Allow, Crawl-
delay, and Sitemap.
● User-agent: Specifies the bot that the rule applies to (e.g., Googlebot).
● Disallow: Instructs the bot not to crawl specific URLs or directories.
● Allow: Allows the bot to crawl specific URLs or directories.
● Crawl-delay: Suggests the bot wait a specified amount of time before crawling
(Googlebot doesn't honor this, but it can be used as a guideline).
● Sitemap: Specifies the location of a sitemap file, which helps crawlers discover
all pages on the site.

BeautifulSoup – Parsing Static HTML
Step-by-Step Workflow
1. Send a GET request
2. Parse HTML using BeautifulSoup
3. Navigate or search for data
4. Extract the text or attributes
Get All Blog Post Titles
import requests
from bs4 import BeautifulSoup
url = "https://realpython.github.io/fake-jobs/"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')
titles = soup.find_all('h2', class_='title')
for title in titles:
print(title.text.strip())

Get all image URLs in a Page
images = soup.find_all('img')
for img in images:
print(img.get('src'))
Extract Table Data into a List of Dicts
table = soup.find("table")
rows = table.find_all("tr")
data = []
for row in rows[1:]: # skip header
cols = row.find_all("td")
data.append({
'Name': cols[0].text.strip(),
'Email': cols[1].text.strip()
})

Nested Navigation
quote = soup.find("div", class_="quote")
text = quote.find("span", class_="text").text
author = quote.find("small", class_="author").text
tags = [tag.text for tag in quote.find_all("a", class_="tag")]
print(text, author, tags)
Accessing elements inside other elements by chaining find(), find_all(), or using CSS selectors
to drill down through the HTML structure.
It allows you to navigate the DOM hierarchy step by step — much like how you’d inspect
elements in browser developer tools.

What will be the output for this web page ?
<div class="quote">
<span class="text">"Talk is cheap. Show me the code."</span>
<span>
<small class="author">Linus Torvalds</small>
<a class="tag" href="/tag/code">code</a>
<a class="tag" href="/tag/linux">linux</a>
</span>
</div>

Using lambda function in BeautifulSoup
1. Find all tags with text longer than 20 characters
soup.find_all(lambda tag: tag.name == 'p' and len(tag.text) > 20)
Use Case: Filter <p> tags with substantial content.
2. Find all tags that have an href attribute containing “login”
soup.find_all(lambda tag: tag.has_attr('href') and 'login' in tag['href'])
Use Case: Find login links like <a href="/user/login">

Using lambda function in BeautifulSoup
3. Find elements with a certain text pattern
soup.find_all(lambda tag: tag.string and 'Python' in tag.string)
Use Case: Locate any tag with exact text containing "Python".
4. Filter div tags whose ID starts with “section-”
soup.find_all(lambda tag: tag.name == 'div' and tag.get('id', '').startswith('section-'))
Use Case: Scraping page sections like section-1, section-2.

Use of Regular Expressions in Beautifulsoup
What are Regular Expressions?
● A way to match patterns in text
● Used for searching, filtering, extracting or validating strings
Why use them in BeautifulSoup?
● To match tags or attributes when values are dynamic or inconsistent
● More flexible than exact string matching

Regex usage in Python
import re
text = "user_123_atleast123#@example.com"
# Match word characters before the '@'
match = re.match(r"w+", text) #w matches all alphanumeric and underscore
print(match.group()) # Output: user_123
Basic Functions:
● re.search() – finds first match
● re.findall() – returns all matches
● re.sub() – substitutes matched pattern

Regex usage in Python
import re
text = "The price is $199.99 and the discount is 25%."
# Extract dollar amount
match = re.search(r"$d+.d+", text)
print(match.group()) # Output: $199.99
# Extract all numbers
numbers = re.findall(r"d+", text)
print(numbers) # Output: ['199', '99', '25']

Regex in validation and Filtering
# Validate email address
email = "user@example.com"
pattern = r"^[w.-]+@[w.-]+.w{2,}$"
if re.match(pattern, email):
print("Valid email")
# Filter lines that start with numbers
lines = ["1. Start", "A. Skip", "2. Continue"]
filtered = list(filter(lambda l: re.match(r"^d", l), lines))
print(filtered) # Output: ['1. Start', '2. Continue']

Regex Usage with BeautifulSoup
import re
html = '''
<a href="index.html">Home</a>
<a href="contact.html">Contact</a>
<a href="resume.pdf">Resume</a>
'''
soup = BeautifulSoup(html, 'html.parser')
# Find only <a> tags with href ending in .html
html_links = soup.find_all('a', href=re.compile(r'.html$'))
for link in html_links:
print(link.text, ' ', link['href'])
→

Web Crawling
What is Web Crawling?
● Systematically visiting and extracting data from multiple web pages
● Often involves following links and recursively scraping data
Crawling with BeautifulSoup:
● Use requests to fetch page content
● Use soup.find_all('a') to find links
● Follow and scrape each link recursively or iteratively

Web Crawling - Example
import requests
def crawl_page(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
print("Page Title:", soup.title.text)
for link in soup.find_all('a'):
href = link.get('href')
if href and href.startswith('http'):
print("Found link:", href)
crawl_page('https://quotes.toscrape.com')

Crawling through various sites on internet

Crawling Models
1. Breadth-First Search (BFS) Model
● Crawls all links on a page before moving deeper
● Suitable for shallow but wide websites
2. Depth-First Search (DFS) Model
● Follows links as deep as possible before backtracking
● Useful when deep data structures or hierarchies are present
3. Focused Crawling
● Targets pages based on keywords or link patterns
● Filters irrelevant pages early using rules or regex
4. Incremental Crawling
● Only crawls pages that have changed since the last visit
● Reduces load and improves efficiency (requires timestamps or hashes)

DFS Crawling
def dfs_crawl(url, depth, visited=set()):
if depth == 0 or url in visited:
return
try:
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
print("[DFS] Visited:", url)
visited.add(url)
for link in soup.find_all('a'):
href = link.get('href')
if href and href.startswith('http'):
dfs_crawl(href, depth - 1, visited)
except:
pass

Introduction to Scrapy
What is Scrapy?
● An open-source web crawling and scraping framework written in
Python
● Designed for fast, large-scale data extraction
● Built-in support for following links, handling pagination, exporting data
Why Scrapy over BeautifulSoup?
● Asynchronous and fast
● Built-in crawling support
● Better suited for structured, multi-page scraping

Python Spider (Web Spider)
● Web spiders are called by technocrats using different names.
● The other names of web spider are web crawler, automatic indexer, crawler or simply
spider.
● A web spider is actually a bot that is programmed for crawling websites.
● The primary duty of a web spider is to generate indices for websites and these indices
can be accessed by other software.
● For instance, the indices generated by a spider could be used by another party to assign
ranks for websites.
pip install scrapy
scrapy startproject quotespider
cd quotespider
● quotespider/spiders/ → your spider scripts
● items.py → define fields
● settings.py → configure behavior

Creating your first Spider
● Create a new spider in the spiders directory:
scrapy genspider quotes quotes.toscrape.com
● Edit quotes.py:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes" # Unique name to run the spider
start_urls = ['https://quotes.toscrape.com'] # Start scraping from this URL
def parse(self, response):
for quote in response.css('div.quote'): # Loop through each quote container
yield {
'text': quote.css('span.text::text').get(), # Extract quote text
'author': quote.css('small.author::text').get(), # Extract author name
'tags': quote.css('a.tag::text').getall() # Get list of tags
}

Understanding .css in Scrapy
The .css() method in Scrapy is used to select HTML elements using CSS selectors, similar to how
elements are targeted in web development (like in browser DevTools or jQuery).
response.css('selector') # returns a SelectorList
response.css('selector::text') # returns text inside the tag
response.css('selector::attr(href)') # returns value of an attribute (e.g., href)
Examples:
● response.css('div.quote'): selects all <div> elements with class "quote" from the HTML response.
● quote.css('span.text::text'): selects the text inside <span class="text">...</span> within each quote block.
● quote.css('a.tag::text').getall(): extracts all tag texts from multiple <a class="tag"> inside the quote.
● response.css('li.next a::attr(href)'): grabs the href attribute from the <a> tag inside a <li class="next">, used for pagination.
Why Use .css()?
● Cleaner and more readable than XPath for most use cases
● Fast, efficient, and very similar to what developers use in HTML/CSS

Running the Spider and Export Results
● Run the spider
scrapy crawl quotes
● Export to JSON/CSV:
scrapy crawl quotes -o quotes.json
scrapy crawl quotes -o quotes.csv

Extracting from more complex data.
Scrape quotes from https://quotes.toscrape.com along with additional details (birthdate, bio) from each
author's profile page.
# This method handles the response from the start_urls or any followed links.
for quote in response.css('div.quote'):
# Loops through all divs with class 'quote' to process each quote block.
item = {'text': quote.css('span.text::text').get(), # Extracts the quote text.
'author': quote.css('small.author::text').get() # Extracts the author's name.
}
author_url = quote.css('span a::attr(href)').get()
# Gets the relative link to the author's bio page.

Extracting from more complex data
if author_url:
# Ensures the URL is valid before following it.
yield response.follow(author_url, self.parse_author, meta={'item': item})
# Makes a new request to the author's page and passes the item using meta.
def parse_author(self, response):
# This function is called for each author's page visited by response.follow.
item = response.meta['item']
# Retrieves the item passed from the previous parse function.
item['birthdate'] = response.css('span.author-born-date::text').get()
# Extracts author's birth date from the author's page.
item['bio'] = response.css('div.author-description::text').get()
# Extracts author's biography/description from the author's page.
yield item
# Final step: yields the complete item containing quote, author, birthdate, and bio.

Scraping with Pagination
Scrape all quotes across multiple pages on https://quotes.toscrape.com by following "Next" page links.
# Handles the response and parses current page content
for quote in response.css('div.quote'):
# Iterates through each quote block on the page
yield {
'text': quote.css('span.text::text').get(), # Extracts quote text
'author': quote.css('small.author::text').get(), # Extracts author name
'tags': quote.css('a.tag::text').getall() # Extracts all associated tags
}
next_page = response.css('li.next a::attr(href)').get()
# Finds the link to the next page, if available
if next_page:
# If a next page exists, schedule a follow-up request
yield response.follow(next_page, self.parse)
# Recursively call the same parse function for the next page

Scrapy Items and Item loaders
Define a structured format (fields) to store extracted data (quotes, authors, tags).
import scrapy
class QuoteItem(scrapy.Item):
text = scrapy.Field() # Defines a field for the quote text
author = scrapy.Field() # Defines a field for the author's name
tags = scrapy.Field() # Defines a field for tags associated with the quote
from quotespider.items import QuoteItem
item = QuoteItem()
item['text'] = quote.css('span.text::text').get() # Assigns extracted quote text to the item field

Scrapy Pipelines
Clean or transform data before storing or exporting (e.g., trim whitespace).
# In settings.py
ITEM_PIPELINES = {
'quotespider.pipelines.QuotespiderPipeline': 300,
# Registers the QuotespiderPipeline class and sets its priority
}
# In pipelines.py
class QuotespiderPipeline:
def process_item(self, item, spider):
item['text'] = item['text'].strip() # Cleans whitespace from the quote text
return item # Returns the processed item to the pipeline chain

Changing Scrapy Settings
Configure how Scrapy behaves during crawling (speed, headers, robots.txt adherence).
ROBOTSTXT_OBEY = True # Ensures crawler respects robots.txt
DOWNLOAD_DELAY = 1 # Adds delay between requests to avoid overloading the
server
CONCURRENT_REQUESTS = 16 # Sets max concurrent requests
USER_AGENT = 'Mozilla/5.0' # Custom user-agent to mimic a real browser
● These settings help manage crawler behavior and compliance with web etiquette.
● Middleware can modify requests and responses; useful for adding headers, retry logic, or
proxy support.

Introduction to Selenium
What is Selenium?
● A browser automation library used for testing and web scraping.
● Works with real browsers like Chrome, Firefox, Edge.
Use Cases:
● Scraping content that is dynamically rendered by JavaScript.
● Automating form submissions and button clicks.
Installation:
pip install selenium

BeautifulSoup vs Scrapy vs Selenium

Javascript and Web scraping
Why JavaScript matters:
● Many modern websites use JavaScript to dynamically load content
after the page initially loads. Eg. Google, Wikipedia, Facebook,
Twitter and many more are JS enabled websites.
● Example: News feeds, product listings, comment sections are often
loaded via JS.
Impact on scraping:
● BeautifulSoup and Scrapy cannot see JS-generated content.
● Selenium renders the page like a real browser, allowing access to
dynamic data.

Javascript and Selenium
How Selenium helps:
● Executes JavaScript as part of page rendering.
● Waits for JavaScript-based elements to load.
● Can interact with elements that only appear after JS execution (e.g.,
modals, infinite scroll).
Example Use Cases:
● Scraping content behind login modals.
● Clicking “Load More” buttons.
● Capturing interactive charts or dynamic tables.

Launching a Browser and Opening a Page
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
# Launch browser
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://example.com")
print("Page Title:", driver.title)
● Launches Chrome browser
● Opens the target webpage
● Displays the page title

Javascript Example: Load More Button
●
How to automate scrolling through a JS-powered infinite scroll page and capture
content ?
●
This example shows how Selenium interacts with JS-generated content on real
website
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Load a dynamic site
driver.get("https://infinite-scroll.com/demo/full-page/")
# Scroll down or simulate clicking to load more content
import time
last_height = driver.execute_script("return document.body.scrollHeight")

Javascript example: Load More Button
while True:
# Scroll to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2) # Wait for content to load
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break # No more content to load
last_height = new_height
# Optionally, capture loaded items
items = driver.find_elements(By.CLASS_NAME, "post")
print(f"Loaded {len(items)} posts.")

More Selenium Usage
● Finding Elements (Extract visible text or attributes from page elements)
element = driver.find_element(By.TAG_NAME, "h1")
print("Heading:", element.text)
# Find multiple elements (e.g., all links)
links = driver.find_elements(By.TAG_NAME, "a")
for link in links:
print(link.get_attribute("href"))

More Selenium Usage
● Working with Forms and Inputs
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
# Launch the browser
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
# Open a webpage with a search box
driver.get("https://www.google.com ")
# Locate the search box using its 'name' attribute (commonly 'q' on many search engines)
search_box = driver.find_element(By.NAME, "q")
# Simulate typing text into the search input field
search_box.send_keys("Python Selenium")
# Submit the form that the input belongs to (triggers the search action)
search_box.submit()

Sample MCQs from topics after Quiz-1 for Exam
Preparation
Level: 400
Credit: 2

Multiprocessing and theading
Q1. What would be the output of the following code?
def f(x):
return x*x
if __name__ == '__main__':
with Pool(2) as p:
print(p.map(f, [1, 2, 3, 4]))
A) [1, 4, 9, 16]
B) May vary based on number of cores
C) Error due to function not being pickleable
D) Only first two results returned due to pool size
Q2. Which of the following is true about multiprocessing.Process?
A) It shares memory space with the main process
B) It runs in a new thread
C) It runs in a separate process with its own memory
D) It cannot run on Windows systems
A- Explanation: Pool size limits concurrency, but all inputs are processed.
C- Explanation: Processes are independent with separate memory space.

Multiprocessing and threading
Q3. Why doesn’t Python threading.Thread improve performance on CPU-bound tasks?
A) Threads are not real in Python
B) GIL prevents true parallel execution
C) CPU-bound tasks can't be threaded
D) Python threads can't access CPU
Q5. Which of the following can result in a race condition?
A) Accessing a list in a single thread
B) Two threads updating a shared variable without locks
C) Forking a child process
D) All of the above
B - Explanation: The Global Interpreter Lock (GIL) prevents true parallelism in CPU-bound threads.
D - Explanation: join() blocks until the thread terminates
B- Explanation: Unsynchronized access to shared variables by threads causes race conditions.

Asynchronous I/O
D) All of the above

Multiprocessing and threading
Q9. What does await asyncio.sleep(1) do?
A) Pauses the thread
B) Sleeps the coroutine
C) Blocks I/O
D) Makes the coroutine finish instantly
Q10. Which of the following is a valid asyncio pattern?
A) await asyncio.run(...)
B) asyncio.run(await foo())
C) await foo() inside a regular function
D) Use async def for coroutine definitions
B - Explanation: await asyncio.sleep(1) suspends the coroutine for 1 second without blocking the event loop
D - Explanation: async def defines an asynchronous coroutine.
Q11. When is asyncio.run() used?
A) Inside every coroutine
B) Only in child coroutines
C) To start the event loop for top-level coroutine ✅
D) To run multiple coroutines in a loop
C- Explanation: asyncio.run() is used to run the top-level coroutine and manage the event loop.

Asynchronous I/O
D) All of the above

Q15. What does PyCUDA primarily provide?
A) Python bindings to OpenCL
B) A compiler for Python GPU code
C) Interface for writing CUDA kernels in Python
D) JIT for vector operations
Q16. Why must PyCUDA code define __global__ functions?
A) It helps PyCUDA run on CPUs
B) These are kernel functions callable from host
C) Only global variables are used in GPU
D) It is deprecated syntax
Q17. What is the purpose of drv.InOut(a) in PyCUDA?
A) It copies a to GPU only
B) It makes a mutable and GPU-accessible
C) It prevents memory leaks
D) It converts a to float32
C - Explanation: PyCUDA allows Python to interface with CUDA kernels written in C/C++
B - Explanation: __global__ marks a function as a GPU kernel callable from host.
B- Explanation: InOut makes a NumPy array available for reading and writing by the GPU.

Q18. What’s a key difference between PyOpenCL and PyCUDA?
A) PyOpenCL uses C++
B) PyOpenCL is cross-platform and not NVIDIA-specific
C) PyOpenCL can't handle memory buffers
D) PyOpenCL works only with AMD GPUs
Q19. What does context = cl.create_some_context() do?
A) Creates a CPU context
B) Finds a random GPU and creates context
C) Raises error if no NVIDIA GPU
D) Compiles OpenCL kernel
Q20. Which one compiles OpenCL kernels in PyOpenCL?
A) cl.KernelBuild()
B) cl.Program(context, source).build()
C) compile.opencl()
D) launch(source)
B - Explanation: PyOpenCL works across different vendors (AMD, Intel, NVIDIA) unlike PyCUDA.
B - Explanation: Explanation: It auto-selects an available device to create the execution context.
B- Explanation: PyOpenCL uses cl.Program(...).build() to compile kernel source.

Q21. What does @numba.jit do?
A) Interprets Python at runtime
B) Compiles Python to C++
C) Compiles annotated function to optimized machine code
D) Optimizes only NumPy functions
Q22. What mode does @jit(nopython=True) enforce?
A) Python fallback on error
B) Full JIT optimization with no Python object overhead
C) Debug mode
D) Safe JIT sandboxing
Q23. What does cp.asarray() do?
A) Copies NumPy array to CPU
B) Converts NumPy array to GPU CuPy array ✅
C) Frees memory
D) Flattens CuPy array
C- Explanation: Numba compiles functions to fast native code using LLVM.
B - Explanation: nopython=True forces use of only native types for best performance.
B- Explanation: It creates a CuPy array on the GPU from a NumPy array.

Q24. CuPy array operations:
A) Use NumPy backend
B) Are CPU-bound
C) Are run asynchronously on GPU
D) Block GPU context
Q25. What is @tf.function used for?
A) Declares TensorFlow constants
B) Converts Python code to a computational graph
C) Enables multi-GPU training
D) Uses PyTorch API
Q26. How do you run tensorflow training on GPU?
A) Use tf.device('CPU')
B) Use cuda=True flag
C) Ensure CUDA/cuDNN and GPU drivers are installed
D) Install CuPy
C- Explanation: Operations are GPU-accelerated and executed asynchronously.
B - Explanation: TensorFlow compiles annotated Python code into an efficient static graph.
3
C- Explanation: TensorFlow auto-detects GPU if drivers and CUDA/cuDNN are installed.

RegEX in Python
Q27. What does the regular expression r'bw{4}b' match?
A) Words longer than 4 characters
B) Exactly 4-letter words
C) Any 4 characters
D) All uppercase 4-letter words
Q28. What will re.findall(r'[a-z]{3,}', 'abc defg h ijkl') return?
A) ['abc', 'defg', 'ijkl']
B) ['abc', 'defg', 'h', 'ijkl']
C) ['abc', 'defg', 'ij']
D) ['abc', 'defg']
Q29. What is the output of re.sub(r'(d+)', r'#1#', 'abc123xyz456')?
A) abc123xyz456
B) abc#123#xyz#456#
C) abc#xyz#
D) abc#1#xyz#2#
B- Explanation: b is a word boundary. w{4} matches exactly 4 word characters. So the regex matches
words that are exactly 4 letters long.
A - Explanation: [a-z]{3,} matches sequences of lowercase letters that are 3 or more characters long.
B- Explanation: d+ matches digits. 1 references the captured digits and wraps them with #.

BeautifulSoup, Scrapy and Selenium
Q30. What does soup.find('p') return?
A) All <p> tags
B) First <p> tag
C) Raises error
D) List of tags
Q31. What does soup.select('.title') do?
A) Finds tags with class="title"
B) Selects titles from database
C) Parses XML only
D) Finds all h1 elements
Q32. Which parser is fastest in most cases?
A) html.parser
B) lxml
C) html5lib
D) xml.sax
B- Explanation: find() returns the first occurrence of the specified tag.
A - Explanation: .select() allows CSS-style selection for classes and tags.
B- Explanation: lxml is usually the fastest and most efficient parser.

BeautifulSoup, Scrapy and Selenium
Q33 What is parse() in a Scrapy spider?
A) Compiles CSS
B) Parses the settings
C) Default callback for start_requests
D) Runs before spider starts
Q34. What does yield scrapy.Request(...) do?
A) Starts a new thread
B) Schedules a request to be processed
C) Parses settings file
D) Restarts spider
Q35. How to extract multiple values from a selector?
A) Use extract_first()
B) Use extract_all()
C) Use get()
D) Use extract()
C- Explanation: parse is the main method to handle responses in a spider.
B- Explanation: Scrapy handles the yielded request asynchronously.
D- Explanation: .extract() or .getall() fetch all matches from a selector.

Advanced Programming (DS40108)_ Working with File Paths and Directories in Python.pptx

More Related Content

Similar to Advanced Programming (DS40108)_ Working with File Paths and Directories in Python.pptx

Recently uploaded

Advanced Programming (DS40108)_ Working with File Paths and Directories in Python.pptx