1. TOP RATED POST SELECTOR
BY ANJANEY MITRA, DIVYANSH CHAUHAN, SHREY JAIN AND PUSHKAR SINGH
2. WHAT DOES OUR PROGRAM DO
• This script uses python as the base programming language.
• We have used python libraries such as PRAW, PANDAS,
URLLIB, OS.
• This script downloads and saves the top rated posts on any
subreddit. The limit of the posts can be manually set by the
user.
• The subreddits are a part of reddit which is accessed using
APIs.
3. REDDIT AND SUBREDDITS
• Reddit is a network of communities where people can dive into
their interests, passions, and hobbies.
• These communities are called subreddits.
• These subreddits contain posts that are posted by various users
around the world. These posts can be upvoted or downvoted
by the users of that subreddit/community.
4. OVERVIEW OF THE SCRIPT
• Our script accesses reddit and a particular subreddit that is
chosen by the user.
• Any subreddit can be accesses as long as we have its name.
• The subreddit is accesses using the reddit API – PRAW
(PYTHON REDDIT API WRAPPER).
• Once the subreddit is accessed, the script finds the top
rated/upvoted posts, selects them and downloads them on our
device.
5. WHAT IS AN API
• API stands for “Application Programming Interface.” An API is a software
intermediary that allows two applications to talk to each other. In other
words, an API is the messenger that delivers your request to the provider
that you’re requesting it from and then delivers the response back to you.
• In simpler words, API is the waiter that fetches food from the kitchen
(source from where the information has been requested)and serves it to
the client (the one who has requested the information).
• APIs allow for transmission of data from system to system and creating a
connected experience.
6. PYTHON REDDIT API WRAPPER
• PRAW (Python Reddit API Wrapper) is a Python library that allows you to easily
access the Reddit API and interact with the Reddit website. With PRAW, you can
read and write data to and from Reddit, including creating, deleting, and updating
posts and comments, and getting information about subreddits and users.
• To implement PRAW, you will first need to install it using pip. Open a command
prompt or terminal window and run the following command:
“pip install praw”
• Once you have installed PRAW, you will need to create a Reddit account and
obtain your API credentials. You can do this by going to
https://www.reddit.com/prefs/apps and creating a new app. Choose the "script"
option and fill in the required fields.
7. PANDAS
• Pandas is a popular Python library for data manipulation and analysis. It provides
data structures and functions for efficiently handling and processing structured
data. Pandas is widely used in data science and machine learning for tasks such
as data cleaning, transformation, aggregation, and visualization.
• The two primary data structures in Pandas are the Series and DataFrame. A
Series is a one-dimensional array-like object that can hold any data type, such as
numbers, strings, or dates. A DataFrame is a two-dimensional table-like data
structure, where each column can have a different data type.
• DataFrames are similar to spreadsheets in that they have rows and columns, and
you can perform operations such as selecting, filtering, and transforming data.
8. PANDAS
• Pandas provides a wide range of functions for data
manipulation, such as merging, grouping, pivoting, and
reshaping data. It also has built-in functions for handling
missing data, applying mathematical operations to data, and
working with dates and times. Additionally, Pandas can read
and write data from various file formats, such as CSV, Excel,
SQL, and JSON.
9. URLLIB
• Urllib package is the URL handling module for python. It is used to
fetch URLs (Uniform Resource Locators). It uses
the urlopen function and is able to fetch URLs using a variety of
different protocols.
• Urllib is a package that collects several modules for working with
URLs, such as:
• urllib.request for opening and reading.
• urllib.parse for parsing URLs
• urllib.error for the exceptions raised
• urllib.robotparser for parsing robot.txt files
10. OS
• The OS module in Python provides functions for interacting with
the operating system. OS comes under Python’s standard utility
modules. This module provides a portable way of using
operating system-dependent functionality. The *os* and
*os.path* modules include many functions to interact with the
file system.
• Python OS module provides easy functions that allow us to
interact and get Operating System information and even control
processes up to a limit.
11. PROJECT FILES
• We have 3 project files – main.py , getmemes.py and keys.py
• Main.py
• Getmemes.py
• Keys.py
12. MAIN.py
• Main.py is the main file that has the user entered parameters
like limit of memes downloaded or the subreddit the memes are
being downloaded from.
• To set the limit and the subreddit we can just open the files and
change the two variables. To set the limit of and the the
subreddit we can also just open the two variable
13. GETMEMES.py
• Getmemes.py is the where we have the brains of the code that
has the main code of using PRAW to get images urls.
• After which it uses urllib to downloaded those memes.
• We also make a csv file with all the memes titles, urls, post
karma, etc, so incase if we want to find those memes original
posts we can find it through the csv file.
14. KEYS.py
• Keys.py is the file where we store all keys like API secret, client
tokens, etc.
• This is done so that they are in a separate file which we can use
to change tokens easily and we don’t have it in other files so
incase we share those files we don’t leak your keys.