3. Abstract
Analyzing a resume manually is plausible but analyzing a huge collection of
resume manually is not possible. Moreover, finding a particular set of
information from collection of resume is not practical.
Storing the resumes in certain format into a storage and accessing the required
information is easier if the resume are parsed in and refined.
There are a number of document formats available for resume and it is not only
to store this unstructured data into a structured format for better storage but
also for fast extraction of that information.
3
4. Architecture
Resume summarizer is divided into two parts:
- Resume Parser;
- Search Resume
Resume Parser:
Parsing resume and converting unstructured resume files into a
structured collection of resume.
Search Resume:
This part of resume summarizer is used to search the required
information from collection of structured resume, and displaying the required4
6. Technology Used
- Java : Parsing the resume into a simple format that could be used by
hadoop for mapping and reducing the data is done in java.
- Hadoop : System uses hadoop mapping and reducing approach for
operating on resume. There are a number of mappers used one for each
file format but only one single reducer is implemented.
- MySql : For storing the structured data locally MySql DB is used.
6
7. Working
The basic working of this project starts with two types of input.
First input could be a resume, this resume is then processed to find the
information from the unstructured file and converted into structured format. The
resume provided as input could be a single resume provided as a live input by
user or we can provide collection of resume.
Second input is the search query, that is fired against the collection of
structured format of resume from the storage.
7
8. Input:
An interface is provided where user can upload the resume and can specify
the format (format is optional).
If user specify the format then we need to check the uploaded resume format
with specified format if it matches then process the resume using resume
parser and store the results in Database, otherwise check the uploaded resume
format with available formats we are able to parse, if uploaded resume format
matches with available formats then parse it using resume parser and store in
Database in some Structured format.
But the problem is resumes need not be in structured format, so processing
and storing the resumes in Database which are not in structured format is
difficult task.
8
Working
9. Resume are processed from either kind of input.
Firstly, the resume is parsed for the information according to the format of the
resume file .
Secondly, the data obtained from parsing is given as input to mapper. There
are multiple mappers according to the supported file formats and these
mappers maps the data accordingly.
Lastly, the data after mapping is passed to reducer and reduces reduces the
data into the desired format and stored into the storage.
Processing Resume:
9
Working
10. In the same interface we are providing another interface in such a way that user
can query the database (i.e. search query) by using some keywords (or) data
elements which are supposed to be existing in resumes.
After the user entered a query to search , then we perform certain operations
on Database by using conditions and filters which are in user’s query.
Then we refine the above results and rank them and then dispaly as a result to
user’s query.
The working flow is looks as follows:
Resume Search:
10
Working
12. Data is stored in a tuple based structured after processing the resumes.
Each tuple stores a resume and the fields of resume are stored in form of comma
separated columns.
Each field or column of a tuple stores certain information and thus each field
from the resume is matched to a column of a tuple. Also, the data that could not
be parsed successfully and the low priority information is stored in the last
column of the tuple.
While on displaying the data from a search query the data is searched from the
storage and resume are displayed according to the rank obtained by each resume
when searched for the appropriate fields. 12
WorkingStorage & Display:
14. Future Work
Resume Summarizer operates on the collection of resume that are in certain
formats and stores the resume in structured format. This project parses the
resume on a certain number of file formats and there is a future scope of
increasing these number of format to even more complicated file structures like
images.
Also the process of identifying the elements can be improved by implementing
machine learning into the summarizer after parsing.
14