SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1152
Extract and Analyze Data from PDF File and Web : A Review
Darshana Jadhav1, Dhanashree Jadhav2, Pooja More 3, Harshali Nikam
1 Darshana Jadhav , Dept. of computer Engineering, MET, Nashik
2 Dhanashree Jadhav , Dept. of computer Engineering, MET, Nashik
3 Pooja More, Dept. of computer Engineering, MET, Nashik
4Harshali Nikam, Dept. of computer Engineering, MET, Nashik
Assistant Professor : Ms.Tusharsaheb Patil
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Current survey done on today’s scenario shows,
result gadget declared by Universities(eg. Pune Uni.) for
engineering is in PDF file format. The PDF datacontentsdetail
such as seat no, centre, permanent registration no.(PRN),
Name, Subjects, Marks, etc. Presently PDF file is extracted in
excel file format, this conversion is done in order to extract
various reporting formats required by
department/college/university at various level. Thus, it
involves somewhat manual process. However, all these
operation have certain limitations such as semi-automated
process, no GUI present, SMS gateway is not support, E-mail
gateway is not supported, and mainly graphical analysis of
data is not available. On the basis of survey done, we came
across existing applications which are semi-automated or
automated with some restrictions which does not allow full
automation of result analysis in proper format. Thus none of
the applications supported the full automation. To overcome
above said drawbacks, we proposed a new system for result
analysis, which is automated with features like Auto-output
generation in different database format like excel, PDF, Mysql
for further compatibility with other ERP system as per user
selection, active SMS gateway, active Email gateway,
interactive and user friendly GUI, graphical result analysis
with text. In Proposed system we have targetedthelimitations
to provide effective solutionforresultanalysis. Thissystem will
also work on current grade system. Where we are going to
maintain database of students which willshow wholestatusof
students. Automated solutions provided by the system will
make exam department activities more efficient by covering
most of the important drawbacks of manual system, namely
speed, precision and simplicity. It will also work as a
generalized system to support any type and format ofPDFfile.
A centralized system will ensure that the activities in the
context of an examination can be managed effectively, while
also making it more accessible and convenient for both staff
and students.
Key Words: Information Extraction, Pattern Matching,
Data Mining, Web Mining.
1.INTRODUCTION
Result evaluation and analysis requires plenty of manual
work. so in order to reduce this issue we need system which
will support automation. Our systemwill work foruniversity
results. Nowadays in most of the engineering colleges , the
traditional method carried out by the colleges is to fill the
data within excel sheet manually for each student from the
pdf file provided by the university. There are so many
formulas for categories the things like toppers, pass, fail,
droppers, etc. This is a complete manual process where
chances of mistakes are so high. Similarly in diploma
colleges results are declared online, so data is taken from
web and fill into excel sheet manually and accordingly the
data evaluated and analyzed as per requirements of result
reports. This process is actually a verytimeconsuming.Thus
in order to fill ease the people doing this analysis, we have
propose one system which would automate the process of
result evaluation and analysis. This system take the input as
pdf file provided by university and save into database, once
the data get store into database we can use the data to get
the information using various queries.
2. LITERATURE SURVEY
In Existing System the data sort and analyze by manual
processes. User has to copy/paste the pdf file into excel
sheets and have to manually sort it to rank students.
Proposed system will be used to automate these processes.
Several researchers work on the topic of extracting require
data from unstructured data such as PDF. Here we are going
describe the tools which are closely related to proposed
system in this section. In reference [1] the authors used the
PDF-Box technique to extract references from PDF which
converts the PDF data into text and get the require
information from data. In reference [2] author used LA-
PDFText technique which is a commandlineutility toextract
text from PDF just by providing path of PDFfile.In[3] author
uses a technique for extraction of data from the structured
web pages. In reference [4] author uses a technique called
tag injection which inserts format information into text
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1153
document which is in the form of tags. It helpstotransforma
text into semi structure data, their is complete details are
discussed about data extraction .
3. PROPOSED SYSTEM
Following figure shows the detailed view of the proposed
system :
Fig: Detailed View
3.1 PDF Box :
PDF file is input for the system, so system has to first extract
data from PDF files. Here the PDF file is result gadget
provided by the Universities. so it does not contain any
diagram or images. To extract data from PDF files, we are
going to use PDF box technique.PDF box is PDF processing
library, it supports development and conversion of PDF
documents in addition it also provides command line utility
for performing various operations actually. PDF box has
ability to quickly and accurately extract the contents from
PDF documents. To use PDF box technique, we have to
include iTextSharp package. iText provides API inlanguages
such as .net, android, JAE, java developers to provide
enhancement to their application with a PDF functionality.It
provide functionalities such as PDF generation, PDF
manipulation, and PDF form filling. After including the
package, PdfReader is used to read the PDF file and then
PdfTextExtractor is used to extract the portable document
data.
3.2 Sorting of data :
Text extracted from PDF files is stored in text file. Proposed
system categories the data according to each department.
This separation is done by string manipulation operations.
3.3 Remove Noisy/Redundant Data :
After getting the essential data from the extracted PDF data,
and filtering the data which is not required.Forthispurpose,
we are using parsing technique which will help us to do
parsing line by line. Also the PDF contain lots of redundant
data E.g. Input PDF file contain same subject list for each
student for his/her of particular department. Then such
redundant data is also removed and only single copy of data
is stored in the database system.
3.4 WEB Extraction :
WEB extractor recognize the relevant data from the web
page and extract two different types ofdata outfromitoneis
source code and another is plain text displayedon webpage.
3.5. DOM Parser for Web Mining:
DOM is Document ObjectModel usuallyusedfororganize the
nodes into tree structure extracted from web pages.
3.6 Pattern Mining :
System uses pattern mining methodtofindthe essential data
from extracted document. The extracted plain text by the
web extractor is checked this the specified pattern and
mined the data accordingly.
3.7 Read and Analyze required data :
After elimination the noisy and redundant data, system has
need actual data . Then this data is accessedfor eachstudent.
Analysis of each student data is to be done by thesystem.For
the first time system will divides the department then
reading the subject list of each department, seperating
subjects into theory, practical, term-work and oral wise,
online exam and insem exam and to generate the final result
of every individual . Also system read personal information
of each student from text extracted from PDF.
3.8. Database designed and extracted data filled in
the system :
All gathered data which is useful need to be store into the
database system. Thus systemdesignsdatabasedynamically
by reading the contents from pdf file. After database is
designed, department wise tables are generated. Then in
tables analyzed data will be store.
3.9. Reports generation:
Reports are generated using the data is stored in the
database. The result reports will be generate by means of
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1154
requirements. The reports like college topper, department
wise topper, subject wise topper, ATKT’s, dropper student,
etc. System will generate result reports which are send via
mail to respective department/students.
4. CONCLUSIONS
System will sort all the data according to students marks
and grades if requested by user, for this we use data mining
techniques ,PDF extraction, data fetching and sorting
techniques, which will make user to simplify the data easily
and make result reports accordingly along with graphical
representation(using pie charts and graphs). It will become
convenient for students to receive results through SMS and
Email gateways. By this way result data will be organized
well , which becomes easy to manage the result records.
ACKNOWLEDGEMENT
We express our sincere gratitude to Prof. Mr. Tusharsaheb
Patil (Assistant Professor, MET BKC IOE) for his supportand
guidance. We would also like to thank Prof.Mr.PankajDeore
(Asst. Professor, MET BKC IOE) for his valuable words of
advice. We are also extremely grateful to our respected
H.O.D. Dr. M. U. Kharat and Principal Dr. V. P. Wani for
providing all facilities and every help for smooth progressof
project work. We are thankful for our family members and
friends for motivating us.
REFERENCES
[1]A Strategy for Automatically Extracting References from
PDF Documents. Neide Ferreira Alves, Universidade do
Estado do AmazonasManaus, Brazil Rafael Dueire Lins,
Universidade Federal de Pernambuco Recife, Brazil Maria
Lencastre, Universidade de PernambucoRecife.
[2] Automatic classification of scientific papers in PDF for
populating ontologies. JuanC.Redon-Miranda,Julia Y.Arana-
Llanes, Juan G. González-Serna andNimrodGonzález-Franco
Department of Computer Science National Center for
Research and Technological Development, CENIDET
Cuernavaca, México {juancarlos, juliaarana, gabriel,
[3] HWPDE: Novel Approach for Data Extraction from
Structured Web Pages .Manpreet Singh Sehgal Department
of information Technology, Apeejay College of Engineering,
Sohna, Gurgaon Anuradha PhD, Department of Computer
Engineering, YMCA University of Sc. & Technology,
Faridabad
[4] A new method of information extraction from pdf
filesFANG YUAN1,2, BO LIU College of Mathematics and
Computer Science, Hebei University, Baoding, 071002
P.R.China College of Information Science and Engineering,
Northeastern University, She0nyang, 110004 P.R.China.
BIOGRAPHIES
Darshana Jadhav Pursuing her computer degree
course in MET’s Institute of Engineering, Nashik. Her
interest include database system.
Dhanashree JadhavPursuing her computer degree
course in MET’s Institute of Engineering, Nashik. Her
interest include database system.
Pooja MorePursuing her computer degree course in
MET’s Institute of Engineering, Nashik. Her interest
include database system and data mining,webmining.
Harshali Nikam Pursuing her computer degree
course in MET’s Institute of Engineering, Nashik. Her
interest include database system and web mining.

More Related Content

What's hot

Student Administration System
Student Administration SystemStudent Administration System
Student Administration System
ijtsrd
 
Systems Development & Procurement
Systems Development & Procurement Systems Development & Procurement
Systems Development & Procurement
Julius Noble Ssekazinga
 
Project proposal of Library Management System.
Project proposal of Library Management System. Project proposal of Library Management System.
Project proposal of Library Management System.
Arjishman Roy
 
Library Management System
Library Management SystemLibrary Management System
Library Management System
Ranjan Ranjan
 
library management system
library management systemlibrary management system
library management system
aniket chauhan
 
Project for Student Result System
Project for Student Result SystemProject for Student Result System
Project for Student Result System
KuMaR AnAnD
 
Introduction to files and db systems 1.0
Introduction to files and db systems 1.0Introduction to files and db systems 1.0
Introduction to files and db systems 1.0
Dr. C.V. Suresh Babu
 
Library management system project
Library management system projectLibrary management system project
Library management system project
AJAY KUMAR
 
IP Final project 12th
IP Final project 12thIP Final project 12th
IP Final project 12th
SantySS
 
System requirement specification report(srs) T/TN/Gomarankadawala Maha vidyal...
System requirement specification report(srs) T/TN/Gomarankadawala Maha vidyal...System requirement specification report(srs) T/TN/Gomarankadawala Maha vidyal...
System requirement specification report(srs) T/TN/Gomarankadawala Maha vidyal...
Ravindu Sandeepa
 
Library Management System
Library  Management  SystemLibrary  Management  System
Library Management System
Priyatham Bollimpalli
 
Libary management system
Libary management system Libary management system
Libary management system
praladh timsina
 
c++ library management
c++ library managementc++ library management
c++ library management
shivani menon
 
Database system structure
Database system structureDatabase system structure
Database system structure
Dr. C.V. Suresh Babu
 
Library management project
Library management projectLibrary management project
Library management project
Sumedh Kumar Singh
 
Online Library management system proposal by Banuka Dananjaya Subasinghe
Online Library management system proposal by Banuka Dananjaya SubasingheOnline Library management system proposal by Banuka Dananjaya Subasinghe
Online Library management system proposal by Banuka Dananjaya Subasinghe
BanukaSubasinghe
 
Software Development Methodologies Library Management System (Part-1)
Software Development Methodologies Library Management System (Part-1)Software Development Methodologies Library Management System (Part-1)
Software Development Methodologies Library Management System (Part-1)
Totan Banik
 
IRJET- Identify the Human or Bots Twitter Data using Machine Learning Alg...
IRJET-  	  Identify the Human or Bots Twitter Data using Machine Learning Alg...IRJET-  	  Identify the Human or Bots Twitter Data using Machine Learning Alg...
IRJET- Identify the Human or Bots Twitter Data using Machine Learning Alg...
IRJET Journal
 

What's hot (18)

Student Administration System
Student Administration SystemStudent Administration System
Student Administration System
 
Systems Development & Procurement
Systems Development & Procurement Systems Development & Procurement
Systems Development & Procurement
 
Project proposal of Library Management System.
Project proposal of Library Management System. Project proposal of Library Management System.
Project proposal of Library Management System.
 
Library Management System
Library Management SystemLibrary Management System
Library Management System
 
library management system
library management systemlibrary management system
library management system
 
Project for Student Result System
Project for Student Result SystemProject for Student Result System
Project for Student Result System
 
Introduction to files and db systems 1.0
Introduction to files and db systems 1.0Introduction to files and db systems 1.0
Introduction to files and db systems 1.0
 
Library management system project
Library management system projectLibrary management system project
Library management system project
 
IP Final project 12th
IP Final project 12thIP Final project 12th
IP Final project 12th
 
System requirement specification report(srs) T/TN/Gomarankadawala Maha vidyal...
System requirement specification report(srs) T/TN/Gomarankadawala Maha vidyal...System requirement specification report(srs) T/TN/Gomarankadawala Maha vidyal...
System requirement specification report(srs) T/TN/Gomarankadawala Maha vidyal...
 
Library Management System
Library  Management  SystemLibrary  Management  System
Library Management System
 
Libary management system
Libary management system Libary management system
Libary management system
 
c++ library management
c++ library managementc++ library management
c++ library management
 
Database system structure
Database system structureDatabase system structure
Database system structure
 
Library management project
Library management projectLibrary management project
Library management project
 
Online Library management system proposal by Banuka Dananjaya Subasinghe
Online Library management system proposal by Banuka Dananjaya SubasingheOnline Library management system proposal by Banuka Dananjaya Subasinghe
Online Library management system proposal by Banuka Dananjaya Subasinghe
 
Software Development Methodologies Library Management System (Part-1)
Software Development Methodologies Library Management System (Part-1)Software Development Methodologies Library Management System (Part-1)
Software Development Methodologies Library Management System (Part-1)
 
IRJET- Identify the Human or Bots Twitter Data using Machine Learning Alg...
IRJET-  	  Identify the Human or Bots Twitter Data using Machine Learning Alg...IRJET-  	  Identify the Human or Bots Twitter Data using Machine Learning Alg...
IRJET- Identify the Human or Bots Twitter Data using Machine Learning Alg...
 

Similar to Extract and Analyze Data from PDF File and Web : A Review

School management System
School management SystemSchool management System
School management System
HATIM Bhagat
 
Development of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLPDevelopment of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLP
IRJET Journal
 
IRJET- Intelligence Extraction using Machine Learning Technics
IRJET- Intelligence Extraction using Machine Learning TechnicsIRJET- Intelligence Extraction using Machine Learning Technics
IRJET- Intelligence Extraction using Machine Learning Technics
IRJET Journal
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & Python
IRJET Journal
 
IRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining TechniquesIRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining Techniques
IRJET Journal
 
IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction Framework
IRJET Journal
 
Database Engine Control though Web Portal Monitoring Configuration
Database Engine Control though Web Portal Monitoring ConfigurationDatabase Engine Control though Web Portal Monitoring Configuration
Database Engine Control though Web Portal Monitoring Configuration
IRJET Journal
 
IRJET- E-Attendance Manager: A Review
IRJET-  	  E-Attendance Manager: A ReviewIRJET-  	  E-Attendance Manager: A Review
IRJET- E-Attendance Manager: A Review
IRJET Journal
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code Mining
IRJET Journal
 
Content Migration -FileNet Image Service to P8
Content Migration -FileNet Image Service to P8Content Migration -FileNet Image Service to P8
Content Migration -FileNet Image Service to P8
IRJET Journal
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Piyush Kumar
 
IRJET- Efficient Student Faculty Management System
IRJET- Efficient Student Faculty Management SystemIRJET- Efficient Student Faculty Management System
IRJET- Efficient Student Faculty Management System
IRJET Journal
 
IRJET- Placemate - Sakec Portal
IRJET-  	  Placemate - Sakec PortalIRJET-  	  Placemate - Sakec Portal
IRJET- Placemate - Sakec Portal
IRJET Journal
 
CV INSPECTION USING NLP AND MACHINE LEARNING
CV INSPECTION USING NLP AND MACHINE LEARNINGCV INSPECTION USING NLP AND MACHINE LEARNING
CV INSPECTION USING NLP AND MACHINE LEARNING
IRJET Journal
 
Cloud Computing Based System Integration in Education
Cloud Computing Based System Integration in EducationCloud Computing Based System Integration in Education
Cloud Computing Based System Integration in Education
IRJET Journal
 
Student database management system
Student database management systemStudent database management system
Student database management system
Snehal Raut
 
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data FeedEfficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
IRJET Journal
 
Job portal
Job portalJob portal
Job portal
Arman Ahmed
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
ijcsit
 
IRJET- Plug-In based System for Data Visualization
IRJET- Plug-In based System for Data VisualizationIRJET- Plug-In based System for Data Visualization
IRJET- Plug-In based System for Data Visualization
IRJET Journal
 

Similar to Extract and Analyze Data from PDF File and Web : A Review (20)

School management System
School management SystemSchool management System
School management System
 
Development of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLPDevelopment of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLP
 
IRJET- Intelligence Extraction using Machine Learning Technics
IRJET- Intelligence Extraction using Machine Learning TechnicsIRJET- Intelligence Extraction using Machine Learning Technics
IRJET- Intelligence Extraction using Machine Learning Technics
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & Python
 
IRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining TechniquesIRJET- PDF Extraction using Data Mining Techniques
IRJET- PDF Extraction using Data Mining Techniques
 
IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction Framework
 
Database Engine Control though Web Portal Monitoring Configuration
Database Engine Control though Web Portal Monitoring ConfigurationDatabase Engine Control though Web Portal Monitoring Configuration
Database Engine Control though Web Portal Monitoring Configuration
 
IRJET- E-Attendance Manager: A Review
IRJET-  	  E-Attendance Manager: A ReviewIRJET-  	  E-Attendance Manager: A Review
IRJET- E-Attendance Manager: A Review
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code Mining
 
Content Migration -FileNet Image Service to P8
Content Migration -FileNet Image Service to P8Content Migration -FileNet Image Service to P8
Content Migration -FileNet Image Service to P8
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
IRJET- Efficient Student Faculty Management System
IRJET- Efficient Student Faculty Management SystemIRJET- Efficient Student Faculty Management System
IRJET- Efficient Student Faculty Management System
 
IRJET- Placemate - Sakec Portal
IRJET-  	  Placemate - Sakec PortalIRJET-  	  Placemate - Sakec Portal
IRJET- Placemate - Sakec Portal
 
CV INSPECTION USING NLP AND MACHINE LEARNING
CV INSPECTION USING NLP AND MACHINE LEARNINGCV INSPECTION USING NLP AND MACHINE LEARNING
CV INSPECTION USING NLP AND MACHINE LEARNING
 
Cloud Computing Based System Integration in Education
Cloud Computing Based System Integration in EducationCloud Computing Based System Integration in Education
Cloud Computing Based System Integration in Education
 
Student database management system
Student database management systemStudent database management system
Student database management system
 
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data FeedEfficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed
 
Job portal
Job portalJob portal
Job portal
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
 
IRJET- Plug-In based System for Data Visualization
IRJET- Plug-In based System for Data VisualizationIRJET- Plug-In based System for Data Visualization
IRJET- Plug-In based System for Data Visualization
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
Madhumitha Jayaram
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
PauloRodrigues104553
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 

Recently uploaded (20)

Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 

Extract and Analyze Data from PDF File and Web : A Review

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1152 Extract and Analyze Data from PDF File and Web : A Review Darshana Jadhav1, Dhanashree Jadhav2, Pooja More 3, Harshali Nikam 1 Darshana Jadhav , Dept. of computer Engineering, MET, Nashik 2 Dhanashree Jadhav , Dept. of computer Engineering, MET, Nashik 3 Pooja More, Dept. of computer Engineering, MET, Nashik 4Harshali Nikam, Dept. of computer Engineering, MET, Nashik Assistant Professor : Ms.Tusharsaheb Patil ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Current survey done on today’s scenario shows, result gadget declared by Universities(eg. Pune Uni.) for engineering is in PDF file format. The PDF datacontentsdetail such as seat no, centre, permanent registration no.(PRN), Name, Subjects, Marks, etc. Presently PDF file is extracted in excel file format, this conversion is done in order to extract various reporting formats required by department/college/university at various level. Thus, it involves somewhat manual process. However, all these operation have certain limitations such as semi-automated process, no GUI present, SMS gateway is not support, E-mail gateway is not supported, and mainly graphical analysis of data is not available. On the basis of survey done, we came across existing applications which are semi-automated or automated with some restrictions which does not allow full automation of result analysis in proper format. Thus none of the applications supported the full automation. To overcome above said drawbacks, we proposed a new system for result analysis, which is automated with features like Auto-output generation in different database format like excel, PDF, Mysql for further compatibility with other ERP system as per user selection, active SMS gateway, active Email gateway, interactive and user friendly GUI, graphical result analysis with text. In Proposed system we have targetedthelimitations to provide effective solutionforresultanalysis. Thissystem will also work on current grade system. Where we are going to maintain database of students which willshow wholestatusof students. Automated solutions provided by the system will make exam department activities more efficient by covering most of the important drawbacks of manual system, namely speed, precision and simplicity. It will also work as a generalized system to support any type and format ofPDFfile. A centralized system will ensure that the activities in the context of an examination can be managed effectively, while also making it more accessible and convenient for both staff and students. Key Words: Information Extraction, Pattern Matching, Data Mining, Web Mining. 1.INTRODUCTION Result evaluation and analysis requires plenty of manual work. so in order to reduce this issue we need system which will support automation. Our systemwill work foruniversity results. Nowadays in most of the engineering colleges , the traditional method carried out by the colleges is to fill the data within excel sheet manually for each student from the pdf file provided by the university. There are so many formulas for categories the things like toppers, pass, fail, droppers, etc. This is a complete manual process where chances of mistakes are so high. Similarly in diploma colleges results are declared online, so data is taken from web and fill into excel sheet manually and accordingly the data evaluated and analyzed as per requirements of result reports. This process is actually a verytimeconsuming.Thus in order to fill ease the people doing this analysis, we have propose one system which would automate the process of result evaluation and analysis. This system take the input as pdf file provided by university and save into database, once the data get store into database we can use the data to get the information using various queries. 2. LITERATURE SURVEY In Existing System the data sort and analyze by manual processes. User has to copy/paste the pdf file into excel sheets and have to manually sort it to rank students. Proposed system will be used to automate these processes. Several researchers work on the topic of extracting require data from unstructured data such as PDF. Here we are going describe the tools which are closely related to proposed system in this section. In reference [1] the authors used the PDF-Box technique to extract references from PDF which converts the PDF data into text and get the require information from data. In reference [2] author used LA- PDFText technique which is a commandlineutility toextract text from PDF just by providing path of PDFfile.In[3] author uses a technique for extraction of data from the structured web pages. In reference [4] author uses a technique called tag injection which inserts format information into text
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1153 document which is in the form of tags. It helpstotransforma text into semi structure data, their is complete details are discussed about data extraction . 3. PROPOSED SYSTEM Following figure shows the detailed view of the proposed system : Fig: Detailed View 3.1 PDF Box : PDF file is input for the system, so system has to first extract data from PDF files. Here the PDF file is result gadget provided by the Universities. so it does not contain any diagram or images. To extract data from PDF files, we are going to use PDF box technique.PDF box is PDF processing library, it supports development and conversion of PDF documents in addition it also provides command line utility for performing various operations actually. PDF box has ability to quickly and accurately extract the contents from PDF documents. To use PDF box technique, we have to include iTextSharp package. iText provides API inlanguages such as .net, android, JAE, java developers to provide enhancement to their application with a PDF functionality.It provide functionalities such as PDF generation, PDF manipulation, and PDF form filling. After including the package, PdfReader is used to read the PDF file and then PdfTextExtractor is used to extract the portable document data. 3.2 Sorting of data : Text extracted from PDF files is stored in text file. Proposed system categories the data according to each department. This separation is done by string manipulation operations. 3.3 Remove Noisy/Redundant Data : After getting the essential data from the extracted PDF data, and filtering the data which is not required.Forthispurpose, we are using parsing technique which will help us to do parsing line by line. Also the PDF contain lots of redundant data E.g. Input PDF file contain same subject list for each student for his/her of particular department. Then such redundant data is also removed and only single copy of data is stored in the database system. 3.4 WEB Extraction : WEB extractor recognize the relevant data from the web page and extract two different types ofdata outfromitoneis source code and another is plain text displayedon webpage. 3.5. DOM Parser for Web Mining: DOM is Document ObjectModel usuallyusedfororganize the nodes into tree structure extracted from web pages. 3.6 Pattern Mining : System uses pattern mining methodtofindthe essential data from extracted document. The extracted plain text by the web extractor is checked this the specified pattern and mined the data accordingly. 3.7 Read and Analyze required data : After elimination the noisy and redundant data, system has need actual data . Then this data is accessedfor eachstudent. Analysis of each student data is to be done by thesystem.For the first time system will divides the department then reading the subject list of each department, seperating subjects into theory, practical, term-work and oral wise, online exam and insem exam and to generate the final result of every individual . Also system read personal information of each student from text extracted from PDF. 3.8. Database designed and extracted data filled in the system : All gathered data which is useful need to be store into the database system. Thus systemdesignsdatabasedynamically by reading the contents from pdf file. After database is designed, department wise tables are generated. Then in tables analyzed data will be store. 3.9. Reports generation: Reports are generated using the data is stored in the database. The result reports will be generate by means of
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1154 requirements. The reports like college topper, department wise topper, subject wise topper, ATKT’s, dropper student, etc. System will generate result reports which are send via mail to respective department/students. 4. CONCLUSIONS System will sort all the data according to students marks and grades if requested by user, for this we use data mining techniques ,PDF extraction, data fetching and sorting techniques, which will make user to simplify the data easily and make result reports accordingly along with graphical representation(using pie charts and graphs). It will become convenient for students to receive results through SMS and Email gateways. By this way result data will be organized well , which becomes easy to manage the result records. ACKNOWLEDGEMENT We express our sincere gratitude to Prof. Mr. Tusharsaheb Patil (Assistant Professor, MET BKC IOE) for his supportand guidance. We would also like to thank Prof.Mr.PankajDeore (Asst. Professor, MET BKC IOE) for his valuable words of advice. We are also extremely grateful to our respected H.O.D. Dr. M. U. Kharat and Principal Dr. V. P. Wani for providing all facilities and every help for smooth progressof project work. We are thankful for our family members and friends for motivating us. REFERENCES [1]A Strategy for Automatically Extracting References from PDF Documents. Neide Ferreira Alves, Universidade do Estado do AmazonasManaus, Brazil Rafael Dueire Lins, Universidade Federal de Pernambuco Recife, Brazil Maria Lencastre, Universidade de PernambucoRecife. [2] Automatic classification of scientific papers in PDF for populating ontologies. JuanC.Redon-Miranda,Julia Y.Arana- Llanes, Juan G. González-Serna andNimrodGonzález-Franco Department of Computer Science National Center for Research and Technological Development, CENIDET Cuernavaca, México {juancarlos, juliaarana, gabriel, [3] HWPDE: Novel Approach for Data Extraction from Structured Web Pages .Manpreet Singh Sehgal Department of information Technology, Apeejay College of Engineering, Sohna, Gurgaon Anuradha PhD, Department of Computer Engineering, YMCA University of Sc. & Technology, Faridabad [4] A new method of information extraction from pdf filesFANG YUAN1,2, BO LIU College of Mathematics and Computer Science, Hebei University, Baoding, 071002 P.R.China College of Information Science and Engineering, Northeastern University, She0nyang, 110004 P.R.China. BIOGRAPHIES Darshana Jadhav Pursuing her computer degree course in MET’s Institute of Engineering, Nashik. Her interest include database system. Dhanashree JadhavPursuing her computer degree course in MET’s Institute of Engineering, Nashik. Her interest include database system. Pooja MorePursuing her computer degree course in MET’s Institute of Engineering, Nashik. Her interest include database system and data mining,webmining. Harshali Nikam Pursuing her computer degree course in MET’s Institute of Engineering, Nashik. Her interest include database system and web mining.