1. Department of Artificial Intelligence
Project Phase –I Title Finalization Seminar
Winter 2022 (Session: 2022-2023)
G H Raisoni College of Engineering, Nagpur
Presented By:
1. Prajwal Kolhe - 45(A)
2. Faizan Khan - 68(A)
3. Apoorva Dhimole - 65(A)
4. Lakshya Chauraghade - 39(A)
Guide:-
Pranali Dhawas
Assistance Professor
GHRCE ,Nagpur
Date : 5th Aug 2022
Title of the Project:-
Document Analyzer using Deep Learning
2. Introduction
• Data in different forms are present in
every organization like colleges ,
schools, companies etc.
• In this research, our objective is to
build a prediction model for analyzing
and classifying these documents.
• It is the most tedious job low wages
workers do . It is a time consuming
but necessary task.
• Similar to other methods of analysis
in qualitative research, document
analysis requires repeated review,
examination, and interpretation of
the data in order to gain meaning and
empirical knowledge of the construct
being studied.
3. Abstract
• Many companies and big organizations have
numerous documents in bulk and required to
keep them in different clusters.
• In recent years, this job has becoming time
consuming as no of document and article has
increased .
• The objective for this study is to identify the
document and classifying them accordingly.
• Documentary analysis is a type of qualitative
research in which documents are reviewed by
the analyst to assess an appraisal theme.
Dissecting documents involves coding content
into subjects like how focus group or interview
transcripts are investigated. A rubric can
likewise be utilized to review or score a
document.
4. • To analyze and classify the documents using CNN .
• To extract features of the documents using algorithms.
• To create a working model that classify the document on the basics of
feature that are extracted.
• The model will use image segmentation and CNN to determine the
articles.
Objectives
5. • N. Chen and D. Blostein. A survey of document image classification: Problem
statement, classifier architecture and performance evaluation. IJDAR, 10(1):1–
16, 2007
• K. Collins-Thompson and R. Nickolov. A clustering-based algorithm for
automatic document separation. In SIGIR, pages 1–8, 2002.
• CNNs are trained to perform a classification task, but a CNN trained on
classification can be exploited to perform retrieval also. These feature vectors
are high-dimensional, but their dimensionality can be reduced significantly via
principal component analysis without significantly affecting their discriminative
power . Ranking these images of the training data will return a sorted list of
documents.
Literature Survey (Survey of existing products)
6. • The type of document is determined according to many specifications, such as
the design of the document, the header and footer, the body of the document
and how the writing is formatted within the document, all of these factors help
in the process of identifying the type of document.
• But some type of documents also have common features for example
government certificate have seal of the govt. and/or logo , which can help
classify the documents.
Proposed Methodology/System Architecture
8. Our proposed solution is model that will accurately classify document and articles,
proposed model is made using CNN and image feature extraction , fine-tuning
these features that are extracted on document images pushed results even
higher.
the CNN approach to document image representation exceeds the power of hand-
crafted alternatives.
Conclusion
9. [1] Batres-Estrada, B. (2015). Deep learning for multivariate financial time series.
[2] Emerson, S., Kennedy, R., O'Shea, L., & O'Brien, J. (2019, May). Trends and Applications of Machine
Learning in Quantitative Finance. In 8th International Conference on Economics and Finance Research
(ICEFR 2019).
[3] Heaton, J. B., Polson, N. G., & Witte, J. H. (2017). Deep learning for finance: deep portfolios. Applied
Stochastic Models in Business and Industry, 33(1), 3-12.
[4] Moritz, B., & Zimmermann, T. (2016). Tree-based conditional portfolio sorts: The relation between
past and future stock returns. Available at SSRN 2740751.
[5] Olah, C. (2015). Understanding lstm networks–colah’s blog. Colah. github. io.
[6] Paiva, F. D., Cardoso, R. T. N., Hanaoka, G. P., & Duarte, W. M. (2018). DecisionMaking for Financial
Trading: A Fusion Approach of Machine Learning and Portfolio Selection. Expert Systems with
Applications.
[7] Patterson J., 2017. Deep Learning: A Practitioner’s Approach, O’Reilly Media.
[8] Siami-Namini, S., & Namin, A. S. (2018). Forecasting economics and financial time series: Arima vs.
lstm. arXiv preprint arXiv:1803.06386.
[9] Takeuchi, L., & Lee, Y. Y. A. (2013). Applying deep learning to enhance momentum trading strategies
in stocks. In Technical Report. Stanford University.
References