Automated data extraction from newspaper documents

•

0 likes•94 views

The document discusses information extraction from text documents to make the data easier to maintain and process. It aims to create a system that can gather information from sources like newspapers and provide it to users simply. The system would classify documents in a database and use algorithms like optical character recognition and document classification to extract structured data from unstructured documents. It involves steps like cropping images, OCR, summarization, and classification to digitize newspaper content and extract useful information for users.

Technology

POWERED BY : GROUP 1
NIA/DESI/POPON/JIMMY/RASYID

-Information Extraction (IE) as an important thing for archievingdata from text paper document to be easily maintain and reprocess data
-The manual process of gathering the information consuming too much time and energy

-To create a system that can gathering and also provide an information to user with a simple way
-To classify documents in database
-To provide the good algorithm in information extraction
-To provide application that make SubDirectoratePublic Opinion, BPS to archievingdata from newspaper

-Informationextraction(IE)isthetaskofautomaticallyextractingstructuredinformationfromunstructeredand/orsemi-structureddocuments
-OCRisaspecialsystemisusedtoidentifyprintedtextpapertypedandprintedusingaprinterwhichisthenfurtherprocessedbyusingaparticularalgorithmintoacharacterthatcanberecognizedandprocessedintoinformation
-Documentclassificationonthelaststep

1. Input(Newspaper/epaper)
2. Cropping+ Image Processing
3. OCR
4. Summarizing
5. Classification
Newspaper,etc
OCR
Plain text
Input
Proses
(get information)
Output
e-paper

Gerbawani, R. A. Somadi. 2013. “Peringkasan Dokumen Bahasa Indonesia Menggunakan Logika Fuzzy”. Bogor : Fakultas Matematika dan Ilmu Pengetahuan Alam IPB.
Trisedya, Bayu Distiawan & Jais, Hardinal. 2009. “Klasifikasi Dokumen Menggunakan Algoritma Naive Bayes dengan Penambahan Parameter Probabilitas Parent Category”. Jakarta: Fakultas Ilmu Komputer Universitas Indonesia.
Pramesti, TitisH.W. 2014. “PengenalanKarakterTeksMenggunakanMetodeNeural Network Backpropagation”. Malang: JurusanTeknikElektro, FakultasTeknikUniversitasBrawijaya.

Similar to Automated data extraction from newspaper documents

IRJET- PDF Extraction using Data Mining TechniquesIRJET Journal

Stages and components of ipVikash Rathour

INFORMATION RETRIEVAL ‎AND DISSEMINATIONLibcorpio

Slide Ngajar E-Filing cover.pdfrahmantoyuri

Sulthan's DBMS for_Computer_ScienceSULTHAN BASHA

Information Storage and Retrieval : A Case StudyBhojaraju Gunjal

RM 4 UNIT.pptxkumarkaushal17

Introduction to DatabasesMohd Tousif

50120140503012IAEME Publication

Extract and Analyze Data from PDF File and Web : A ReviewIRJET Journal

Resume Parsing And Processing Using Hadoop (1)Sourav Madhesiya

Ijetcas14 409Iasir Journals

CRC Final ReportSangram Keshari Senapati

A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMcscpconf

A language independent approach to develop urduir systemcsandit

Computer is an electronic device or combination of electronic devicesArti Arora

V6 i5 0267om12345

Project report of OCR RecognitionBharat Kalia

Extraction and Retrieval of Web based Content in Web EngineeringIRJET Journal

Modul e filingdedidarwis

Similar to Automated data extraction from newspaper documents (20)

IRJET- PDF Extraction using Data Mining Techniques

Stages and components of ip

INFORMATION RETRIEVAL ‎AND DISSEMINATION

Slide Ngajar E-Filing cover.pdf

Sulthan's DBMS for_Computer_Science

Information Storage and Retrieval : A Case Study

RM 4 UNIT.pptx

Introduction to Databases

50120140503012

Extract and Analyze Data from PDF File and Web : A Review

Resume Parsing And Processing Using Hadoop (1)

Ijetcas14 409

CRC Final Report

A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM

A language independent approach to develop urduir system

Computer is an electronic device or combination of electronic devices

V6 i5 0267

Project report of OCR Recognition

Extraction and Retrieval of Web based Content in Web Engineering

Modul e filing

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brandgvaughan

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Pigging Solutions in Pet Food ManufacturingPigging Solutions

AI as an Interface for Commercial BuildingsMemoori

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

"ML in Production",Oleksandr BaganFwdays

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

CloudStudio User manual (basic edition):comworks

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Install Stable Diffusion in windows machinePadma Pradeep

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand

SAP Build Work Zone - Overview L2-L3.pptx

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Nell’iperspazio con Rocket: il Framework Web di Rust!

Dev Dives: Streamline document processing with UiPath Studio Web

Pigging Solutions in Pet Food Manufacturing

AI as an Interface for Commercial Buildings

Are Multi-Cloud and Serverless Good or Bad?

Connect Wave/ connectwave Pitch Deck Presentation

"ML in Production",Oleksandr Bagan

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

"Debugging python applications inside k8s environment", Andrii Soldatenko

Unraveling Multimodality with Large Language Models.pdf

SIP trunking in Janus @ Kamailio World 2024

CloudStudio User manual (basic edition):

My Hashitalk Indonesia April 2024 Presentation

Install Stable Diffusion in windows machine

Scanning the Internet for External Cloud Exposures via SSL Certs

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Artificial intelligence in cctv survelliance.pptx

Automated data extraction from newspaper documents

1. POWERED BY : GROUP 1 NIA/DESI/POPON/JIMMY/RASYID

2. -Information Extraction (IE) as an important thing for archievingdata from text paper document to be easily maintain and reprocess data -The manual process of gathering the information consuming too much time and energy

3. -To create a system that can gathering and also provide an information to user with a simple way -To classify documents in database -To provide the good algorithm in information extraction -To provide application that make SubDirectoratePublic Opinion, BPS to archievingdata from newspaper

4. -Informationextraction(IE)isthetaskofautomaticallyextractingstructuredinformationfromunstructeredand/orsemi-structureddocuments -OCRisaspecialsystemisusedtoidentifyprintedtextpapertypedandprintedusingaprinterwhichisthenfurtherprocessedbyusingaparticularalgorithmintoacharacterthatcanberecognizedandprocessedintoinformation -Documentclassificationonthelaststep

5. 1. Input(Newspaper/epaper) 2. Cropping+ Image Processing 3. OCR 4. Summarizing 5. Classification Newspaper,etc OCR Plain text Input Proses (get information) Output e-paper

7. Gerbawani, R. A. Somadi. 2013. “Peringkasan Dokumen Bahasa Indonesia Menggunakan Logika Fuzzy”. Bogor : Fakultas Matematika dan Ilmu Pengetahuan Alam IPB. Trisedya, Bayu Distiawan & Jais, Hardinal. 2009. “Klasifikasi Dokumen Menggunakan Algoritma Naive Bayes dengan Penambahan Parameter Probabilitas Parent Category”. Jakarta: Fakultas Ilmu Komputer Universitas Indonesia. Pramesti, TitisH.W. 2014. “PengenalanKarakterTeksMenggunakanMetodeNeural Network Backpropagation”. Malang: JurusanTeknikElektro, FakultasTeknikUniversitasBrawijaya.

8. THANK YOU FOR YOUR ATTENTION

Automated data extraction from newspaper documents

Recommended

Recommended

More Related Content

Similar to Automated data extraction from newspaper documents

Similar to Automated data extraction from newspaper documents (20)

Recently uploaded

Recently uploaded (20)

Automated data extraction from newspaper documents