SlideShare a Scribd company logo
1 of 27
Digital Content Creation
RupeshKumarA
Email:a.rupeshkumar@gmail.com
Digitization
• Digitization refers to the process of translating a piece of
information such as a book, journal articles, sound recordings,
pictures, audio tapes or video recordings, etc. into bits.
• Bits are the fundamental units of information in a computer
system.
• Converting information into these binary digits (bits) is called
digitisation.
• Thefirst step in digitizationis scanning.
• Whenanobjectisscanned,itisconverted intoadigitalimage.
• A digital image is composed of a set of pixels (picture elements),
arrangedaccording toapre-definedratioofcolumnsandrows.
• An image file can be managed as regular computer file and can be
retrieved,printedandmodifiedusing appropriatesoftware.
• Images containing text can be converted into text files using a
process calledOpticalCharacterRecognition(OCR).
OCR
• Optical Character Recognition, or OCR, is a technology that
enables a user to convert different types of documents, such as
scanned paper documents, PDF files or images captured by a
digitalcameraintoeditableand searchable data.
• The mechanical or electronic conversion of images of typed,
handwritten or printed text into machine-encoded text, whether
fromascanned document,aphotoofadocument,ascene-photo.
Techniquesin OCR
• Pre-processing
• Character recognition
• Post-processing
Pre-processing
• Pre-processing involves certain tasks to improve character recognition
and its accuracy.
• Pre-processing includes
• de-skewing: setting the characters perfectly horizontal or vertical if they
are slant
• Despeckle: removing positive and negative spots, smoothing edges
• Binarization:converting images to b&w
• Line removal: clearing non-character lines andboxes
• Line and word detection
• Script recognition: recognizing the script of the text
CharacterRecognition
• Character recognition may involve:
• Matrix matching:comparing an image to a stored glyph on a
pixel-by-pixelbasis.
• It is also knownas “patternmatching” or“image correlation”.
• Featureextraction: decomposing (dividing) glyphs into
featureslikelines, closed loops, linedirection and line
intersections.
MatrixMatching
FeatureExtraction
Post-processing
• The output stream may be a plain text stream or fileof
characters.
• More sophisticated OCR systems can preserve the original
layoutof thepage.
OCR Software
• Tesseract
• Screenworm
• ABBYY FineReader
• FreeOCR
• SimpleOCR
• OmniPage
• GOCR
• Microsoft OfficeOnenote
ElectronicDocument
• Any electronic media content which is intended to be used in either
electronic form or as printed output.
• E-documents donot include computer programs or system files.
• E-documents come in a varietyof file formats.
• Today, most e-docs in different file formats will have at least one file
viewer (e.g. Adobe Reader for PDFfiles).
• File format incompatibility poses achallenge for e-docs.
• Development of non-proprietary, standardized file formats is a solution
to tackle incompatibility (e.g. HTML, OpenDocument).
FileFormats (in digitization)
• Several fileformats are used for documentsto be included in
digital libraries.
• Most common formatis PDF.
• Other formats include:
– TIFF: Tagged Image File Format
– JPG (JPEG): Joint Photographic Experts Group
– PNG: Portable Network Graphics
– GIF: Graphics Interchange Format
– PS or EPS: PostScript or Encapsulated PostScript
PortableDocumentFormat
• A file format used to present documents in a manner
independentof software, hardware, and operating systems.
• PDF file encapsulates a complete description of a fixed-layout
flat document, including the text, fonts, graphics, and other
informationneededtodisplay it.
• A PDF file will look the same way on a variety of computers
irrespective of operating systems.
History
• PDFwas developedby AdobeCorporation in early 1990s.
• Before the emergence of World Wide Web and HTML format, PDF
waspopularin DesktopPublishing(DTP).
• PDFwasaproprietary formatcontrolledby Adobetill2008.
• On July 1, 2008, it was released as an open standard and
published by ISO as
ISO 32000-1:2008.
TechnicalAspectsof PDF
• PDFuses the followingtechnologies:
– PostScript page description programming language, for generating
the layout and graphics.
– A font-embedding/replacement system to allow fonts to travel
with the documents
– A structured storage system to bundle these elements and any
associated content into a single file, with data compression where
appropriate.
SpecialFeatures
• PDF files may contain interactive elements such as
annotations, form fields, video and Flash animation. Such
filesare called “RichMediaPDF”.
• A PDF file may be encrypted for security, or digitally signed
for authentication.
• PDF documents can contain display settings, including the
pagedisplay layout and zoom level.
Borndigitalandlegacydocuments
• Born digital documents are resources or items created and
managedin digital form.
• They may be: digital photographs, digital documents,
harvested Web content, digital manuscripts, electronic
records, staticdata sets, digital art, digital mediapublications.
• Born digital documents can be easily processed for inclusion
in thedigitallibrary as they are nativelyin digitalformat.
Legacy documents
• Legacy documents are resources or items which are originally in ‘non-digital’
form and have to be converted into ‘digital’ form for inclusion in a digital
library.
• Photographs, documents, manuscripts, print records, art, media publications
are examplesoflegacydocuments.
• The process of converting legacy documents into digital form to make them
compatiblefordigitallibrariesisknownas‘digitization’.
• Legacy documents pose greater challenge for digital libraries as their
conversiontodigitalformisverytedious.
ScholarlyCommunication
• Scholarly communication is the process by which academics,
scholars and researchers share and publish their research findings
so that they are available to the wider academic community and
beyond.
• Scholarly communication is “the system through which research
and other scholarly writings are created, evaluated for quality,
disseminated to the scholarly community, and preserved for
futureuse.”
ScholarlyLiterature
• Writings in a scholarly journals& books, E-journals
• Reviews, preprints and working papers,
• Writings in encyclopaedias, dictionaries,and annotated
content,data,
• blogs, discussion forums, professional and scholarlyhubs and
conference papers.
• Sound and video recordings
Terminologyin ScholarlyCommunication
• Manuscript:a scholarly documentwhich has notyetbeen
submittedforpublication.
• Preprint: a scholarly documentacceptedforpublicationin a
journal or book;materialacceptedto beusedin a presentationat
a conference.
• Article: a scholarly documentwhich has beenpublished.
• Paper: a scholarly documentor materialwhich have been
presentedataconference.
• E-Script:an electronicmanuscript.
ElectronicPublishing
• E-publishing includes the digital publication of e-books, digital
magazines, and the development of digital libraries and
catalogues.
• The electronic publishing process follows some aspects of the
traditional paper-based publishing process but differs from
traditionalpublishingin twoways:
– 1)itdoesnotincludeusingan offsetprintingpresstoprintthefinal
productand
– 2)itavoidsthedistributionofaphysicalproduct(e.g.,paper books,
papermagazines,orpapernewspapers).
• Because the content is electronic, it may be distributed over
theInternetand throughelectronic bookstores.
• Users can read the material on a range of electronic and
digital devices, including desktop computers, laptops, tablet
computers, smartphones or e-reader tablets.
E-Journal
• Electronic journals, also known as ejournals, ejournals, and electronic
serials, are scholarly journals or intellectual magazines that can be
accessed viaelectronic transmission.
• An e-journal closely resembles a print journal in structure, but will be in
electronic format.
• Often a journal article will be available for download in two formats - as a
PDF and in HTML format.
• E-journals allow new types on content to be included in journals, for
example video material, or the data sets on which research has been
based.
E-book
• An electronic book (or e-book) is a book publication made
available in digital form, consisting of text, images, or both,
readable on the flat-panel display of computers or other
electronic devices.
• An e-book may be an e-only book or an electronic version of a
printedbook.
E-book fileformats
• PDF(.pdf)
• Open eBook (.opf)
• EPUB (.epub)
• Compiled html(chm)
• DjVu (.djvu)
• Mobipocket (.mobi)

More Related Content

What's hot

Greenstone Digital Library
Greenstone Digital LibraryGreenstone Digital Library
Greenstone Digital LibraryImran Mansuri
 
Networking And Resource Sharing In Library And Information
Networking And Resource Sharing In Library And InformationNetworking And Resource Sharing In Library And Information
Networking And Resource Sharing In Library And InformationBaguio Central University
 
Ict uses in libraries
Ict uses in librariesIct uses in libraries
Ict uses in librariesLiaquat Rahoo
 
Library networks and consortium
Library networks and consortiumLibrary networks and consortium
Library networks and consortiumSunilKumar5028
 
Chain indexing
Chain indexingChain indexing
Chain indexingsilambu111
 
Electronic Resources Management(ERM): Issues and Challenges
Electronic Resources Management(ERM): Issues and ChallengesElectronic Resources Management(ERM): Issues and Challenges
Electronic Resources Management(ERM): Issues and ChallengesDr Trivedi
 
Web 2.0 in Libraries
Web 2.0 in LibrariesWeb 2.0 in Libraries
Web 2.0 in LibrariesAnupama Saini
 
E resources
E resourcesE resources
E resourcesavid
 
eprints digital library software
eprints digital library softwareeprints digital library software
eprints digital library softwaresonia naomi bandao
 
N-LIST program of INFLIBNET
N-LIST program of INFLIBNETN-LIST program of INFLIBNET
N-LIST program of INFLIBNETTapan Barui
 
Dds
Dds Dds
Dds drrst
 
Digital libraries power point
Digital libraries power pointDigital libraries power point
Digital libraries power pointckdozier
 
Library automation history Anandraj.L
Library automation history Anandraj.LLibrary automation history Anandraj.L
Library automation history Anandraj.Lanujessy
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservationsmtcd
 
Information products
Information products Information products
Information products Mohit Kumar
 

What's hot (20)

Greenstone Digital Library
Greenstone Digital LibraryGreenstone Digital Library
Greenstone Digital Library
 
Networking And Resource Sharing In Library And Information
Networking And Resource Sharing In Library And InformationNetworking And Resource Sharing In Library And Information
Networking And Resource Sharing In Library And Information
 
Ict uses in libraries
Ict uses in librariesIct uses in libraries
Ict uses in libraries
 
Library networks and consortium
Library networks and consortiumLibrary networks and consortium
Library networks and consortium
 
Chain indexing
Chain indexingChain indexing
Chain indexing
 
Electronic Resources Management(ERM): Issues and Challenges
Electronic Resources Management(ERM): Issues and ChallengesElectronic Resources Management(ERM): Issues and Challenges
Electronic Resources Management(ERM): Issues and Challenges
 
Web 2.0 in Libraries
Web 2.0 in LibrariesWeb 2.0 in Libraries
Web 2.0 in Libraries
 
E resources
E resourcesE resources
E resources
 
NISCAIR.pptx
NISCAIR.pptxNISCAIR.pptx
NISCAIR.pptx
 
eprints digital library software
eprints digital library softwareeprints digital library software
eprints digital library software
 
N-LIST program of INFLIBNET
N-LIST program of INFLIBNETN-LIST program of INFLIBNET
N-LIST program of INFLIBNET
 
DELNET.pptx
DELNET.pptxDELNET.pptx
DELNET.pptx
 
Dds
Dds Dds
Dds
 
Digital libraries power point
Digital libraries power pointDigital libraries power point
Digital libraries power point
 
Digital library softaware greenstone & dsapce
Digital library softaware greenstone & dsapceDigital library softaware greenstone & dsapce
Digital library softaware greenstone & dsapce
 
Library automation history Anandraj.L
Library automation history Anandraj.LLibrary automation history Anandraj.L
Library automation history Anandraj.L
 
Delnet
DelnetDelnet
Delnet
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Information products
Information products Information products
Information products
 
Library networks
Library networksLibrary networks
Library networks
 

Similar to Digital Content Creation

Cs8092 computer graphics and multimedia unit 4
Cs8092 computer graphics and multimedia unit 4Cs8092 computer graphics and multimedia unit 4
Cs8092 computer graphics and multimedia unit 4SIMONTHOMAS S
 
e-Services to Keep Your Digital Files Current
e-Services to Keep Your Digital Files Currente-Services to Keep Your Digital Files Current
e-Services to Keep Your Digital Files Currentpbajcsy
 
Chapter 7 : MAKING MULTIMEDIA
Chapter 7 : MAKING MULTIMEDIAChapter 7 : MAKING MULTIMEDIA
Chapter 7 : MAKING MULTIMEDIAazira96
 
chapter7-151010022348-lva1-app6892 (1).pptx
chapter7-151010022348-lva1-app6892 (1).pptxchapter7-151010022348-lva1-app6892 (1).pptx
chapter7-151010022348-lva1-app6892 (1).pptxJayasheelanP
 
Digitalization manual (2).pptx
Digitalization manual (2).pptxDigitalization manual (2).pptx
Digitalization manual (2).pptxawokeyirdaw1
 
Multimedia Presentation and Authoring
Multimedia Presentation and AuthoringMultimedia Presentation and Authoring
Multimedia Presentation and AuthoringTamanna Sehgal
 
Multimedia tech.sec a & b
Multimedia tech.sec a & bMultimedia tech.sec a & b
Multimedia tech.sec a & bSonu Sharma
 
Chapter 7
Chapter 7 Chapter 7
Chapter 7 carnillr
 
computer literacy chapter2.pptx
computer literacy chapter2.pptxcomputer literacy chapter2.pptx
computer literacy chapter2.pptxToobaFarooq10
 
Digitization in theory and practice
Digitization in theory and practiceDigitization in theory and practice
Digitization in theory and practiceHelen Nneka Okpala
 
Daunted by digitization by Bruce Covington
Daunted by digitization by Bruce CovingtonDaunted by digitization by Bruce Covington
Daunted by digitization by Bruce CovingtonBruce Covington
 
Crossmedia Workflows
Crossmedia WorkflowsCrossmedia Workflows
Crossmedia WorkflowsDwight Kelly
 
INPUT , OUTPUT AND STORAGE
INPUT , OUTPUT AND STORAGEINPUT , OUTPUT AND STORAGE
INPUT , OUTPUT AND STORAGERajihah Razali
 

Similar to Digital Content Creation (20)

Cs8092 computer graphics and multimedia unit 4
Cs8092 computer graphics and multimedia unit 4Cs8092 computer graphics and multimedia unit 4
Cs8092 computer graphics and multimedia unit 4
 
e-Services to Keep Your Digital Files Current
e-Services to Keep Your Digital Files Currente-Services to Keep Your Digital Files Current
e-Services to Keep Your Digital Files Current
 
Digitization
DigitizationDigitization
Digitization
 
DIGITAL LIBRARY
DIGITAL LIBRARYDIGITAL LIBRARY
DIGITAL LIBRARY
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Chapter 7 : MAKING MULTIMEDIA
Chapter 7 : MAKING MULTIMEDIAChapter 7 : MAKING MULTIMEDIA
Chapter 7 : MAKING MULTIMEDIA
 
chapter7-151010022348-lva1-app6892 (1).pptx
chapter7-151010022348-lva1-app6892 (1).pptxchapter7-151010022348-lva1-app6892 (1).pptx
chapter7-151010022348-lva1-app6892 (1).pptx
 
Digitalization manual (2).pptx
Digitalization manual (2).pptxDigitalization manual (2).pptx
Digitalization manual (2).pptx
 
Multimedia Presentation and Authoring
Multimedia Presentation and AuthoringMultimedia Presentation and Authoring
Multimedia Presentation and Authoring
 
Digital library software
Digital library softwareDigital library software
Digital library software
 
MULTMEDIA DATABASE.ppt
MULTMEDIA DATABASE.pptMULTMEDIA DATABASE.ppt
MULTMEDIA DATABASE.ppt
 
Multimedia tech.sec a & b
Multimedia tech.sec a & bMultimedia tech.sec a & b
Multimedia tech.sec a & b
 
Chapter 7
Chapter 7 Chapter 7
Chapter 7
 
computer literacy chapter2.pptx
computer literacy chapter2.pptxcomputer literacy chapter2.pptx
computer literacy chapter2.pptx
 
Digitization in theory and practice
Digitization in theory and practiceDigitization in theory and practice
Digitization in theory and practice
 
new one
new onenew one
new one
 
Daunted by digitization by Bruce Covington
Daunted by digitization by Bruce CovingtonDaunted by digitization by Bruce Covington
Daunted by digitization by Bruce Covington
 
Unit 78 technical file
Unit 78 technical fileUnit 78 technical file
Unit 78 technical file
 
Crossmedia Workflows
Crossmedia WorkflowsCrossmedia Workflows
Crossmedia Workflows
 
INPUT , OUTPUT AND STORAGE
INPUT , OUTPUT AND STORAGEINPUT , OUTPUT AND STORAGE
INPUT , OUTPUT AND STORAGE
 

More from Dept of Library and Information Science Tumkur University

More from Dept of Library and Information Science Tumkur University (19)

Institutional Repositories and Open Access Movement
Institutional Repositories and Open Access MovementInstitutional Repositories and Open Access Movement
Institutional Repositories and Open Access Movement
 
Digital Library Software
Digital Library SoftwareDigital Library Software
Digital Library Software
 
Digital Content Management
Digital Content ManagementDigital Content Management
Digital Content Management
 
Digital Library Architecture
Digital Library ArchitectureDigital Library Architecture
Digital Library Architecture
 
Interoperability in Digital Libraries
Interoperability in Digital LibrariesInteroperability in Digital Libraries
Interoperability in Digital Libraries
 
International Digital Library Initiatives
International Digital Library InitiativesInternational Digital Library Initiatives
International Digital Library Initiatives
 
Evolution of Digital Libraries
Evolution of Digital LibrariesEvolution of Digital Libraries
Evolution of Digital Libraries
 
Digital Library Initiatives in India
Digital Library Initiatives in IndiaDigital Library Initiatives in India
Digital Library Initiatives in India
 
Digital Library Conferences
Digital Library ConferencesDigital Library Conferences
Digital Library Conferences
 
Types of Libraries
Types of LibrariesTypes of Libraries
Types of Libraries
 
Resource Sharing and Networking
Resource Sharing and NetworkingResource Sharing and Networking
Resource Sharing and Networking
 
Basics of Research
Basics of ResearchBasics of Research
Basics of Research
 
Historical Method of Research
Historical Method of ResearchHistorical Method of Research
Historical Method of Research
 
Five Laws of Library Science
Five Laws of Library ScienceFive Laws of Library Science
Five Laws of Library Science
 
Library Classification
Library ClassificationLibrary Classification
Library Classification
 
How to create a filter for mails in GMail
How to create a filter for mails in GMailHow to create a filter for mails in GMail
How to create a filter for mails in GMail
 
How to add custom signature in GMail
How to add custom signature in GMailHow to add custom signature in GMail
How to add custom signature in GMail
 
How to attach a file with a mail in GMail
How to attach a file with a mail in GMailHow to attach a file with a mail in GMail
How to attach a file with a mail in GMail
 
How to create a new email account using GMail
How to create a new email account using GMailHow to create a new email account using GMail
How to create a new email account using GMail
 

Recently uploaded

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfPondicherry University
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17Celine George
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 

Recently uploaded (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 

Digital Content Creation

  • 2. Digitization • Digitization refers to the process of translating a piece of information such as a book, journal articles, sound recordings, pictures, audio tapes or video recordings, etc. into bits. • Bits are the fundamental units of information in a computer system. • Converting information into these binary digits (bits) is called digitisation. • Thefirst step in digitizationis scanning.
  • 3. • Whenanobjectisscanned,itisconverted intoadigitalimage. • A digital image is composed of a set of pixels (picture elements), arrangedaccording toapre-definedratioofcolumnsandrows. • An image file can be managed as regular computer file and can be retrieved,printedandmodifiedusing appropriatesoftware. • Images containing text can be converted into text files using a process calledOpticalCharacterRecognition(OCR).
  • 4. OCR • Optical Character Recognition, or OCR, is a technology that enables a user to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digitalcameraintoeditableand searchable data. • The mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether fromascanned document,aphotoofadocument,ascene-photo.
  • 5. Techniquesin OCR • Pre-processing • Character recognition • Post-processing
  • 6. Pre-processing • Pre-processing involves certain tasks to improve character recognition and its accuracy. • Pre-processing includes • de-skewing: setting the characters perfectly horizontal or vertical if they are slant • Despeckle: removing positive and negative spots, smoothing edges • Binarization:converting images to b&w • Line removal: clearing non-character lines andboxes • Line and word detection • Script recognition: recognizing the script of the text
  • 7. CharacterRecognition • Character recognition may involve: • Matrix matching:comparing an image to a stored glyph on a pixel-by-pixelbasis. • It is also knownas “patternmatching” or“image correlation”. • Featureextraction: decomposing (dividing) glyphs into featureslikelines, closed loops, linedirection and line intersections.
  • 10. Post-processing • The output stream may be a plain text stream or fileof characters. • More sophisticated OCR systems can preserve the original layoutof thepage.
  • 11. OCR Software • Tesseract • Screenworm • ABBYY FineReader • FreeOCR • SimpleOCR • OmniPage • GOCR • Microsoft OfficeOnenote
  • 12. ElectronicDocument • Any electronic media content which is intended to be used in either electronic form or as printed output. • E-documents donot include computer programs or system files. • E-documents come in a varietyof file formats. • Today, most e-docs in different file formats will have at least one file viewer (e.g. Adobe Reader for PDFfiles). • File format incompatibility poses achallenge for e-docs. • Development of non-proprietary, standardized file formats is a solution to tackle incompatibility (e.g. HTML, OpenDocument).
  • 13. FileFormats (in digitization) • Several fileformats are used for documentsto be included in digital libraries. • Most common formatis PDF. • Other formats include: – TIFF: Tagged Image File Format – JPG (JPEG): Joint Photographic Experts Group – PNG: Portable Network Graphics – GIF: Graphics Interchange Format – PS or EPS: PostScript or Encapsulated PostScript
  • 14. PortableDocumentFormat • A file format used to present documents in a manner independentof software, hardware, and operating systems. • PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, graphics, and other informationneededtodisplay it. • A PDF file will look the same way on a variety of computers irrespective of operating systems.
  • 15. History • PDFwas developedby AdobeCorporation in early 1990s. • Before the emergence of World Wide Web and HTML format, PDF waspopularin DesktopPublishing(DTP). • PDFwasaproprietary formatcontrolledby Adobetill2008. • On July 1, 2008, it was released as an open standard and published by ISO as ISO 32000-1:2008.
  • 16. TechnicalAspectsof PDF • PDFuses the followingtechnologies: – PostScript page description programming language, for generating the layout and graphics. – A font-embedding/replacement system to allow fonts to travel with the documents – A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate.
  • 17. SpecialFeatures • PDF files may contain interactive elements such as annotations, form fields, video and Flash animation. Such filesare called “RichMediaPDF”. • A PDF file may be encrypted for security, or digitally signed for authentication. • PDF documents can contain display settings, including the pagedisplay layout and zoom level.
  • 18. Borndigitalandlegacydocuments • Born digital documents are resources or items created and managedin digital form. • They may be: digital photographs, digital documents, harvested Web content, digital manuscripts, electronic records, staticdata sets, digital art, digital mediapublications. • Born digital documents can be easily processed for inclusion in thedigitallibrary as they are nativelyin digitalformat.
  • 19. Legacy documents • Legacy documents are resources or items which are originally in ‘non-digital’ form and have to be converted into ‘digital’ form for inclusion in a digital library. • Photographs, documents, manuscripts, print records, art, media publications are examplesoflegacydocuments. • The process of converting legacy documents into digital form to make them compatiblefordigitallibrariesisknownas‘digitization’. • Legacy documents pose greater challenge for digital libraries as their conversiontodigitalformisverytedious.
  • 20. ScholarlyCommunication • Scholarly communication is the process by which academics, scholars and researchers share and publish their research findings so that they are available to the wider academic community and beyond. • Scholarly communication is “the system through which research and other scholarly writings are created, evaluated for quality, disseminated to the scholarly community, and preserved for futureuse.”
  • 21. ScholarlyLiterature • Writings in a scholarly journals& books, E-journals • Reviews, preprints and working papers, • Writings in encyclopaedias, dictionaries,and annotated content,data, • blogs, discussion forums, professional and scholarlyhubs and conference papers. • Sound and video recordings
  • 22. Terminologyin ScholarlyCommunication • Manuscript:a scholarly documentwhich has notyetbeen submittedforpublication. • Preprint: a scholarly documentacceptedforpublicationin a journal or book;materialacceptedto beusedin a presentationat a conference. • Article: a scholarly documentwhich has beenpublished. • Paper: a scholarly documentor materialwhich have been presentedataconference. • E-Script:an electronicmanuscript.
  • 23. ElectronicPublishing • E-publishing includes the digital publication of e-books, digital magazines, and the development of digital libraries and catalogues. • The electronic publishing process follows some aspects of the traditional paper-based publishing process but differs from traditionalpublishingin twoways: – 1)itdoesnotincludeusingan offsetprintingpresstoprintthefinal productand – 2)itavoidsthedistributionofaphysicalproduct(e.g.,paper books, papermagazines,orpapernewspapers).
  • 24. • Because the content is electronic, it may be distributed over theInternetand throughelectronic bookstores. • Users can read the material on a range of electronic and digital devices, including desktop computers, laptops, tablet computers, smartphones or e-reader tablets.
  • 25. E-Journal • Electronic journals, also known as ejournals, ejournals, and electronic serials, are scholarly journals or intellectual magazines that can be accessed viaelectronic transmission. • An e-journal closely resembles a print journal in structure, but will be in electronic format. • Often a journal article will be available for download in two formats - as a PDF and in HTML format. • E-journals allow new types on content to be included in journals, for example video material, or the data sets on which research has been based.
  • 26. E-book • An electronic book (or e-book) is a book publication made available in digital form, consisting of text, images, or both, readable on the flat-panel display of computers or other electronic devices. • An e-book may be an e-only book or an electronic version of a printedbook.
  • 27. E-book fileformats • PDF(.pdf) • Open eBook (.opf) • EPUB (.epub) • Compiled html(chm) • DjVu (.djvu) • Mobipocket (.mobi)

Editor's Notes

  1. http://www.oclc.org/content/dam/research/activities/hiddencollections/borndigital.pdf