Transformation of PDF Textbooks into Interactive Educational Resources

Isaac Alpizar-Chacon, Max van der Hart, Zef S. Wiersma, Lorenzo S.J. Theunissen,
and Sergey Sosnovsky
Transformation of PDF Textbooks into
Interactive Educational Resources

27-7-2020
Motivation
Digital textbooks are a standard medium to distribute educational
content
Most digital textbooks are digital copies of their printed counterparts
Creation of intelligent textbooks requires effort and expertise

The INTelligent
TEXTBOOKS
system
complete transformation of
PDF textbooks into online
intelligent educational
resources
57-7-2020

67-7-2020
The Intextbooks system
Extracts a semantic
model from a PDF
textbook
Converts the PDF
textbook into an
HTML/CSS
representation
Enriches the HTML
with a fine-grained
DOM (Document
Object Model)
connected to the
semantic model
Every
content/layout/structure
element of the textbook
is identifiable in the
DOM.
Any object of a textbook
can become an object of
targeted interaction
Every element of a
textbook’s DOM is also
identiﬁable within the
semantic model extracted
from the textbook

Intelligent
Textbooks
Each textbook becomes an integrated
resource where content elements and
pieces of domain knowledge are
interlinked on both the presentation and
the knowledge levels
77-7-2020

Architecture:
Ingestion (offline)

Architecture:
Web-reader (online)

Textbook model extractor
rule-base
approach
7 steps 17 tasks 55 rules

127-7-2020
TEI Textbook Model
Structure
(sections)
Content (words, lines,
titles, etc)
Domain
Knowledge
(terms)
+ RDFa attributes

137-7-2020
PDF to HTML converter
• Several open libraries available:
• pdf2htmlEX, PDFMiner, pdf2html, Xpdf, etc.
• pdf2htmlEX:
• preserves the layout perfectly across very different types of documents
• produces the same structure across different documents
• fast, stable, and scalable

147-7-2020
TEI-HTML synchronizer

157-7-2020
TEI-HTML synchronizer

177-7-2020
Other components (planned)
Student model: keeps an internal representation of the learning
progress of the students
Monitoring engine: logs every action of the students
Adaptation engine: uses the student model and the activity log to
provide personalization to the students

187-7-2020
Validation
Test the accuracy of the matching algorithm for the TEI-HTML synchronization
70 university-level textbooks
domains: statistics, computer
science, web programming,
literature, history
3 versions of the algorithm:
use of a threshold to sort and
merge close lines (subscripts
and superscripts)
evaluation metric: percentage
of words that were matched
between the TEI and HTML
representations

197-7-2020
Results
No threshold:
87.16%
Fixed
threshold:
88.76%
Dynamic
threshold:
87.09%

207-7-2020
Analysis
Results very similar across variants of the algorithm
Textbooks consisting mostly of only text, get very high matching rate ( ~
100%)
Textbooks with figures, tables, graphs get lower matching rate
Words with subscripts and superscript are not matched correctly

217-7-2020
Summary
• We have presented the Intextbooks system:
• extract high-quality semantic models of the textbooks
• create HTML representations of the same textbooks that are connected to
their semantic models using ﬁne-grained DOM structures
• match around 88% of all the words in the textbooks to individual elements in
the HTML resource
• we have designed a web interface to interact with the textbooks
• we have plans to incorporate a student model and an adaptation mechanism

227-7-2020
Future work
• Extend and improve the components of the system:
• improve the matching algorithm
• add the missing components
• Better define the semantics of knowledge that is extracted from the textbooks,
and potential applications
• Test the textbook modeling technology with different textbook formattings
across different domains (e.g., medicine)
• Evaluate the system’s effectiveness in a user study with real students from a
target group

Transformation of PDF Textbooks into Interactive Educational Resources

Transformation of PDF Textbooks into Interactive Educational Resources

More Related Content

Similar to Transformation of PDF Textbooks into Interactive Educational Resources

Recently uploaded

Transformation of PDF Textbooks into Interactive Educational Resources

Editor's Notes