Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Transformation of PDF Textbooks into Interactive Educational Resources

We present Intextbooks - the system for automated conversion of PDF-based textbooks into interactive intelligent Web resources. The papers focuses on the new component of Intextbooks - responsible for transformation of PDF-based content into semantically-annotated HTML/CSS. The architecture of the system, the design of the client application rendering resulting textbooks and a short validation experiment demonstrating the quality of the transformation workflow are presented. Demo video:

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

Transformation of PDF Textbooks into Interactive Educational Resources

  1. 1. Isaac Alpizar-Chacon, Max van der Hart, Zef S. Wiersma, Lorenzo S.J. Theunissen, and Sergey Sosnovsky Transformation of PDF Textbooks into Interactive Educational Resources
  2. 2. 27-7-2020 Motivation Digital textbooks are a standard medium to distribute educational content Most digital textbooks are digital copies of their printed counterparts Creation of intelligent textbooks requires effort and expertise
  3. 3. 37-7-2020
  4. 4. 47-7-2020
  5. 5. The INTelligent TEXTBOOKS system complete transformation of PDF textbooks into online intelligent educational resources 57-7-2020
  6. 6. 67-7-2020 The Intextbooks system Extracts a semantic model from a PDF textbook Converts the PDF textbook into an HTML/CSS representation Enriches the HTML with a fine-grained DOM (Document Object Model) connected to the semantic model Every content/layout/structure element of the textbook is identifiable in the DOM. Any object of a textbook can become an object of targeted interaction Every element of a textbook’s DOM is also identifiable within the semantic model extracted from the textbook
  7. 7. Intelligent Textbooks Each textbook becomes an integrated resource where content elements and pieces of domain knowledge are interlinked on both the presentation and the knowledge levels 77-7-2020
  8. 8. Architecture: Ingestion (offline)
  9. 9. Architecture: Web-reader (online)
  10. 10. Textbook model extractor rule-base approach 7 steps 17 tasks 55 rules
  11. 11. 127-7-2020 TEI Textbook Model Structure (sections) Content (words, lines, titles, etc) Domain Knowledge (terms) + RDFa attributes
  12. 12. 137-7-2020 PDF to HTML converter • Several open libraries available: • pdf2htmlEX, PDFMiner, pdf2html, Xpdf, etc. • pdf2htmlEX: • preserves the layout perfectly across very different types of documents • produces the same structure across different documents • fast, stable, and scalable
  13. 13. 147-7-2020 TEI-HTML synchronizer
  14. 14. 157-7-2020 TEI-HTML synchronizer
  15. 15. 177-7-2020 Other components (planned) Student model: keeps an internal representation of the learning progress of the students Monitoring engine: logs every action of the students Adaptation engine: uses the student model and the activity log to provide personalization to the students
  16. 16. 187-7-2020 Validation Test the accuracy of the matching algorithm for the TEI-HTML synchronization 70 university-level textbooks domains: statistics, computer science, web programming, literature, history 3 versions of the algorithm: use of a threshold to sort and merge close lines (subscripts and superscripts) evaluation metric: percentage of words that were matched between the TEI and HTML representations
  17. 17. 197-7-2020 Results No threshold: 87.16% Fixed threshold: 88.76% Dynamic threshold: 87.09%
  18. 18. 207-7-2020 Analysis Results very similar across variants of the algorithm Textbooks consisting mostly of only text, get very high matching rate ( ~ 100%) Textbooks with figures, tables, graphs get lower matching rate Words with subscripts and superscript are not matched correctly
  19. 19. 217-7-2020 Summary • We have presented the Intextbooks system: • extract high-quality semantic models of the textbooks • create HTML representations of the same textbooks that are connected to their semantic models using fine-grained DOM structures • match around 88% of all the words in the textbooks to individual elements in the HTML resource • we have designed a web interface to interact with the textbooks • we have plans to incorporate a student model and an adaptation mechanism
  20. 20. 227-7-2020 Future work • Extend and improve the components of the system: • improve the matching algorithm • add the missing components • Better define the semantics of knowledge that is extracted from the textbooks, and potential applications • Test the textbook modeling technology with different textbook formattings across different domains (e.g., medicine) • Evaluate the system’s effectiveness in a user study with real students from a target group