SMART-GS: A Tool for Studying Digitized Historical Manuscripts
1. SMART-GS: A Tool for Studying
Digitized Historical Manuscripts
Yuta Hashimoto
PhD student, Department of Humanistic Informatics
Kyoto University
March 15, 2015 @ University of Michigan
2. Introduction
• Who am I
• A PhD student studying DH at Kyoto University
• Research interest: Digital History
• Background: History of Science
• Also an iOS/Android Developer
• Kin Digi Reader (近デジリーダー)
• A mobile reader for the Kindai Digital Library
• In this talk, I will…
• Introduce an application named SMART-GS
• And its possible contributions to Japanese studies
3.
4. What is SMART-GS?
• A transcription/annotation suite for digitized
historical manuscripts
• Has been developed in Kyoto University since
2007
• An open source project
• SMART-GS is NOT
• An OCR application for handwritten texts
• A language-dependent application
8. Problems with Paper-based Research
1. Papers are heavy and require space
1. Difficult to share the “metadata” added to the
manuscripts with co-workers
2. Organizing information is also difficult
• Searching, grouping, indexing, etc…
11. Markup Functions for Texts and Images
• Various ways of marking up
image regions:
• rectangle or polygon shape
• Drawing an arrow from one
region to another
• Putting a comment on it
• etc.
• HTML markup for texts:
• Highlighting a certain word
or phrase
• Adding a link to an external
website
12. Linking Markups
• Any two markups can be
linked to each other
• These links are one-to-many
and bidirectional
• Link itself can be annotated
13. Word Spotting for Handwritten Text (DSC Search)
Search results for query “Scheler” (a German philosopher’s name)
14. How DSC Search indexes images
1. Separate the image into
lines
2. Divide each line into thin
slits
3. Compute a gradient vector
for each pixel in each slits
4. Accumulate these gradient
vectors (which will be used
as “feature vectors”)
15. How DSC Search Finds Similar Images
Query image
Candidate Image
• DSC Search calculates the
“distances” between the query
and candidate images by
comparing their feature vector
sequences
• The smaller the distance is, the
more likely two images have
similar shapes
16. Pros and Cons of DSC Search
• Pros
• Can be applied to any type of documents, regardless of
its languages and text directions
• No need for executing machine learning
• Cons
• Requires preprocessing by users for separating lines
• Not accurate for manuscripts written by multiple authors
18. Transcription Project of Kuratomi’s Diary
• Baron Yuzaburo Kuratomi
(1853-1943)
• An elite bureaucrat-politician of
Meiji, Taisho, and early Showa era
• Project goal
• to publish complete transcription of
Kuratomi’s diary
• which consists of more than 300
notebooks
19. Team-based Transcription with SMART-GS
WebDAV Server
gsx file
1. Create draft transcriptions 2. Add annotations 3. Revise and finalize
transcription texts
20. Transcription of Hajime Tanabe’s
Lecture Notebooks
• Hajime Tanabe (1885-1962)
• One of prominent philosophers
of Kyoto School
• Tanabe’s lecture notebooks
• Written in Japanese, German,
Latin, Greek, and English
• And written in extremely bad
handwriting
22. Transcription of Earthquake Recordings
◀ Teibi Shinsai Roku (丁未震災録):
A recording of a large earthquake that
took place in 1847
▲Reading Group of Earthquake Recordings
(古地震研究会)
26. As a Platform for the International Collaboration
• NIJL’s large-scale project
• Titled “Construction of the International Collaborative
Network on Japanese Classical Books”
• 0.3 million books will be digitized and published on the
web by 2024
27. Our Current Attempts
• To have NIJL use SMART-GS as their official
transcription tool
• And to make SMART-GS a global platform for
Japanese studies
• So that scholars all over the world can cooperate
through the network on the same platform
29. Conclusion
• More and more digital images of historical
manuscripts have become available on the web
• SMART-GS provides a set of features to handle
these digital images effectively
• And it offers ways to collaborate with other
scholars through the network
• Our next attempt is to make SMART-GS a global
platform where scholars can collaborate with each
other
30. Thank you for listening!
ご清聴ありがとうございました
Any questions and comments?