Analyzing
Japanese Art
with Node.js
and Computer
Vision
John Resig
Lot 55: 20 Japanese Woodblock Prints
Each depicting a female/Geisha figure with
calligraphy throughout each print. Prints
measure 13.75" H x 9.375" W. Toning to
each print, some losses around edges.
Estimated Price: $400 - $600
Step 1: Acquire and read tons of expensive books.
Step 2: Learn to read Japanese. *
Japanese from the 17th to 19th century. *
You’re not going to learn this from Rosetta Stone.
Step 3: Learn to read Japanese calligraphy.
Solution: A fast-loading, responsive, i18ned, web
site: Ukiyo-e.org
https://github.com/jeresig/i18n-node-2
var greeting = i18n.__('Hello %s, how are you
today?', 'Marcus');
i18n.__n('%s cat', '%s cats', 3);
Node i18n 2 (npm install i18n-2)
setLocaleFromSubdomain([request])
https://github.com/jeresig/i18n-node-2
{!
"Hello": "Hello",!
"Hello %s, how are you today?": "Hello %s, how are you today?",!
"weekend": "weekend",!
"Hello %s, how are you today? How was your %s.": "Hello %s, how
are you today? How was your %s.",!
"Hi": "Hi",!
"Howdy": "Howdy",!
"%s cat": {!
"one": "%s cat",!
"other": "%s cats"!
},!
"There is one monkey in the %%s": {!
"one": "There is one monkey in the %%s",!
"other": "There are %d monkeys in the %%s"!
},!
"tree": "tree"!
}!
Node i18n 2 (npm install i18n-2)
Digital Ocean
Amazon S3
Amazon Cloudfront
Digital Ocean
Images
Data

(HTML,
XML, JSON)
Images JS, CSS
Images JS, CSS
nginx
(w/ cache)
node.js
express
node.js
express
naught
mongodb
Elastic

Search
Scraper
https://github.com/jeresig/jquery-imgscrubber
Collecting Tons of Woodblock Print Data
Search
Page Page Page
HTML
Image
HTML
Image
HTML
Image
Search
Page Page Page
HTML
Image
HTML
Image
HTML
Image
Queue-based Crawling using PhantomJS
Processing Queue
Some Website
WebKit
PhantomJS
CasperJS
SpookyJS
Save Data
XML Files
Mongo Log
libxml (+ xpath)
MongoDB
Extract Data
Process Data
Artists
Images
Correct Artist
and Date
Add to Site!
module.exports = function() {!
return {!
scrape: [!
{!
start: "http://ukiyo-e.org/search",!
visit: "//a[@class='img']",!
next: "//a[contains(@rel,'next')]"!
},!
{!
extract: {!
"title": "//p[contains(@class, 'title')]//span",!
"dateCreated": "//p[contains(@class, 'date')]//span",!
"artists[]": "//p[contains(@class, 'artist')]//a",!
"images[]": "//div[contains(@class,'imageholder')]//a/@href"!
}!
}!
]!
};!
};!
"surname" : "Hashimoto",
"surname_kana" : "はしもと",
"name" : "Hashimoto Okiie",
"ascii" : "Hashimoto Okiie",
"plain" : "Hashimoto Okiie",
"kana" : "はしもとおきいえ",
"_id" : ObjectId("530c0825d9a80976b2000437")
}
],
"names" : [
{
"original" : "Hashimoto Okiie (橋本興家)",
"locale" : "ja",
"kanji" : "橋本興家",
"given" : "Okiie",
"given_kana" : "おきいえ",
"surname" : "Hashimoto",
"surname_kana" : "はしもと",
"given_kanji" : "興家",
"surname_kanji" : "橋本",
"name" : "Hashimoto Okiie",
"ascii" : "Hashimoto Okiie",
"plain" : "Hashimoto Okiie",
"kana" : "はしもとおきいえ",
"_id" : ObjectId("530c0825d9a80976b2000439")
}
],
"extract" : [
"53dfc997cbf9fa7501d78e4820b24a9c"
],
"created" : ISODate("2014-02-25T03:04:05Z"),
"__v" : 0
}
“Stack Scraper”
https://github.com/jeresig/stack-scraper
https://github.com/jeresig/ukiyoe-scrapers
Image Similarity
https://github.com/jeresig/node-matchengine
Image Similarity Search
Idyll: Offline Image Cropping
• https://github.com/jeresig/idyll

• Crop images offline and on a mobile
device.

• Saves the selections back to a server.

• Data is synced and saved using HTML 5
appcache.

• https://github.com/jeresig/node-
appcache-glob
by David Chester

at Shutterstock
https://github.com/dchester/perl-image-crop-calibration-target
http://www.ersatzlabs.com/
Aiding Woodblock Print
Studies with Image Analysis
Correcting Print Data
Japanese Names
• Utagawa Hiroshige	

• Ando Hiroshige	

• Andō Hiroshige	

• Hiroshige	

• 歌川広重	

• 広重
安土
安堂
安島
安東
安籐
安藤
安道
安達
阿藤
Andō
安藤
andō
antō
anzō
yasuzuka
A many-to-many mapping!
Sharaku Toshusai
東洲斎写楽
Sharaku Toshusai
東洲斎写楽
Is this the family name?
Where are the stress marks?
How do you “split” this name?
Which name parts

correlate?
Tools (all are Node modules!)
• https://github.com/lovell/
hepburn

• https://github.com/jeresig/
node-enamdict

• https://github.com/jeresig/
node-ndlna

• https://github.com/jeresig/
node-romaji-name
ndlnahepburn enamdict
romaji-name
Hepburn
• https://github.com/lovell/
hepburn

• Takes in the English form of a
Japanese word.

• Returns it written in Hiragana or
Katakana (phonetic Japanese
alphabets).
ndlnahepburn enamdict
romaji-name
うたがわひろしげUtagawa Hiroshige
Enamdict
• https://github.com/jeresig/
node-enamdict

• Downloads and queries the
ENAMDICT database

• (A mapping of Japanese proper
names to Hiragana and
English.)

• Used to correct typos and figure
out surname/given name.
ndlnahepburn enamdict
romaji-name
NDLNA
• https://github.com/jeresig/
node-ndlna

• Queries the NDLNA database

• Finds the correct Kanji for an
English name.

• Or the correct English for a
Kanji name.
ndlnahepburn enamdict
romaji-name
ndlnahepburn enamdict
romaji-name
{
"original" : "Sharaku Toshusai (東洲斎写楽 )",
"locale" : "ja",
"kanji" : "東洲斎写楽",
"given" : "Sharaku",
"given_kana" : "しゃらく",
"surname" : "Tōshūsai",
"surname_kana" : "とおしゅうさい",
"surname_kanji" : "東洲斎",
"given_kanji" : "写楽",
"name" : "Tōshūsai Sharaku",
"ascii" : "Tooshuusai Sharaku",
"plain" : "Toshusai Sharaku",
"kana" : "とおしゅうさいしゃらく"
}
Dates
• https://github.com/jeresig/node-yearrange
var yr = require("yearrange");!
"
yr.parse("1877")!
// {"start": 1877, "end": 1877}!
"
yr.parse("1847-48")!
// {"start": 1847, "end": 1848}!
"
yr.parse("ca. 1810-20s")!
// {"start": 1810, "end": 1829, "circa": true}!
"
yr.parse("18th–19th century")!
// {"start": 1700, "end": 1899}!
"
yr.parse("Meiji era")!
// {"start": 1868, "end": 1912}
Artist Rectification
Miyagawa Shuntei
Printed in 1897
Sold for: $550
Prints sell for $100-$400 individually
True Estimate: $2100 - $8400 *
* You just have to find
someone willing to buy them!
• http://ejohn.org/research/

• http://ukiyo-e.org/
• https://github.com/jeresig

EmpireJS: Hacking Art with Node js and Image Analysis