SlideShare a Scribd company logo
1 of 24
1
About me
マーク・バーンズ
about.me/mark.burns
日本語ができる Ruby developer
On holiday from England
I love ruby and startups
2
Introduction
Jim Breen’s (Monash University)
Japanese-English online dictionary
wwwjdic.com
Data freely available
accepts user-contributions
3
wwwjdic
(rewrite)
https://github.com/markburns/wwwjdic
4
Current interaction
GET http://wwwjdic.com
301 -> http://www.edrdg.org/cgi-bin/wwwjdic/wwjdic?1C
POST http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1E
BODY: dsrchkey=%CD%F1&dicsel=1
5
Response
5
6
Aims
JSON API
Cleaner UI
Nice features: e.g. autocomplete
Easily extensible open source codebase
7
JSON API
GET http://localhost:4000/ 卵 .json
8
Simpler UI
(Example)
GET http://localhost:4000/ 卵
8
9
Autocomplete
10
Trie index
http://oldblog.antirez.com/post/autocomplete-with-redis.html
Autocomplete
11
Trie index
Time: O(log(N)) N=~150,000.
Space: N*(Ma+1)
=~ 51MB
12
TRIE
12
13
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
14
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
["eg", "ega", "egal", "egali", "egalit",
"egalita", "egalitar", "egalitari", "egalitaria",
"egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
15
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
["eg", "ega", "egal", "egali", "egalit",
"egalita", "egalitar", "egalitari", "egalitaria",
"egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
["egg dish", "egg dishe", "egg dishes",
"egg dishes*", "egg l", "egg la", "egg lai",
"egg laid", "egg laid ", "egg laid i", "egg
laid in", "egg laid in ", "egg laid in w",
"egg laid in wi", "egg laid in win"]
16
["egg laid in wint", "egg laid in winte", "egg
laid in winter", "egg laid in winter*", "egg m",
"egg me", "egg mem", "egg memb", "egg
membr", "egg membra", "egg membran",
"egg membrane", "egg membrane*", "egg s",
"egg sa"]
["eg", "ega", "egal", "egali", "egalit",
"egalita", "egalitar", "egalitari", "egalitaria",
"egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
["egg dish", "egg dishe", "egg dishes",
"egg dishes*", "egg l", "egg la", "egg lai",
"egg laid", "egg laid ", "egg laid i", "egg
laid in", "egg laid in ", "egg laid in w",
"egg laid in wi", "egg laid in win"]
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
17
"walr""walt"
"walrus"
["walr", "walru", "walrus", "walrus*",
"walruse", "walruses", "walruses*",
"walt", "waltz", "waltz ", "waltz (",
"waltz (c", "waltz (co", "waltz (com",
"waltz (comp"]
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
18
shutl.com & graphs
19
Isomorphism?
20
N-grams
安心 リフォーム へ の 近道 [TAB]29
(Anshin reform he no chikamichi)
安心 + リフォーム + へ + の + 近道
安心 [TAB]41,322,178
21
Present/State of
Play
Data import to redis
Indexed word lookup
Autocomplete
Begun work on text glossing
22
Noticably Missing
Not yet released to production
No test/staging server
However, should be easy enough to run
locally
23
Future
Wordnet plus graph db => mapping of
languages
Analysis of kanji
User experience/Design/Polish
N-grams
Other ideas/collaboration?
24
https://github.com/markburns/wwwjdic
http://www.slideshare.net/_mark_burns/slides-24568551
about.me/mark.burns
Questions?
24

More Related Content

What's hot

Baby – SS & FK
Baby – SS & FKBaby – SS & FK
Baby – SS & FKshortstp73
 
Presentation on tank fish culture at pksf
Presentation on tank fish culture at pksfPresentation on tank fish culture at pksf
Presentation on tank fish culture at pksfRasal Ali
 
PyCon大会分享
PyCon大会分享PyCon大会分享
PyCon大会分享Qing Feng
 
زُبَرَ الْحَدِيدِ و الْقِطْرِ
زُبَرَ الْحَدِيدِ و الْقِطْرِزُبَرَ الْحَدِيدِ و الْقِطْرِ
زُبَرَ الْحَدِيدِ و الْقِطْرِDr. GM Sherbini
 
おひろめ会〜教師なしワード抽出
おひろめ会〜教師なしワード抽出おひろめ会〜教師なしワード抽出
おひろめ会〜教師なしワード抽出moai kids
 
多快好省的前端开发实践
多快好省的前端开发实践多快好省的前端开发实践
多快好省的前端开发实践美团技术团队
 
Node js javascript no lado do servidor
Node js javascript no lado do servidorNode js javascript no lado do servidor
Node js javascript no lado do servidorMauricio Vieira
 
Head to head shed 20 dairy cow in bangla
Head to head shed  20 dairy cow in  banglaHead to head shed  20 dairy cow in  bangla
Head to head shed 20 dairy cow in banglaMohammad Ruhul Amin
 
Tail to tail shed 20 dairy cow in bangla
Tail to tail shed 20 dairy cow in banglaTail to tail shed 20 dairy cow in bangla
Tail to tail shed 20 dairy cow in banglaMohammad Ruhul Amin
 
Williams darnell finalppp_slideshow
Williams darnell finalppp_slideshowWilliams darnell finalppp_slideshow
Williams darnell finalppp_slideshowKash Kobain
 

What's hot (14)

Baby – SS & FK
Baby – SS & FKBaby – SS & FK
Baby – SS & FK
 
Site 2013
Site 2013Site 2013
Site 2013
 
Presentation on tank fish culture at pksf
Presentation on tank fish culture at pksfPresentation on tank fish culture at pksf
Presentation on tank fish culture at pksf
 
PyCon大会分享
PyCon大会分享PyCon大会分享
PyCon大会分享
 
زُبَرَ الْحَدِيدِ و الْقِطْرِ
زُبَرَ الْحَدِيدِ و الْقِطْرِزُبَرَ الْحَدِيدِ و الْقِطْرِ
زُبَرَ الْحَدِيدِ و الْقِطْرِ
 
Computer nerwork
Computer nerworkComputer nerwork
Computer nerwork
 
おひろめ会〜教師なしワード抽出
おひろめ会〜教師なしワード抽出おひろめ会〜教師なしワード抽出
おひろめ会〜教師なしワード抽出
 
123movies au
123movies au123movies au
123movies au
 
多快好省的前端开发实践
多快好省的前端开发实践多快好省的前端开发实践
多快好省的前端开发实践
 
Node js javascript no lado do servidor
Node js javascript no lado do servidorNode js javascript no lado do servidor
Node js javascript no lado do servidor
 
Head to head shed 20 dairy cow in bangla
Head to head shed  20 dairy cow in  banglaHead to head shed  20 dairy cow in  bangla
Head to head shed 20 dairy cow in bangla
 
Tail to tail shed 20 dairy cow in bangla
Tail to tail shed 20 dairy cow in banglaTail to tail shed 20 dairy cow in bangla
Tail to tail shed 20 dairy cow in bangla
 
Gd
GdGd
Gd
 
Williams darnell finalppp_slideshow
Williams darnell finalppp_slideshowWilliams darnell finalppp_slideshow
Williams darnell finalppp_slideshow
 

Viewers also liked

แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่Kamthon Sarawan
 
Canals de tv via satel·lit asma
Canals de tv via satel·lit asmaCanals de tv via satel·lit asma
Canals de tv via satel·lit asmamgonellgomez
 
V mware organizing-for-the-cloud-whitepaper
V mware organizing-for-the-cloud-whitepaperV mware organizing-for-the-cloud-whitepaper
V mware organizing-for-the-cloud-whitepaperEMC
 
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...EMC
 
La televisió blai
La televisió blaiLa televisió blai
La televisió blaimgonellgomez
 
4 Ms of Big Data: Make Me More Money – Infographic
4 Ms of Big Data: Make Me More Money – Infographic4 Ms of Big Data: Make Me More Money – Infographic
4 Ms of Big Data: Make Me More Money – InfographicEMC
 
Forbidden fruits of Active Directory – Cloning, snapshotting, virtualization
Forbidden fruits of Active Directory  –  Cloning, snapshotting, virtualization Forbidden fruits of Active Directory  –  Cloning, snapshotting, virtualization
Forbidden fruits of Active Directory – Cloning, snapshotting, virtualization Microsoft TechNet - Belgium and Luxembourg
 
Flash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array DesignsFlash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array DesignsEMC
 
The colorful friends
The colorful friendsThe colorful friends
The colorful friendsChandan Dubey
 
El cas del... oriol, oriol i nil
El cas del... oriol, oriol i nilEl cas del... oriol, oriol i nil
El cas del... oriol, oriol i nilmgonellgomez
 
Dell Webinar 2014-06-24: Subqueries For Superheroes
Dell Webinar 2014-06-24: Subqueries For SuperheroesDell Webinar 2014-06-24: Subqueries For Superheroes
Dell Webinar 2014-06-24: Subqueries For SuperheroesTracy McKibben
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
International trade
International tradeInternational trade
International tradeTravis Klein
 
RSA Monthly Online Fraud Report -- May 2013
RSA Monthly Online Fraud Report -- May 2013RSA Monthly Online Fraud Report -- May 2013
RSA Monthly Online Fraud Report -- May 2013EMC
 

Viewers also liked (20)

แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
 
Canals de tv via satel·lit asma
Canals de tv via satel·lit asmaCanals de tv via satel·lit asma
Canals de tv via satel·lit asma
 
Day2
Day2 Day2
Day2
 
V mware organizing-for-the-cloud-whitepaper
V mware organizing-for-the-cloud-whitepaperV mware organizing-for-the-cloud-whitepaper
V mware organizing-for-the-cloud-whitepaper
 
Jose gafas
Jose gafasJose gafas
Jose gafas
 
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
 
La televisió blai
La televisió blaiLa televisió blai
La televisió blai
 
4 Ms of Big Data: Make Me More Money – Infographic
4 Ms of Big Data: Make Me More Money – Infographic4 Ms of Big Data: Make Me More Money – Infographic
4 Ms of Big Data: Make Me More Money – Infographic
 
Webdays blida mobile top 10 risks
Webdays blida   mobile top 10 risksWebdays blida   mobile top 10 risks
Webdays blida mobile top 10 risks
 
Day 7
Day 7Day 7
Day 7
 
Forbidden fruits of Active Directory – Cloning, snapshotting, virtualization
Forbidden fruits of Active Directory  –  Cloning, snapshotting, virtualization Forbidden fruits of Active Directory  –  Cloning, snapshotting, virtualization
Forbidden fruits of Active Directory – Cloning, snapshotting, virtualization
 
Flash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array DesignsFlash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array Designs
 
The colorful friends
The colorful friendsThe colorful friends
The colorful friends
 
El cas del... oriol, oriol i nil
El cas del... oriol, oriol i nilEl cas del... oriol, oriol i nil
El cas del... oriol, oriol i nil
 
Warren buffet
Warren buffetWarren buffet
Warren buffet
 
Dell Webinar 2014-06-24: Subqueries For Superheroes
Dell Webinar 2014-06-24: Subqueries For SuperheroesDell Webinar 2014-06-24: Subqueries For Superheroes
Dell Webinar 2014-06-24: Subqueries For Superheroes
 
Thebracelet
ThebraceletThebracelet
Thebracelet
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
International trade
International tradeInternational trade
International trade
 
RSA Monthly Online Fraud Report -- May 2013
RSA Monthly Online Fraud Report -- May 2013RSA Monthly Online Fraud Report -- May 2013
RSA Monthly Online Fraud Report -- May 2013
 

Similar to Introduction to wwwjdic project

Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013Amazon Web Services
 
"今" 使えるJavaScriptのトレンド
"今" 使えるJavaScriptのトレンド"今" 使えるJavaScriptのトレンド
"今" 使えるJavaScriptのトレンドHayato Mizuno
 
Polyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPraPolyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPraMathias Karlsson
 
MongoDB shell games: Here be dragons .. and JavaScript!
MongoDB shell games: Here be dragons .. and JavaScript!MongoDB shell games: Here be dragons .. and JavaScript!
MongoDB shell games: Here be dragons .. and JavaScript!Stennie Steneker
 
Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012threepointone
 
Leveling Up at JavaScript
Leveling Up at JavaScriptLeveling Up at JavaScript
Leveling Up at JavaScriptRaymond Camden
 
Node.js Anti Patterns
Node.js Anti PatternsNode.js Anti Patterns
Node.js Anti PatternsBen Hall
 
Your Library Sucks, and why you should use it.
Your Library Sucks, and why you should use it.Your Library Sucks, and why you should use it.
Your Library Sucks, and why you should use it.Peter Higgins
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainKen Collins
 
Writing your Third Plugin
Writing your Third PluginWriting your Third Plugin
Writing your Third PluginJustin Ryan
 
Social Coding With JRuby
Social Coding With JRubySocial Coding With JRuby
Social Coding With JRubyKoichiro Ohba
 
Getting Started With Play Framework
Getting Started With Play FrameworkGetting Started With Play Framework
Getting Started With Play FrameworkTreasury user10
 
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef PROIDEA
 
Metasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUMetasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUKiwamu Okabe
 
Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)True-Vision
 
Why Rust? by Edd Barrett (codeHarbour December 2019)
Why Rust? by Edd Barrett (codeHarbour December 2019)Why Rust? by Edd Barrett (codeHarbour December 2019)
Why Rust? by Edd Barrett (codeHarbour December 2019)Alex Cachia
 

Similar to Introduction to wwwjdic project (20)

Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
 
"今" 使えるJavaScriptのトレンド
"今" 使えるJavaScriptのトレンド"今" 使えるJavaScriptのトレンド
"今" 使えるJavaScriptのトレンド
 
Polyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPraPolyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPra
 
MongoDB shell games: Here be dragons .. and JavaScript!
MongoDB shell games: Here be dragons .. and JavaScript!MongoDB shell games: Here be dragons .. and JavaScript!
MongoDB shell games: Here be dragons .. and JavaScript!
 
Shell Script
Shell ScriptShell Script
Shell Script
 
Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012
 
MateriApps LIVE! の設定
MateriApps LIVE! の設定MateriApps LIVE! の設定
MateriApps LIVE! の設定
 
Leveling Up at JavaScript
Leveling Up at JavaScriptLeveling Up at JavaScript
Leveling Up at JavaScript
 
Node.js Anti Patterns
Node.js Anti PatternsNode.js Anti Patterns
Node.js Anti Patterns
 
Your Library Sucks, and why you should use it.
Your Library Sucks, and why you should use it.Your Library Sucks, and why you should use it.
Your Library Sucks, and why you should use it.
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own Domain
 
Writing your Third Plugin
Writing your Third PluginWriting your Third Plugin
Writing your Third Plugin
 
Social Coding With JRuby
Social Coding With JRubySocial Coding With JRuby
Social Coding With JRuby
 
Ruby ile tanışma!
Ruby ile tanışma!Ruby ile tanışma!
Ruby ile tanışma!
 
Getting Started With Play Framework
Getting Started With Play FrameworkGetting Started With Play Framework
Getting Started With Play Framework
 
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
 
Metasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUMetasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCU
 
03 tk2123 - pemrograman shell-2
03   tk2123 - pemrograman shell-203   tk2123 - pemrograman shell-2
03 tk2123 - pemrograman shell-2
 
Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)
 
Why Rust? by Edd Barrett (codeHarbour December 2019)
Why Rust? by Edd Barrett (codeHarbour December 2019)Why Rust? by Edd Barrett (codeHarbour December 2019)
Why Rust? by Edd Barrett (codeHarbour December 2019)
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Introduction to wwwjdic project

Editor's Notes

  1. My name is Mark BurnsI'm a ruby developer, I speak Japanese, and I'm on holiday from England.
  2. I'm here to talk today about Jim Breen's Japanese Dictionary, wwwjdic,in particular, an open source rewrite of this online dictionary. As you may have guessed, it's originally written and maintained mostly byJim Breen, who is a retired professor (and current PhD student) at MonashUniversity in Melbourne Australia.It's freely available, actually I'm not 100% sure about the license, I'm no internet/international lawyer, but it's a flexible license that allows free and commercial use, but with a 'please-do-the-right-thing'and donate some money if it benefits you kind of deal
  3. So the start of the rewrite is available here: [URL]I'll also show the slideshare URL at the end of the talk so youcan make a note to be able to see all the various linksIn the past I've spoke to Jim about making improvements to the webinterface of the dictionary. I feel it could be better presented andmore user-friendly/intuitive.
  4. For example a typical lookup would be this kind of interaction:Visit wwwjdic.comredirected to this long URL with a particular query param for the word-search pagefill in a form and do a POST request toa URL with a specific query string parameter andspecifically encoded bodyAnd the results are currently available as HTML that looks likethis:
  5. So it's great, if you like information, and know where to look.You have links to everything you might need to do, and more.And it's this 'and more', that I think is the issue with a lot of information presentation.To be honest, it's not great for beginners, without thought on hierarchy of importance of information(which I'll come back to) Now, there's nothing wrong with this at all, it's just that it suitsits specific audience in particular. And by that I mean, technicallyminded learners of Japanese. I can only guess, but I also imagine it is morecommonly known amongst English native speakers than native Japanese.
  6. I thought it would be nicer to be able to make it in general more accessibleSo my aims of creating this project are thus:* Provide a JSON API* A Cleaner UI/UX* Autocomplete/other nice UI touches* Maintainability
  7. 8.Propose APIwhere you can GET a simply defined (easy to remember) URLGET http://wwwjdic.com/egg.json
  8. And some nicer design for the HTML output. now I'm not a front-end designer by any means, but I can appreciate the philosophyof clean design
  9. A first attempt was made using the Rails flavour of the ActiveRecord pattern against an SQL backend . (Easy to get up andrunning, but squeezes the concepts of domain model and persistence together). But a dictionary is much more read heavy than write heavy,and the model of languages doesn't fit as well in a relationaldatabase. The existing data is a few flat text files so I wanted toget a decent compromise for maintainability and it would be nice tonot completely throw away all the performance of the existingsystem's custom C code reading from flat text files.
  10. Autocomplete was done with a trie index The whole code and concept was pretty much taken from Antirez's (theauthor of redis) blog post http://oldblog.antirez.com/post/autocomplete-with-redis.html It scales quite nicely, as the entries are of the 150,000 magnitude Time O(log(N)) Space N*(Ma+1) Where Ma is average length of a word (5.6) =~51MB
  11. OK some details, Not too specific, but detailed enough hopefully to keep everyone happy. This is a result of doing a lookup on an index generated for autocompletion. E.g. the user searched for ‘egg’, and the list shows all the following matches in the autocomplete list.
  12. Here’s the lookup
  13. After entering ‘ eg ’ this is the value of `matches` Where we iterate over each match, and if the match doesn’t match, we break out. otherwise we append the match to our list of matches
  14. Here we have an example where the user has entered “walr” and the break clause is hit, as the value “walt” does not match “walr”
  15. In my work for shutl, a UK startup aimed at solving the onlinedelivery problem we use graph databases to help us match upcarrier/vehicle availability and pricing with customer requirementsand retail store opening hours. I think it could be interesting tostart structuring the data in a graph format. Words can at least belinked to the entries listed in their definitions. There can be amore semantically rich level of relationships represented though
  16. I think that mapping words to a graph is a more natural way to expressthe relationship between two languages. Firstly, you don't always haveisomorphic (one-to-one) relationships between any two words in eitherlanguage. すごい can mean in English either great or terrible. It can meansomething like wonderful or fantastic, as well as dreadful. I oftenstruggle with words that are their own antonyms, this was particularlyrelevant to me as on the day of the large Touhoku earthquake, I was ona shinkansen heading into Tokyo. After being on the train for sixhours, I needed to get a beer and find some people to chat to to findout what had happened. I'd understood that there was an earthquake,but it was my first experience of an earthquake and I hadn't yetgrasped the magnitude of it in both the literal and metaphoricalsenses of the term magnitude. So I found a guy who wanted to practicehis English, and he explained to me that "This is a great day forJapan". "Very great" I understanding something along the lines ofwonderful/fantastic had to ask him "Why? Is it a national holiday?Maybe the emperor's birthday?" Of course, it occured to me when Itranslated his sentence into Japanese in my head, choosing すごい forgreat that he must have meant the terrible/dreadful sense of the word.So clearly there is a need for a richer, more expressive data modelthat can capture these nuances and senses, and not just provide aone-to-one lookup service.
  17. Due to Jim's relationship with Monash University, hehas access to google's data-set of Japanese n-grams. An n-gram 安心リフォームへの近道 安心 リフォーム へ の 近道 [TAB]29 (5-gramsample) 安心 + リフォーム + へ + の + 近道安心 [TAB]41322178 安 心 [TAB]3274So this sequence of words occurred 29 times during the datacollection.By utilising this data we can look at making search have morerelevance. One of the problems with the existing flat file structureis that there is no meta-data helping with understanding how recent orrelevant a particular result is. Some of the terms may be legal orscientific terms, or pre-1945,Can be useful for spotting common co-locations too.