Successfully reported this slideshow.
Creating Zotero Translators Using Framework               Sebastian Karcher         Boston College, November 2012
Why Learn About Translators?      One-click import from the web is perhaps the key features      that distinguishes Zotero...
Some Notes on Zotero Translators      Each Zotero translator is an individual file, written in      javascript      There a...
(I know last week was Halloween but:) This isn’t going tobe scary!      You cannot break Zotero by fiddling with translator...
Xpaths are “directions” on a website      xpaths are basically “directions” used to point to a part of a      webpage     ...
The most basic Xpath      Give directions: at every corner/node, tell Zotero where to go:      Let’s say we want to go go ...
Making Xpaths more precise      But we’re still “lost” - which of the two “div” streets do we      go down?      Option 1:...
Making Xpaths more efficient      In an actual webpage, an xpath can be very long, so we’d like      to make them shorter. w...
A sample Framework Translator - Single item   /** Articles */   FW.Scraper({   itemType : ’blogPost’,   detect : FW.Xpath(...
A sample Framework Translator - Multiples   /** Search results */   FW.MultiScraper({   itemType : "multiple",   detect : ...
Our Tools      Scaffold - a Firefox extension to write and test the translator      Firefox “Inspect Element” - to help us ...
Upcoming SlideShare
Loading in …5
×

Zotero Framework Translators

483 views

Published on

Presentation given as part of the Zotero Training Workshops, Fall 2012. Original authored in Pandoc markdown and available on github: https://github.com/adam3smith/zotero-workshops

Published in: Education
  • Be the first to comment

Zotero Framework Translators

  1. 1. Creating Zotero Translators Using Framework Sebastian Karcher Boston College, November 2012
  2. 2. Why Learn About Translators? One-click import from the web is perhaps the key features that distinguishes Zotero In this session we will write a “screen scraper” type translator for Zotero Best Case: This will allow you to write translators for sites you or your “clients” need Minimal Case: This will give you an undertstanding on how translators work and what may be possible, even if you’re not going to do it yourself
  3. 3. Some Notes on Zotero Translators Each Zotero translator is an individual file, written in javascript There are four types of translators: Web, Import, Search, Export Some web translators, like those for many libraries, call on an import translators (e.g. MARC) - we won’t learn about those. Other web translators “scrape” data from the page - that is what we will do now
  4. 4. (I know last week was Halloween but:) This isn’t going tobe scary! You cannot break Zotero by fiddling with translators - you can always “reset” from the advanced panel of the preferences About 2 years ago, Eric Hetzner of the UC libraries developed a “framework” for wrting translators –> now you don’t need any javascript. Just Xpaths and regular expressions. And those are easy!
  5. 5. Xpaths are “directions” on a website xpaths are basically “directions” used to point to a part of a webpage A webpage is built up from a number of nested nodes This is what the most simple webpage looks like <html> <head> <title>A Basic Webpage</title> </head> <body> <div id="title">The Title of the webpage</div> <div id="content" class="text">The Content of the w </body> </html>
  6. 6. The most basic Xpath Give directions: at every corner/node, tell Zotero where to go: Let’s say we want to go go to “The Content of the webpage” “Take the HTLM road, take a left at”body“, then take the”div” street, or in HTML: /html/body/div
  7. 7. Making Xpaths more precise But we’re still “lost” - which of the two “div” streets do we go down? Option 1: Take the second <div> /html/body/div[2] Option 2: Take the <div> that has “content” as an id /html/body/div[@id="content"]
  8. 8. Making Xpaths more efficient In an actual webpage, an xpath can be very long, so we’d like to make them shorter. we can use // to start anywhere in the html tree, e.g “the <div> with”content” as an “id” anywhere on the site: //div[@id="content"] Sometimes we don’t want the precise content of an attribute like id - in those case we can use contains() as in //div[contains(@id, "cont")] We can combine conditions with “and” or “or” (in lowercase!) //div[@id="content" and @class="text"]
  9. 9. A sample Framework Translator - Single item /** Articles */ FW.Scraper({ itemType : ’blogPost’, detect : FW.Xpath(’//h1[@class="article-title"]’), title : FW.Xpath(’//h1[@class="article-title"]’).text().tri attachments : { url : FW.Url(),title : "voxEU snapshot",type : "text/html }, creators : FW.Xpath(’//div[@class="author"] //span[@class="field-content"]/a’).text().cleanAuthor("auth abstractNote : FW.Xpath(’//div[@class="article-teaser"]’).t date : FW.Xpath(’//h1[@class="article-title"]/following-sib publicationTitle : "VoxEU.org" });
  10. 10. A sample Framework Translator - Multiples /** Search results */ FW.MultiScraper({ itemType : "multiple", detect : FW.Xpath(’//ul[contains(@class, "search-results")] choices : { titles : FW.Xpath(’//ul[contains(@class, "search-results")] /li/h2/a’).text(), urls : FW.Xpath(’//ul[contains(@class, "search-results")] /li/h2/a’).key(’href’).text() } });
  11. 11. Our Tools Scaffold - a Firefox extension to write and test the translator Firefox “Inspect Element” - to help us understand the structure of a webpage (there are alternatives like “Firebug”)

×