0
Left to Their Own Devices:  Automating XML Parsing andRendering for Scholarly Publishing    Alex Garnett & John Willinsky ...
What do we want? XML Publishing!• When do we want it? 2004 would’ve been  nice…• We’ve known the value of properly marked ...
The Public Knowledge Project• Developers of Open Journal Systems &  Open Monograph Press  – Open source software to    sup...
Nice things that PDF doesn’t have•   Well-structured text mining & indexing•   Rendering in different formats (e.g. mobile...
XML Publishing Workflows• Are complex and underdocumented, requiring  lots of manual labour, since no author will ever  wr...
Toolchain• External Services:  – LibreOffice – document conversion  – pdfx – fuzzy parsing  – ParsCit – fuzzy citation par...
Future Work• After incorporating upstream changes from pdfx  (fixing punctutation & non-English languages)  we’re aiming t...
Future Work not done by us• Collaborators at Heidelberg University are  working on a WYSIWYG in-browser XML  editor for ma...
Thanks• Damion Dooley, our primary developer• Steve Pettifer and the University of Manchester  for allowing us to use pdfx...
Questions?• If you want to use our service for document  preparation right now, contact me (Alex) at  axfelix@gmail.com.• ...
MediaX (Jan 2013) -- PKP XML Parsing
Upcoming SlideShare
Loading in...5
×

MediaX (Jan 2013) -- PKP XML Parsing

367

Published on

1 Comment
0 Likes
Statistics
Notes
  • demo at http://142.58.129.113/dev/ during slide 6 (can only guarantee this will be functional on jan 8, 2013)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
367
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • (5 minute demo happens here)
  • Transcript of "MediaX (Jan 2013) -- PKP XML Parsing"

    1. 1. Left to Their Own Devices: Automating XML Parsing andRendering for Scholarly Publishing Alex Garnett & John Willinsky Public Knowledge Project
    2. 2. What do we want? XML Publishing!• When do we want it? 2004 would’ve been nice…• We’ve known the value of properly marked up documents for a few decades now – Unfortunately, this entails hours of marking.• Open-source publishers on limited budgets can’t afford the outsourcing or the grad students that normally make this possible
    3. 3. The Public Knowledge Project• Developers of Open Journal Systems & Open Monograph Press – Open source software to support open access publishing. – http://pkp.sfu.ca• Our userbase happens to include many such small publishers, who publish almost exclusively in PDF, given its ease.
    4. 4. Nice things that PDF doesn’t have• Well-structured text mining & indexing• Rendering in different formats (e.g. mobile)• Embedded dynamic content• Citation parsing and lookup• Reliable metadata• So why are we still using it, again?
    5. 5. XML Publishing Workflows• Are complex and underdocumented, requiring lots of manual labour, since no author will ever write in XML, and only a small fraction will use Markdown or LaTeX or some other text format that’s easy to transform, and most automated parsing tools are in deplorable condition anyhow, rant rant rant, despite the fact that there are many very good piecemeal tools available at different stages of these workflows. We put some of them together.
    6. 6. Toolchain• External Services: – LibreOffice – document conversion – pdfx – fuzzy parsing – ParsCit – fuzzy citation parsing – citeproc/CSL – citation transformation
    7. 7. Future Work• After incorporating upstream changes from pdfx (fixing punctutation & non-English languages) we’re aiming to have an OJS plugin by March.• OMP will follow soon after.• By the end of our initial funding period in June, we’ll have a source release (without pdfx) and plan to be supporting a set of OJS/OMP users.
    8. 8. Future Work not done by us• Collaborators at Heidelberg University are working on a WYSIWYG in-browser XML editor for manually revising article formatting.• The University of Michigan’s mPach system will add ePub generation and HathiTrust ingest.• CrossRef will be contributing functionality to look up, verify, and link parsed citations.
    9. 9. Thanks• Damion Dooley, our primary developer• Steve Pettifer and the University of Manchester for allowing us to use pdfx• Juan Alperin and the rest of the PKP team for their support and earlier work• Alf Eaton from the NLM for stylesheets• MediaX for funding this project
    10. 10. Questions?• If you want to use our service for document preparation right now, contact me (Alex) at axfelix@gmail.com.• We’ll have a stable version available by the end of January (probably free with registration)• OJS/OMP integration and standalone release (without pdfx) coming soon!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×