Our work on the EC-TEL paper data extraction.

707 views
638 views

Published on

my slides on the work we did on the data extraction of the PDF files of the proceedings of the last 4 years of EC-TEL conferences.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
707
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Our work on the EC-TEL paper data extraction.

    1. 1. How we did it... Parscit
    2. 2. How we did it... Parscit
    3. 3. How we did it... Parscit
    4. 4. How we did it... Parscit
    5. 5. How we did it... Parscit
    6. 6. How we did it... Parscit REST API
    7. 7. Lessons learned • data gathering from PDF is only OK for some data • alot of cleanup work + complexity with distributed clean up data • future: more structured data as a starting point.
    8. 8. What we want... • clean citation data • geographical data: author - affiliation links • structured data • ...
    9. 9. What might be helpful... } PDF Author Title

    ×