Our work on the EC-TEL paper data extraction.

  • 489 views
Uploaded on

my slides on the work we did on the data extraction of the PDF files of the proceedings of the last 4 years of EC-TEL conferences.

my slides on the work we did on the data extraction of the PDF files of the proceedings of the last 4 years of EC-TEL conferences.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
489
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. How we did it... Parscit
  • 2. How we did it... Parscit
  • 3. How we did it... Parscit
  • 4. How we did it... Parscit
  • 5. How we did it... Parscit
  • 6. How we did it... Parscit REST API
  • 7. Lessons learned • data gathering from PDF is only OK for some data • alot of cleanup work + complexity with distributed clean up data • future: more structured data as a starting point.
  • 8. What we want... • clean citation data • geographical data: author - affiliation links • structured data • ...
  • 9. What might be helpful... } PDF Author Title