Algorithm any good?A Cloud-basedInfrastructure forEvaluation on Big DataAllan HanburyVienna University of Technology      ...
Evaluation Evaluation campaigns / Challenges /  Benchmarks / Competitions / ... Makes economic sense   “for every $1 th...
Evaluation Campaigns                                                          Ground                                      ...
Evaluation Campaigns                                                          Ground                                      ...
With Big Data?                                                          Ground                                            ...
Benchmarking Algorithms on Big Data Distributing terabytes is hard    Sending hard disks, download is not feasible    B...
Evaluation on the Cloud                        (http://visceral.eu) Bring the algorithms to the data, not the data    to...
Training Phase      Cloud                     Training Data   Test Data                                        Participant...
Evaluation Phase      Cloud                     Training Data   Test Data                                        Participa...
Annotators                   (Radiologists)                  Locally Installed                  Annotation                ...
Future Development Dealing with private data   Does it make sense to evaluate on data that the    participant cannot see...
Challenges Sharing components Who should provide the cloud service? Who pays for using it? Transferring components to ...
Upcoming SlideShare
Loading in...5
×

EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

675

Published on

Selected Talk by Allan Hanbury, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
675
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

  1. 1. Algorithm any good?A Cloud-basedInfrastructure forEvaluation on Big DataAllan HanburyVienna University of Technology The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318068 (VISCERAL).
  2. 2. Evaluation Evaluation campaigns / Challenges / Benchmarks / Competitions / ... Makes economic sense  “for every $1 that NIST and its partners invested in TREC, at least $3.35 to $5.07 in benefits accrued to IR researchers.” Has scientific impact
  3. 3. Evaluation Campaigns Ground truth Tasks Data OrganiserParticipants Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/
  4. 4. Evaluation Campaigns Ground truth Tasks Data OrganiserParticipants Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/
  5. 5. With Big Data? Ground truth Organiser Tasks DataParticipants Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/
  6. 6. Benchmarking Algorithms on Big Data Distributing terabytes is hard  Sending hard disks, download is not feasible  Bringing algorithms to the data is necessary Motivating participants  Tasks with general interest and few infrastructure barriers (how to store or treat terabytes ...)  Allow sharing infrastructure Manual ground truthing does not scale. Use:  Semi-automation (e.g. silver corpus)  Coercion (e.g. crowd sourcing)  …
  7. 7. Evaluation on the Cloud (http://visceral.eu) Bring the algorithms to the data, not the data to the algorithms  Put the data on the cloud  Participants program in computing instances on the cloud First benchmark on structure recognition in medical images
  8. 8. Training Phase Cloud Training Data Test Data Participant Instances Registration System Analysis SystemParticipants Organiser
  9. 9. Evaluation Phase Cloud Training Data Test Data Participant Instances Registration System Analysis SystemParticipants Organiser
  10. 10. Annotators (Radiologists) Locally Installed Annotation Clients Annotation Management System Cloud Training Data Test Data Participant Instances Registration System Analysis SystemParticipants Organiser
  11. 11. Future Development Dealing with private data  Does it make sense to evaluate on data that the participant cannot see?  Does it make sense to evaluate only on extracted features? Moving toward eScience  Data identifiers  Algorithm identifiers? Continuous evaluation Modular construction of the algorithms
  12. 12. Challenges Sharing components Who should provide the cloud service? Who pays for using it? Transferring components to industry

×