EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

•

0 likes•988 views

Selected Talk by Allan Hanbury, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

Technology Education

Algorithm any good?
A Cloud-based
Infrastructure for
Evaluation on Big Data
Allan Hanbury
Vienna University of Technology

The research leading to these results has received funding from the European Union Seventh
Framework Programme (FP7/2007-2013) under grant agreement n° 318068 (VISCERAL).

Evaluation

 Evaluation campaigns / Challenges /
Benchmarks / Competitions / ...
 Makes economic sense
 “for every $1 that NIST and its partners invested in
TREC, at least $3.35 to $5.07 in benefits accrued
to IR researchers.”
 Has scientific impact

Evaluation Campaigns
Ground
truth
Tasks Data
Organiser

Participants

Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/

With Big Data?
Ground
truth

Organiser

Tasks Data

Participants

Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/

Benchmarking Algorithms on Big Data

 Distributing terabytes is hard
 Sending hard disks, download is not feasible
 Bringing algorithms to the data is necessary
 Motivating participants
 Tasks with general interest and few infrastructure
barriers (how to store or treat terabytes ...)
 Allow sharing infrastructure
 Manual ground truthing does not scale. Use:
 Semi-automation (e.g. silver corpus)
 Coercion (e.g. crowd sourcing)
 …

Evaluation on the Cloud

 (http://visceral.eu)

 Bring the algorithms to the data, not the data
to the algorithms
 Put the data on the cloud
 Participants program in computing instances on
the cloud
 First benchmark on structure recognition in
medical images

Training Phase

Cloud
Training Data Test Data

Participant
Instances
Registration
System
Analysis
System

Participants Organiser

Evaluation Phase

Cloud
Training Data Test Data

Participant
Instances
Registration
System
Analysis
System

Participants Organiser

Annotators
(Radiologists)

Locally Installed
Annotation
Clients
Annotation
Management System
Cloud
Training Data Test Data

Participant
Instances
Registration
System
Analysis
System

Participants Organiser

Future Development

 Dealing with private data
 Does it make sense to evaluate on data that the
participant cannot see?
 Does it make sense to evaluate only on extracted
features?
 Moving toward eScience
 Data identifiers
 Algorithm identifiers?
 Continuous evaluation
 Modular construction of the algorithms

Challenges

 Sharing components
 Who should provide the cloud service?
 Who pays for using it?
 Transferring components to industry

Biomedical data exploration requires integrative analyses of large datasets using a diverse ecosystem of tools. For more than a decade, the Galaxy project (https://galaxyproject.org) has provided researchers with a web-based, user-friendly, scalable data analysis framework complemented by a rich ecosystem of tools (https://usegalaxy.org/toolshed) used to perform genomic, proteomic, metabolomic, and imaging experiments. Galaxy can be deployed on the cloud (https://launch.usegalaxy.org), institutional computing clusters, and personal computers, or readily used on a number of public servers (e.g., https://usegalaxy.org). In this paper, we present our plan and progress towards creating Galaxy-as-a-Service—a federation of distributed data and computing resources into a panoptic analysis platform. Users can leverage a pool of public and institutional resources, in addition to plugging-in their private resources, helping answer the challenge of resource divergence across various Galaxy instances and enabling seamless analysis of biomedical data.

Eyeo 2019-Lightning-Cytoscape

Keiichiro Ono

The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Ian Foster

Director's Colloquium at Los Alamos National Laboratory, September 18, 2014. We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. In this talk, I explore the past, current, and potential future of large-scale outsourcing and automation for science.

EventShop Demo

Siripen Pongpaichet

SKG-2013, Beijing, China, 03 October 2013

Charith Perera

36x48_Trifold_FinalPosterRyan Riopelle, EIT

Today there are so much data being available from sources like sensors (RFIDs, Near Field Communication), web activities, transactions, social networks, etc. Making sense of this avalanche of data requires efficient and fast processing. Processing of high volume of events to derive higher-level information is a vital part of taking critical decisions, and Complex Event Processing (CEP) has become one of the most rapidly emerging fields in data processing. e-Science use-cases, business applications, financial trading applications, operational analytics applications and business activity monitoring applications are some use-cases that directly use CEP. This paper discusses different design decisions associated with CEP Engines, and proposes some approaches to improve CEP performance by using more stream processing style pipelines. Furthermore, the paper will discuss Siddhi, a CEP Engine that implements those suggestions. We present a performance study that exhibits that the resulting CEP Engine—Siddhi—has significantly improved performance. Primary contributions of this paper are performing a critical analysis of the CEP Engine design and identifying suggestions for improvements, implementing those improvements through Siddhi, and demonstrating the soundness of those suggestions through empirical evidence.

Situation Awareness In A Complex World

vsorathia

IRJET- Criminal Recognization in CCTV Surveillance Video

IRJET Journal

Scalable Computing Labs (SCL).

Mindtree Ltd.

Mindtree is one of the first IT service providers to invest in emerging technologies and has developed various technology assets. Customers in product engineering services benefit heavily from our domain expertise. Some of the technology assets developed include short-range wireless connectivity technologies such as Bluetooth and UWB, Video Analytic Algorithms, Acoustic Echo Cancellation, Audio Codecs, VoIP Stacks, etc.

Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia

Review of Algorithms for Crime Analysis & Prediction

IRJET Journal

Microservices Architecture Part 2 Event Sourcing and Saga

Araf Karsh Hamid

Understand the Microservices Architecture concepts Understand Event Sourcing and CQRS Understanding Domain Driven Design Understanding Functional Reactive Programming Understanding Distributed Transaction Management Understanding Microservices Messaging Setting up Micro services Infrastructure (API Gateway, Service Discovery, Load Balancer, Circuit Breaker) https://github.com/meta-magic/microservice_workshop

IBM Smarter Business 2012 - PureSystems - PureData

IBM Sverige

År 2013 kommer nästan 70 % av företagens driftkostnader att läggas på existerande IT. Endaste en av fem organisationer lägger idag mer än 50 % av IT-budgeten på nya projekt. Med vetskap om den digitala tillväxt vi har framför oss, handlar ödesfrågan enligt Don Boulia, Vice President Strategy, IBM Software Group, därför om hur väl ett företags infrastruktur klarar förändringarna. Nya lanseringen i familjen IBM PureSystems, PureData, adresserar utmaningarna med stora datamängder. Talare: Don Boulia, Vice President IBM WebSphere Cloud, Per Fredriksson, IBM PureSystems Executive Architect Besök http://smarterbusiness.se för mer information.

IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...

IRJET Journal

A vision on collaborative computation of things for personalized analyses

Daniele Gianni

Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...

IRJET Journal

IRJET - A Genetic Approach for Reversible Database Watermarking using Fingerp...

IRJET Journal

Making Runtime Data Useful for Incident Diagnosis: An Experience Report

QAware GmbH

QuASD/PROFES 2018, Wolfsburg: Talk by Marcus Ciolkowski (@M_Ciolkowski, Principal IT Consultant at QAware) and Florian Lautenschlager (@flolaut, Senior Software Engineer) === Please download slides if blurred! === Abstract: Important and critical aspects of technical debt often surface at runtime only and are difficult to measure statically. This is a particular challenge for cloud applications because of their highly distributed nature. Fortunately, mature frameworks for collecting runtime data exist but need to be integrated. In this paper, we report an experience from a project that implements a cloud application within Kubernetes on Azure. To analyze the runtime data of this software system, we instrumented our services with Zipkin for distributed tracing; with Prometheus and Grafana for analyzing metrics; and with fluentd, Elasticsearch and Kibana for collecting, storing and exploring log files. However, project team members did not utilize these runtime data until we created a unified and simple access using a chat bot. We argue that even though your project collects runtime data, this is not sufficient to guarantee its usage: In order to be useful, a simple, unified access to different data sources is required that should be integrated into tools that are commonly used by team members. Get the research paper: http://bitly.com/2QmSNwl

Io t technologies_ppt-2

achakracu

Performance of Hasty and Consistent Multi Spectral Iris Segmentation using De...

ijtsrd

The recognition system is composed of seven phases acquisition, preprocessing, segmentation, normalization, feature extraction, feature selection, and classification. In the acquisition phase, iris images are captured, followed by preprocessing to enhance the quality of the images. The segmentation phase involves separating the iris region from the background, and the normalized iris region is shaped into a rectangle in the normalization phase. Iris segmentation is a critical step in iris recognition systems and has a direct impact on authentication and recognition results. However, standard segmentation techniques may not perform well in noisy iris databases captured under challenging conditions. Moreover, the lack of large iris databases hinders the performance improvement of convolution neural networks. The proposed method addresses these challenges by effectively handling irregular iris images captured under visible light. The iris region is processed and evaluated to generate a unique feature vector, which is then used for person identification. VGG16, a well known deep learning model, is employed for image classification, and the feature vector is fed into VGG16 for classification purposes. Ram Niwas Sharma | Ankit Kumar Navalakha | Neha Sharma "Performance of Hasty and Consistent Multi Spectral Iris Segmentation using Deep Learning" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-5 , October 2023, URL: https://www.ijtsrd.com/papers/ijtsrd59853.pdf Paper Url: https://www.ijtsrd.com/engineering/computer-engineering/59853/performance-of-hasty-and-consistent-multi-spectral-iris-segmentation-using-deep-learning/ram-niwas-sharma

Viewers also liked

Power pointMila Smw

EDF2013: Selected Talk Nikolaos Loutas, João Rodrigues Frade: Linked Open Gov...

European Data Forum

Rol del docente y del alumno ante las ticNelba Quintana

EDF2013: Selected Talk, Peter Haase: Optique: Scalable End-User Access to Big...

European Data Forum

Delivering on Standards for Publishing Government Linked Data

3 Round Stones

EDF2013: Selected Talk John Sheridan: Good Law from Open Data

European Data Forum

Viewers also liked (6)

Power point

EDF2013: Selected Talk Nikolaos Loutas, João Rodrigues Frade: Linked Open Gov...

Rol del docente y del alumno ante las tic

EDF2013: Selected Talk, Peter Haase: Optique: Scalable End-User Access to Big...

Delivering on Standards for Publishing Government Linked Data

EDF2013: Selected Talk John Sheridan: Good Law from Open Data

Similar to EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

Knowledge Discovery in Production

André Karpištšenko

Siddhi: A Second Look at Complex Event Processing Implementations

Srinath Perera

Situation Awareness In A Complex World

vsorathia

IRJET- Criminal Recognization in CCTV Surveillance Video

IRJET Journal

Scalable Computing Labs (SCL).

Mindtree Ltd.

Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia

Review of Algorithms for Crime Analysis & Prediction

IRJET Journal

Microservices Architecture Part 2 Event Sourcing and Saga

Araf Karsh Hamid

IBM Smarter Business 2012 - PureSystems - PureData

IBM Sverige

IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...

IRJET Journal

A vision on collaborative computation of things for personalized analyses

Daniele Gianni

Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...

IRJET Journal

IRJET - A Genetic Approach for Reversible Database Watermarking using Fingerp...

IRJET Journal

Making Runtime Data Useful for Incident Diagnosis: An Experience Report

QAware GmbH

Io t technologies_ppt-2

achakracu

Performance of Hasty and Consistent Multi Spectral Iris Segmentation using De...

ijtsrd

陸永祥/全球網路攝影機帶來的機會與挑戰

台灣資料科學年會

Appistry WGDAS Presentation

elasticdave

V1_I1_2012_Paper3.docx

praveena06

Abstract-Intrusion Detection System used to discover attacks against computers and network Infrastructures. There are many techniques used to determine the IDS such as Outlier Detection Schemes for Anomaly Detection, K-Mean Clustering of monitoring data, classification detection and outlier detection. The data mining approaches help to determine what meets the criteria as an intrusion versus normal traffic, whether a system uses anomaly detection, misuse detection, target monitoring, or stealth probes. This paper attempts to evaluate, categorize, compares and summarizes the performance of data mining techniques to detect the intrusion.

How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...

RightScale

Similar to EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data (20)

Knowledge Discovery in Production

Siddhi: A Second Look at Complex Event Processing Implementations

Situation Awareness In A Complex World

IRJET- Criminal Recognization in CCTV Surveillance Video

Scalable Computing Labs (SCL).

Tim Malthus_Towards standards for the exchange of field spectral datasets

Review of Algorithms for Crime Analysis & Prediction

Microservices Architecture Part 2 Event Sourcing and Saga

IBM Smarter Business 2012 - PureSystems - PureData

IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...

A vision on collaborative computation of things for personalized analyses

Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...

IRJET - A Genetic Approach for Reversible Database Watermarking using Fingerp...

Making Runtime Data Useful for Incident Diagnosis: An Experience Report

Io t technologies_ppt-2

Performance of Hasty and Consistent Multi Spectral Iris Segmentation using De...

陸永祥/全球網路攝影機帶來的機會與挑戰

Appistry WGDAS Presentation

V1_I1_2012_Paper3.docx

How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

Free Complete Python - A step towards Data Science

RinaMondal9

Elizabeth Buie - Older adults: Are we really designing for our future selves?

Nexer Digital

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

Peter Spielvogel

Building better applications for business users with SAP Fiori. • What is SAP Fiori and why it matters to you • How a better user experience drives measurable business benefits • How to get started with SAP Fiori today • How SAP Fiori elements accelerates application development • How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities • How SAP Fiori paves the way for using AI in SAP apps

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Neo4j

Climate Impact of Software Testing at Nordic Testing Days

Kari Kakkonen

My slides at Nordic Testing Days 6.6.2024 Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

Microsoft - Power Platform_G.Aspiotis.pdf

Uni Systems S.M.S.A.

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Paige Cruz

Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack. While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack. I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

20240607 QFM018 Elixir Reading List May 2024

Matthew Sinclair

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

nkrafacyberclub

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

A tale of scale & speed: How the US Navy is enabling software delivery from l...

sonjaschweigert1

Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved: - Reduction in onboarding time from 5 weeks to 1 day - Improved developer experience and productivity through actionable findings and reduction of false positives - Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO) Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production. We will cover: - How to remove silos in DevSecOps - How to build efficient development pipeline roles and component templates - How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence) - How to streamline operations with automated policy checks on container images

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Free Complete Python - A step towards Data Science

Elizabeth Buie - Older adults: Are we really designing for our future selves?

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Securing your Kubernetes cluster_ a step-by-step guide to success !

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Climate Impact of Software Testing at Nordic Testing Days

Monitoring Java Application Security with JDK Tools and JFR Events

Microsoft - Power Platform_G.Aspiotis.pdf

The Art of the Pitch: WordPress Relationships and Sales

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Removing Uninteresting Bytes in Software Fuzzing

Generative AI Deep Dive: Advancing from Proof of Concept to Production

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

20240607 QFM018 Elixir Reading List May 2024

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

A tale of scale & speed: How the US Navy is enabling software delivery from l...

EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

1. Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data Allan Hanbury Vienna University of Technology The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318068 (VISCERAL).

2. Evaluation  Evaluation campaigns / Challenges / Benchmarks / Competitions / ...  Makes economic sense  “for every $1 that NIST and its partners invested in TREC, at least $3.35 to $5.07 in benefits accrued to IR researchers.”  Has scientific impact

3. Evaluation Campaigns Ground truth Tasks Data Organiser Participants Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/

4. Evaluation Campaigns Ground truth Tasks Data Organiser Participants Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/

5. With Big Data? Ground truth Organiser Tasks Data Participants Kyle Mcdonald: http://www.flickr.com/photos/kylemcdonald/6187343093/

6. Benchmarking Algorithms on Big Data  Distributing terabytes is hard  Sending hard disks, download is not feasible  Bringing algorithms to the data is necessary  Motivating participants  Tasks with general interest and few infrastructure barriers (how to store or treat terabytes ...)  Allow sharing infrastructure  Manual ground truthing does not scale. Use:  Semi-automation (e.g. silver corpus)  Coercion (e.g. crowd sourcing)  …

7. Evaluation on the Cloud  (http://visceral.eu)  Bring the algorithms to the data, not the data to the algorithms  Put the data on the cloud  Participants program in computing instances on the cloud  First benchmark on structure recognition in medical images

8. Training Phase Cloud Training Data Test Data Participant Instances Registration System Analysis System Participants Organiser

9. Evaluation Phase Cloud Training Data Test Data Participant Instances Registration System Analysis System Participants Organiser

10. Annotators (Radiologists) Locally Installed Annotation Clients Annotation Management System Cloud Training Data Test Data Participant Instances Registration System Analysis System Participants Organiser

11. Future Development  Dealing with private data  Does it make sense to evaluate on data that the participant cannot see?  Does it make sense to evaluate only on extracted features?  Moving toward eScience  Data identifiers  Algorithm identifiers?  Continuous evaluation  Modular construction of the algorithms

12. Challenges  Sharing components  Who should provide the cloud service?  Who pays for using it?  Transferring components to industry

EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data

Similar to EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data (20)

More from European Data Forum

More from European Data Forum (20)

Recently uploaded

Recently uploaded (20)

EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infrastructure for Evaluation on Big Data