Coping with the Long Tail of Data Variety (EDF 2014)


Published on

The talk will discuss current challenges, approaches and future directions for coping with data variety. The discussion will be grounded on exemplar use cases from
leaders in industry and in large-scale scientific projects such as IBM Watson, CrowdFlower, BBC, Press Association, ProteinDataBank,, Chemspider among others. The use cases were collected in interviews with Big Data industry and academic experts in the context of the BIG Project and provide a glimpse of the state of the art techniques which are currently being used to cope
with data variety and the future directions and emerging trends for this field.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Coping with the Long Tail of Data Variety (EDF 2014)

  1. 1.  Data curation is enabling more complete and high quality data-driven models for knowledge organisations.  eScience projects are the key innovators while Biomedical and Media companies are the early adopters.  Pre-competitive economic models can support the creation of curation infrastructures.  Curation at scale requires blending of automated curation platforms with large numbers of data curators.  Improvement of human-data interaction is needed.  Standards and models needed to reduce data curation effort.  Interviews with domain experts, sector case studies and literature analysis.  Focus on , and .  Five main categories of analysis:         Figure: The long tail of data variety and data curation scalability.  Provide a for the future of data curation.  Distributed data generation.  Data quality issues.  Increasing data variety and volume.  Data curation activities as a fundamental process for coping with the . Project co-funded by the European Commission within the 7th Framework Program (Grant Agreement No. 318062).