I’m Cynthia Parr, chief scientist of the Encyclopedia of Life. I’ll be talking today about where we are after our first five years, and where we are going.Abstract: Hot on the heels of today's genomics revolution is a coming phenomics revolution. Unfortunately, like most biodiversity information, phenotype data are currently scattered, disorganized and inaccessible. To meet rising demand for access to species "traits" - the attributes or phenotypes of organisms - EOL is undergoing a major transformation. Building on its strengths as a digital resource of species information, EOL is working to add support for trait data users can search, download and crunch, analogous to what is possible with GenBank. This innovative work supports the Museum's efforts to accelerate the rate of biodiversity discovery in the 21st century.
We have over a million pages with content, some of it is even in other languages like Arabic. These are some pretty big numbers considering that we started from nothing five years ago. Here are some more big numbers.
We are in the midst of a genomics revolution.The cost to generate a full genome sequence is dropping more or less daily.What is all this genetic information DOING?How does it relate to what we can see and measure about organisms, their phenotypes, or their traits?How does DNA interact with the environment to result in both normal and abnormal developmentHow did it evolve? How fast do DNA changes make a difference in the lives of organisms?
Everyone wants to know theattributes of organisms. Look at the guy in yellow. He’s asking “what IS that thing.”People exploring the world find something and want to be able to search on characteristics they can seeTeachers want their students to become better at analyzing data, and how better than to work with real numerical information about the size of organisms or their behavior or what their sensitivity is to temperature and what might happen in the face of climate changeSo while scientists were asking us to provide data they could analyze, we heard the same thing from our educators.
Here’s a problem. Many different ways to describe something and many formats as shown here for the great frigatebirdRanging from the picture that is worth a thousand words but can only be image-searched by google, to hand-written notes in this example from the Smithsonian’s Field Book project, to a nice bit of readable text from EOL that doesn’t easily become a point on a chart, to these measurements which are fine but are specific to some definition of length, mass, or wingspan.
In the next year and a half we are tackling this with funding from the Sloan Foundation. We are starting with marine dataIn the most simplistic view, we’ll be storing triples, each of these can be linked so that the meaning, or the semantics of each part is clearly defined. Of course we’ll also make sure each triple links back to a dataset and all the appropriate credits.This data will be organized on a data tab, perhaps sorting out the data into the 35 or so “topics” that we currently have text chapters for, and we will also allow powerful downloading and searching capabilityFinally we’ll be setting up ways for other applications to grab the data and do interesting things with it.The approach here builds on our innovations for EOL and adds some proven technology called the “semantic web” to our domain.
Scale up involves opening up to nearly any attributes for any organism. Key providers couldinclude not only our current providers, some of whom have this data but also new sources like WikiData a new initiative of the Wikimedia Foundation and FreeBase owned by Google,orModel organism databases such as mouse or zebrafish, used in biomedical research. Or even long term conservation-related datasets generated by projects like the Smithsonian’s SI-GEO and Marine GEO initiatives.TraitBank should promote new ways of visualizing the data, like how do these fruits stack up against each other, and it should engage crowds to both to find attributes buried in our text and images and to find patterns in the data we have. We have 63,000 registered members, let’s give them interesting things to do and find.Nothing like this has ever been attempted at this scale in biology involving both integrating databases and involving crowds and some very very complex data. How complex?
Phenoscape is a database that is looking at anatomical traits in fishes. Looking just at 57 publications they have more than 500K descriptions for 2500 kinds of organisms.ZFIN is a model organism database for zebrafish, a common model organism for developmental biologists. In just this one species they have captured nearly 40,000 traits – just for ONE very well-studied SPECIES
We already have a professional working infrastructure as well as more than 200 partners, Everything is either public domain or CC-licensed so that it can very easily be re-used. We already harvest and sort text and multimedia by topic and by species and put it on our page. Curation + user-added content from the crowds is added to the mix.This is fed back to providers, giving them traffic, quality control on their own content, and new content for them to use And, we are already seeing spinoff products like a field guide tool, various games using the content, and a mobile exhibit application for the Birds of DC exhibit.All of these process will be essentially the same for trait data. Imagine the innovations that can come when developers have access to numerical, analyzable data about traits.
Strong LibrariesActive researchers publishing in modern journalsEfforts like the Global Genome initiativeBut mainly because our scientists and collections represent more than a hundred and fifty years of deep experience describing the biological diversity all over the planet AND we have a deep commitment to both the increase AND the diffusion of that knowledge.
EOL was always a big dream and we succeeded because we started simple – text and multimedia on a page.TB is not only innovative on its own, but it will FUEL innovation through its ability to support big scientific questions that are hard to answer otherwise. What will be the impact of climate change on our biosphere? What can we learn about genes and development given this huge evolutionary experiment of 1.9 million, evolving species? There is a multiplicative effect of every dollar invested in TB.The creation of the transistor led to the microchip led to the computer led to software led to the internet led to Amazon led to Cloud Services. TB is a step in a similar chain of innovations, leading from EOL, and leading to someplace wonderful.
The Road to TraitBank: What's Next for the Encyclopedia of Life
The Road to TraitBankWhats Next for the Encyclopedia of Lifeeol.org@eol
EOL TodayKey Milestones in 20131.1 million species pages242 content providers3 million unique visitorsfrom 223 countries &territories
GenBank60 million DNA sequence records900,000 species4,000 genomesHow are these related to traits?
First step: add traits to EOLFunded: Marine focus<scientific name> <name of attribute> <value><scientific name> <preysOn> <scientific name>Harvest and display on data tabDownloads, fancy searchingMachine access
Next step: Create TraitBank• Scale up• Promote best practices• Promote new tools &analysis• Enable crowd-sourcing
Quick mathIn Phenoscape57 publications had 565,158 anatomical traitdescriptions for 2,527 kinds of organisms= 223 traits/organismIn ZFIN38,189 trait descriptions for 4,727 genes forZebraFish1.9 million species on the planet= LOTS OF TRAITS
Why EOL + TraitBankEOLCrowdsHarvestThird party applications
Final wordsDream bigStart simpleFuel ongoing innovation
ThanksFunding & other contributionsSloan FoundationSmithsonian InstitutionDavid RubensteinMarine Biological LaboratoryHarvard UniversityOur content partnersThousands of individualcontributors, and hundreds ofvolunteer curatorsImagesJenny from TaipeiUniversity of BirminghamJumpStart YouthThe Field Book ProjectAnimal Diversity WebKevin RolleDominik HoferCynthia ParrChief Scientist @eol@cydparr firstname.lastname@example.org