3. An extraordinary company that captures, organizes
and adds value to the rich information available in
agricultural and biodiversity sciences, in order to
make it universally accessible, useful and meaningful.
http://www.agroknow.gr
4. Our way of doing things
We put our people at our
focus
We have a culture of
shared, co-defined values
We are based on trust
and transparency
We see beyond profit by
serving our users and
customers so that they
create societal impact
5. We develop and put in real
practice solutions that transform
data into meaningful knowledge
and services
We help people
solve problems
informed by data
6. Unorganized Content in
local and remote sites
Widgets
Authoring services
Data Discovery Services
Analytics services
Data Platform
Ingestion Translation Publication
Harvesting BlossomCultivation
Organized and structured
Content in local and remote
DBs
Educational
Bibliographic
Other
Enrichment
Aggregate
data from
diverse
sources
Works with
different type
of data
Prepare data
for
meaningful
services
Educational
Bibliographic
data aggregation & sharing solutions
7. working with high profile partners & clients
• Food and Agriculture Organization (FAO) of the
United Nations
• World Bank Group
• UK’s Dept for International Development (DFID)
• Michigan State University (MSU)
• Wageningen University & Research (WUR)
• French Institute of Agricultural Research (INRA)
• Creative Commons
8. large scale data-related projects
• agINFRA: a data infrastructure to support agricultural scientific
communities (2011 -now)
– EU, $5.2M, 12 partners (incl. FAO); tech coordinator, evaluation, sustainability
– in G8 Open Data in Agriculture Action Plan for Europe
• SemaGrow: Data intensive techniques to boost the real-time
performance of global agricultural data infrastructures (2012 - now)
– EU, $3.1M, 8 partners (incl. FAO, WUR); tech coordinator, evaluation,
sustainability
– in G8 Open Data in Agriculture Action Plan for Europe
• Organic.Lingua: Demonstrating the potential of multilingual Web
Portal for Sustainable Agricultural & Environmental Education (2011-
2014)
– EU, $2.4M, 11 partners (incl. INRA); tech+data coordinator, evaluation
9. data interoperability work
• Agricultural Interoperability Interest Group
(IG) at Research Data Alliance (RDA)
• Database Subgroup, Knowledge & Learning
Systems Group, Global Food Safety
Partnership (GFSP)
11. “Knowledge is the
engine of our economy.
And data is its fuel”
Neelie Kroes, Vice President of
the European Commission
http://ec.europa.eu/digital-
agenda/en/news/economic-and-
social-benefits-big-data
12. “By improving our ability to
extract knowledge and insights
from large and complex
collections of digital data, the
initiative promises to help solve
some the Nation’s most
pressing challenges.”
Big Data Research & Development
Initiative
http://www.whitehouse.gov/sites/default/files/mi
crosites/ostp/big_data_press_release_final_2.pdf
13. policy
• USA’s National Research Council on Ensuring
the Integrity, Accessibility, and Stewardship
of Research Data in the Digital Age
–“researchers to make all research data,
methods, and other information underlying
results publicly accessible in a timely manner
–“the stewardship of research data is a critical
long-term task for the research enterprise and
its stakeholders”
http://www.nap.edu/catalog.php?record_id=12615
14. internationally
• joint USA, EU, Australia, Research Data
Alliance (RDA) vision
–“researchers and innovators openly sharing
data across technologies, disciplines, and
countries to address the grand challenges of
society”
https://rd-alliance.org/about.html
15. CIARD’s manifesto
• “towards a Knowledge Commons on
Agricultural Research for Development”
• “agricultural knowledge is freely accessible
and contributes to reducing hunger and
poverty”
• “open knowledge makes it easier to provide
better solutions”
http://www.ciard.net/about/manifesto
16. GODAN’s statement of purpose
• “support global efforts to make agricultural and
nutritionally relevant data available, accessible, and
usable for unrestricted use worldwide”
• “advocate for the release and re-usability of data in
support of Innovation and Economic Growth,
Improved Service Delivery and Effective Governance,
and Improved Environmental and Social Outcomes”
http://godan.info/statement.html
17. IFPRI & open access
• “…research is an international public good, that
should be freely disseminated to the extent
possible…”
• “IFPRI is committed to the principle of free
access to the knowledge it generates”
18. CGIAR & open access
• “CGIAR regards the results of its research and
development activities as international public
goods and is committed to their widespread
dissemination and use to achieve the maximum
impact to advantage the poor…”
20. agricultural bibliography
• bibliography on agricultural sciences
• several efforts in putting together
(aggregating/indexing) metadata records on
agricultural publications & grey literature
• FAO’s AGRIS service: a prominent example
– quite advanced data ingestion workflow &
infrastructure
– semantic backbone with AGROVOC as LOD & triple
store with all aggregated records
– more than 7.5 million publications indexed & made
discoverable
21. elaborated, automated workflow
Metadata
harvester
Filtering
component
Stores
File system
(DC, IEEE
LOM, MODS
XML)
File system
(DC, IEEE
LOM, MODS
XML)
Stores
Identification and
de-duplication
component
MySQL
Dupli
cates
Stores
Transformation
component
( to AKIF)
Store
metadata in
JSON (Internal
Format)
Link checking
component
PostProcessing/
Enrichment
component
File
system
(XMLs)
Get unique ID
Records
with
Broken
Links
Indexing mechanism
API
24. similar/relevant efforts
• PubAg: forthcoming service by National
Agricultural Library (NAL) for discovering USDA
publications – and beyond
• LGU community of ag knowledge: forthcoming
service federating institutional repositories of
Land Grant Universities
• CGIAR open: (to be) federating & providing access
to all CG center repositories
• …and more to come
25. but we are not there yet
a) each initiative replicating technical & data
processing effort (harvesting, transforming,
indexing…)
b) coverage is not complete – transferring the
discovery problem to the level of aggregators
c) still not focusing on the needs of each specific
subject, group, region, project, …
d) agriculture is multi-disciplinary: relevant
publications may be found in other domains
(health, economics, environment, … )
27. CSPI
• the organized voice of the American public on
nutrition, food safety, health and other issues
– “improve food safety laws and reduce the incidence of
foodborne illness”
• has tracked foodborne illness outbreaks since 1997
– events where two or more people become ill from
eating the same food
– outbreaks where both the food and pathogen can be
identified
28. US Outbreak Alert Database (until 2011)
http://cspinet.org/foodsafety/outbreak/pathogen.php
29. US Outbreak Report (after 2011)
http://cspinet.org/foodsafety/outbreak_report.html
31. data sources of interest
• CDC - Foodborne Outbreak Online Database (FOOD)
– http://wwwn.cdc.gov/foodborneoutbreaks/
• ProMED mail
– http://www.promedmail.org
• Kansas FS-net
– blogging at http://barfblog.com
– posting news at http://bites.ksu.edu
– archive at http://www.safefoodhandler.com/fsnet.htm
• Project TYCHO
– https://www.tycho.pitt.edu
32. some of the challenges
a) time-consuming & laborious primary data
identification and documentation (by hand)
b) not complete coverage: incomplete &
problematic data collection and sharing
c) multiple & outdated databases for
secondary/processed data storage and
curation
d) time-consuming & expensive processed
data visualization & publication
33. improving curation of data
• focus on making data documentation,
storage, management easier
a) migrate existing multiple databases in single
data repository
b) improve data organization & classification
schemes (e.g. by pathogen, food, geographical
location, time reported, …etc)
c) improve data curation & filtering workflows
(document & store data once, feed multiple
sites/access points; US vs. international sites)
37. improving discovery & processing
• focus on foodborne illness outbreak reports &
product recalls
a) automate as much as possible workflow of
reports’ processing (feeding directly into CSPI
data repository)
b) extend coverage of data types (include food
product recalls)
c) extend coverage of data sources (include more
sites with outbreak reports & product recalls)
42. improving visualization & publication
• focus on making processed & validated data
accessible immediately online
a) automate as much as possible workflows for
generating filtered reports (feed diagrams & tables for
CSPI publications, present directly online through CSPI
& SFI web sites)
b) offer opportunities for public to interact with data
online (play with parameters and generate new data
reports & visualizations)
c) share data openly for research, education and
awareness through CSPI & SFI web sites)
48. let’s imagine that
• we have an very big, open, scalable platform
that…
– …will catalog all relevant information entities
– …will make all information machine readable and discoverable
– …will allow information providers express how, with whom, under
which license and for which purposes they share this info
– …will help people utilize the collective power of information to
solve more societal challenges, better
– …will make funding & resource use transparent for donors and
the public
– …will coordinate, consolidate and harmonize data & technology
sharing among agri-food sectors and user communities
57. scale up, per federated info type
Meta-registry platform federating all existing registries & making information
discoverable
Registries of data
sources
Federated data registry
Federated information providers
Registries of
organisations’ catalogs
Federated org registry
Registries of software
apps/components
Federated solution registry
…etc
58. evolving technology further
HARVESTER
OAI-PMH Service
Provider #1
Schema #1
OAI-PMH Service
Provider #n
Schema #n
INDEXER
Aggregated
XML Repository
Web Portals
Open AGRIS (FAO)
AgLR/GLN (ARIADNE)
Organic.Edunet (UAH)
VOA3R (UAH)
...
AGRIS AP Schema
IEEE LOM Schema
DC Schema
...
RDF Triple Store
Common Schema
SPARQL endpoint
(Data Source #1)
SPARQL endpoint
(Data Source #n)
INDEXER
Web Portals
SPARQL endpoint
NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES
How
Many?
Big Data
Problem!
Is it
feasible?
http://semagrow.eu