Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
Data Integration for
Analyzing the Evolution
Amrapali Zaveri, Jens Lehmann, Katja Nowick
● Why study Evolution of Cognition?
● Research Questions
● Our Approach
○ Querying and Preliminary Results
● Conclusions & Future Work
Why study Evolution of
● Cognition refers to a group of mental processes
that includes memory, attention, language
(production and understanding), reasoning,
learning, problem solving as well as decision
● Some aspects of cognition are human specific,
and that it has been argued that human
specific evolutionary innovations have made
us on the one hand smarter but on the other
hand more vulnerable to cognitive disorders,
e.g. Autism, Alzheimers disease 3/13
Why study Evolution of
● Mental processes involved in cognition are not
controlled by a few individual genes but rather
by the function and interplay of several
hundreds, if not even thousands, of genes.
● Information available in disparate databases or
in separate tables of publications.
● Querying across these databases is
● time consuming – data in different formats
● highly inefficient - when any one of the datasets is
updated or changed.
• Which genes have been found to be positively
selected in humans, but also have been implicated
with cognitive diseases?
• Which genes have been associated with human
cognitive processes and evidence of evolutionary
signatures "changes" within primates?
• Which genes have been associated with cognitive
decline during ageing in humans? Do they show
differential expression patterns when compared with
other primates during development "ageing"?
• Do genes involved in cognition and behaviour show
high diversity within humans and higher divergence
between humans and chimpanzees?
Image source: http://www.scientificamerican.com/article.cfm?id=what-makes-us-human
• Use the Linking Open Data (LOD) principles
• Identify and acquire data from relevant
• Convert data to a single human and
machine-readable format – RDF (Resource
• Integrate and interlink datasets
• Query integrated datasets
● 11 datasets
● Genes –
● CSV, TSV
Transformed to RDF using:
● Each gene was given a unique identifier
based on the gene symbol to create a URI
(Uniform Resource Identifier)
● a single globally re-usable resource.
● Common element: Gene Symbol
• Integrated datasets available at
http://k41.bioinf.uni-leipzig.de:8890/sparql with the graph
• Research Question:
o “Which genes are involved in determining cognition and have
changed during primate evolution?”
§ Transcription Factors associated with Intelligence Disorder
o Human Positive Selection Candidates
§ dN/dS ratio: no. of mutations leading to an amino acid seq.
change vs. no. of mutations that do not lead to this change
§ The higher this ratio, the faster the protein is evolving.
§ dN/dS ratio > 1 – evolve under positive selection
Datasets Querying and
SELECT ?symbol1 ?dnbydns
?gene1 rdf:type cog:gene .
?gene1 go:symbol ?symbol1 .
?gene1 cog:dnDs ?dnbydns .
?gene2 rdf:type cog:gene .
?gene2 go:symbol ?symbol2 .
?gene2 cog:nsid ?ns .
FILTER (?symbol1 = ?symbol2) }
dN/dS = 1.33
• has changed significantly
more during primate
evolution & might be under
positive selection in humans
• Patients with mutations in
FMR2 have been reported
to be mentally retarded &
have autistic behavior*.
* M. Bensaid, M. Meiko, E.G. Bechara, L. Davidovic, A. Berretta, M.V. Catania, J.Gecz and B. Lalli, E.
Bardoni. FRAXE-associated mental retardation protein (FMR2) is an RNA binding protein with high affinity for
G-quartet RNA forming structure. Nucleic Acids Research, 2009. 11/13
Conclusions & Future Work
● Preliminary work and ideas to use Linked
Data publication to demonstrate its use in
analyzing the evolution of cognition.
● Perform complex queries
● Answer more research questions
● Add more datasets
● Interlink with external datasets
● Create user interface