HFE & BCR-ABL
In Search of Links
© 2014, TopicQuests Foundation
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Jack Park
BigData Science Meetup
Freemont, CA: 17 May, 2014
Shyam Sarkar, Organizer
Target Benefits
• SolrSherlock will support:
– Hypothesis formation
– Research/Experiment planning
– Deep Question Answering
• Personal medical issues
• …
“Therefore psychologically we must keep all the theories in our heads, and
every theoretical physicist who is any good knows six or seven different
theoretical representations for exactly the same physics.”
―Richard Feynman
“Why, sometimes I've believed as many as six impossible things before
breakfast.”
―The Queen: Through The Looking Glass
What We Have Read: HFE
• Human hemochromatosis protein also known
as the HFE protein is a protein which in
humans is encoded by the HFE gene. The HFE
gene is located on short arm of chromosome 6
at location 6p22.2*
– Some mutations which are associated with
Hereditary Hemochromatosis (a genetic
disease)**:
• C282Y
• H63D
*http://en.wikipedia.org/wiki/HFE_%28gene%29
**http://www.genome.gov/10001214
What We Have Read: BCR-ABL aka:
Philadelphia Chromosome
• Philadelphia chromosome or Philadelphia
translocation is a specific chromosomal
abnormality that is associated with chronic
myelogenous leukemia (CML). It is the result
of a reciprocal translocation between
chromosome 9 and 22, and is specifically
designated t(9;22)(q34;q11)*
*http://en.wikipedia.org/wiki/Philadelphia_chromosome
Are HFE and BCR-ABL Linked?
• One document instance which suggests they
are linked:
– “We found that HFE C282Y might be associated
with a protective role against CMPD. Because
chronic iron deficiency or latent anemia may
trigger disease susceptibility for CMPD, HFE C282Y
positivity may be a genetic factor influencing this
effect.”*
• Note: this response is simply evidence of a link, a
signal; it leaves open many questions
CMPD: Chronic Myeloproliferative Disease
* http://www.ncbi.nlm.nih.gov/pubmed/19258483
Where do we go from here?
• We have read about some actors
• We seek evidence for relationships between
those actors
• We have one small piece of evidence
• We turn to Literature-based Discovery (LBD)
– Read and process many papers
– Assemble an evidence field
– Determine answers and confidence levels
Sensemaking In Biological Research
http://www.biomedcentral.com/content/pdf/1471-2105-15-117.pdf Figure 1
© 2014 Mirel and Görg; licensee BioMed Central Ltd (cc by)
Literature-based Discovery
• Swanson’s ABC Model
• Two Varieties of LBD
– Closed Discovery
– Open Discovery
SolrSherlock Block Level
• Models
– Process Models
– Conceptual Graphs
– OpenBEL
• Identity
– Topic Map
• Topics
• Relations
• Associations
– Bayes
– DeepLearning
– HyperMembrane
• Interface
Interface
Associations
Identity
Models
Data
SolrSherlock’s HyperMembrane
• SolrSherlock Big Picture
– Documents to harvest
– Sentences to parse
• WordGrams from the sentences
– Lenses to interpret the sentences
» NTuples from the WordGrams
– Lenses to interpret whole documents
• HyperMembrane as a fabric woven from the
Ntuples
– Organizes statements read from literature into a kind
of associative fabric, linked into a topic map
HyperMembrane Inspiration
http://xanadu.com/zigzag/ZZdnld/zzRefDef/
https://www.flickr.com/photos/portier/2927798222/sizes/s/
HyperMembrane Internal Structure
Graph
Agent
Structure
Agent
Sentence
Agent
Document
Agent
Query
AgentInformation
Fabric
Sentence Parse
• Salient WordGrams in that sentence:
– C282Y
– might be associated with a
– protective role against
• Transforms to: protect against
– CMPD
We found that HFE C282Y might be associated with a protective role against CMPD
+-----------------MVp-----------------------------------+
| +---------Js------------+ |
+---Cet------+ | | +-------Ds---------+ |
+-Sp-+--TH--+ +--G-+--Ss--+--Ix---+---Pv-----+---MVp--+ | +----A---+ +--Js--+
| | | | | | | | | | | | | |
we found.p that.c HFE C282Y might.v be.v associated.v with a protective.a role.n against CMPD
Parse produced by a Java
implementation of Link
Grammar Parser
WordGram instances
created while processing
the sentence
WordGram Example
• Sentence:
– CO2 causes climate change
• WordGrams
– Terminals
• CO2
• causes
• climate
• Change
– Pairs
• CO2 causes
• causes climate
• climate change
– Triples
• CO2 causes climate
• causes climate change
– Quads
• CO2 causes climate change
• Parsed Result—representation of the sentence:
– CO2 (terminal, noun)
– cause (terminal, verb, transformed causescause)
– climate change (pair, noun phrase)
• Resulting NTuple
– {CO2, cause, climate change}
• Where the names are replaced with topic locators from the topic map
These WordGram
instances represent the
sentence; they are wired
into the fabric.
This Ntuple participates
in high-level structure
formation and in
question answering
WordGram instances
created while processing
the sentence
WordGram instances
created while processing
the sentence
WordGram instances
created while processing
the sentence
Lenses
• Simple Interpreters
– Based on Canonical Predicates
– Build structures from parsed sentences and
WordGrams
– Examples from biology
• Cause
• Bind
• Augment
• Prevent
• Increase
• Decrease
• Believe
Multiple Lenses
• Consider this sentence:
– We believe that A causes B
– Two Lenses in play
• Believe
• Cause
– Result is a nested NTuple
• {We, believe, {A, cause, B}}
Canonical Predicate
• Results from transformations on predicates
– E.g.
• A causes B, A can cause B, A will cause B  A cause B
• A is caused by B  B cause A
Actors: Named Entities
• For any given named entity, there will be one and
only one WordGram
– Issue of Ambiguity
• Same name string can serve different topics in the topic map
– Topic map maintains identity for disambiguation
• Thus, a single WordGram might be associated with more
than one individual actor
• This means:
– Fibers (threads) flowing through the fabric must be
maintained in bundles according to their context
(topic)
Lens Selection and Action
• The Lens:
– ProtectAgainst
• Selected by the WordGram for “protect against”
– Is a transformation of the WordGram for “protective role
against”
• Lens Action:
– Create an NTuple
• {C282Y, protect against, CMPD}
• We will call that NTuple an Assertion
We found that HFE C282Y might be associated with a protective role against CMPD
Weaving an Information Fabric
• Background:
– One and only one
WordGram for each
Actor (named entity)
– One and only one
WordGram for each
canonical Predicate
– One and only one
NTuple for each
Assertion
• WordGrams which form
an NTuple are strung
together as beads on a
string in the fabric.
– Thus, it is the detection
of NTuple structures
(Assertions) which form
the HyperMembrane’s
fabric.
Note: it is next to impossible to diagram the fabric, but it
will likely look like a very tangled knotted structure. https://www.flickr.com/photos/fermicat/27
3539481/in/set-72157601620157588/
Fabric Example
• Two NTuples
– {Jack Park, AuthoredBook, The Wind Power Book}
– {Jack Park, AuthoredBook, Ohio State University
Football Vault}
JP101 JP102
Book101
AuthoredBook
Wind Power Book
OSU Football…
Book102
Jack Park
Topic Map organizes fiber bundles
Looking Forward
• Lenses, today, are hardwired
– Opportunity for adaptive learning of new lenses
• Fabric, today, is simple
– Opportunity to use cardinalities, frequency counts
in the fabric for:
• Probabilistic modeling
• Topological studies
• Opportunity for a Domain-Specific Language
(DSL) to emerge
Completed Representation
antioxidants
kill
free radicals
Contraindicates
macrophages use
free radicals to
kill bacteria
Bacterial Infection Antioxidants
Because
Appropriate For
Compromised Host
Let us co-create Cognitive Agents for Discovery
jackpark@topicquests.org
Thanks to Mei Lin Fung, David Alexander Price, and Patrick Durusau for
valuable comments
SolrSherlock at:
http://debategraph.org/SolrSherlock and https://github.com/SolrSherlock

SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

  • 1.
    HFE & BCR-ABL InSearch of Links © 2014, TopicQuests Foundation Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Jack Park BigData Science Meetup Freemont, CA: 17 May, 2014 Shyam Sarkar, Organizer
  • 2.
    Target Benefits • SolrSherlockwill support: – Hypothesis formation – Research/Experiment planning – Deep Question Answering • Personal medical issues • … “Therefore psychologically we must keep all the theories in our heads, and every theoretical physicist who is any good knows six or seven different theoretical representations for exactly the same physics.” ―Richard Feynman “Why, sometimes I've believed as many as six impossible things before breakfast.” ―The Queen: Through The Looking Glass
  • 3.
    What We HaveRead: HFE • Human hemochromatosis protein also known as the HFE protein is a protein which in humans is encoded by the HFE gene. The HFE gene is located on short arm of chromosome 6 at location 6p22.2* – Some mutations which are associated with Hereditary Hemochromatosis (a genetic disease)**: • C282Y • H63D *http://en.wikipedia.org/wiki/HFE_%28gene%29 **http://www.genome.gov/10001214
  • 4.
    What We HaveRead: BCR-ABL aka: Philadelphia Chromosome • Philadelphia chromosome or Philadelphia translocation is a specific chromosomal abnormality that is associated with chronic myelogenous leukemia (CML). It is the result of a reciprocal translocation between chromosome 9 and 22, and is specifically designated t(9;22)(q34;q11)* *http://en.wikipedia.org/wiki/Philadelphia_chromosome
  • 5.
    Are HFE andBCR-ABL Linked? • One document instance which suggests they are linked: – “We found that HFE C282Y might be associated with a protective role against CMPD. Because chronic iron deficiency or latent anemia may trigger disease susceptibility for CMPD, HFE C282Y positivity may be a genetic factor influencing this effect.”* • Note: this response is simply evidence of a link, a signal; it leaves open many questions CMPD: Chronic Myeloproliferative Disease * http://www.ncbi.nlm.nih.gov/pubmed/19258483
  • 6.
    Where do wego from here? • We have read about some actors • We seek evidence for relationships between those actors • We have one small piece of evidence • We turn to Literature-based Discovery (LBD) – Read and process many papers – Assemble an evidence field – Determine answers and confidence levels
  • 7.
    Sensemaking In BiologicalResearch http://www.biomedcentral.com/content/pdf/1471-2105-15-117.pdf Figure 1 © 2014 Mirel and Görg; licensee BioMed Central Ltd (cc by)
  • 8.
    Literature-based Discovery • Swanson’sABC Model • Two Varieties of LBD – Closed Discovery – Open Discovery
  • 9.
    SolrSherlock Block Level •Models – Process Models – Conceptual Graphs – OpenBEL • Identity – Topic Map • Topics • Relations • Associations – Bayes – DeepLearning – HyperMembrane • Interface Interface Associations Identity Models Data
  • 10.
    SolrSherlock’s HyperMembrane • SolrSherlockBig Picture – Documents to harvest – Sentences to parse • WordGrams from the sentences – Lenses to interpret the sentences » NTuples from the WordGrams – Lenses to interpret whole documents • HyperMembrane as a fabric woven from the Ntuples – Organizes statements read from literature into a kind of associative fabric, linked into a topic map
  • 11.
  • 12.
  • 13.
    Sentence Parse • SalientWordGrams in that sentence: – C282Y – might be associated with a – protective role against • Transforms to: protect against – CMPD We found that HFE C282Y might be associated with a protective role against CMPD +-----------------MVp-----------------------------------+ | +---------Js------------+ | +---Cet------+ | | +-------Ds---------+ | +-Sp-+--TH--+ +--G-+--Ss--+--Ix---+---Pv-----+---MVp--+ | +----A---+ +--Js--+ | | | | | | | | | | | | | | we found.p that.c HFE C282Y might.v be.v associated.v with a protective.a role.n against CMPD Parse produced by a Java implementation of Link Grammar Parser
  • 14.
    WordGram instances created whileprocessing the sentence WordGram Example • Sentence: – CO2 causes climate change • WordGrams – Terminals • CO2 • causes • climate • Change – Pairs • CO2 causes • causes climate • climate change – Triples • CO2 causes climate • causes climate change – Quads • CO2 causes climate change • Parsed Result—representation of the sentence: – CO2 (terminal, noun) – cause (terminal, verb, transformed causescause) – climate change (pair, noun phrase) • Resulting NTuple – {CO2, cause, climate change} • Where the names are replaced with topic locators from the topic map These WordGram instances represent the sentence; they are wired into the fabric. This Ntuple participates in high-level structure formation and in question answering WordGram instances created while processing the sentence WordGram instances created while processing the sentence WordGram instances created while processing the sentence
  • 15.
    Lenses • Simple Interpreters –Based on Canonical Predicates – Build structures from parsed sentences and WordGrams – Examples from biology • Cause • Bind • Augment • Prevent • Increase • Decrease • Believe
  • 16.
    Multiple Lenses • Considerthis sentence: – We believe that A causes B – Two Lenses in play • Believe • Cause – Result is a nested NTuple • {We, believe, {A, cause, B}}
  • 17.
    Canonical Predicate • Resultsfrom transformations on predicates – E.g. • A causes B, A can cause B, A will cause B  A cause B • A is caused by B  B cause A
  • 18.
    Actors: Named Entities •For any given named entity, there will be one and only one WordGram – Issue of Ambiguity • Same name string can serve different topics in the topic map – Topic map maintains identity for disambiguation • Thus, a single WordGram might be associated with more than one individual actor • This means: – Fibers (threads) flowing through the fabric must be maintained in bundles according to their context (topic)
  • 19.
    Lens Selection andAction • The Lens: – ProtectAgainst • Selected by the WordGram for “protect against” – Is a transformation of the WordGram for “protective role against” • Lens Action: – Create an NTuple • {C282Y, protect against, CMPD} • We will call that NTuple an Assertion We found that HFE C282Y might be associated with a protective role against CMPD
  • 20.
    Weaving an InformationFabric • Background: – One and only one WordGram for each Actor (named entity) – One and only one WordGram for each canonical Predicate – One and only one NTuple for each Assertion • WordGrams which form an NTuple are strung together as beads on a string in the fabric. – Thus, it is the detection of NTuple structures (Assertions) which form the HyperMembrane’s fabric. Note: it is next to impossible to diagram the fabric, but it will likely look like a very tangled knotted structure. https://www.flickr.com/photos/fermicat/27 3539481/in/set-72157601620157588/
  • 21.
    Fabric Example • TwoNTuples – {Jack Park, AuthoredBook, The Wind Power Book} – {Jack Park, AuthoredBook, Ohio State University Football Vault} JP101 JP102 Book101 AuthoredBook Wind Power Book OSU Football… Book102 Jack Park Topic Map organizes fiber bundles
  • 22.
    Looking Forward • Lenses,today, are hardwired – Opportunity for adaptive learning of new lenses • Fabric, today, is simple – Opportunity to use cardinalities, frequency counts in the fabric for: • Probabilistic modeling • Topological studies • Opportunity for a Domain-Specific Language (DSL) to emerge
  • 23.
    Completed Representation antioxidants kill free radicals Contraindicates macrophagesuse free radicals to kill bacteria Bacterial Infection Antioxidants Because Appropriate For Compromised Host Let us co-create Cognitive Agents for Discovery jackpark@topicquests.org Thanks to Mei Lin Fung, David Alexander Price, and Patrick Durusau for valuable comments SolrSherlock at: http://debategraph.org/SolrSherlock and https://github.com/SolrSherlock