3. Vision
Making it easy and fast
to find relevant knowledge
and discover new patterns
Automated. Because scientific language is constantly growing, evolving, and accelerating.
Omniscient. Because important findings may not be apparent. Even to the author.
Unbiased. Because existing solutions rank by popularity and cause filter bubbles.
5. A request to find content on "inflammation":
"pathogens" OR "damaged cells" OR "irritants" OR "necrotic cells" OR
"inflammatory" OR "inflammation" OR "hay fever" OR "periodontitis,"
OR "atherosclerosis" OR "rheumatoid arthritis" OR "gallbladder
carcinoma" OR "leukocytes" OR "granulocytes" OR "urethritis" OR "type
III hypersensitivity" OR "ischaemia" OR "parasitosis" OR "eosinophilia"
OR "Appendicitis" OR "Bursitis" OR "Colitis" OR "Cystitis" OR
"Dermatitis" OR "Phlebitis" OR "Rhinitis" OR "Tendonitis" OR "Tonsillitis"
OR "Vasculitis"-
Searching is messy!
6. Key Challenges
Our Knowledge does Not Compute
▪ The world moves too fast for data curators and ontology writers
▪ Most Scientific Disciplines have no ontologies (or even controlled vocabularies)
▪ Dictionaries and Reference Works are too small and often out-of-date
▪ New discoveries have no official names
People are too creative
▪ There is a lot of variation in language
▪ Researchers often add descriptive detail that obscure facts
▪ There is no “right way” to describe most things
Some things seem obvious …but mostly to the author
▪ The right Level-of-Detail depends both on the context and the reader
▪ The most obvious facts are often omitted because they are implicitly included
▪ Editors think in themes and topics, researchers in methods, properties, and facts
7. Core Technology
Finds key phrases in any text
and uses Machine Learning
to identify novel ideas
Open Languages, libraries, and frameworks
Apache UIMA, Apache Ruta, Stanford NLP tools, DKPRo, Hadoop, Spark, TensorFlow,
Mahout, Vowpal Wabbit, GenSim, LevelDb, Elasticsearch, Docker, Cloudsigma, AWS
8. Full Text Search
Pseudohyponatremia: Does It Matter in Current Clinical Practice?
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3894530/
doi: 10.5049/EBP.2006.4.2.77
Serum consists of water (93% of serum volume) and nonaqueous components, mainly lipids and proteins (7% of serum
volume). Sodium is restricted to serum water. In states of hyperproteinemia or hyperlipidemia, there is an increased
mass of the nonaqueous components of serum and a concomitant decrease in the proportion of serum composed of
water. Thus, pseudohyponatremia results because the flame photometry method measures sodium concentration in
whole plasma. A sodium-selective electrode gives the true, physiologically pertinent sodium concentration because it
measures sodium activity in serum water. Whereas the serum sample is diluted in indirect potentiometry, the sample is
not diluted in direct potentiometry. Because only direct reading gives an accurate concentration, we suspect that
indirect potentiometry which many hospital laboratories are now using may mislead us to confusion in interpreting the
serum sodium data. However, it seems that indirect potentiometry very rarely gives us discernibly low serum sodium
levels in cases with hyperproteinemia and hyperlipidemia. As long as small margins of errors are kept in mind of
clinicians when serum sodium is measured from the patients with hyperproteinemia or hyperlipidemia, the present
methods for measuring sodium concentration in serum by indirect sodium-selective electrode potentiometry could be
maintained in the clinical practice.
9. Using Dictionaries and Ontologies
Pseudohyponatremia: Does It Matter in Current Clinical Practice?
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3894530/
doi: 10.5049/EBP.2006.4.2.77
Key: Chemical Technique Anatomy Disease Species
Serum consists of water (93% of serum volume) and nonaqueous components, mainly lipids and proteins (7% of serum
volume). Sodium is restricted to serum water. In states of hyperproteinemia or hyperlipidemia, there is an increased
mass of the nonaqueous components of serum and a concomitant decrease in the proportion of serum composed of
water. Thus, pseudohyponatremia results because the flame photometry method measures sodium concentration in
whole plasma. A sodium-selective electrode gives the true, physiologically pertinent sodium concentration because it
measures sodium activity in serum water. Whereas the serum sample is diluted in indirect potentiometry, the sample is
not diluted in direct potentiometry. Because only direct reading gives an accurate concentration, we suspect that
indirect potentiometry which many hospital laboratories are now using may mislead us to confusion in interpreting the
serum sodium data. However, it seems that indirect potentiometry very rarely gives us discernibly low serum sodium
levels in cases with hyperproteinemia and hyperlipidemia. As long as small margins of errors are kept in mind of
clinicians when serum sodium is measured from the patients with hyperproteinemia or hyperlipidemia, the present
methods for measuring sodium concentration in serum by indirect sodium-selective electrode potentiometry could be
maintained in the clinical practice.
10. UNSILO Concept Extraction
Pseudohyponatremia: Does It Matter in Current Clinical Practice?
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3894530/
doi: 10.5049/EBP.2006.4.2.77
Key: Chemical Technique Anatomy Disease Species
Serum consists of water (93% of serum volume) and nonaqueous components, mainly lipids and proteins (7% of serum
volume). Sodium is restricted to serum water. In states of hyperproteinemia or hyperlipidemia, there is an increased
mass of the nonaqueous components of serum and a concomitant decrease in the proportion of serum composed of
water. Thus, pseudohyponatremia results because the flame photometry method measures sodium concentration in
whole plasma. A sodium-selective electrode gives the true, physiologically pertinent sodium concentration because it
measures sodium activity in serum water. Whereas the serum sample is diluted in indirect potentiometry, the sample is
not diluted in direct potentiometry. Because only direct reading gives an accurate concentration, we suspect that
indirect potentiometry which many hospital laboratories are now using may mislead us to confusion in interpreting the
serum sodium data. However, it seems that indirect potentiometry very rarely gives us discernibly low serum sodium
levels in cases with hyperproteinemia and hyperlipidemia. As long as small margins of errors are kept in mind of
clinicians when serum sodium is measured from the patients with hyperproteinemia or hyperlipidemia, the present
methods for measuring sodium concentration in serum by indirect sodium-selective electrode potentiometry could be
maintained in the clinical practice.
11. UNSILO Semantic Mapping
Pseudohyponatremia: Does It Matter in Current Clinical Practice?
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3894530/
doi: 10.5049/EBP.2006.4.2.77
Key: Action/Relation Chemical Technique Anatomy Disease Species
Serum consists of water (93% of serum volume) and nonaqueous components, mainly lipids and proteins (7% of serum
volume). Sodium is restricted to serum water. In states of hyperproteinemia or hyperlipidemia, there is an increased
mass of the nonaqueous components of serum and a concomitant decrease in the proportion of serum composed of
water. Thus, pseudohyponatremia results because the flame photometry method measures sodium concentration in
whole plasma. A sodium-selective electrode gives the true, physiologically pertinent sodium concentration because it
measures sodium activity in serum water. Whereas the serum sample is diluted in indirect potentiometry, the sample is
not diluted in direct potentiometry. Because only direct reading gives an accurate concentration, we suspect that
indirect potentiometry which many hospital laboratories are now using may mislead us to confusion in interpreting the
serum sodium data. However, it seems that indirect potentiometry very rarely gives us discernibly low serum sodium
levels in cases with hyperproteinemia and hyperlipidemia. As long as small margins of errors are kept in mind of
clinicians when serum sodium is measured from the patients with hyperproteinemia or hyperlipidemia, the present
methods for measuring sodium concentration in serum by indirect sodium-selective electrode potentiometry could be
maintained in the clinical practice.
12. ■ Natural Language Processing
Sentences are annotated with part-of-speech tags; noun, verb, adjective, and a dependency tree
methods for measuring sodium concentration in serum by indirect sodium-selective electrode potentiometry
[··thing··] [··action··] [···········thing··········] [·thing·] [····························· thing ······························]
■ Extract all “things”
Method
Sodium concentration
Serum
Indirect Sodium-Selective Electrode Potentiometry
Phrase Extraction
13. ■ Reduce Morphological and Syntactic variation (Grammar, form)
■ Normalize adjectival modifiers, compound paraphrases, and expand coordinations
Concentration of Sodium >> Sodium Concentration
The Electrode Potentiometry was indirect >> Indirect Electrode Potentiometry
Methodology >> Method
■ Reduce Lexical and Semantic variation (Synonyms, hypernyms, ontologies)
■ Normalize semantic Level-of-Detail using ontologies and vector models
Serum Sample >> Blood Sample
Sodium Concentration >> Natrium Concentration
Indirect Electrode Potentiometry >> Electroanalysis
■ Remove rare super-grams and hyponyms (C-level filtering, distribution metrics)
■ E.g. “Clinically validated indirect sodium-selective potentiometry”
■ Snap to common fragments and forms (actual usage and Ontologies)
■ Indirect Sodium Selective Potentiometry is-a-kind-of
Indirect Potentiometry is-a-kind-of
Electroanalysis
Boundary detection and Normalization
14. ● We build high-dimensional vector-space representations of all concepts from the textual context
Word Embeddings and Word2Vec
15. Vasodilatation (finding)
Peripheral vasodilation (finding)
Vasodilator (substance)
Poisoning by vasodilator (disorder)
Vasodilating agent (product)
Intra-cavernosal vasodilator (product)
Intra-arterial vasodilator (product)
Coronary vasodilator (product)
Alpha blocking vasodilator (product)
Nitrate-based vasodilating agent (product)
Human B-type natriuretic peptide (product)
Endothelin receptor antagonist (product)
Pentaerythritol tetranitrate (product)
Nitroglycerin (product)
Isosorbide mononitrate (product)
Isosorbide dinitrate (product)
Measurement of blood pressure (procedure)
Self-measurement devices (product)
Systolic arterial pressure (observable entity)
Non-invasive arterial pressure (observable entity)
Blood pressure finding (finding)
Blood pressure cuff, device (physical object)
Blood pressure cuff inflator (physical object)
Lying blood pressure (observable entity)
Abnormal blood pressure (finding)
Lower tourniquet cuff inflation (procedure)
Cuff inflated (attribute)
principle.n.01
generalization
basic truth
assumption
law
receptor.n01
Plasma membrane molecule
G protein-coupled receptor
ligand-gated ion channel
P2X receptor
P2Y receptor
● We build high-dimensional vector-space representations of all concepts from the textual context
● We apply ontologies and dictionaries to improve occurrence counts of on rare, complex, or novel concepts
● We use these normalized concepts to improve recall and precision for rare, complex, or novel concepts
● We use this high-dimensional vector model to build real-time semantic indexes with unprecedented precision
Ontology Augmented Vector-space
17. Human-readable Fingerprints
We have built a Corpus-based Recommender
that use our novel and flexible approach to
document fingerprinting and similarity
▪ Traditional Document Similarity
▪ Document vectors based on TF-IDF and Naïve BOW
▪ Slow moving ontologies (snomed, doid, dron)
▪ Simple concepts (“insulin” and “obesity”)
▪ Limited recognition (only lemmatization/stemming)
▪ UNSILO
▪ Dynamic corpus-driven concept similarity
▪ Captures novel significant phrases (“insulin insensitivity”)
▪ Links concepts across terminology variations (“reduced hormone response”)
19. Springer.com
“Using UNSILO’s fully automated content
enrichment technology, we can identify the
most descriptive concepts and phrases
within any document in our content
portfolio, and provide more valuable reading
suggestions, even across domains with a
highly variable terminology.”
Jan-Erik de Boer
Chief Information Officer
Springer Nature
“Our goal with this new feature is to make it
easy for our users to drill down on what they
find important in an article, and use that
insight as a departure point for their
discovery process.”
Stephen Cornelius
Product Owner
IT Platform Development
Springer Nature
UNSILO technology vendor for Springer Nature
9M scientific articles and book chapters
22M monthly users
Significant increase in traffic and user engagement
Displaced leading competitor
20.
21.
22.
23.
24. ■ Normalize Actions and Relationships
Sample linguistic variations of common relationships from re-statements of known facts,
Then apply what we learn to less well understood domains:
■ Serum consists of water
■ Serum amounts to 93% water
■ Serum contains water
■ Serum is composed of water
■ Serum is mostly water
■ Providing hooks into Unstructured Text
Improve training and prediction capabilities of larger AI initiatives by improving access to
consumer feedback, corporate data lakes, or conversations within large communities of practice.
■ Reasoning at Scale
Question answering, uncover hidden causal chains, invalidate futile research projects
■ Augment Researcher’s cognitive abilities
■ Accelerate the pace of Research
■ Improve the return on R&D investments
■ Helping 10M Researchers across the globe “Finding the Cure for Cancer”
Future Directions
■ Thin film Coated Gold Nano Particles
■ Coating of Iron nano-particles with thin Gold film
■ Fe Nanoparticles thin-film Gold coat
■ Evaporation-coating of nanoparticles with gold
■ Gold-coated magnetic nanoparticles