SlideShare a Scribd company logo
GraphAware®
SIGNALS FROM OUTER
SPACE
Vlasta Kůs, Data Scientist @ GraphAware
graphaware.com
@graph_aware, @VlastaKus
How NASA Benefits from Graph-Powered NLP
‣ Database of learned knowledge across NASA’s programs & projects
‣ Unstructured text with basic metadata
‣ Collected since late 1950s (100s of millions of documents)
‣ Public dataset: ~1600 documents
NASA’s Lessons Learned
GraphAware®
"1406",420,"Roberts, J “,
"VO'75 Pressure Regulator Leakage and Work-Around
Procedures (~1976)”,
"The pressure regulator in the Viking Orbiter Propulsion
Subsystem started leaking following a pyro firing that
occurred prior to the near-Mars TCM. Likely causes were
corrosion or residue from propellant migration or pyro
valve blowby, or particulate contamination. Recommendations
included using separate regulators for the fuel and
oxidizer sides, incorporating a bellows in the pyro valve
to eliminate blowby, and adding a isolation valve between
the regulator and propellant tank.“,
" The micro-scale effects of long-term propellant exposure
should be investigated in order to better critique
regulator design. “,
"JPL",1996-07-08,"",TRUE,"",1460,7,NA,"https://
nen.nasa.gov/web/11/viewall/-/viewall/420"
NASA’s Lessons Learned Database
GraphAware®
“673",1326,"Relvini, Kristine “,
"Lessons Learned Not Being Inputted Into Lessons
Learned Information System (LLIS) Database”,
“",
"If you don't document the lessons learned, you loose
knowledgeable, shared information and tracking capacity
across programs.“,
"KSC",2002-10-11,"Aeronautics Research, Science,
Exploration Systems, Space Operations, ",FALSE,"",
702,6,NA,"https://nen.nasa.gov/web/11/viewall/-/
viewall/1326"
NASA’s Lessons Learned
GraphAware®
Graph database = isolated data silos -> connected knowledge
‣ Efficient search
‣ Relationships among various areas
Apollo, Space Shuttle, Orion, …
‣ Pattern recognition (clusters, communities, correlations, …)
Example: correlation between corrosion of valves & topics involving batteries
‣ Useful for planning future projects and preventing/solving issues
NASA’s Lessons Learned
GraphAware®
What is a Graph?
GraphAware®
G = (V, E)
WHY NEO4J?
GraphAware®
It is a proper graph database
It is a proper database
Graph-Based Architecture: Knowledge Graph
GraphAware®
EXAMPLE
GraphAware®
‣ NLP = machine learning tools allowing computers to process - and
perhaps understand - human languages
‣ Basic steps
Sentence segmentation
Tokenisation
Lemmatisation
Part of Speech (POS) tagging
Parsing
Named Entities Recognition (NER)
Sentiment analysis
…
Natural Language Processing
GraphAware®
Currently supported toolkits for human language processing
‣ Stanford CoreNLP
‣ developed at Stanford University
‣ fast, robust, production ready
‣ many pre-built models
‣ license: GPL v3+
‣ Apache OpenNLP
‣ developed by volunteers
‣ many pre-built models
‣ license: Apache License v2.0
NLP: Text Processors
GraphAware®
‣ Named Entity Recognition (NER) = classification of words into predefined
classes
‣ Examples: Dr. Who -> Person, May 2018 -> Date, EU -> Country …
‣ Stanford NLP default entities: Person, Location, Date, Organisation,
Number, Money, Percentage
‣ Custom NE classes -> training on large tokenised & labeled corpus
‣ Wikipedia, Wikidata - rich sources of multilingual training data that can
be extracted automatically
Named Entity Recognition
GraphAware®
Custom Named Entities based on Wikipedia
GraphAware®
NASA use case: identify names of space missions
Training - crawling Wikipedia & identifying relevant information
EXAMPLE
GraphAware®
Universal Dependencies: cross-linguistically consistent grammatical relations
among words in a sentence
Examples:
‣ amod (adjectival modifier)
Matt likes red wine.
‣ appos (appositional modifier)
Mars Global Surveyor (MGS) was an American robotic spacecraft …
‣ conj (conjunct)
It failed to respond to messages and commands.
‣ …
Universal Dependencies
GraphAware®
‣ Stanford CoreNLP: Dependency & Part of Speech analysis of a single sentence
Source: http://nlp.stanford.edu:8080/corenlp/process
Either find an efficient representation in some traditional database, or …
Graph-Powered NLP
GraphAware®
Graph-Powered NLP
GraphAware®
NLP and property graphs: natural fit
… use a property graph!
EXAMPLE
GraphAware®
Unsupervised techniques tend to be underestimated, while …
‣ No need for time & money to get massive labeled training datasets
‣ Often faster to train & faster to predict
‣ Unsupervised deep learning
Unsupervised ML Algorithms
GraphAware®
PageRank
GraphAware®
PageRank = a measure of importance of a web page based on the quality
of links from other pages
The formula reflects a model of a random surfer.
Source: https://en.wikipedia.org/wiki/PageRank
Keyword Extraction: TextRank
GraphAware®
Keywords = words/phrases that capture the semantic essence of a text
Graph-Based Unsupervised Algorithm:
‣ Construct a graph of word co-occurrences
‣ Asses the importance of words by PageRank algorithm
‣ Use top 1/3 of words as keyword candidates
‣ Use universal dependencies to construct key phrases
GraphAware®
Rada Mihalcea, Paul Tarau. TextRank: Bringing Order into Texts. Proceedings of EMNLP 2004, pages 404–411, Barcelona,
Spain. Association for Computational Linguistics. http://www.aclweb.org/anthology/W04-3252.
Keyword Extraction: TextRank
Despite its simplicity, TextRank provides
state of the art results on wide range of
unstructured texts.
Leveraging universal dependencies allowed
us to surpass precision & recall of the
original TextRank paper.
NASA examples: “space shuttle”, “flight hardware”, “launch vehicle”, …
Automatic text summarisation
‣ Abstractive
‣ Extractive
TextRank can be adapted for efficient
sentence ranking for extractive summarisation.
Summarisation: TextRank
GraphAware®
EXAMPLE
GraphAware®
ConceptNet 5 = semantic network for understanding the meaning of words
‣ Relational knowledge from MIT’s Open Mind Common Sense project
‣ DBPedia (information from Wikipedia info-boxes)
‣ Wiktionary (free multilingual dictionary)
‣ …
Knowledge Enrichment: ConceptNet 5
GraphAware®
Microsoft Concept Graph = semantic network introducing knowledge
about concepts
‣ harnessed from billions of web pages and years’ worth of search logs
Expand the knowledge from external or other internal sources.
Knowlege Enrichment
GraphAware®
‣ Latent Dirichlet Allocation (LDA) - generative statistical model that
describes documents as a probabilistic mixture of a small number of topics
‣ Each topic described by a list of most relevant words
‣ Sample of topics from the NASA dataset
[“design”, "failure", "test", "result", "flight", "hardware", "mission", “testing”, “system”,
“due”]
[“pressure", "system", "cause", "valve", "propellant", "leak", "operation", “shuttle”,
“space”, “gas”]
[“space”, "shuttle", "NASA", "operation", "safety", "iss", "crew", "ISS", "astronaut", "progr
am"]
Topic Extraction: Latent Dirichlet Allocation
GraphAware®
EXAMPLE
GraphAware®
‣ Word embeddings = representation of words as multi-dimensional
semantic vectors which encode linguistic patterns
‣ Use cases: word sense disambiguation, new distance functions between
documents, starting point for further ML (e.g. NN classification)
‣ Word2vec = shallow two-layer neural network model for producing word
embeddings
‣ ConceptNet Numberbatch - consists of state-of-the-art word embeddings
Word Embeddings
GraphAware®
Word Embeddings: word2vec
GraphAware®
Tomas Mikolov et al.: https://arxiv.org/abs/1301.3781
Word Embeddings: word2vec
GraphAware®
Kusner et al.: http://mkusner.github.io/publications/WMD.pdf
Document distance: min. cumulative distance that all words need to travel
Semantic patterns representable as linear translations:
distance(Oslo -> Norway) similar to distance(Berlin -> Germany)
vec(Germany) - vec(Berlin) + vec(Oslo) = vec(Norway)
Document Embeddings
GraphAware®
Q. Le, T. Mikolov: Distributed representations of sentences and documents, arXiv:1405.4053v2
Paragraph Vector (doc2vec): extension of word2vec
The additional paragraph node represents context (topic) of the current document.
Paragraph vectors have the same behaviour towards linear vector translations as
word vectors.
Document Embeddings
GraphAware®
doc2vec vectors of dimension 300, NASA sentences -> dimensionality reduction (PCA + t-SNE)
Document Embeddings
GraphAware®
doc2vec vectors of dimension 2000, 30k Wikipedia pages -> dimensionality reduction (PCA + t-SNE)
Some of the neural networks applicable to text processing
‣ Shallow networks (word & document embeddings)
‣ Deep Auto-Encoders
‣ Convolutional Neural Networks
‣ Recurrent Neural Networks (LSTMs)
Deep Learning for Text Processing
GraphAware®
Self-supervised Auto-Encoders: useful for vector embeddings (images, texts)
DeepLearning4J - Java-based deep learning library
Example of auto-encoder (e.g. stacked RBMs) …
Deep Learning: Auto-encoders
GraphAware®
Works well for images, but problematic for texts (sparsity).
Convolutional Neural Networks
GraphAware®
Y. Zhang, B. Wallace: arXiv:1510.03820
Classification of documents based on word embeddings and CNN
Deep Learning: Summarisation
GraphAware®
S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
Deep Learning: Summarisation
GraphAware®
Extractive summarisation (sentence ranking) notably outperforms abstractive.
S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
Knowledge Graphs are a powerful problem-solving tool
‣ Augmented search
‣ Actionable knowledge
‣ Machine Learning
‣ Chatbots and Question answering systems
‣ Foundational to AI
Conclusion
GraphAware®
www.graphaware.com @graph_aware

More Related Content

What's hot

Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
GraphAware
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
Doug Needham
 
Graph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with GraphgenGraph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with Graphgen
GraphAware
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
Spark Summit
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
Krishna Sankar
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
Michal Bachman
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j
Neo4j
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Mo Patel
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Sigmoid
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
Ankur Dave
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
Doug Needham
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
Spark Summit
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Databricks
 
Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to Graphs
Neo4j
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
Databricks
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Databricks
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
Data Science Thailand
 

What's hot (20)

Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
Unparalleled Graph Database Scalability Delivered by Neo4j 4.0
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Graph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with GraphgenGraph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with Graphgen
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
 
Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to Graphs
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 

Similar to Signals from outer space

The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
University of California, San Diego
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data stream
PyData
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
Alejandro Llaves
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
Alejandro Llaves
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
Olabode Ajayi
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Pinar Alper
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript Programs
IRJET Journal
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
François Belleau
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
anpawlik
 
Scientific
Scientific Scientific
Scientific
marpierc
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
Jean-Paul Calbimonte
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
Jean-Paul Calbimonte
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
Anubhav Jain
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
Jean-Paul Calbimonte
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
andrea huang
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Paco Nathan
 

Similar to Signals from outer space (20)

The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data stream
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript Programs
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
 
Scientific
Scientific Scientific
Scientific
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Text mining and Visualizations
Text mining  and VisualizationsText mining  and Visualizations
Text mining and Visualizations
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
 

More from GraphAware

Challenges in knowledge graph visualization
Challenges in knowledge graph visualizationChallenges in knowledge graph visualization
Challenges in knowledge graph visualization
GraphAware
 
Social media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge GraphSocial media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge Graph
GraphAware
 
To be or not to be.
To be or not to be. To be or not to be.
To be or not to be.
GraphAware
 
It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)
GraphAware
 
How Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph AnalyticsHow Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph Analytics
GraphAware
 
When privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businessesWhen privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businesses
GraphAware
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
GraphAware
 
(Big) Data Science
 (Big) Data Science (Big) Data Science
(Big) Data Science
GraphAware
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
GraphAware
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)
GraphAware
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
GraphAware
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
GraphAware
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
GraphAware
 
Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
GraphAware
 
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe WillemsenKnowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
GraphAware
 
The power of polyglot searching
The power of polyglot searchingThe power of polyglot searching
The power of polyglot searching
GraphAware
 
Neo4j-Databridge
Neo4j-DatabridgeNeo4j-Databridge
Neo4j-Databridge
GraphAware
 
Spring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise AppsSpring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise Apps
GraphAware
 
Voice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon AlexaVoice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
GraphAware
 
Relevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4jRelevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4j
GraphAware
 

More from GraphAware (20)

Challenges in knowledge graph visualization
Challenges in knowledge graph visualizationChallenges in knowledge graph visualization
Challenges in knowledge graph visualization
 
Social media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge GraphSocial media monitoring with ML-powered Knowledge Graph
Social media monitoring with ML-powered Knowledge Graph
 
To be or not to be.
To be or not to be. To be or not to be.
To be or not to be.
 
It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)It Depends (and why it's the most frequent answer to modelling questions)
It Depends (and why it's the most frequent answer to modelling questions)
 
How Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph AnalyticsHow Boston Scientific Improves Manufacturing Quality Using Graph Analytics
How Boston Scientific Improves Manufacturing Quality Using Graph Analytics
 
When privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businessesWhen privacy matters! Chatbots in data-sensitive businesses
When privacy matters! Chatbots in data-sensitive businesses
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
(Big) Data Science
 (Big) Data Science (Big) Data Science
(Big) Data Science
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
 
Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
 
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe WillemsenKnowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
Knowledge Graphs and Chatbots with Neo4j and IBM Watson - Christophe Willemsen
 
The power of polyglot searching
The power of polyglot searchingThe power of polyglot searching
The power of polyglot searching
 
Neo4j-Databridge
Neo4j-DatabridgeNeo4j-Databridge
Neo4j-Databridge
 
Spring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise AppsSpring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise Apps
 
Voice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon AlexaVoice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
Voice-driven Knowledge Graph Journey with Neo4j and Amazon Alexa
 
Relevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4jRelevant Search Leveraging Knowledge Graphs with Neo4j
Relevant Search Leveraging Knowledge Graphs with Neo4j
 

Recently uploaded

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Signals from outer space

  • 1. GraphAware® SIGNALS FROM OUTER SPACE Vlasta Kůs, Data Scientist @ GraphAware graphaware.com @graph_aware, @VlastaKus How NASA Benefits from Graph-Powered NLP
  • 2. ‣ Database of learned knowledge across NASA’s programs & projects ‣ Unstructured text with basic metadata ‣ Collected since late 1950s (100s of millions of documents) ‣ Public dataset: ~1600 documents NASA’s Lessons Learned GraphAware®
  • 3. "1406",420,"Roberts, J “, "VO'75 Pressure Regulator Leakage and Work-Around Procedures (~1976)”, "The pressure regulator in the Viking Orbiter Propulsion Subsystem started leaking following a pyro firing that occurred prior to the near-Mars TCM. Likely causes were corrosion or residue from propellant migration or pyro valve blowby, or particulate contamination. Recommendations included using separate regulators for the fuel and oxidizer sides, incorporating a bellows in the pyro valve to eliminate blowby, and adding a isolation valve between the regulator and propellant tank.“, " The micro-scale effects of long-term propellant exposure should be investigated in order to better critique regulator design. “, "JPL",1996-07-08,"",TRUE,"",1460,7,NA,"https:// nen.nasa.gov/web/11/viewall/-/viewall/420" NASA’s Lessons Learned Database GraphAware®
  • 4. “673",1326,"Relvini, Kristine “, "Lessons Learned Not Being Inputted Into Lessons Learned Information System (LLIS) Database”, “", "If you don't document the lessons learned, you loose knowledgeable, shared information and tracking capacity across programs.“, "KSC",2002-10-11,"Aeronautics Research, Science, Exploration Systems, Space Operations, ",FALSE,"", 702,6,NA,"https://nen.nasa.gov/web/11/viewall/-/ viewall/1326" NASA’s Lessons Learned GraphAware®
  • 5. Graph database = isolated data silos -> connected knowledge ‣ Efficient search ‣ Relationships among various areas Apollo, Space Shuttle, Orion, … ‣ Pattern recognition (clusters, communities, correlations, …) Example: correlation between corrosion of valves & topics involving batteries ‣ Useful for planning future projects and preventing/solving issues NASA’s Lessons Learned GraphAware®
  • 6. What is a Graph? GraphAware® G = (V, E)
  • 7. WHY NEO4J? GraphAware® It is a proper graph database It is a proper database
  • 10. ‣ NLP = machine learning tools allowing computers to process - and perhaps understand - human languages ‣ Basic steps Sentence segmentation Tokenisation Lemmatisation Part of Speech (POS) tagging Parsing Named Entities Recognition (NER) Sentiment analysis … Natural Language Processing GraphAware®
  • 11. Currently supported toolkits for human language processing ‣ Stanford CoreNLP ‣ developed at Stanford University ‣ fast, robust, production ready ‣ many pre-built models ‣ license: GPL v3+ ‣ Apache OpenNLP ‣ developed by volunteers ‣ many pre-built models ‣ license: Apache License v2.0 NLP: Text Processors GraphAware®
  • 12. ‣ Named Entity Recognition (NER) = classification of words into predefined classes ‣ Examples: Dr. Who -> Person, May 2018 -> Date, EU -> Country … ‣ Stanford NLP default entities: Person, Location, Date, Organisation, Number, Money, Percentage ‣ Custom NE classes -> training on large tokenised & labeled corpus ‣ Wikipedia, Wikidata - rich sources of multilingual training data that can be extracted automatically Named Entity Recognition GraphAware®
  • 13. Custom Named Entities based on Wikipedia GraphAware® NASA use case: identify names of space missions Training - crawling Wikipedia & identifying relevant information
  • 15. Universal Dependencies: cross-linguistically consistent grammatical relations among words in a sentence Examples: ‣ amod (adjectival modifier) Matt likes red wine. ‣ appos (appositional modifier) Mars Global Surveyor (MGS) was an American robotic spacecraft … ‣ conj (conjunct) It failed to respond to messages and commands. ‣ … Universal Dependencies GraphAware®
  • 16. ‣ Stanford CoreNLP: Dependency & Part of Speech analysis of a single sentence Source: http://nlp.stanford.edu:8080/corenlp/process Either find an efficient representation in some traditional database, or … Graph-Powered NLP GraphAware®
  • 17. Graph-Powered NLP GraphAware® NLP and property graphs: natural fit … use a property graph!
  • 19. Unsupervised techniques tend to be underestimated, while … ‣ No need for time & money to get massive labeled training datasets ‣ Often faster to train & faster to predict ‣ Unsupervised deep learning Unsupervised ML Algorithms GraphAware®
  • 20. PageRank GraphAware® PageRank = a measure of importance of a web page based on the quality of links from other pages The formula reflects a model of a random surfer. Source: https://en.wikipedia.org/wiki/PageRank
  • 21. Keyword Extraction: TextRank GraphAware® Keywords = words/phrases that capture the semantic essence of a text Graph-Based Unsupervised Algorithm: ‣ Construct a graph of word co-occurrences ‣ Asses the importance of words by PageRank algorithm ‣ Use top 1/3 of words as keyword candidates ‣ Use universal dependencies to construct key phrases
  • 22. GraphAware® Rada Mihalcea, Paul Tarau. TextRank: Bringing Order into Texts. Proceedings of EMNLP 2004, pages 404–411, Barcelona, Spain. Association for Computational Linguistics. http://www.aclweb.org/anthology/W04-3252. Keyword Extraction: TextRank Despite its simplicity, TextRank provides state of the art results on wide range of unstructured texts. Leveraging universal dependencies allowed us to surpass precision & recall of the original TextRank paper. NASA examples: “space shuttle”, “flight hardware”, “launch vehicle”, …
  • 23. Automatic text summarisation ‣ Abstractive ‣ Extractive TextRank can be adapted for efficient sentence ranking for extractive summarisation. Summarisation: TextRank GraphAware®
  • 25. ConceptNet 5 = semantic network for understanding the meaning of words ‣ Relational knowledge from MIT’s Open Mind Common Sense project ‣ DBPedia (information from Wikipedia info-boxes) ‣ Wiktionary (free multilingual dictionary) ‣ … Knowledge Enrichment: ConceptNet 5 GraphAware® Microsoft Concept Graph = semantic network introducing knowledge about concepts ‣ harnessed from billions of web pages and years’ worth of search logs
  • 26. Expand the knowledge from external or other internal sources. Knowlege Enrichment GraphAware®
  • 27. ‣ Latent Dirichlet Allocation (LDA) - generative statistical model that describes documents as a probabilistic mixture of a small number of topics ‣ Each topic described by a list of most relevant words ‣ Sample of topics from the NASA dataset [“design”, "failure", "test", "result", "flight", "hardware", "mission", “testing”, “system”, “due”] [“pressure", "system", "cause", "valve", "propellant", "leak", "operation", “shuttle”, “space”, “gas”] [“space”, "shuttle", "NASA", "operation", "safety", "iss", "crew", "ISS", "astronaut", "progr am"] Topic Extraction: Latent Dirichlet Allocation GraphAware®
  • 29. ‣ Word embeddings = representation of words as multi-dimensional semantic vectors which encode linguistic patterns ‣ Use cases: word sense disambiguation, new distance functions between documents, starting point for further ML (e.g. NN classification) ‣ Word2vec = shallow two-layer neural network model for producing word embeddings ‣ ConceptNet Numberbatch - consists of state-of-the-art word embeddings Word Embeddings GraphAware®
  • 30. Word Embeddings: word2vec GraphAware® Tomas Mikolov et al.: https://arxiv.org/abs/1301.3781
  • 31. Word Embeddings: word2vec GraphAware® Kusner et al.: http://mkusner.github.io/publications/WMD.pdf Document distance: min. cumulative distance that all words need to travel Semantic patterns representable as linear translations: distance(Oslo -> Norway) similar to distance(Berlin -> Germany) vec(Germany) - vec(Berlin) + vec(Oslo) = vec(Norway)
  • 32. Document Embeddings GraphAware® Q. Le, T. Mikolov: Distributed representations of sentences and documents, arXiv:1405.4053v2 Paragraph Vector (doc2vec): extension of word2vec The additional paragraph node represents context (topic) of the current document. Paragraph vectors have the same behaviour towards linear vector translations as word vectors.
  • 33. Document Embeddings GraphAware® doc2vec vectors of dimension 300, NASA sentences -> dimensionality reduction (PCA + t-SNE)
  • 34. Document Embeddings GraphAware® doc2vec vectors of dimension 2000, 30k Wikipedia pages -> dimensionality reduction (PCA + t-SNE)
  • 35. Some of the neural networks applicable to text processing ‣ Shallow networks (word & document embeddings) ‣ Deep Auto-Encoders ‣ Convolutional Neural Networks ‣ Recurrent Neural Networks (LSTMs) Deep Learning for Text Processing GraphAware®
  • 36. Self-supervised Auto-Encoders: useful for vector embeddings (images, texts) DeepLearning4J - Java-based deep learning library Example of auto-encoder (e.g. stacked RBMs) … Deep Learning: Auto-encoders GraphAware® Works well for images, but problematic for texts (sparsity).
  • 37. Convolutional Neural Networks GraphAware® Y. Zhang, B. Wallace: arXiv:1510.03820 Classification of documents based on word embeddings and CNN
  • 38. Deep Learning: Summarisation GraphAware® S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
  • 39. Deep Learning: Summarisation GraphAware® Extractive summarisation (sentence ranking) notably outperforms abstractive. S. Narayan et al.: Ranking Sentences for Extractive Summarisation with Reinforcement learning, arXiv:1802.08636
  • 40. Knowledge Graphs are a powerful problem-solving tool ‣ Augmented search ‣ Actionable knowledge ‣ Machine Learning ‣ Chatbots and Question answering systems ‣ Foundational to AI Conclusion GraphAware®