Companies or organisations of any size, either public or private, have a large amount of data available into isolated data silos. They are created independently for the specific needs of the organisational unit and mainly contain textual data in multiple formats. In order to unleash the power of the relevant information available in such data sources, it is necessary to collect and organize them in an homogenous data structure, easy to access and extend.
The presentation starts identifying the business needs, then drives the audience through the journey of (i) creating a knowledge graph that represents a single highly connected source of truth for the entire organisation, (ii) enriching it using multiple external sources of knowledge and machine learning algorithms, (iii) and evolving it accordingly to the mutating needs of the company.
Furthermore, this session highlights the role of graphs as a new "access pattern" for textual data, compared with the more classical inverted index approach. It concludes with the presentation of a complete end-to-end infrastructure for unstructured data processing workflow where Neo4j is the core of a complex ecosystem integrated with other tools like Elasticsearch, Apache Kafka, Stanford NLP, OpenNLP, Apache Spark, and Tensorflow.
Talk at GraphTour Washington D.C., April 14, 2018