SlideShare a Scribd company logo
1 of 45
Download to read offline
Neo4j and Bioinformatics




www.ohnosequences.com                   www.bio4j.com
But who’s this guy talking here?
     I am Currently working as a Bioinformatics consultant/developer/researcher at
     Oh no sequences!


     Oh no what !?
     We are the R&D group at Era7 Bioinformatics.
     we like bioinformatics, cloud computing, NGS, category theory, bacterial
     genomics…
     well, lots of things.


     What about Era7 Bioinformatics?
     Era7 Bioinformatics is a Bioinformatics company specialized in sequence analysis,
     knowledge management and sequencing data interpretation.
     Our area of expertise revolves around biological sequence analysis, particularly
     Next Generation Sequencing data management and analysis.




www.ohnosequences.com                                                      www.bio4j.com
In Bioinformatics we have highly interconnected overlapping knowledge spread
    throughout different DBs




www.ohnosequences.com                                                   www.bio4j.com
However all this data is in most cases modeled in relational databases.
        Sometimes even just as plain CSV files

               As the amount and diversity of data grows, domain models
               become crazily complicated!




www.ohnosequences.com                                                     www.bio4j.com
With a relational paradigm, the double implication

                              Entity  Table

         does not go both ways.


              You get ‘auxiliary’ tables that have no relationship with the small
              piece of reality you are modeling.


              You need ‘artificial’ IDs only for connecting entities, (and these are mixed
              with IDs that somehow live in reality)


              Entity-relationship models are cool but in the end you always have to
              deal with ‘raw’ tables plus SQL.


              Integrating/incorporating new knowledge into already existing
              databases is hard and sometimes even not possible without changing
              the domain model




www.ohnosequences.com                                                               www.bio4j.com
Life in general and biology in particular are probably not 100% like a graph…




                                but one thing’s sure, they are not a set of tables!



www.ohnosequences.com                                                                www.bio4j.com
NoSQL data models




www.ohnosequences.com        www.bio4j.com
Neo4j is a high-performance, NOSQL graph database with all
           the features of a mature and robust database.


           The programmer works with an object-oriented, flexible
           network structure rather than with strict and static tables


           All the benefits of a fully transactional, enterprise-strength
           database.


           For many applications, Neo4j offers performance
           improvements on the order of 1000x or more compared to
           relational DBs.




www.ohnosequences.com                                                    www.bio4j.com
What’s Bio4j?


     Bio4j is a bioinformatics graph based DB including most data
     available in :

        Uniprot KB (SwissProt + Trembl)   NCBI Taxonomy

        Gene Ontology (GO)                RefSeq

        UniRef (50,90,100)                Enzyme DB




www.ohnosequences.com                                      www.bio4j.com
What’s Bio4j?

     It provides a completely new and powerful framework
     for protein related information querying and
     management.


     Since it relies on a high-performance graph engine, data
     is stored in a way that semantically represents its own
     structure




www.ohnosequences.com                                www.bio4j.com
What’s Bio4j?

     Bio4j uses Neo4j technology, a "high-performance graph
     engine with all the features of a mature and robust
     database".

     Thanks to both being based on Neo4j DB and the API
     provided, Bio4j is also very scalable, allowing anyone
     to easily incorporate his own data making the best
     out of it.



www.ohnosequences.com                                 www.bio4j.com
What’s Bio4j?


                        Everything in Bio4j is open source !



       released under AGPLv3




www.ohnosequences.com                              www.bio4j.com
Bio4j in numbers


     The current version (0.7) includes:



             Relationships: 530.642.683

             Nodes: 76.071.411

             Relationship types: 139

             Node types: 38




www.ohnosequences.com                      www.bio4j.com
Let’s dig a bit about Bio4j structure…


               Data sources and their relationships:




www.ohnosequences.com                                  www.bio4j.com
Bio4j domain model




www.ohnosequences.com   www.bio4j.com
The Graph DB model: representation


          Core abstractions:

             Nodes

             Relationships between nodes

             Properties on both




www.ohnosequences.com                      www.bio4j.com
How are things modeled?




                            Couldn’t be simpler!




                 Entities           Associations / Relationships




                  Nodes                        Edges




www.ohnosequences.com                                        www.bio4j.com
Some examples of nodes would be:


                                      GO term
                  Protein
                                                         Genome Element




     and relationships:




                            Protein   PROTEIN_GO_ANNOTATION


                                                      GO term




www.ohnosequences.com                                                www.bio4j.com
We have developed a tool aimed to be used both as a reference manual and
    initial contact for Bio4j domain model: Bio4jExplorer



     Bio4jExplorer allows you to:

     • Navigate through all nodes and relationships


     • Access the javadocs of any node or relationship


     • Graphically explore the neighborhood of a node/relationship


     • Look up for the indexes that may serve as an entry point for a node


     • Check incoming/outgoing relationships of a specific node


     • Check start/end nodes of a specific relationship




www.ohnosequences.com                                                          www.bio4j.com
Entry points and indexing

         There are two kinds of entry points for the graph:



               Auxiliary relationships going from the reference node, e.g.

                 - CELLULAR_COMPONENT: leads to the root of GO cellular component
                 sub-ontology

                 - MAIN_DATASET: leads to both main datasets: Swiss-Prot and Trembl


               Node indexing

               There are two types of node indexes:

                 - Exact: Only exact values are considered hits

                 - Fulltext: Regular expressions can be used




www.ohnosequences.com                                                           www.bio4j.com
Retrieving protein info (Bio4jModel Java API)

     //--creating manager and node retriever----
     Bio4jManager manager = new Bio4jManager(“/mybio4jdb”);
     NodeRetriever nR= new NodeRetriever(manager);

     ProteinNode protein = nR.getProteinNodeByAccession(“P12345”);


     Getting more related info...

     List<InterproNode> interpros = protein.getInterpro();
     OrganismNode organism = protein.getOrganism();
     List<GoTermNode> goAnnotations = protein.getGOAnnotations();

     List<ArticleNode> articles = protein.getArticleCitations();

     for (ArticleNode article : articles) {
         System.out.println(article.getPubmedId());
     }

     //Don’t forget to close the manager
     manager.shutDown();




www.ohnosequences.com                                                www.bio4j.com
Querying Bio4j with Cypher


     Getting a keyword by its ID

     START k=node:keyword_id_index(keyword_id_index = "KW-0181")
     return k.name, k.id


     Finding circuits/simple cycles of length 3 where at least one protein is from Swiss-Prot
     dataset:

     START d=node:dataset_name_index(dataset_name_index = "Swiss-Prot")
     MATCH d <-[r:PROTEIN_DATASET]- p,
     circuit = (p) -[:PROTEIN_PROTEIN_INTERACTION]-> (p2) -
     [:PROTEIN_PROTEIN_INTERACTION]-> (p3) -[:PROTEIN_PROTEIN_INTERACTION]->
     (p)
      return p.accession, p2.accession, p3.accession


              Check this blog post for more info and our Bio4j Cypher cheetsheet




www.ohnosequences.com                                                                   www.bio4j.com
A graph traversal language


     Get protein by its accession number and return its full name

     gremlin> g.idx('protein_accession_index')[['protein_accession_index':'P12345']].full_name
     ==> Aspartate aminotransferase, mitochondrial


     Get proteins (accessions) associated to an interpro motif (limited to 4 results)
     gremlin>
     g.idx('interpro_id_index')[['interpro_id_index':'IPR023306']].inE('PROTEIN_INTERPRO').outV.
     accession[0..3]
     ==> E2GK26
     ==> G3PMS4
     ==> G3Q865
     ==> G3PIL8


            Check our Bio4j Gremlin cheetsheet




www.ohnosequences.com                                                               www.bio4j.com
REST Server


     You can also query/navigate through Bio4j with the Neo4j REST API !

     The default representation is json, both for responses and or data sent with
     POST/PUT requests


     Get protein by its accession number: (Q9UR66)

     http://server_url:7474/db/data/index/node/protein_accession_index/
     protein_accession_index/Q9UR66


     Get outgoing relationships for protein Q9UR66

     http://server_url:7474/db/data/node/Q9UR66_node_id/relationships/o
     ut




www.ohnosequences.com                                                      www.bio4j.com
Visualizations (1)  REST Server Data Browser


      Navigate through Bio4j data in real time !




www.ohnosequences.com                               www.bio4j.com
Visualizations (2)  Bio4j GO Tools




www.ohnosequences.com                    www.bio4j.com
Visualizations (3)  Bio4j + Gephi

      Get really cool graph visualizations using Bio4j and Gephi visualization and
      exploration platform




www.ohnosequences.com                                                                www.bio4j.com
Bio4j + Cloud

     We use AWS (Amazon Web Services) everywhere we can around Bio4j, giving
     us the following benefits:


          Interoperability and data distribution

           Releases are available as public EBS Snapshots, giving AWS users the
           opportunity of creating and attaching to their instances Bio4j DB 100% ready
           volumes in just a few seconds.

           CloudFormation templates:

             - Basic Bio4j DB Instance

             - Bio4j REST Server Instance


           Backup and Storage using S3 (Simple Storage Service)

           We use S3 both for backup (indirectly through the EBS snapshots) and
           storage (directly storing RefSeq sequences as independent S3 files)



www.ohnosequences.com                                                               www.bio4j.com
Why would I use Bio4j ?


    Massive access to protein/genome/taxonomy… related information


    Integration of your own DBs/resources around common information


    Development of services tailored to your needs built around Bio4j


    Networks analysis


    Visualizations


    Besides many others I cannot think of myself…
    If you have something in mind for which Bio4j might be useful, please let us know so we
    can all see how it could help you meet your needs! ;)




www.ohnosequences.com                                                                www.bio4j.com
Community

     Bio4j has a fast growing internet presence:



            - Twitter: check @bio4j for updates

            - Blog: go to http://blog.bio4j.com

            - Mail-list: ask any question you may have in our list.

            - LinkedIn: check the Bio4j group

            - Github issues: don’t be shy! open a new issue if you think
                             something’s going wrong.




www.ohnosequences.com                                                 www.bio4j.com
OK, but why starting all this?
   Were you so bored…?!

    It all started somehow around our need for massive access to protein GO
    (Gene Ontology) annotations.

     At that point I had to develop my own MySQL DB based on the official
     GO SQL database, and problems started from the beginning:


          I got crazy ‘deciphering’ how to extract Uniprot protein annotations
          from GO official tables schema

          Uniprot and GO official protein annotations were not always consistent


          Populating my own DB took really long due to all the joins and
          subqueries needed in order to get and store the protein annotations.

          Soon enough we also had the need of having massive access to basic
          protein information.




www.ohnosequences.com                                                              www.bio4j.com
These processes had to be automated for our (specifically designed for NGS data)
  bacterial genome annotation system BG7



              Uniprot web services available were too limited:

                - Slow

                - Number of queries limitation

                - Too little information available




                  So I downloaded the whole Uniprot DB in XML format
                  (Swiss-Prot + Trembl)

                  and started to have some fun with it !




www.ohnosequences.com                                                  www.bio4j.com
BG7 algorithm


       • Selection of the specific reference protein set
   1

       • Prediction of possible genes by BLAST similarity
   2


       • Gene definition: merging compatible similarity regions, detecting   start and stop
   3


       • Solving overlapped predicted genes
   4

       • RNA prediction by BLAST similarity
   5


   6   • Final annotation and complete deliverables. Quality control.




www.era7bioinformatics.com
We got used to having massive direct access to all this protein related
      information…


           So why not adding other resources we needed quite often in most
           projects and which now were becoming a sort of bottleneck
           compared to all those already included in Bio4j ?

       Then we incorporated:
            -   Isoform sequences

            -   Protein interactions and features

            -   Uniref 50, 90, and 100

            -   RefSeq

            -   NCBI Taxonomy

            -   Enzyme Expasy DB




www.ohnosequences.com                                                 www.bio4j.com
Bio4j + MG7 + 48 Blast XML files (~1GB each)


     Some numbers:

                •   157 639 502 nodes

                •   742 615 705 relationships

                •   632 832 045 properties

                •   148 relationship types

                •   44 node types


             And it works just fine!


www.ohnosequences.com                           www.bio4j.com
MG7 domain model




www.ohnosequences.com   www.bio4j.com
What’s MG7?

     MG7 provides the possibility of choosing different parameters to fix the
     thresholds for filtering the BLAST hits:

     i.    E-value
     ii.   Identity and query coverage


     It allows exporting the results of the analysis to different data formats like:
     • XML
     • CSV
     • Gexf (Graph exchange XML format)

     As well as provides to the user with Heat maps and graph visualizations whilst
     including an user-friendly interface that allows to access to the alignment
     responsible for each functional or taxonomical read assignation and that displays
     the frequencies in the taxonomical tree --> MG7Viewer




www.ohnosequences.com                                                         www.bio4j.com
Heat-map Viz




www.ohnosequences.com   www.bio4j.com
Graph Viz




www.ohnosequences.com   www.bio4j.com
MG7 Viewer




www.ohnosequences.com   www.bio4j.com
Mining Bio4j data

      Finding topological patterns in Protein-Protein
                  Interaction networks




www.ohnosequences.com                            www.bio4j.com
Finding the lowest common ancestor of a set of NCBI
                taxonomy nodes with Bio4j




www.ohnosequences.com                         www.bio4j.com
Future directions (1)


    Gene flux tool

    New tool for bacterial comparative genomics: massive tracing of vertical and
    horizontal gene flux between genome elements based on the analysis of the
    similarity between their proteins. It would analyze similarity relationships that could
    be fixed to a 90% or 100% similarity threshold.



    Pathways tool
    Data from Metacyc is going to be included in Bio4j. This data would allow to dissect
    the metabolic pathways in which a genome element, organism or community
    (metagenomic samples) is involved. Gephi could be used for the representation of
    metabolic pathways for each of them.
    .




www.ohnosequences.com                                                         www.bio4j.com
Future directions (2)


    Detector of common annotations in gene clusters

    Many biological problems are related to the search of common annotations in a set of genes.
    Some examples:

       - a set of overexpressed genes
       - a set of proteins with local structural similarities (WIP)
       - a set of genes bearing SNPs in cancer samples
       - a set of exclusive genes in a pathogenic bacterial strain

    The detection of common annotations can help in the inference of important functional
    connections.




www.ohnosequences.com                                                           www.bio4j.com
That’s it !


                        Thanks for
                        your time ;)




www.ohnosequences.com                  www.bio4j.com

More Related Content

What's hot

Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark Summit
 
Graph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfGraph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfNeo4j
 
GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
A Knowledge Graph for Reaction & Synthesis Prediction (AstraZeneca)
A Knowledge Graph for Reaction & Synthesis Prediction (AstraZeneca)A Knowledge Graph for Reaction & Synthesis Prediction (AstraZeneca)
A Knowledge Graph for Reaction & Synthesis Prediction (AstraZeneca)Neo4j
 
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptxEncrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptxNeo4j
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Redis: Swiss Army Knife @HackerRank: Kamal Joshi
Redis: Swiss Army Knife @HackerRank: Kamal JoshiRedis: Swiss Army Knife @HackerRank: Kamal Joshi
Redis: Swiss Army Knife @HackerRank: Kamal JoshiRedis Labs
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021StreamNative
 
MongoDB: システム可用性を拡張するインデクス戦略
MongoDB: システム可用性を拡張するインデクス戦略MongoDB: システム可用性を拡張するインデクス戦略
MongoDB: システム可用性を拡張するインデクス戦略ippei_suzuki
 
Introdução à Neo4j
Introdução à Neo4j Introdução à Neo4j
Introdução à Neo4j Neo4j
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the CloudNeo4j
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
 
Migrating from SQL Server Profiler to xEvent Profiler
Migrating from SQL Server Profiler to xEvent ProfilerMigrating from SQL Server Profiler to xEvent Profiler
Migrating from SQL Server Profiler to xEvent ProfilerOshitari_kochi
 
[db tech showcase Tokyo 2015] E27: Neo4jグラフデータベース by クリエーションライン株式会社 李昌桓
[db tech showcase Tokyo 2015] E27: Neo4jグラフデータベース by クリエーションライン株式会社 李昌桓[db tech showcase Tokyo 2015] E27: Neo4jグラフデータベース by クリエーションライン株式会社 李昌桓
[db tech showcase Tokyo 2015] E27: Neo4jグラフデータベース by クリエーションライン株式会社 李昌桓Insight Technology, Inc.
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Mineaki Motohashi
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMongoDB
 
Project Loom - 限定継続と軽量スレッド -
Project Loom - 限定継続と軽量スレッド - Project Loom - 限定継続と軽量スレッド -
Project Loom - 限定継続と軽量スレッド - Yuichi Sakuraba
 

What's hot (20)

Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Graph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfGraph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdf
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache Atlas
 
A Knowledge Graph for Reaction & Synthesis Prediction (AstraZeneca)
A Knowledge Graph for Reaction & Synthesis Prediction (AstraZeneca)A Knowledge Graph for Reaction & Synthesis Prediction (AstraZeneca)
A Knowledge Graph for Reaction & Synthesis Prediction (AstraZeneca)
 
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptxEncrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Redis: Swiss Army Knife @HackerRank: Kamal Joshi
Redis: Swiss Army Knife @HackerRank: Kamal JoshiRedis: Swiss Army Knife @HackerRank: Kamal Joshi
Redis: Swiss Army Knife @HackerRank: Kamal Joshi
 
20180216 sapporo techbar_db_migration
20180216 sapporo techbar_db_migration20180216 sapporo techbar_db_migration
20180216 sapporo techbar_db_migration
 
with NATS with Kubernetesの世界へ
with NATS with Kubernetesの世界へwith NATS with Kubernetesの世界へ
with NATS with Kubernetesの世界へ
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021
 
MongoDB: システム可用性を拡張するインデクス戦略
MongoDB: システム可用性を拡張するインデクス戦略MongoDB: システム可用性を拡張するインデクス戦略
MongoDB: システム可用性を拡張するインデクス戦略
 
Introdução à Neo4j
Introdução à Neo4j Introdução à Neo4j
Introdução à Neo4j
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the Cloud
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
 
Migrating from SQL Server Profiler to xEvent Profiler
Migrating from SQL Server Profiler to xEvent ProfilerMigrating from SQL Server Profiler to xEvent Profiler
Migrating from SQL Server Profiler to xEvent Profiler
 
[db tech showcase Tokyo 2015] E27: Neo4jグラフデータベース by クリエーションライン株式会社 李昌桓
[db tech showcase Tokyo 2015] E27: Neo4jグラフデータベース by クリエーションライン株式会社 李昌桓[db tech showcase Tokyo 2015] E27: Neo4jグラフデータベース by クリエーションライン株式会社 李昌桓
[db tech showcase Tokyo 2015] E27: Neo4jグラフデータベース by クリエーションライン株式会社 李昌桓
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Project Loom - 限定継続と軽量スレッド -
Project Loom - 限定継続と軽量スレッド - Project Loom - 限定継続と軽量スレッド -
Project Loom - 限定継続と軽量スレッド -
 

Viewers also liked

The power of graphs to analyze biological data
The power of graphs to analyze biological dataThe power of graphs to analyze biological data
The power of graphs to analyze biological datadatablend
 
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Graph DB + Bioinformatics:  Bio4j, recent applications and future directions Graph DB + Bioinformatics:  Bio4j, recent applications and future directions
Graph DB + Bioinformatics: Bio4j, recent applications and future directions Pablo Pareja Tobes
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsdatablend
 
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015StampedeCon
 
Building a repository of biomedical ontologies with Neo4j
Building a repository of biomedical ontologies with Neo4jBuilding a repository of biomedical ontologies with Neo4j
Building a repository of biomedical ontologies with Neo4jSimon Jupp
 
Arakawa_Glanguage_BOSC2009
Arakawa_Glanguage_BOSC2009Arakawa_Glanguage_BOSC2009
Arakawa_Glanguage_BOSC2009bosc
 
Bio4j: A pioneer graph based database for the integration of biological Big Data
Bio4j: A pioneer graph based database for the integration of biological Big DataBio4j: A pioneer graph based database for the integration of biological Big Data
Bio4j: A pioneer graph based database for the integration of biological Big DataPablo Pareja Tobes
 
Graph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBGraph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBAndrei KUCHARAVY
 
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas WeberGraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas WeberNeo4j
 
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009bosc
 
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...Jonathan Eisen
 
OBF Address at BOSC 2012
OBF Address at BOSC 2012OBF Address at BOSC 2012
OBF Address at BOSC 2012Hilmar Lapp
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesisschamber
 
VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationJan Aerts
 
Bio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perlBio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perlRutger Vos
 
The role of cost in yeast gene expression
The role of cost in yeast gene expressionThe role of cost in yeast gene expression
The role of cost in yeast gene expressionMichael Barton
 
Tetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan EisenTetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan EisenJonathan Eisen
 

Viewers also liked (20)

The power of graphs to analyze biological data
The power of graphs to analyze biological dataThe power of graphs to analyze biological data
The power of graphs to analyze biological data
 
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Graph DB + Bioinformatics:  Bio4j, recent applications and future directions Graph DB + Bioinformatics:  Bio4j, recent applications and future directions
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphs
 
Temporal graph
Temporal graphTemporal graph
Temporal graph
 
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
 
Building a repository of biomedical ontologies with Neo4j
Building a repository of biomedical ontologies with Neo4jBuilding a repository of biomedical ontologies with Neo4j
Building a repository of biomedical ontologies with Neo4j
 
Arakawa_Glanguage_BOSC2009
Arakawa_Glanguage_BOSC2009Arakawa_Glanguage_BOSC2009
Arakawa_Glanguage_BOSC2009
 
Bio4j: A pioneer graph based database for the integration of biological Big Data
Bio4j: A pioneer graph based database for the integration of biological Big DataBio4j: A pioneer graph based database for the integration of biological Big Data
Bio4j: A pioneer graph based database for the integration of biological Big Data
 
Graph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBGraph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDB
 
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas WeberGraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
 
Bio4j
Bio4jBio4j
Bio4j
 
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...
 
OBF Address at BOSC 2012
OBF Address at BOSC 2012OBF Address at BOSC 2012
OBF Address at BOSC 2012
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesis
 
VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Bio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perlBio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perl
 
The role of cost in yeast gene expression
The role of cost in yeast gene expressionThe role of cost in yeast gene expression
The role of cost in yeast gene expression
 
Tetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan EisenTetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan Eisen
 

Similar to Neo4j and bioinformatics

Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...graphdevroom
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...Michel Dumontier
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyChunlei Wu
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Trish Whetzel
 
MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxChris Mungall
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
Pham yang embl-ebi
Pham yang embl-ebiPham yang embl-ebi
Pham yang embl-ebiNate Wildes
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
InterPro and InterProScan 5.0
InterPro and InterProScan 5.0InterPro and InterProScan 5.0
InterPro and InterProScan 5.0EBI
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1iotest
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...Chunlei Wu
 

Similar to Neo4j and bioinformatics (20)

Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biology
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications
 
MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
Pham yang embl-ebi
Pham yang embl-ebiPham yang embl-ebi
Pham yang embl-ebi
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Harvester I
Harvester IHarvester I
Harvester I
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
InterPro and InterProScan 5.0
InterPro and InterProScan 5.0InterPro and InterProScan 5.0
InterPro and InterProScan 5.0
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Biothings presentation
Biothings presentationBiothings presentation
Biothings presentation
 
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Neo4j and bioinformatics

  • 2. But who’s this guy talking here? I am Currently working as a Bioinformatics consultant/developer/researcher at Oh no sequences! Oh no what !? We are the R&D group at Era7 Bioinformatics. we like bioinformatics, cloud computing, NGS, category theory, bacterial genomics… well, lots of things. What about Era7 Bioinformatics? Era7 Bioinformatics is a Bioinformatics company specialized in sequence analysis, knowledge management and sequencing data interpretation. Our area of expertise revolves around biological sequence analysis, particularly Next Generation Sequencing data management and analysis. www.ohnosequences.com www.bio4j.com
  • 3. In Bioinformatics we have highly interconnected overlapping knowledge spread throughout different DBs www.ohnosequences.com www.bio4j.com
  • 4. However all this data is in most cases modeled in relational databases. Sometimes even just as plain CSV files As the amount and diversity of data grows, domain models become crazily complicated! www.ohnosequences.com www.bio4j.com
  • 5. With a relational paradigm, the double implication Entity  Table does not go both ways. You get ‘auxiliary’ tables that have no relationship with the small piece of reality you are modeling. You need ‘artificial’ IDs only for connecting entities, (and these are mixed with IDs that somehow live in reality) Entity-relationship models are cool but in the end you always have to deal with ‘raw’ tables plus SQL. Integrating/incorporating new knowledge into already existing databases is hard and sometimes even not possible without changing the domain model www.ohnosequences.com www.bio4j.com
  • 6. Life in general and biology in particular are probably not 100% like a graph… but one thing’s sure, they are not a set of tables! www.ohnosequences.com www.bio4j.com
  • 8. Neo4j is a high-performance, NOSQL graph database with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables All the benefits of a fully transactional, enterprise-strength database. For many applications, Neo4j offers performance improvements on the order of 1000x or more compared to relational DBs. www.ohnosequences.com www.bio4j.com
  • 9. What’s Bio4j? Bio4j is a bioinformatics graph based DB including most data available in : Uniprot KB (SwissProt + Trembl) NCBI Taxonomy Gene Ontology (GO) RefSeq UniRef (50,90,100) Enzyme DB www.ohnosequences.com www.bio4j.com
  • 10. What’s Bio4j? It provides a completely new and powerful framework for protein related information querying and management. Since it relies on a high-performance graph engine, data is stored in a way that semantically represents its own structure www.ohnosequences.com www.bio4j.com
  • 11. What’s Bio4j? Bio4j uses Neo4j technology, a "high-performance graph engine with all the features of a mature and robust database". Thanks to both being based on Neo4j DB and the API provided, Bio4j is also very scalable, allowing anyone to easily incorporate his own data making the best out of it. www.ohnosequences.com www.bio4j.com
  • 12. What’s Bio4j? Everything in Bio4j is open source ! released under AGPLv3 www.ohnosequences.com www.bio4j.com
  • 13. Bio4j in numbers The current version (0.7) includes: Relationships: 530.642.683 Nodes: 76.071.411 Relationship types: 139 Node types: 38 www.ohnosequences.com www.bio4j.com
  • 14. Let’s dig a bit about Bio4j structure… Data sources and their relationships: www.ohnosequences.com www.bio4j.com
  • 16. The Graph DB model: representation Core abstractions: Nodes Relationships between nodes Properties on both www.ohnosequences.com www.bio4j.com
  • 17. How are things modeled? Couldn’t be simpler! Entities Associations / Relationships Nodes Edges www.ohnosequences.com www.bio4j.com
  • 18. Some examples of nodes would be: GO term Protein Genome Element and relationships: Protein PROTEIN_GO_ANNOTATION GO term www.ohnosequences.com www.bio4j.com
  • 19. We have developed a tool aimed to be used both as a reference manual and initial contact for Bio4j domain model: Bio4jExplorer Bio4jExplorer allows you to: • Navigate through all nodes and relationships • Access the javadocs of any node or relationship • Graphically explore the neighborhood of a node/relationship • Look up for the indexes that may serve as an entry point for a node • Check incoming/outgoing relationships of a specific node • Check start/end nodes of a specific relationship www.ohnosequences.com www.bio4j.com
  • 20. Entry points and indexing There are two kinds of entry points for the graph: Auxiliary relationships going from the reference node, e.g. - CELLULAR_COMPONENT: leads to the root of GO cellular component sub-ontology - MAIN_DATASET: leads to both main datasets: Swiss-Prot and Trembl Node indexing There are two types of node indexes: - Exact: Only exact values are considered hits - Fulltext: Regular expressions can be used www.ohnosequences.com www.bio4j.com
  • 21. Retrieving protein info (Bio4jModel Java API) //--creating manager and node retriever---- Bio4jManager manager = new Bio4jManager(“/mybio4jdb”); NodeRetriever nR= new NodeRetriever(manager); ProteinNode protein = nR.getProteinNodeByAccession(“P12345”); Getting more related info... List<InterproNode> interpros = protein.getInterpro(); OrganismNode organism = protein.getOrganism(); List<GoTermNode> goAnnotations = protein.getGOAnnotations(); List<ArticleNode> articles = protein.getArticleCitations(); for (ArticleNode article : articles) { System.out.println(article.getPubmedId()); } //Don’t forget to close the manager manager.shutDown(); www.ohnosequences.com www.bio4j.com
  • 22. Querying Bio4j with Cypher Getting a keyword by its ID START k=node:keyword_id_index(keyword_id_index = "KW-0181") return k.name, k.id Finding circuits/simple cycles of length 3 where at least one protein is from Swiss-Prot dataset: START d=node:dataset_name_index(dataset_name_index = "Swiss-Prot") MATCH d <-[r:PROTEIN_DATASET]- p, circuit = (p) -[:PROTEIN_PROTEIN_INTERACTION]-> (p2) - [:PROTEIN_PROTEIN_INTERACTION]-> (p3) -[:PROTEIN_PROTEIN_INTERACTION]-> (p) return p.accession, p2.accession, p3.accession Check this blog post for more info and our Bio4j Cypher cheetsheet www.ohnosequences.com www.bio4j.com
  • 23. A graph traversal language Get protein by its accession number and return its full name gremlin> g.idx('protein_accession_index')[['protein_accession_index':'P12345']].full_name ==> Aspartate aminotransferase, mitochondrial Get proteins (accessions) associated to an interpro motif (limited to 4 results) gremlin> g.idx('interpro_id_index')[['interpro_id_index':'IPR023306']].inE('PROTEIN_INTERPRO').outV. accession[0..3] ==> E2GK26 ==> G3PMS4 ==> G3Q865 ==> G3PIL8 Check our Bio4j Gremlin cheetsheet www.ohnosequences.com www.bio4j.com
  • 24. REST Server You can also query/navigate through Bio4j with the Neo4j REST API ! The default representation is json, both for responses and or data sent with POST/PUT requests Get protein by its accession number: (Q9UR66) http://server_url:7474/db/data/index/node/protein_accession_index/ protein_accession_index/Q9UR66 Get outgoing relationships for protein Q9UR66 http://server_url:7474/db/data/node/Q9UR66_node_id/relationships/o ut www.ohnosequences.com www.bio4j.com
  • 25. Visualizations (1)  REST Server Data Browser Navigate through Bio4j data in real time ! www.ohnosequences.com www.bio4j.com
  • 26. Visualizations (2)  Bio4j GO Tools www.ohnosequences.com www.bio4j.com
  • 27. Visualizations (3)  Bio4j + Gephi Get really cool graph visualizations using Bio4j and Gephi visualization and exploration platform www.ohnosequences.com www.bio4j.com
  • 28. Bio4j + Cloud We use AWS (Amazon Web Services) everywhere we can around Bio4j, giving us the following benefits: Interoperability and data distribution Releases are available as public EBS Snapshots, giving AWS users the opportunity of creating and attaching to their instances Bio4j DB 100% ready volumes in just a few seconds. CloudFormation templates: - Basic Bio4j DB Instance - Bio4j REST Server Instance Backup and Storage using S3 (Simple Storage Service) We use S3 both for backup (indirectly through the EBS snapshots) and storage (directly storing RefSeq sequences as independent S3 files) www.ohnosequences.com www.bio4j.com
  • 29. Why would I use Bio4j ? Massive access to protein/genome/taxonomy… related information Integration of your own DBs/resources around common information Development of services tailored to your needs built around Bio4j Networks analysis Visualizations Besides many others I cannot think of myself… If you have something in mind for which Bio4j might be useful, please let us know so we can all see how it could help you meet your needs! ;) www.ohnosequences.com www.bio4j.com
  • 30. Community Bio4j has a fast growing internet presence: - Twitter: check @bio4j for updates - Blog: go to http://blog.bio4j.com - Mail-list: ask any question you may have in our list. - LinkedIn: check the Bio4j group - Github issues: don’t be shy! open a new issue if you think something’s going wrong. www.ohnosequences.com www.bio4j.com
  • 31. OK, but why starting all this? Were you so bored…?! It all started somehow around our need for massive access to protein GO (Gene Ontology) annotations. At that point I had to develop my own MySQL DB based on the official GO SQL database, and problems started from the beginning: I got crazy ‘deciphering’ how to extract Uniprot protein annotations from GO official tables schema Uniprot and GO official protein annotations were not always consistent Populating my own DB took really long due to all the joins and subqueries needed in order to get and store the protein annotations. Soon enough we also had the need of having massive access to basic protein information. www.ohnosequences.com www.bio4j.com
  • 32. These processes had to be automated for our (specifically designed for NGS data) bacterial genome annotation system BG7 Uniprot web services available were too limited: - Slow - Number of queries limitation - Too little information available So I downloaded the whole Uniprot DB in XML format (Swiss-Prot + Trembl) and started to have some fun with it ! www.ohnosequences.com www.bio4j.com
  • 33. BG7 algorithm • Selection of the specific reference protein set 1 • Prediction of possible genes by BLAST similarity 2 • Gene definition: merging compatible similarity regions, detecting start and stop 3 • Solving overlapped predicted genes 4 • RNA prediction by BLAST similarity 5 6 • Final annotation and complete deliverables. Quality control. www.era7bioinformatics.com
  • 34. We got used to having massive direct access to all this protein related information… So why not adding other resources we needed quite often in most projects and which now were becoming a sort of bottleneck compared to all those already included in Bio4j ? Then we incorporated: - Isoform sequences - Protein interactions and features - Uniref 50, 90, and 100 - RefSeq - NCBI Taxonomy - Enzyme Expasy DB www.ohnosequences.com www.bio4j.com
  • 35. Bio4j + MG7 + 48 Blast XML files (~1GB each) Some numbers: • 157 639 502 nodes • 742 615 705 relationships • 632 832 045 properties • 148 relationship types • 44 node types And it works just fine! www.ohnosequences.com www.bio4j.com
  • 37. What’s MG7? MG7 provides the possibility of choosing different parameters to fix the thresholds for filtering the BLAST hits: i. E-value ii. Identity and query coverage It allows exporting the results of the analysis to different data formats like: • XML • CSV • Gexf (Graph exchange XML format) As well as provides to the user with Heat maps and graph visualizations whilst including an user-friendly interface that allows to access to the alignment responsible for each functional or taxonomical read assignation and that displays the frequencies in the taxonomical tree --> MG7Viewer www.ohnosequences.com www.bio4j.com
  • 41. Mining Bio4j data Finding topological patterns in Protein-Protein Interaction networks www.ohnosequences.com www.bio4j.com
  • 42. Finding the lowest common ancestor of a set of NCBI taxonomy nodes with Bio4j www.ohnosequences.com www.bio4j.com
  • 43. Future directions (1) Gene flux tool New tool for bacterial comparative genomics: massive tracing of vertical and horizontal gene flux between genome elements based on the analysis of the similarity between their proteins. It would analyze similarity relationships that could be fixed to a 90% or 100% similarity threshold. Pathways tool Data from Metacyc is going to be included in Bio4j. This data would allow to dissect the metabolic pathways in which a genome element, organism or community (metagenomic samples) is involved. Gephi could be used for the representation of metabolic pathways for each of them. . www.ohnosequences.com www.bio4j.com
  • 44. Future directions (2) Detector of common annotations in gene clusters Many biological problems are related to the search of common annotations in a set of genes. Some examples: - a set of overexpressed genes - a set of proteins with local structural similarities (WIP) - a set of genes bearing SNPs in cancer samples - a set of exclusive genes in a pathogenic bacterial strain The detection of common annotations can help in the inference of important functional connections. www.ohnosequences.com www.bio4j.com
  • 45. That’s it ! Thanks for your time ;) www.ohnosequences.com www.bio4j.com