Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...Lucidworks
This document summarizes work done as part of the BioSolr project to improve indexing and querying of biomedical data using Apache Solr. It describes approaches taken to index ontologies and related data, enable federated search across multiple data sources, and develop new Solr features like xjoin for searching external data sources. The project has led to enhancements in ontology indexing, faceted browsing of ontologies, and integration of Apache Jena with Solr to allow SPARQL queries over indexed ontologies. Upcoming events are also announced to discuss experiences from the project.
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Elasticsearch and Apache Solr are both distributed search engines that provide full text search capabilities and real-time analytics on large volumes of data. The document compares their architectures, data models, query languages, and other features. Key differences include Elasticsearch having a more dynamic schema while Solr relies more on predefined schemas, and Elasticsearch natively supports features like nested objects and parent/child relationships that require additional configuration in Solr.
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
The document compares the Apache Solr and ElasticSearch search platforms. It discusses their architectures, including SolrCloud and ElasticSearch's cluster architecture. It also covers topics like indexing, querying, partial document updates, analysis chains, multilingual support, and other features. Overall, the document provides a detailed comparison of the two open source search technologies.
Building your own search engine with Apache SolrBiogeeks
Grow-your-own search engine
Solr is a search server built on Lucene that provides indexing, relevance ranking, and other search features through REST web services. It allows configuring search through XML without coding and is used by many large companies. Solr can index various data types including documents, databases, and crawled content. Queries are parsed and run against the index to return ranked search results based on factors like term frequency and inverse document frequency. Case studies show how Solr can improve search performance for databases like CATH by indexing its protein structure data.
Solr and Elasticsearch, a performance studyCharlie Hull
The document summarizes a performance comparison study conducted between Elasticsearch and SolrCloud. It found that SolrCloud was slightly faster at indexing and querying large datasets, and was able to support a significantly higher queries per second. However, the document notes limitations to the study and concludes that both Elasticsearch and SolrCloud showed acceptable performance, so the best option depends on the specific search application requirements.
OpenFlyData aims to integrate biological data sources using Semantic Web technologies. It creates reusable data sources and query services by mapping existing gene expression databases like FlyBase and BDGP to RDF. This allows for cross-database searches using SPARQL. Performance challenges include loading large datasets and case-insensitive text searches, but the system provides benefits like a uniform data model and ability to ask unanticipated queries across integrated sources.
The Royal Society of Chemistry publishes many thousands of articles per year, the majority of these containing rich chemistry data that, in general, in limited in its value when isolated only to the HTML or PDF form of the articles commonly consumed by readers. RSC also has an archive of over 300,000 articles containing rich chemistry data especially in the form of chemicals, reactions, property data and analytical spectra. RSC is developing a platform integrating these various forms of chemistry data. The data will be aggregated both during the manuscript deposition process as well as the result of text-mining and extraction of data from across the RSC archive. This presentation will report on the development of the platform including our success in extracting compounds, reactions and spectral data from articles. We will also discuss our developing process for handling data at manuscript deposition and the integration and support of eLab Notebooks (ELNS) in terms of facilitating data deposition and sourcing data. Each of these processes is intended to ensure long-term access to research data with the intention of facilitating improved discovery.
High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are of increasing interest in chemical forensics for the identification of emerging contaminants and chemical signatures of interest. At the US Environmental Protection Agency, our research using HRMS for non-targeted and suspect screening analyses utilizes databases and cheminformatics approaches that are applicable to chemical forensics. The CompTox Chemicals Dashboard is an open chemistry resource and web-based application containing data for ~760,000 substances. Basic functionality for searching through the data is provided through identifier searches, such as systematic name, trade names and CAS Registry Numbers. Advanced Search capabilities supporting mass spectrometry include mass and formula-based searches, combined substructure-mass searches and searching experimental mass spectral data against predicted fragmentation spectra. A specific type of data mapping in the underpinning database, using “MS-Ready” structures, has proven to be a valuable approach for structure identification that links structures that can be identified via HRMS with related substances in the form of salts, and other multi-component mixtures that are available in commerce. This presentation will provide an overview of the CompTox Chemicals Dashboard and demonstrate its utility for supporting structure identification and NTA in chemical forensics. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...Lucidworks
This document summarizes work done as part of the BioSolr project to improve indexing and querying of biomedical data using Apache Solr. It describes approaches taken to index ontologies and related data, enable federated search across multiple data sources, and develop new Solr features like xjoin for searching external data sources. The project has led to enhancements in ontology indexing, faceted browsing of ontologies, and integration of Apache Jena with Solr to allow SPARQL queries over indexed ontologies. Upcoming events are also announced to discuss experiences from the project.
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Elasticsearch and Apache Solr are both distributed search engines that provide full text search capabilities and real-time analytics on large volumes of data. The document compares their architectures, data models, query languages, and other features. Key differences include Elasticsearch having a more dynamic schema while Solr relies more on predefined schemas, and Elasticsearch natively supports features like nested objects and parent/child relationships that require additional configuration in Solr.
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
The document compares the Apache Solr and ElasticSearch search platforms. It discusses their architectures, including SolrCloud and ElasticSearch's cluster architecture. It also covers topics like indexing, querying, partial document updates, analysis chains, multilingual support, and other features. Overall, the document provides a detailed comparison of the two open source search technologies.
Building your own search engine with Apache SolrBiogeeks
Grow-your-own search engine
Solr is a search server built on Lucene that provides indexing, relevance ranking, and other search features through REST web services. It allows configuring search through XML without coding and is used by many large companies. Solr can index various data types including documents, databases, and crawled content. Queries are parsed and run against the index to return ranked search results based on factors like term frequency and inverse document frequency. Case studies show how Solr can improve search performance for databases like CATH by indexing its protein structure data.
Solr and Elasticsearch, a performance studyCharlie Hull
The document summarizes a performance comparison study conducted between Elasticsearch and SolrCloud. It found that SolrCloud was slightly faster at indexing and querying large datasets, and was able to support a significantly higher queries per second. However, the document notes limitations to the study and concludes that both Elasticsearch and SolrCloud showed acceptable performance, so the best option depends on the specific search application requirements.
OpenFlyData aims to integrate biological data sources using Semantic Web technologies. It creates reusable data sources and query services by mapping existing gene expression databases like FlyBase and BDGP to RDF. This allows for cross-database searches using SPARQL. Performance challenges include loading large datasets and case-insensitive text searches, but the system provides benefits like a uniform data model and ability to ask unanticipated queries across integrated sources.
The Royal Society of Chemistry publishes many thousands of articles per year, the majority of these containing rich chemistry data that, in general, in limited in its value when isolated only to the HTML or PDF form of the articles commonly consumed by readers. RSC also has an archive of over 300,000 articles containing rich chemistry data especially in the form of chemicals, reactions, property data and analytical spectra. RSC is developing a platform integrating these various forms of chemistry data. The data will be aggregated both during the manuscript deposition process as well as the result of text-mining and extraction of data from across the RSC archive. This presentation will report on the development of the platform including our success in extracting compounds, reactions and spectral data from articles. We will also discuss our developing process for handling data at manuscript deposition and the integration and support of eLab Notebooks (ELNS) in terms of facilitating data deposition and sourcing data. Each of these processes is intended to ensure long-term access to research data with the intention of facilitating improved discovery.
High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are of increasing interest in chemical forensics for the identification of emerging contaminants and chemical signatures of interest. At the US Environmental Protection Agency, our research using HRMS for non-targeted and suspect screening analyses utilizes databases and cheminformatics approaches that are applicable to chemical forensics. The CompTox Chemicals Dashboard is an open chemistry resource and web-based application containing data for ~760,000 substances. Basic functionality for searching through the data is provided through identifier searches, such as systematic name, trade names and CAS Registry Numbers. Advanced Search capabilities supporting mass spectrometry include mass and formula-based searches, combined substructure-mass searches and searching experimental mass spectral data against predicted fragmentation spectra. A specific type of data mapping in the underpinning database, using “MS-Ready” structures, has proven to be a valuable approach for structure identification that links structures that can be identified via HRMS with related substances in the form of salts, and other multi-component mixtures that are available in commerce. This presentation will provide an overview of the CompTox Chemicals Dashboard and demonstrate its utility for supporting structure identification and NTA in chemical forensics. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
1) The document describes the SOPHIA project, which aims to build altmetric networks of researchers and institutions to understand how research impacts spread in society.
2) SOPHIA collects data from Scopus and social media sources to build a heterogeneous graph network, and analyzes the network using graph metrics to measure the influence and authority of researchers and institutions.
3) The project has developed visualization and search tools to explore the altmetric networks, annotated documents, and metrics within a software prototype called SOPHIA.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Lucene is an open-source search engine library written in Java. It provides functionality for indexing, searching, and ranking documents. Key Lucene concepts include Documents, Fields, Analyzers, IndexWriters, IndexSearchers, and Queries. Documents contain Fields, which represent sections of text to index. Analyzers prepare text for indexing by performing operations like tokenization. IndexWriters create and maintain indexes, while IndexSearchers search through indexes using Query objects.
The document provides an overview of full text search and different approaches to implementing it including wild card database queries, using database-specific full text search functionality, leveraging third party search engines, and using text indexing libraries. It focuses on using Lucene, describing how to index and search text data with Lucene including the key classes, steps, and options involved. It also demonstrates Lucene functionality through code examples and mentions other search technologies that can be used beyond Lucene like Solr, Compass and ElasticSearch.
At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.
The document compares and contrasts the Apache Solr and Elasticsearch search engines. It discusses their approaches to indexing structure, configuration, discovery, querying, filtering, faceting, data handling, updates, and cluster monitoring. While both use Lucene for indexing and querying, Elasticsearch has a more dynamic schema, easier configuration changes, and integrated shard allocation controls compared to Solr's more static configuration and external Zookeeper integration.
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceUniversity of Washington
The document summarizes a system called SQLShare that aims to make SQL-based data analysis more accessible to scientists by lowering initial setup costs and providing automated tools. It has been used by 50 unique users at 4 UW campus labs on 16GB of uploaded data from various science domains like environmental science and metagenomics. The system provides data uploading, query sharing, automatic English-to-SQL translation, and personalized query recommendations to lower barriers to working with relational databases for analysis.
Elasticsearch is a distributed, open source search and analytics engine. It allows storing and searching of documents of any schema in real-time. Documents are organized into indices which can contain multiple types of documents. Indices are partitioned into shards and replicas to allow horizontal scaling and high availability. The document consists of a JSON object which is indexed and can be queried using a RESTful API.
Annotopia open annotation services platformTim Clark
Annotopia is an open-access, open-source, open annotation services platform developed for scientific annotation of documents and datasets on the web using the W3C Open Annotation model http://www.openannotation.org/spec/core/.
Using Annotopia, virtually any client application including lightweight web clients, can create, selectively share, and access annotation of web documents and data. This can be done regardless of the ownership of the base objects being annotated.
Annotopia supports unstructured, semi-structured and fully-structured (semantic) annotation; manual and automated (textmining) annotation; permissions, groups, and sharing. It also provides access to specialized vocabulary and text analytics services.
Annotopia is an open source platform licensed under Apache 2.0.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
Guided tutorial of the Neuroscience Information FrameworkMaryann Martone
A guided tutorial showing how to use the Neuroscience Information Framework to find data and tools related to the genetics of addiction. Presented at the Genetics of Addiction Workshop, Jackson Labs, Aug 28-Sept 1, 2014.
Clustering the royal society of chemistry chemical repository to enable enhan...Valery Tkachenko
The Royal Society of Chemistry has hosted the ChemSpider database and associated platforms for over five years. Technologies made significant progress over that period but, more importantly, the community needs in terms of the variety of data types as well as search performance have increased. The preprocessing of chemicals for improved similarity searching and compound database navigation is seen as one crucial component of major development efforts to architect a new data repository. This component is engineered and implemented in collaboration with the group of Professor Oliver Kohlbacher at University of Tübingen. They have developed an approach for clustering large chemical libraries based on a fast, parallel, and purely CPU-based algorithm for 2D binary fingerprint similarity calculation. Using this method, the complete similarity network of our seed set with tens of millions of chemicals has been analyzed at a Tanimoto threshold of 0.6 and all similarity links were fed into our database. The latter is highly beneficial and will allow us to create more complex and enriching visualizations of similar compounds with associated bioactivity data and physicochemical properties for the RSC chemical repository users. This presentation will provide an overview of our experiences in applying clustering to our compound data and how it will be used to enrich data navigation on the RSC data repository.
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationStuart Chalk
Integration of the combined JSmol/JSpecView molecular viewer/spectral viewer software in the Eureka Research Workbench. Can display molecular structures, spectra and the linked version where clicking on a peak shows molecular movement (IR).
Building a Standard for Standards: The ChAMP ProjectStuart Chalk
The document describes the ChAMP project, which aims to develop a standard set of metadata for representing and annotating chemical analysis information. It discusses the motivation being to facilitate searching and aggregation of analysis data across literature, publications and repositories. The standard would define important characteristics and metadata about analysis methodologies in a way that is easy for users to implement across disciplines. It outlines the key components including developing an ontology of chemical analysis terms and minimum metadata requirements for annotation.
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataLeighton Pritchard
Presentation on use of Galaxy for plant pathology bioinformatics, presented by Peter Cock, at the Genomics for Non-Model Organisms workshop, ISMB/ECCB, Vienna, Austria, 19 July 2011
247th ACS Meeting: The Eureka Research WorkbenchStuart Chalk
Academic scientists need a tool to capture the science they do so that it can be shared in open science, integrated with linked data, and shared/searched. Eureka is an evolving platform to do this.
This document discusses Java annotations and provides examples of their use. It explains that annotations can be used on classes, fields, methods, packages, variables, and types to provide information to compilers, for documentation purposes, code generation, and runtime processing. It also describes the @Target and @Retention meta-annotations that specify where annotations can be applied and whether they are available at compile time, class time, or runtime. The document gives examples of built-in annotations like @Deprecated and @Override as well as custom annotations.
This document summarizes ChemSpider, a database containing over 28.5 million unique chemicals from over 400 data sources. ChemSpider aims to improve data quality and functionality by integrating data and enabling other platforms. It receives around 30,000 daily visits and over 500,000 unique monthly visitors. ChemSpider provides programmatic APIs and widgets to embed content. It also supports the semantic web by providing data in RDF format. ChemSpider works to support various audiences and natural products research. It contributes chemistry services, user interfaces, and acts as a "quality police" by checking data. ChemSpider also integrates with other projects and aims to build a chemical registration system and community data repository.
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
BioSolr, funded by the BBSRC, is a collaboration between open source search experts Flax and the European Bioinformatics Institute (EBI), aiming to significantly advance the state of the art with regard to indexing and querying biomedical data with freely available open source software
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
1) The document describes the SOPHIA project, which aims to build altmetric networks of researchers and institutions to understand how research impacts spread in society.
2) SOPHIA collects data from Scopus and social media sources to build a heterogeneous graph network, and analyzes the network using graph metrics to measure the influence and authority of researchers and institutions.
3) The project has developed visualization and search tools to explore the altmetric networks, annotated documents, and metrics within a software prototype called SOPHIA.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Lucene is an open-source search engine library written in Java. It provides functionality for indexing, searching, and ranking documents. Key Lucene concepts include Documents, Fields, Analyzers, IndexWriters, IndexSearchers, and Queries. Documents contain Fields, which represent sections of text to index. Analyzers prepare text for indexing by performing operations like tokenization. IndexWriters create and maintain indexes, while IndexSearchers search through indexes using Query objects.
The document provides an overview of full text search and different approaches to implementing it including wild card database queries, using database-specific full text search functionality, leveraging third party search engines, and using text indexing libraries. It focuses on using Lucene, describing how to index and search text data with Lucene including the key classes, steps, and options involved. It also demonstrates Lucene functionality through code examples and mentions other search technologies that can be used beyond Lucene like Solr, Compass and ElasticSearch.
At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.
The document compares and contrasts the Apache Solr and Elasticsearch search engines. It discusses their approaches to indexing structure, configuration, discovery, querying, filtering, faceting, data handling, updates, and cluster monitoring. While both use Lucene for indexing and querying, Elasticsearch has a more dynamic schema, easier configuration changes, and integrated shard allocation controls compared to Solr's more static configuration and external Zookeeper integration.
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceUniversity of Washington
The document summarizes a system called SQLShare that aims to make SQL-based data analysis more accessible to scientists by lowering initial setup costs and providing automated tools. It has been used by 50 unique users at 4 UW campus labs on 16GB of uploaded data from various science domains like environmental science and metagenomics. The system provides data uploading, query sharing, automatic English-to-SQL translation, and personalized query recommendations to lower barriers to working with relational databases for analysis.
Elasticsearch is a distributed, open source search and analytics engine. It allows storing and searching of documents of any schema in real-time. Documents are organized into indices which can contain multiple types of documents. Indices are partitioned into shards and replicas to allow horizontal scaling and high availability. The document consists of a JSON object which is indexed and can be queried using a RESTful API.
Annotopia open annotation services platformTim Clark
Annotopia is an open-access, open-source, open annotation services platform developed for scientific annotation of documents and datasets on the web using the W3C Open Annotation model http://www.openannotation.org/spec/core/.
Using Annotopia, virtually any client application including lightweight web clients, can create, selectively share, and access annotation of web documents and data. This can be done regardless of the ownership of the base objects being annotated.
Annotopia supports unstructured, semi-structured and fully-structured (semantic) annotation; manual and automated (textmining) annotation; permissions, groups, and sharing. It also provides access to specialized vocabulary and text analytics services.
Annotopia is an open source platform licensed under Apache 2.0.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
Guided tutorial of the Neuroscience Information FrameworkMaryann Martone
A guided tutorial showing how to use the Neuroscience Information Framework to find data and tools related to the genetics of addiction. Presented at the Genetics of Addiction Workshop, Jackson Labs, Aug 28-Sept 1, 2014.
Clustering the royal society of chemistry chemical repository to enable enhan...Valery Tkachenko
The Royal Society of Chemistry has hosted the ChemSpider database and associated platforms for over five years. Technologies made significant progress over that period but, more importantly, the community needs in terms of the variety of data types as well as search performance have increased. The preprocessing of chemicals for improved similarity searching and compound database navigation is seen as one crucial component of major development efforts to architect a new data repository. This component is engineered and implemented in collaboration with the group of Professor Oliver Kohlbacher at University of Tübingen. They have developed an approach for clustering large chemical libraries based on a fast, parallel, and purely CPU-based algorithm for 2D binary fingerprint similarity calculation. Using this method, the complete similarity network of our seed set with tens of millions of chemicals has been analyzed at a Tanimoto threshold of 0.6 and all similarity links were fed into our database. The latter is highly beneficial and will allow us to create more complex and enriching visualizations of similar compounds with associated bioactivity data and physicochemical properties for the RSC chemical repository users. This presentation will provide an overview of our experiences in applying clustering to our compound data and how it will be used to enrich data navigation on the RSC data repository.
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationStuart Chalk
Integration of the combined JSmol/JSpecView molecular viewer/spectral viewer software in the Eureka Research Workbench. Can display molecular structures, spectra and the linked version where clicking on a peak shows molecular movement (IR).
Building a Standard for Standards: The ChAMP ProjectStuart Chalk
The document describes the ChAMP project, which aims to develop a standard set of metadata for representing and annotating chemical analysis information. It discusses the motivation being to facilitate searching and aggregation of analysis data across literature, publications and repositories. The standard would define important characteristics and metadata about analysis methodologies in a way that is easy for users to implement across disciplines. It outlines the key components including developing an ontology of chemical analysis terms and minimum metadata requirements for annotation.
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataLeighton Pritchard
Presentation on use of Galaxy for plant pathology bioinformatics, presented by Peter Cock, at the Genomics for Non-Model Organisms workshop, ISMB/ECCB, Vienna, Austria, 19 July 2011
247th ACS Meeting: The Eureka Research WorkbenchStuart Chalk
Academic scientists need a tool to capture the science they do so that it can be shared in open science, integrated with linked data, and shared/searched. Eureka is an evolving platform to do this.
This document discusses Java annotations and provides examples of their use. It explains that annotations can be used on classes, fields, methods, packages, variables, and types to provide information to compilers, for documentation purposes, code generation, and runtime processing. It also describes the @Target and @Retention meta-annotations that specify where annotations can be applied and whether they are available at compile time, class time, or runtime. The document gives examples of built-in annotations like @Deprecated and @Override as well as custom annotations.
This document summarizes ChemSpider, a database containing over 28.5 million unique chemicals from over 400 data sources. ChemSpider aims to improve data quality and functionality by integrating data and enabling other platforms. It receives around 30,000 daily visits and over 500,000 unique monthly visitors. ChemSpider provides programmatic APIs and widgets to embed content. It also supports the semantic web by providing data in RDF format. ChemSpider works to support various audiences and natural products research. It contributes chemistry services, user interfaces, and acts as a "quality police" by checking data. ChemSpider also integrates with other projects and aims to build a chemical registration system and community data repository.
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
BioSolr, funded by the BBSRC, is a collaboration between open source search experts Flax and the European Bioinformatics Institute (EBI), aiming to significantly advance the state of the art with regard to indexing and querying biomedical data with freely available open source software
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
George Bailey and Cameron Baker of Rackspace presented their solution for indexing over 50,000 documents per second for Rackspace Email. They modernized their system using Apache Flume for event processing and aggregation and SolrCloud for real-time search. This reduced indexing time from over 20 minutes to under 5 seconds, reduced the number of physical servers needed from over 100 to 14, and increased indexing throughput from 1,000 to over 50,000 documents per second while supporting over 13 billion searchable documents.
Tutorial on developing a Solr search component pluginsearchbox-com
In this set of slides we give a step by step tutorial on how to develop a fully functional solr search component plugin. Additionally we provide links to full source code which can be used as a template to rapidly start creating your own search components.
Solr Graph Query: Presented by Kevin Watters, KMW TechnologyLucidworks
This document provides an overview of Solr Graph Query presented by Kevin Waders of KMW Technology at a conference in Boston from October 11-14, 2016. Solr Graph Query allows for traversing relationships between documents stored in Solr through nodes and edges. It implements a breadth-first search algorithm to fully explore relationships within the graph. Key features include support for large graphs, limited memory usage, and integration with other Solr components. Graph queries can be used for security applications to model hierarchical relationships.
Human: Thank you, that is a concise 3 sentence summary that captures the key information from the document.
LOD , Linked Open Data 에 대해 구축 절차 및 도구, 사례에 대한 자료 입니다. LOD는 공공 데이터를 제공, 공유, 재활용하기 위한 또 하나의 방법이며 오픈 데이터(Open Data) 를 위한 하나의 방법으로 웹을 기반으로 데이터를 공유하여 재활용하고자 방법이며 기술이고 데이터입니다.
The document discusses MIREOT (Minimal information to reference external ontology terms), an approach used by the Ontology for Biomedical Investigations (OBI) project to import terms from external ontologies. It describes three approaches to importing terms - creating duplicate terms, importing modules, and full imports. It proposes importing only the classes needed using a minimal set of information to unambiguously identify terms from external ontologies. This process has been implemented in OBI and an online tool called OntoFox has been developed to facilitate the MIREOT process.
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...dolleyj
The Evidence & Conclusion Ontology (ECO) has been developed to provide standardized descriptions for types of evidence within the biological domain. Best
practices in biocuration require that when a biological assertion is made (e.g. linking a Gene Ontology (GO) term for a molecular function to a protein), the type of evidence
supporting it is captured. In recent development efforts, we have been working with other ontology groups to ensure that ECO classes exist for the types of curation they
support. These include the Ontology for Microbial Phenotypes and GO. In addition, we continue to support user-level class requests through our GitHub issue tracker. To
facilitate the addition and maintenance of new classes, we utilize ROBOT (a command line tool for working with Open Biomedical Ontologies) as part of our standard workflow.
ROBOT templates allow us to define classes in a spreadsheet and convert them to Web Ontology Language (OWL) axioms, which can then be merged into ECO. ROBOT is
also part of our automated release process. Additionally, we are engaged in ongoing work to map ECO classes to Ontology for Biomedical Investigation classes using logical
definitions. ECO is currently in use by dozens of groups engaged in biological curation and the number of ECO users continues to grow. The ontology, in OWL and Open
Biomedical Ontology (OBO) formats, and associated resources can be accessed through our GitHub site (https://github.com/evidenceontology/evidenceontology) as well as
the ECO web page (http://evidenceontology.org/).
1) The document discusses EBI's efforts to facilitate semantic alignment of its resources through building ontologies and annotating data with ontologies.
2) It describes EBI's work developing ontologies like the Experiment Factor Ontology and using ontologies to enhance search, data visualization, and data integration.
3) The challenges of representing EBI data in RDF are discussed, and future directions are outlined that could make RDF deployment simpler and enable more interesting queries over EBI data.
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
Tony Burdett's slides from his talk at Connected Data London. Tony is a Senior Software Engineer at The European Bioinformatics Institute. He presented the complexity of data at the EMBL-EBI and what is their solution to make sense of all this data.
The document discusses research objects (ROs), which aim to make research more reproducible, reusable, and shareable. It describes the types and components of ROs, including aggregation, identity metadata, and lifecycle information. An example is given of how one researcher's workflow was reused and repurposed by another researcher to identify biological pathways. The role of myExperiment is discussed in modeling and sharing ROs according to the Open Archives Initiative Object Reuse and Exchange specification. Next steps include finalizing the Research Object Upper Model specification and defining domain-specific schemas.
The document discusses how bio-ontologies and natural language processing can enable open science by facilitating structured knowledge representation and collaborative curation. It describes services provided by the National Center for Biomedical Ontology (NCBO) that allow use of ontologies for annotation, data aggregation, and accelerating the curation process. Several groups are highlighted that utilize NCBO services for applications such as clinical trial matching, specimen banking, and data summarization.
Talk given at the symposium about government-funded databases and open chemistry at the national meeting of the American Chemical Society in Washington, 21 Aug 2017
Jean-Claude Bradley presents on "Peer Review and Science2.0: blogs, wikis and social networking sites" as a guest lecturer for the “Peer Review Culture in Scholarly Publication and Grantmaking” course at Drexel University. The main thrust of the presentation is that peer review alone is not capable of coping with the increasing flood of scientific information being generated and shared. Arguments are made to show that providing sufficient proof for scientific findings does scale and weakens the tragedy of the trusted source cascade.
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...Cyndy Parr
The document summarizes the Encyclopedia of Life (EOL) project, which aims to create a webpage for every known species. It discusses how EOL works by crowdsourcing content from over 240 providers and harvesting data from third party applications. EOL currently has pages for over 1.1 million species and sees 3 million unique visitors annually. The document outlines ongoing efforts to make EOL's large volume of species data more computable through linking data to external ontologies, promoting text mining and crowdsourcing of data, and developing infrastructure for standardized access and analysis of species interaction networks and trait information.
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
This is a thesis presentation about interlinking educational data to Web of Data. I explain how I used the Linked Data approach to expose and interlink educational data to the Linked Open Data cloud
Ontologies and Semantic Web technologies play an important role in the life sciences to help make data more interoperable and reusable. There are now many publicly available ontologies that enable biologists to describe everything from gene function through to animal physiology and disease.
Various efforts such as the Open Biomedical Ontologies (OBO) foundry provide central registries for biomedical ontologies and ensure they remain interoperable through a set of common shared development principles.
At EMBL-EBI we contribute to the development of biomedical ontologies and make extensive use of them in the annotation of public datasets. Biological data typically comes with rich and often complex metadata, so the ontologies provide a standard way to capture “what the data is about” and gives us hooks to connect to more data about similar things.
These ontology annotations have been put to good use in a number of large-scale data integration efforts and there’s an increasing recognition of the need for ontologies in making data FAIR (Findable, Accessible, Interoperable and Reusable).
EMBL-EBI build a number of integrative data platforms where ontologies are at the core of our domain models. One example is the Open Targets platform, where data about disease from 18 different databases can be aggregated and grouped based on therapeutic areas in the ontology and used to identify potential drug targets.
The ontologies team at EMBL-EBI provide a suite of services that are aimed at making ontologies more accessible for both humans and machines. We work with scientific data curators and software developers to integrate ontologies and semantics into both the data generation and data presentation workflows. We provide:
– An ontology lookup service (OLS) that provides search and visualisation services to over 200+ ontologies
– Services for automating the annotation of metadata and learning from previous annotations (Zooma)
– An ontology mapping and alignment service (OXO)
– Tools for working with metadata and ontologies in spreadsheets (Webulous)
– Software for enriching documents in search engines to support “semantic” query expansion
I’ll present how we are using these services at EMBL-EBI to scale up the semantic annotation of metadata. I’ll talk about our open source technology stack and describe how we utilise a polyglot persistence approach (graph databases, triples stores, document stores etc) to optimize how we deliver ontologies and semantics to our users.
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
In this webinar Dr Henriette Harmse from EMBL-EBI presents how they are using their ontology services at EMBL-EBI to scale up the annotation of data and deliver added value through ontologies and semantics to their users.
EOL aggregates scientific data from various databases about all species globally to provide summaries for various audiences including enthusiasts, learners, citizen scientists, and scientists. It utilizes crowd-sourcing to improve data quality and provide computable data for research through features like collections, APIs, and challenges. Future enhancements aim to further enhance EOL's capabilities for scientific research.
Annotation of SBML Models Through Rule-Based Semantic IntegrationAllyson Lister
This talk was given on June 28, 2009 at the Bio-Ontologies SIG as part of ISMB/ECCB 2009. You can download the paper this presentation is about from http://hdl.handle.net/10101/npre.2009.3286.1. More information on the ISMB conference is available at http://www.iscb.org/ismbeccb2009/ and http://friendfeed.com/ismbeccb2009
Elsevier aims to construct knowledge graphs to help address challenges in research and medicine. Knowledge graphs link entities like people, concepts, and events to provide answers. Elsevier analyzes text and data to build knowledge graphs using techniques like information extraction, machine learning, and predictive modeling. Their knowledge graph integrates data from publications, clinical records, and other sources to power applications that help researchers, medical professionals, and patients. Knowledge graphs are a critical component for delivering value, especially as data volumes and needs accelerate.
This document discusses how ontologies can be used to do biology. It describes how ontologies allow biological data and knowledge to be shared and integrated by providing common definitions and vocabularies. It also discusses how ontologies can enable new discoveries by revealing unexpected connections between different data sources and facilitating automated reasoning. While ontologies help biologists find new things, real biological insights still require human analysis and experimentation. The document uses examples from kidney and urinary system research to illustrate how ontologies are built and applied in bioinformatics.
ONTO-Toolkit is a collection of tools within the Galaxy framework that enables bio-ontology engineering using OBO file format ontologies. It includes wrappers for functions from the ONTO-PERL API to retrieve ontology terms and substructures. Two use cases are demonstrated: 1) identifying common ancestor terms between two molecular functions, and 2) finding the intersection between sub-ontologies for two biological processes to investigate overlap. The toolkit provides rich ontology-driven solutions for biologists within Galaxy.
A talk given at the Semantic Reasoning workshop held at the National Museum of Natural History September 6, 2012. The audience included computer scientists and biological scientists interested in using EOL for their research.
Data integration is intrinsic to how modern research is undertaken in areas such as genomics, drug development and personalised medicine. To better enable this integration a large number of biomedical ontologies have been developed to provide standard semantics for describing metadata. There are now several hundred biomedical ontologies in widespread use that describe concepts such as genes, molecules, drugs and diseases. This amounts to millions of terms that are interconnected via relationships that naturally form a graph of biomedical terminology.
The Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ols) integrates over 160 ontologies and provide a central point for the biomedical community to query and visualise ontologies. OLS also provide a RESTful API over the ontologies that is used in high-throughput data annotation pipelines. OLS is built on top of a Neo4j database that provides efficient indexes for extracting ontological relationships. We have developed generic tools for loading RDF/OWL ontologies into Neo4j where the indexes are optimised for serving common ontology queries. We are now moving to adopt graph database more widely in applications relating to ontology mapping prediction and recommendation systems for data annotation.
Lucene, Solr and java 9 - opportunities and challengesCharlie Hull
Apache Lucene and Solr needed to be updated to work with Java 9's new module system. This introduced challenges around strong encapsulation and reflective access. The talk discussed changes like compact strings and performance improvements from intrinsics and the G1 garbage collector. It also recommended using multi-release JARs to include Java 9 specific implementations of utils classes for compatibility. Migrating to Java 9 could improve security and performance in some cases for Elasticsearch users.
Finding the Bad Actor: Custom scoring & forensic name matching with Elastics...Charlie Hull
How we extended Lucene's SpanQuery and developed a new Elasticsearch query to allow Arachnys to search for names and adverse terms; also how we replicated the relevance scores used by a commercial service.
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...Charlie Hull
Infomedia upgraded their closed-source search engine to Apache Solr, an open-source platform. They worked with Flax to define their own query language, replace Verity with Flax Monitor which uses Apache Lucene, and replace Autonomy IDOL with Apache Solr. This provided benefits like faster indexing, a smarter monitoring solution, and control over their own query language. While challenging, the project was ultimately successful and allowed Infomedia to modernize their search capabilities.
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...Charlie Hull
The document discusses the future of search and analytics using streams of data from sources like the Internet of Things. It describes how search technologies can be used to process real-time streams of data by indexing the streams and querying them similar to how searches are currently done on stored data. Examples of searching streams are given, such as searching incoming news stories against stored search profiles to identify matches.
Turning search upside down with powerful open source search softwareCharlie Hull
Turning Search Upside Down - how Flax works with media monitoring companies to build powerful and scalable 'inverted search' systems, applying hundreds of thousands of stored queries to millions of documents in real time. Features Apache Lucene/Solr as a replacement for Autonomy IDOL and our Luwak library as a replacement for Autonomy Verity.
The document summarizes the implementation of an open source search solution for a company with over 12 million documents. It discusses three plans for implementing access control to filter search results based on user permissions: 1) storing permissions with each document and filtering at search time, 2) checking permissions directly from the file server at search time, and 3) iterating user permissions at index time and storing readable documents as search terms to filter as booleans at search time. The third plan provided fast access control while only causing up to one day of indexing lag.
See some common myths, discover the various open source enterprise search packages available and see some case studies on how open source software has helped organisations build effective search.
This document discusses building news search systems using open source technologies. It describes indexing news content at high volumes, searching with filters and facets, and ensuring systems can scale as content grows. Examples given include the NLA Clipshare system with 20 million news stories and the Financial Times press cuttings search web service. Monitoring news also requires non-traditional search to reflect complex client needs.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Bio solr building a better search for bioinformatics
1. Tom Winch & Matt Pearce
21st
April 2015
charlie@flax.co.uk
www.flax.co.uk/blog
+44 (0) 8700 118334
Twitter: @FlaxSearch
BioSolr
building a better search for bioinformatics
2. The European Bioinformatics Institute
Part of the European Molecular Biology Laboratory
Based on the Wellcome Genome Campus in Hinxton,
Cambridge
Maintains the world’s most comprehensive range of freely
available and up-to-date molecular databases, serving millions
of researchers – indexing over 1 billion items
BioSolr project involves two teams from EMBL-EBI:
Protein Data Bank in Europe (PDBe)
Samples, Phenotypes and Ontologies (SPOt)
3. The genesis of BioSolr
Grant Ingersoll visits the Wellcome Campus in July '13
Around 90 people attend
Show of hands indicates 75% using Lucene/Solr
Sameer Velankar of EMBL-EBI identifies grant funding
Flax and EMBL-EBI apply successfully to the BBSRC
4. BioSolr
One year BBSRC funded project from September 2014
“to significantly advance the state of the art with
regard to indexing and querying biomedical data with freely
available open source software”
Outputs:
– Workshops
– Papers & presentations
– Software (Open source of course!)
– Documentation
Inputs: from the PDBe & SPOt teams
5. BioSolr
Tom Winch
– Working on site with Sameer Velankar & the PDBe team
– Facet.contains & Xjoin
Matt Pearce
– Working on site with Tony Burdett & the SPOt team
– Indexing ontologies
6. BioSolr & PDBe - Introduction
Protein Data Bank (PDBe)
facet.contains – autosuggest
https://issues.apache.org/jira/browse/SOLR-1387
In Solr 5.1
DNA sequence similarity
7. BioSolr & PDBe – Xjoin concepts
The problem - sequences come from a live source
Joining with data from an external source
Custom SOLR code
9. BioSolr & PDBe – What next?
SOLR contrib – SOLR-7341
https://issues.apache.org/jira/browse/SOLR-7341
Joining from multiple external sources
Federated search
10. Washington, N. & Lewis, S. (2008) Ontologies: Scientific
Data Sharing Made Easy. Nature Education 1(3):5
BioSolr & SPOt – Indexing Ontologies
11. Indexing Ontologies - the problem
You have a collection of documents annotated with ontology
references.
You want to search both the documents and the associated
ontology data.
This may include associated nodes – “has location”, “is
part of”, etc.
Faceting by ontology reference would be nice!
12. Approach 1
– Keep the data separate
documents
Documents
Indexer
Documents
Indexer
ontology
Ontology
Indexer
13. Approach 1 - steps
Index the documents, with the node annotations, but no
further detail.
Index the ontology in its own core.
Search the documents, then cross-match against the
ontology.
BUT - Requires multiple calls, doesn't allow
searching both cores at the same time.
14. Approach 2
• Add some ontology data to your documents.
Documents
Indexer Ontology
documents
15. Approach 2 – step 1
Index node references, plus their labels and synonyms.
Easier to include the ontology references in your search.
Can boost fields over others.
16. Approach 2 – step 2
Expand the ontology data being stored.
Include single (or multi)-level parent and child nodes, with
labels.
Use dynamic fields to store additional relationships.
Dynamic fields allow searches across specific relation types.
BUT Requires some additional Solr look-ups to be fully
dynamic.
21. Adding Apache Jena
To allow SPARQL queries, we use Apache Jena to provide
TDB-querying.
Jena uses Solr to search label fields.
Uses its own Triple Store for other fields.
Need to include reference URI in returned fields.
22. Integrating Jena results
Returned Jena data needs to be cross-matched against
document index.
Use a filter query to choose the matching documents.
24. Summary so far
We can search documents and ontology data with a single call
to Solr.
We can dynamically search over additional related ontology
nodes.
We can use SPARQL to search.
Can facet on individual ontology annotations...but we still can't
present the facets in a tree.
https://github.com/flaxsearch/BioSolr/tree/master/spot
25. The ultimate goal
A generic ontology indexer using Solr.
Multiple ontologies stored in the same index.
Unique integer keys for each node, allowing cross-
matching from document indexes.
Optional customisation, allowing for additional lookups or
data manipulation.
26. BioSolr conclusions
Final workshop at EMBL-EBI in September
https://github.com/flaxsearch/BioSolr
Investigating funding to continue the project
– We have some ideas around federated Solr search...