Some considerations on using the two systems to manage molecular biology knowledge networks. This comes from: https://github.com/marco-brandizi/odx_neo4j_converter_test
Generics are one of the most complex features of Java. They are often poorly understood and lead to confusing errors. Unfortunately, it won’t get easier. Java 10, release planned for 2018, extends Generics. It’s now time to understand generics or risk being left behind.
We start by stepping back into the halcyon days of 2004 and explain why generics were introduced in the first place back. We also explain why Java’s implementation is unique compared to similar features in other programming languages.
Then we travel to the present to explaining how to make effective use of Generics. We then explore various entertaining code examples and puzzlers of how Generics are used today.
Finally, this talk sheds light on the planned changes in Java 10 with practical code examples and related ideas from other programming languages. If you ever wanted to understand the buzz around declaration site variance now is your chance!
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
Vladimir Alexiev, Dimitar Manov, Jana Parvanova and Svetoslav Petrov. In proceedings of workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, 26 Sep 2013, Valetta, Malta
Since its first public release over two decades ago, the Java platform has become ubiquitous in virtually all areas of computing; from embedded systems to enterprise applications. As the demands of modern software development have evolved, so too have programming languages. There are many languages on the Java platform designed to be a “better Java”; though none are as widespread as Java itself. Despite the somewhat conservative nature of Java as a language, there have been a number of notable additions designed to address the challenges developers face without breaking compatibility. This presentation aims to provide an overview of the newer language features and APIs in recent releases of Java in the hope of increasing developer productivity.
Generics are one of the most complex features of Java. They are often poorly understood and lead to confusing errors. Unfortunately, it won’t get easier. Java 10, release planned for 2018, extends Generics. It’s now time to understand generics or risk being left behind.
We start by stepping back into the halcyon days of 2004 and explain why generics were introduced in the first place back. We also explain why Java’s implementation is unique compared to similar features in other programming languages.
Then we travel to the present to explaining how to make effective use of Generics. We then explore various entertaining code examples and puzzlers of how Generics are used today.
Finally, this talk sheds light on the planned changes in Java 10 with practical code examples and related ideas from other programming languages. If you ever wanted to understand the buzz around declaration site variance now is your chance!
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
Vladimir Alexiev, Dimitar Manov, Jana Parvanova and Svetoslav Petrov. In proceedings of workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, 26 Sep 2013, Valetta, Malta
Since its first public release over two decades ago, the Java platform has become ubiquitous in virtually all areas of computing; from embedded systems to enterprise applications. As the demands of modern software development have evolved, so too have programming languages. There are many languages on the Java platform designed to be a “better Java”; though none are as widespread as Java itself. Despite the somewhat conservative nature of Java as a language, there have been a number of notable additions designed to address the challenges developers face without breaking compatibility. This presentation aims to provide an overview of the newer language features and APIs in recent releases of Java in the hope of increasing developer productivity.
This presentation about Python Interview Questions will help you crack your next Python interview with ease. The video includes interview questions on Numbers, lists, tuples, arrays, functions, regular expressions, strings, and files. We also look into concepts such as multithreading, deep copy, and shallow copy, pickling and unpickling. This video also covers Python libraries such as matplotlib, pandas, numpy,scikit and the programming paradigms followed by Python. It also covers Python library interview questions, libraries such as matplotlib, pandas, numpy and scikit. This video is ideal for both beginners as well as experienced professionals who are appearing for Python programming job interviews. Learn what are the most important Python interview questions and answers and know what will set you apart in the interview process.
Simplilearn’s Python Training Course is an all-inclusive program that will introduce you to the Python development language and expose you to the essentials of object-oriented programming, web development with Django and game development. Python has surpassed Java as the top language used to introduce U.S. students to programming and computer science. This course will give you hands-on development experience and prepare you for a career as a professional Python programmer.
What is this course about?
The All-in-One Python course enables you to become a professional Python programmer. Any aspiring programmer can learn Python from the basics and go on to master web development & game development in Python. Gain hands on experience creating a flappy bird game clone & website functionalities in Python.
What are the course objectives?
By the end of this online Python training course, you will be able to:
1. Internalize the concepts & constructs of Python
2. Learn to create your own Python programs
3. Master Python Django & advanced web development in Python
4. Master PyGame & game development in Python
5. Create a flappy bird game clone
The Python training course is recommended for:
1. Any aspiring programmer can take up this bundle to master Python
2. Any aspiring web developer or game developer can take up this bundle to meet their training needs
Learn more at https://www.simplilearn.com/mobile-and-software-development/python-development-training
Babar: Knowledge Recognition, Extraction and RepresentationPierre de Lacaze
Babar is a research project in the field of Artificial Intelligence. It aims to bridge together Neural AI and Symbolic AI. As such it is implemented in three different programming languages: Clojure, Python and CLOS.
The Clojure component (Clobar) implements the graphical user interface to Babar. Examples of the Clojure Hiccup library and interfacing Clojure to Javascript will be presented. The Python module (Pybar) implements the web crawling and scraping and the Neural Networks aspect of Babar. The Word Embedding and and LSTM (Long Short-Term Memory) components of Pybar will be described in detail. Finally the Common Lisp module (Lispbar) implements the Symbolic AI aspect of Babar. This latter includes an English Language Parser and Semantic Networks implemented as an in-memory Hypergraph.
We will present each of these components and target individual aspects with code examples. Specifically we will first present the web developments and Neural Networks components. Then the English Language parser will be examined in detail. We will also present the knowledge extraction aspect and bridge this with the Neural Network component.
Ultimately we will argue what can be termed "Neural AI" and "Symbolic AI" are at not at odds with each other but rather complement each other. In summary Artificial Intelligence is not a question of "brain" or "mind", but rather a question of "brain" and "mind".
Tackling repetitive tasks with serial or parallel programming in RLun-Hsien Chang
R programming basics
● Syntax forms, data structure, vector, elapsed time
Serial computing
● For loop, vectorised functions, *apply() functions
Parallel computing
● The doParallel, parallel, foreach package
Compare time performance in serial and parallel computing
Table of contents
Download R script from my Google drive:
20190828_R-user-group_string-manipulation.R
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Subsetting files through their names or paths
● Subsetting groups
Summary
This presentation about Python Interview Questions will help you crack your next Python interview with ease. The video includes interview questions on Numbers, lists, tuples, arrays, functions, regular expressions, strings, and files. We also look into concepts such as multithreading, deep copy, and shallow copy, pickling and unpickling. This video also covers Python libraries such as matplotlib, pandas, numpy,scikit and the programming paradigms followed by Python. It also covers Python library interview questions, libraries such as matplotlib, pandas, numpy and scikit. This video is ideal for both beginners as well as experienced professionals who are appearing for Python programming job interviews. Learn what are the most important Python interview questions and answers and know what will set you apart in the interview process.
Simplilearn’s Python Training Course is an all-inclusive program that will introduce you to the Python development language and expose you to the essentials of object-oriented programming, web development with Django and game development. Python has surpassed Java as the top language used to introduce U.S. students to programming and computer science. This course will give you hands-on development experience and prepare you for a career as a professional Python programmer.
What is this course about?
The All-in-One Python course enables you to become a professional Python programmer. Any aspiring programmer can learn Python from the basics and go on to master web development & game development in Python. Gain hands on experience creating a flappy bird game clone & website functionalities in Python.
What are the course objectives?
By the end of this online Python training course, you will be able to:
1. Internalize the concepts & constructs of Python
2. Learn to create your own Python programs
3. Master Python Django & advanced web development in Python
4. Master PyGame & game development in Python
5. Create a flappy bird game clone
The Python training course is recommended for:
1. Any aspiring programmer can take up this bundle to master Python
2. Any aspiring web developer or game developer can take up this bundle to meet their training needs
Learn more at https://www.simplilearn.com/mobile-and-software-development/python-development-training
Babar: Knowledge Recognition, Extraction and RepresentationPierre de Lacaze
Babar is a research project in the field of Artificial Intelligence. It aims to bridge together Neural AI and Symbolic AI. As such it is implemented in three different programming languages: Clojure, Python and CLOS.
The Clojure component (Clobar) implements the graphical user interface to Babar. Examples of the Clojure Hiccup library and interfacing Clojure to Javascript will be presented. The Python module (Pybar) implements the web crawling and scraping and the Neural Networks aspect of Babar. The Word Embedding and and LSTM (Long Short-Term Memory) components of Pybar will be described in detail. Finally the Common Lisp module (Lispbar) implements the Symbolic AI aspect of Babar. This latter includes an English Language Parser and Semantic Networks implemented as an in-memory Hypergraph.
We will present each of these components and target individual aspects with code examples. Specifically we will first present the web developments and Neural Networks components. Then the English Language parser will be examined in detail. We will also present the knowledge extraction aspect and bridge this with the Neural Network component.
Ultimately we will argue what can be termed "Neural AI" and "Symbolic AI" are at not at odds with each other but rather complement each other. In summary Artificial Intelligence is not a question of "brain" or "mind", but rather a question of "brain" and "mind".
Tackling repetitive tasks with serial or parallel programming in RLun-Hsien Chang
R programming basics
● Syntax forms, data structure, vector, elapsed time
Serial computing
● For loop, vectorised functions, *apply() functions
Parallel computing
● The doParallel, parallel, foreach package
Compare time performance in serial and parallel computing
Table of contents
Download R script from my Google drive:
20190828_R-user-group_string-manipulation.R
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Subsetting files through their names or paths
● Subsetting groups
Summary
Protein threading using context specific alignment potential ismb-2013Sheng Wang
Template-based modeling, including homology modeling and protein threading, is the most reliable method for protein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current template-base modeling methods, especially when proteins under consideration are distantly related.
We present a novel context-specific alignment potential for protein threading, including alignment and template selection. Our alignment potential measures the log-odds ratio of one alignment being generated from two related proteins to being generated from two unrelated proteins, by integrating both local and global contextspecific information.
5th in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084
PGQL: A Query Language for Graphs
Learn how to query graphs using PGQL, an expressive and intuitive graph query language that's a lot like SQL. With PGQL, it's easy to get going writing graph analysis queries to the database in a very short time. Albert and Oskar show what you can do with PGQL, and how to write and execute PGQL code.
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
Graph-based modelling is becoming more popular, in the sciences and elsewhere, as a flexible and powerful way to exploit data to power world-changing digital applications. Com- pared to the initial vision of the Semantic Web, knowledge graphs and graph databases are be- coming a practical and computationally less formal way to manage graph data. On the other hand, linked data based on Semantic Web standards are a complementary, rather than alternative, ap- proach to deal with these data, since they still provide a common way to represent and exchange information. In this paper we introduce rdf2neo, a tool to populate Neo4j databases starting from RDF data sets, based on a configurable mapping between the two. By employing agrigenomics- related real use cases, we show how such mapping can allow for a hybrid approach to the man- agement of networked knowledge, based on taking advantage of the best of both RDF and prop- erty graphs.
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
Graph data and graph analytics are increasingly important in data science and engineering. Cypher is an open language used for querying and updating graph databases and analytics platforms, which is now available in the Apache Spark environment. Neo4j Morpheus leverages the open source graph language project to integrate data from Neo4j operational graph databases with Hive and JDBC SQL data sources, using new Cypher features like the Property Graph Catalog, named graphs, graph projection, parameterized graph view functions, and graph/table views. Input and output graphs can be loaded and stored as structured collections of DataFrames with strong graph schemas to ensure data consistency and graph query optimization. Property graphs can also be analyzed and transformed using graph algorithms such as those in the GraphFrames project. Besides describing and demonstrating these capabilities, this talk also discusses the Spark Project Improvement Proposal to bring Cypher into Spark 3.0, and outlines current work to unify Cypher with other graph query languages to form a new ISO standard Graph Query Language.
Speakers: Alastair Green, Martin Junghanns
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusInfluxData
Flux is not only a new data scripting and query language — it is also a powerful data processing engine. This talk by Julius Volz will focus on how he worked with the InfluxData team to build PromQL support for the Flux engine. Hear about lessons learned from building the transpiler and recommendations on why and how to use PromQL and Flux. This talk will include a demo and will share the current project progress.
Similar to A Preliminary survey of RDF/Neo4j as backends for KnetMiner (20)
Marco Brandizi and Keywan Hassani-Pak, Rothamsted Research, Invited Presentation at SWAT4HCLS 2022.
FAIR data principles are being a driving force in life sciences and other scientific domains, helping researchers to share their data and free all of their potential to integrate information and do novel discoveries. Knowledge graphs are an ever more popular paradigm to model data according to such principles, and technologies such as graph databases are emerging as complementary to approaches like linked data. All of this includes the agronomy, farming and food domains. How advanced the adoption of sound data management policies is in these life domains? How does that compare to other life sciences? In this presentation, we will talk about our practical experience, focusing on KnetMiner, a gene and molecular biology discovering platform, which is based on building and publishing knowledge graphs according to the FAIR principles, as well as using a mix of linked data standards for life sciences and recent graph database and API technologies. We will welcome questions and discussions from the audience about similar experience.
Sharing data with lightweight data standards, such as schema.org and bioschemas. The Knetminer case, an application for the agrifood domain and molecular biology.
Presented at Open Data Sicilia (#ODS2021)
How open data contribute to improving the world. The life science use case. The technical, social, ethical issues.
This was a talk given within the iGEM 2020 programme by the London Imperial College students group (https://2020.igem.org/Team:Imperial_College), in a webinar organised by the SOAPLab group on the topic of Ethics of Automation. Excellent Dr Brandon Sepulvado was the other speaker of the day.
This workshop aims at gathering together practioners of all levels and from a variety of research areas (agronomy, plant biology, food, life sciences etc) to compare best practices, points of views and projects about producing and consuming data in the agrifood field.
As it happens in general for digital data, the current trends in this arena include integration of "traditional" semantic-based approaches (eg, ontoloies, RDF-based linked data) with lightweight schemas (eg, Bioschemas/schema.org), use of JSON-based APIs, development of data lakes and knowledge graphs based on NoSQL technologies, graph databases based on property graphs (eg, Neo4j, TinkerPop/Gremlin).
Workshop participants will get an opportunity to discuss how those approaches and technologies are being used in the agrifood field, for the purpose or realising the FAIR data principles and make data sharing a powerful tool for research, industry or socio-economic investigation. In particular, we will propose an interactive session to outline the way participant-proposed datasets can be encoded through bioschemas or similar approaches.
Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and...Rothamsted Research, UK
Presented at Integrative Bioinformatics Conference (IB2018, Harpenden, 2018).
We describe how to use Semantic Web Technologies and graph databases like Neo4j to serve life science data and address the FAIR data principles.
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Rothamsted Research, UK
Workshop within the Integrative Bioinformatics Conference (IB2018, Harpenden, 2018).
We describe how to use Semantic Web Technologies and graph databases like Neo4j to serve life science data and address the FAIR data principles.
graph2tab, a library to convert experimental workflow graphs into tabular for...Rothamsted Research, UK
a generic implementation of a method for producing spreadsheets out of pipeline graphs See https://github.com/ISA-tools/graph2tab for details.
Presentation given to my group at EBI, on Feb 2, 2012.
Presentation on the EBI linked data, given at the SIB course on linked data for life science, Dec 2015.
This is a PDF version of the original presentation available on Prezi: http://tinyurl.com/ebirdfsib15
Building Linked Data for the EBI RDF Platform and biomedical samples: what we have learned and delivered during the Biomedbridges project. Original @ https://prezi.com/vxox0pgra6d7/biosd-linked-data-lessons-learned/
myequivalents is a system to manage cross-references between entities that can be identified by pairs composed of a service name (e.g., EBI's ArrayExpress, Wikipedia) and an accession (e.g., E-MEXP-2514, Barack_Obama). For those familiar with the Semantic Web, we plan to support identification of entities via URIs and the owl:sameAs property. For those who already know MIRIAM and identifiers.org, myequivalents is more general than them and we plan to support these services in future.
A few notes here about the UK Ontology Network Meeting (http://dream.inf.ed.ac.uk/events/ukont-13/2013_workshop_program.html) I've attended on 11/Apr/2013.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
How to Position Your Globus Data Portal for Success Ten Good Practices
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
1. O N D E X & G R A P H D B S
M A R C O B R A N D I Z I , 1 6 / 1 0 / 2 0 1 7
2. G O A L S
• Evaluate graph databases (GDBs)/frameworkd/etc in relation to ONDEX needs
• Assess GDBs as kNetMiner/ONDEX backends
• Evaluate a new architecture where raw data access is entirely based on a GDB
• Evaluate a new data exchange format, possibly integrated with one GDBs
• and hence, evaluate the data models too
• Assess data query/manipulation languages (expressivity, ease of use, speed)
• Assess that performance fits to ONDEX needs
3. T E S T D A T A
Trait Ontology (TO) 1500 nodes, is-a and part-of relations (i.e., mostly tree)
Gene Ontology (GO) Tree with 46k nodes
AraCyc/BioPAX Heterogeneous net, 23k nodes, 40k relations
Ara-kNet Heterogeneous net, 350k nodes 1.150M relations
7. R D F / L I N K E D D A T A
E S S E N T I A L S
• Simple, Fine-Grained Data
Model: Property/Value Pairs &
Typed Links
• Designed for Data Integration:
• Universal Identifiers, W3C
Standards
• Strong (even too much)
emphasis on knowledge
modelling via
schemas/ontologies
• Designed for the Web:
Resolvable URIs, Web APIs
8. R D F / L I N K E D D A T A E S S E N T I A L S
Integration as native citizen, strong emphasis on knowledge modelling, schemas, ontologies
10. E X A M P L E Q U E R I E S
Count concepts (classes) in Trait Ontology:
select count (distinct ?c) WHERE {
?c a odxcc:TO_TERM.
}
Parts of membrane (transitively):
select distinct ?csup ?supName ?c ?name
WHERE {
?csup odx:conceptName ?supName.
FILTER ( ?supName = "cellular membrane" )
?c odxrt:part_of* ?csup.
?c odx:conceptName ?name.
}
LIMIT 1000
Proteins related to pathways:
select distinct ?prot ?pway {
?prot odxrt:pd_by|odxrt:cs_by ?react;
a odxcc:Protein.
?react a odxcc:Reaction.
?react odxrt:part_of ?pway.
?pway a odxcc:Path.
}
LIMIT 1000
optimised order
‘|’ for property paths
11. E X A M P L E Q U E R I E S
# part 2
union {
# Branch 2
?prot ^odxrt:ac_by|odxrt:is_a ?enz.
?prot a odxcc:Protein.
?enz a odxcc:Enzyme.
{
# Branch 2.1
?enz odxrt:ac_by|odxrt:in_by ?comp.
?comp a odxcc:Compound.
?comp odxrt:cs_by|odxrt:pd_by ?trns
?trns a odxcc:Transport
}
union {
# Branch 2.2
?enz ^odxrt:ca_by ?trns.
?trns a odxcc:Transport
}
?trns odxrt:part_of ?pway.
?pway a odxcc:Path.
}
} LIMIT 1000
prefix odx: <http://ondex.sourceforge.net/ondex-core#>
prefix odxcc: <http://www.ondex.org/ex/conceptClass/>
prefix odxc: <http://www.ondex.org/ex/concept/>
prefix odxrt: <http://www.ondex.org/ex/relationType/>
prefix odxr: <http://www.ondex.org/ex/relation/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?prot ?pway {
where {
# Branch 1
?prot odxrt:pd_by|odxrt:cs_by ?react.
?prot a odxcc:Protein.
?react a odxcc:Reaction.
?react odxrt:part_of ?pway.
?pway a odxcc:Path.
}
# to be continued…
Proteins related to pathways:
12. R D F P E R F O R M A N C E
Simple, common queries (Fuseki)
13. R D F P E R F O R M A N C E
Queries over ONDEX paths (Fuseki)
14. R D F P E R F O R M A N C E
Queries over ONDEX paths, Virtuoso
16. N E O 4 J E S S E N T I A L S
• Designed to backup applications
• much less about standards or Web-based sharing
• Very little to manage schemas (more later)
• No native data format (except Cypher, support for
GraphML, RDF)
• Initially based on API only, now Cypher available
• Compact, easy, no URIs (can be used as strings)
• Very performant
• Hasn’t much for clustering/federation, but Cypher can be
used in TinkerPop
• More commercial (not necessarily good)
• Cool management interface
• Probably easier to use for the average Java developer
Image credits: https://goo.gl/YLhCXG
17. N E O 4 J D A T A M O D E L
Both nodes and relations can have attributes
Nodes & relations have labels
(i.e., string-based types)
Cool management interface
(SPARQL version might be a student project)
18. C Y P H E R Q U E R Y / D M
L A N G U A G E
Proteins->Reactions->Pathways:
// chain of paths, node selection via property (exploits indices)
MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ })
// further conditions, but often not performant
WHERE prot.name =~ ‘(?i)^DNA.+’
// Usual projection and post-selection operators
RETURN prot.name, pway
// Relations can have properties
ORDER BY csby.pvalue
LIMIT 1000
Single-path (or same-direction branching) easy to write:
MATCH (prot:Protein) - [:pd_by|cs_by] -> (:Reaction) - [:part_of*1..3] ->
(pway:Path)
RETURN ID(prot), ID(pway) LIMIT 1000
// Very compact forms available, depending on the data
MATCH (prot:Protein) - (pway:Path) RETURN pway
19. C Y P H E R Q U E R Y / D M
L A N G U A G E
DML features:
MATCH (prot:Protein{ name:’P53’ }), (pway:Path{ title:’apoptosis’})
CREATE (prot) - [:participates_in] -> (pway)
DML features, embeddable in Java/Python/etc:
UNWIND $rows AS row // $rows set by the invoker, programmatically
MATCH (prot:Protein{ id: row.protId }), (pway:Path{ id:row.pathId })
CREATE (prot) - [relation:participates_in] -> (pway)
SET relation = row.relationAttributes
20. C Y P H E R / N E O 4 J P E R F O R M A N C E
Simple, common queries
21. C Y P H E R / N E O 4 J P E R F O R M A N C E
Path Queries
22. S O U N D S G O O D , B U T …
select distinct ?prot ?pway {
where {
# Branch 1
…
}
union {
# Branch 2
…
{
# Branch 2.1
}
union {
# Branch 2.2
}
…
}
}
• In Cypher?!
• I couldn’t find a decent way, although it might be possible (https://goo.gl/Rpa9SM)
• Partially possible in straightforward way, but redundantly, e.g., Branch 2:
MATCH (prot:Protein) <- [:ac_by] - (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] -
(pway:Path)
RETURN prot, pway LIMIT 100
UNION
MATCH (prot:Protein) - [:is_a] -> (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] -
(pway:Path)
RETURN prot, pway LIMIT 100
23. A D D E N D U M
select distinct ?prot ?pway {
where {
# Branch 1
…
}
union {
# Branch 2
…
{
# Branch 2.1
}
union {
# Branch 2.2
}
…
}
}
• In Cypher?!
Unions+branches partially possible by means of paths in WHERE:
// Branch 2
MATCH (prot:Protein), (enz:Enzyme), (tns:Transport) - [:part_of] -> (path:Path)
WHERE ( (enz) - [:ac_by|:in_by] -> (:Comp) - [:pd_by|:cs_by] -> (tns) // Branch 2.1
OR (tns) - [:ca_by] -> (enz) ) //Branch 2.2 (pt1)
AND ( (prot) - [:is_a] -> (enz) OR (prot) <- [:ac_by] - (enz) ) // Branch 2.2 (pt2)
RETURN prot, path LIMIT 30
UNION
// Branch1
MATCH (prot:Protein) - [:pd_by|:cs_by] -> (:Reaction) - [:part_of] -> (path:Path)
RETURN prot, path LIMIT 30
• However,
• 41249ms to execute against wheat net.
• it generates cartesian products and can
easily explode
24. S O U N D S G O O D , B U T …
• What about schemas/metadata/ontologies?
• Node and relations can only have multiple labels attached, which are just
strings. Rich schema-operations not so easy:
• Select any kind of protein, including enzymes, cytokines
• Select any type of ‘interacts with’, including ‘catalysed by’, ‘consumed by’,
‘produced by’ (might require ‘inverse of’)
• Basically, has a relational-oriented view about the schemas
25. S O U N D S G O O D , B U T …
• Basically, it’s relational-oriented about schemas
• we might still be OK with metadata modelled as graphs, however:
• MATCH (molecule:Molecule),
(molType:Class)-[:is_a*]->(:Class{ name:’Protein’ })
WHERE LABELS molType IN LABELS (molecule)
• It’s expensive to compute (doesn’t exploit indexes)
• MATCH (molecule:Molecule:$additionalLabel) CREATE …
• Parameterising on labels not possible
• Requires non parametric Cypher string => UNWIND-based bulk loading impossible
• => bad performance
• Programmatic approach possible, but a lot of problems with things like Lucene version mismatches (one reason
being that ONDEX would require review and proper plug-in architecture)
26. F L A T , R D F - L I K E M O D E L
Code for both converters:
github:/marco-brandizi/odx_neo4j_converter_test
27. F L A T M O D E L I M P A C T O N
C Y P H E R
Structured model:
MATCH (prot:Protein{ id: '250169' }) - [:cs_by] -> (react:Reaction) - [:part_of] -> (pway:Path)
RETURN * LIMIT 100
Flat model:
MATCH (prot:Concept {id: '250169', ccName: 'Protein'})
<- [:from] - (csby:Relation {name: 'cs_by' })
- [:to] -> (react:Concept { ccName: 'Reaction'})
<- [:from] - (partof:Relation {name:'part_of'}) - [:to]
-> (pway:Concept {ccName:'Path'})
RETURN * LIMIT 100
Rich schema-based queries
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
28. F L A T M O D E L P E R F O R M A N C E
Simple, common queries
29. F L A T M O D E L P E R F O R M A N C E
Typical ONDEX Graph Queries
30. I M P A C T O N C Y P H E R
Rich schema-based queries
From:
MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ })
WHERE molType.label IN LABELS (molecule)
To:
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
now it’s efficient-enough (especially with length restrictions)
However…
31. I M P A C T O N C Y P H E R
Rich schema-based queries
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
now it’s efficient-enough (especially with length restrictions)
However…
from: MATCH (react:Reaction) - [:part_of] -> (pway:Path)
to: MATCH (react:Concept {ccName: ‘Reaction’})
<- [:from] - (partof:Relation {name:'part_of'})
- [:to] -> (pway:Concept {ccName:'Path'})
What if we want variable-length part_of?
Not currently possible in Cypher (nor in SPARQL),
maybe in future (https://github.com/neo4j/neo4j/issues/88)
=> Having both model, redundantly, would probably be worth
=> makes it not so different than RDF
32. O T H E R I S S U E S
• Data Exchange format?
• None, except Cypher
• DML not so performant
• In particular, no standard data exchange format
• Could be combined with RDF
• Is Neo4j Open Source?
• Produced by a company, only the Community Edition is OSS
• OpenCypher is available
• Cypher backed by Gremlin/TinkerPop
• Apache project, more reliable OSS-wide
• Performance comparable with Neo4j (https://goo.gl/NK1tn2)
• More choice of implementations
• Alternative QL, but more complicated IMHO (Cypher supported)
Image credits: https://goo.gl/ysBFF2
33. C O N C L U S I O N S
Neo4J/GraphDBs Virtuoso/Triple Stores
Data X format - +
Data model
+ Relations with properties
- Metadata management
- Relations cannot have properties (req. reification)
+ Metadata as first citizen
Performance + - (comparable)
QL
+ Easier (eg, compact, omissions)? - Expressivity
for some patterns (unions, DML)
- Harder? (URIs, namespaces, verbosity) + More
expressive
Standardisation,
openness
- +
Scalability, big data - TinkerPop probably better
LB/Cluster solutions Over TinkerPop (via SAIL
implementation)
37. W H Y ?
• Graph + APIs
• Clearer architecture, open to more
applications, not only kNetMiner
• QL makes it easier to develop further
components/analyses/applications
• Standard Data model and format
• Don’t reinvent the wheel
• Data sharing
• Data and app integration