The document describes a framework for biological relation extraction using biomedical ontologies and text mining. It discusses introducing biomedical text mining and outlines the problem, motivation, and challenges. It then presents the overall system components and architecture, including searching/browsing, Swanson's algorithm, protein-protein interactions, and gene clustering applications. The framework concept issues, design issues, sequence diagram, and database are also covered at a high level.
A knowledge capture framework for domain specific search systemsramakanz
This is the product roll out presentation at the AFRL on creating a focused knowledge base, search, and retrieval system for the domain of human performance and cognition.
Catherine Canevet – Ondex: Data integration and visualisation
Ondex (http://ondex.org/) is a data integration platform which enables data from diverse biological data sets to be linked, integrated and visualised through graph analysis techniques. This talk describes its functionalities and a few application cases.
A knowledge capture framework for domain specific search systemsramakanz
This is the product roll out presentation at the AFRL on creating a focused knowledge base, search, and retrieval system for the domain of human performance and cognition.
Catherine Canevet – Ondex: Data integration and visualisation
Ondex (http://ondex.org/) is a data integration platform which enables data from diverse biological data sets to be linked, integrated and visualised through graph analysis techniques. This talk describes its functionalities and a few application cases.
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
Presentation about collaborative development of open source pathway analysis code and pathways and about usage in analytical software distributed with analytical machines like mass spectrophotometers.
Keynote presented at the Phenotype Foundation first annual meeting.
Describes data sharing, data annotation and the needs for further tool and ontology and ontology mapping development.
Amsterdam, January 18, 2016
Presentation pathway extensions using knowledge integration and network approaches presented at the Systems Biology Institute in Luxembourg on November 28 2012.
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Amit Sheth
Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ...
While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. ..
This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research.
Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer,
Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs)
Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/)
D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation)
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013
D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%)
D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure - overview, implementation patterns & software tools
Slides presented at EBI Metagenomics Bioinformatics course: http://www.ebi.ac.uk/training/course/metagenomics2014
Using ontologies to do integrative systems biologyChris Evelo
To really get ahead with complex health problems like cancer and diabetes we need to become better at combining different types of studies, including large scale genomics and genetics studies and we need to learn to better combine such studies with biological knowledge we already. Typically that leads to questions like “I did this study with high-fat low fat diet comparison in mice and looked at the transcriptomics results in liver, fat and muscle. Did somebody else maybe do a study like that and publish the data, maybe for proteomics? Could I find that in one of these open data repositories?”. Or, “I did that, can I find which biological pathways are affected most and whether any of the proteins in that pathway is a known target for an existing drug?”. Or even “I did that study, could I find another study that yielded the same kind of biological results even if it was from a different research field with a completely different result?”.
To answer this kind of questions we need to describe studies and study results, structure knowledge allow mapping of “equal” things with different identifier schemes and essentially do a lot of mapping to and between ontologies. More and more of this is getting real and I will try to describe some of that.
Homepage for this webinar is here: http://www.bioontology.org/ontologies-in-integrative-systems-biology
It is part of this series: http://www.bioontology.org/webinar-series
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
Presentation about collaborative development of open source pathway analysis code and pathways and about usage in analytical software distributed with analytical machines like mass spectrophotometers.
Keynote presented at the Phenotype Foundation first annual meeting.
Describes data sharing, data annotation and the needs for further tool and ontology and ontology mapping development.
Amsterdam, January 18, 2016
Presentation pathway extensions using knowledge integration and network approaches presented at the Systems Biology Institute in Luxembourg on November 28 2012.
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Amit Sheth
Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ...
While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. ..
This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research.
Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer,
Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs)
Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/)
D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation)
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013
D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%)
D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure - overview, implementation patterns & software tools
Slides presented at EBI Metagenomics Bioinformatics course: http://www.ebi.ac.uk/training/course/metagenomics2014
Using ontologies to do integrative systems biologyChris Evelo
To really get ahead with complex health problems like cancer and diabetes we need to become better at combining different types of studies, including large scale genomics and genetics studies and we need to learn to better combine such studies with biological knowledge we already. Typically that leads to questions like “I did this study with high-fat low fat diet comparison in mice and looked at the transcriptomics results in liver, fat and muscle. Did somebody else maybe do a study like that and publish the data, maybe for proteomics? Could I find that in one of these open data repositories?”. Or, “I did that, can I find which biological pathways are affected most and whether any of the proteins in that pathway is a known target for an existing drug?”. Or even “I did that study, could I find another study that yielded the same kind of biological results even if it was from a different research field with a completely different result?”.
To answer this kind of questions we need to describe studies and study results, structure knowledge allow mapping of “equal” things with different identifier schemes and essentially do a lot of mapping to and between ontologies. More and more of this is getting real and I will try to describe some of that.
Homepage for this webinar is here: http://www.bioontology.org/ontologies-in-integrative-systems-biology
It is part of this series: http://www.bioontology.org/webinar-series
Book: Al-Farabi & the Foundation of Islamic Political PhilosophyMuhsin Maltezos
A presentation on the book "Al-Farabi & the Foundation of Islamic Political Philosophy"
by Muhsin Mahdi. This presentation offers an overview of the major contributions of the early Islamic scholar Abu Nasr Al-Farabi (ca. 870-950), his studies of Plato and Aristotle, and his development of an Islamic Political Philosophy.
CDAO presentation.
The idea of the comparative analysis ontoloty has been presented worldwide, including: NESCent (USA), IGBMC (France), UFRJ (Brazil). Providing a semantic framework for evolutionary analysis in a high-throughtput way after the next and third generation sequencing is the way to approach evolutionary-based studies into genome-wide analysis. The darwinian core of reasoning also allows CDAO to be used with other entities.
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
INTRODUCTION
DEFINITION OF BIOINFORMATICS
HISTORY
OBJECTIVE OF BIOINFORMATIC
TOOLS OF BIOINFORMATICS
PROCEDURE AND TOOLS OF BIOINFORMATIC
BIOLOGICAL DATABASES
HOMOLOGY AND SIMILARITY TOOLS (SEQUENCE ALIGNMENT)
PROTEIN FUNCTION ANALYSIS TOOLS
STRUCTURAL ANALYSIS TOOLS
SEQUENCE MANIPULATION TOOLS
SEQUENCE ANALYSIS TOOLS
APPLICATION
CONCLUSION
REFERENCES
Brief overview of Bioschemas presented at the 2017 bio hackathon in Japan. The presentation introduces the new proposal for a new Schema.org type called BiologicalEntity.
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
Semantic Web for Health Care and Biomedical InformaticsAmit Sheth
Amit Sheth, "Semantic Web for Health Care and Biomedical Informatics," Keynote at NSF Biomed Web Workshop, Corbett, Oregon, December 4-5, 2007.
http://www.biomedweb.info/2007/
Capistrano is an open source tool for running scripts on
multiple servers. Its main use is deploying web
applications including supporting tasks such as
changing databases.
Presentation that explains the main concepts used with dynaTrace.
dynaTrace is a tool to perform browser performance analysis (for JS, images, CSS, network, caching, ..etc)
This is my submission for the fourth assignment "CONNECT AND COMBINE" for "A Crash Course on Creativity" by Professor Tina Seelig, Stanford University.
This is my submission for the second assignment "ARE YOU PAYING ATTENTION?" for "A Crash Course on Creativity" by Professor Tina Seelig, Stanford University
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
3. Agenda
Introduction to Biomedical Text Mining
System Overview
Problem Description
Motivation
Challenges
System Framework
Application upon System Framework
Swanson’s Algorithm
Protein to Protein Interactions (PPI)
Gene Clustering based on Text Mining
Extended Work
Conclusion and Future Work.
4. Agenda
Introduction to Biomedical Text Mining
System Overview
Problem Description
Motivation
Challenges
System Framework
Application upon System Framework
Swanson’s Algorithm
Protein to Protein Interactions (PPI)
Gene Clustering based on Text Mining
Extended Work
Conclusion and Future Work.
5. Introduction to Biomedical Text
Mining
Text Mining = Process unstructured (textual)
information, extract meaningful data, make the
information contained in the text accessible to the
various data mining (statistical and machine learning)
algorithms.
Biomedical Text Mining = Working on biomedical
documents.
6. Agenda
Introduction to Biomedical Text Mining
System Overview
Problem Description
Motivation
Challenges
System Framework
Application upon System Framework
Swanson’s Algorithm
Protein to Protein Interactions (PPI)
Gene Clustering based on Text Mining
Extended Work
Conclusion and Future Work.
7. System Overview
Problem Description
Huge amount of information stored in million of
documents
These information can be used effectively to solve many
problems
Knowledge retrieval with no much effort
Discover relationship between different entities
Assessing relationship strength between different entities
Group entities into different clusters
8. System Overview
Motivation:
Build semantic structure of documents which
facilitates navigation through thousands of
documents.
Extract relationships between biomedical terms using
text mining techniques with aid of biomedical
ontologies.
Using text mining to group genes into different clusters.
9. System Overview
Challenges:
Concept Recognition
Build semantic structure of annotated documents using
ontologies
Relationship Recognition
Similarity (distance) between different entities.
10. Overall System Components
Framework
Searching and Browsing
Swanson’s Algorithm
PPI
Gene Clustering
12. Agenda
Introduction to Biomedical Text Mining
System Overview
Problem Description
Motivation
Challenges
System Framework
Application upon System Framework
Swanson’s Algorithm
Protein to Protein Interactions (PPI)
Gene Clustering based on Text Mining
Extended Work
Conclusion and Future Work.
15. System Framework
Objective:
Use ontologies to markup biomedical text documents.
Based on established semantic links between documents
and ontology concepts, the goal is build semantic
representation of information.
Provide services to other applications and users.
18. Framework Concept Issues
User Expanded Query
Query Expansion
Query Fetching
Documents
Search PubMed
Gene Documents
Ontology
Extract GO terms
Annotate PubMed
documents
Structure Representation
of documents
Annotated Documents
19. System Framework
PubMed:
Largest documents source in the biomedical field
Contains over 18 million documents
Maintained by the United States National Library
of Medicine (NLM)
Indexes all documents by MeSH terms to facilitate
searching and retrieval
20. System Framework
Gene Ontology:
The Gene Ontology project is a major
bioinformatics initiative with the aim of
standardizing the representation of gene and gene
product attributes across species and databases
Includes a controlled vocabulary of terms for
describing gene product characteristics.
Consists of three main categories
Cellular component
Biological process
Molecular function
21. System Framework
MeSH database:
Comprehensive controlled vocabulary for the purpose of indexing journal articles and
books in the life sciences; it can also serve as a thesaurus that facilitates searching
[Wikipedia]
MeSH main heading:
Anatomy
Organisms
Diseases
Chemicals and Drugs
Analytical, Diagnostic and Therapeutic Techniques and Equipment
Psychiatry and Psychology
Phenomena and Processes
Disciplines and Occupations
Anthropology, Education, Sociology and Social Phenomena
Technology, Industry, Agriculture
Humanities
Information Science
Named Groups
Health Care
Publication Characteristics
Geographical liocations
22. System Framework
Query Expansion (QE):is the process of reformulating
a seed query to improve retrieval performance in
information retrieval operations [Wikipedia]
How ?
Example
23. Query
Expansion Ocellus
pigmentation
Example
Pigment
Pigment
metabolic Pigmentation
accumulation
process
Cellular
pigmentation
24. System Framework
Documents Annotating
Annotate documents with Gene Ontology Terms, Genes
and proteins.
Represent each documents by set of terms. (How ?)
25. GO extractor
●GO’s vocabulary consists of 7,841 words. The majority of the GO words found
occur only once in the whole ontology. On the other hand 51 of the GO words
occur at least 100 times in the ontology. More than 90%, do not occur more
than 10 times.
●words with a very high frequency do not give much information as they are
part of many labels in the ontology. However, extracting a word with a low
frequency gives a much better hint about a mentioned concept. (Zipf's law).
●From the nature of GO-terms, the words in the end are very general
ex.(activity , transport).
●Besides, many GO-terms are substring of descending GO-terms.
●The algorithm is taken from GOPubMed (2008) “GoPubMed: Ontology-based
literature search for the life sciences”.
26. GO extractor algorithm
Get last
word
Compar Set main
e with root as a
root N root
Do BFS
The same
word
N and take Reache Y Get
occurred at each one as s leaf next
any sibling
a root word
Y
get next word
& do BFS and
consider each
one as a root
27. Go Extractor
Example:-
Abstract
“............................................and it's effected by the Kinase activity”. Abstract.
● Starting from the last word of the paragraph “activity”.
●Starting from the root of the GO tree searching for GO-term ending with
“activity”.
● When we rich it, fetch the next word and starting from the new root.
● Now we are looking in the subtree for an ontology ends with “Kinase activity”.
●While on search we reach leaf . It means that we got a GO-term. Now restart
by take the next word and from the root.
29. Framework Design Issues
Top Level Architecture of the System can be divided into:-
Data Handling Components
Information Handling Components
Information Extraction
Information Representation
Information Retrieval
31. System Framework
Framework main components:
Document Sources
Extractor
Document Annotators
Ontology Manager
System Engine
Database Manager
Cache Manager
Document
32. System Framework
Document Sources
Fetching of singles or collections of documents from
remote stores.
Extractor
Implements Information Extraction algorithms to extract
ontology terms from the documents
Document Annotators
establish semantic link between documents and ontology
concepts.
For example linking documents with its GO terms, MeSH
terms . . . etc.
33. System Framework
Ontology Manager
Provide interface to around ontologies
Composed by sub-managers to merge ontologies such as
Gene ontology
System Engine
Main component of the system.
Responsible for maintaining all the operations and
communications between various components of the
system
34. System Framework
Database Manager
implemented as a pool object (connections pool)
handles and maintains queries to the database such
insert, update and delete documents
Cache Manager
Implemented as client side of MemCached (open source
caching project).
Handles operations to the system cache
43. Our system Textpresso XplorMed Vivismo
Ontology Full Gene Only 30 Top hierarchy Drive
used Ontology category of ontology
the MeSH from the
ontology search
result
Output Uses the deep Returns a list For each Returns a list
ontology to of relevant MeSH of relevant
navigate abstract category, abstract
through a there is an
large result set associated list
in a non-
sequential
order
44. IBN-SINA vs. Others
IBN-SINA Textpresso XplorMed Vivismo
Works on works on all Designed for works on all works on all
the PubMed full paper which the PubMed the PubMed
abstracts not available abstracts abstracts
most of the
time
Term Allows gaps Tries to nd the Extract terms Extract terms
Extraction within category terms based on based on term
matches and directly in the term frequency in
considers the text only frequency in the collected
information allowing the collected documents
content of the for some documents
words, which variations in
leads to more lower/uppercas
rened term e letters and
extraction plural forms
48. Agenda
Introduction to Biomedical Text Mining
System Overview
Problem Description
Motivation
Challenges
System Framework
Application upon System Framework
Swanson’s Algorithm
Protein to Protein Interactions (PPI)
Gene Clustering based on Text Mining
Extended Work
Conclusion and Future Work.
50. Swanson Algorithm(1986)
Swanson’s method is a away of finding indirect relations between
objects.
A B
Related Related
term A1 term B1
Related Related
term A2 term B2
1986: “Undiscovered public knowledge”
51. Cosine Similarity
Cosine similarity is a measure of similarity between two vectors of n
dimensions by finding the cosine of the angle between them, often used to
compare documents in text mining [Wikipedia].
Terms related to first term “As’ related terms”
A B C D E F G H
Terms related to second term “Bs’ related terms”
A X Y B Z D E F
A B C D E F G H X Y Z
1 1 1 1 1 1 1 1 0 0 0
A B C D E F G H X Y Z
1 1 0 1 1 1 0 0 1 1 1
52. Cosine Similarity (Cont.)
Finally, applying cosine similarity function :-
A B C D E F G H X Y Z
1 1 1 1 1 1 1 1 0 0 0
A B C D E F G H X Y Z
1 1 0 1 1 1 0 0 1 1 1
Similarity = (1+1+0+1+1+1+0+0+0+0+0)/ (√8*√8) = 5/8 = 0.625
53. Swanson example
Relation between P53 and P51
1986: “Fish oil, Raynaud’s syndrome, and
undiscovered public knowledge”
54. Agenda
Introduction to Biomedical Text Mining
System Overview
Problem Description
Motivation
Challenges
System Framework
Application upon System Framework
Swanson’s Algorithm
Protein to Protein Interactions (PPI)
Gene Clustering based on Text Mining
Extended Work
Conclusion and Future Work.
56. PPI Agenda
Problem Description
Motivation
PPI System Overview
PPI System Main Components
Dependency Parse Tree
Similarity Metrics
K-Nearest Neighbor Classifier
Evaluation of PPI
Evaluation Metrics
Results and Comparison
57. PPI Agenda
Problem Description
Motivation
PPI System Overview
PPI System Main Components
Dependency Parse Tree
Similarity Metrics
K-Nearest Neighbor Classifier
Evaluation of PPI
Evaluation Metrics
Results and Comparison
58. Problem Description
Due to the ever growing amount of publications about
protein-protein interactions, information extraction from
text is increasingly recognized as one of crucial
technologies in bioinformatics
Reference:
Gunes Erkan, Arzucan Ozgur, Dragomir R. Radev. Semi-Supervised Classication
for Extracting Protein Interaction Sentences using Dependency Parsing.
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural
Language Processing and Computational Natural Language Learning, pp. 228237,
Prague, June 2007
59. PPI Agenda
Problem Description
Motivation
PPI System Overview
PPI System Main Components
Dependency Parse Tree
Similarity Metrics
K-Nearest Neighbor Classifier
Evaluation of PPI
Evaluation Metrics
Results and Comparison
60. Motivation
The interactions between proteins are important for
very numerous if not all biological functions.
The function of a protein can be characterized more
precisely through knowledge of PPI.
Information about these interactions improves our
understanding of diseases and can provide the basis
for new therapeutic approaches.
Validate experimental results and test benches.
61. PPI Agenda
Problem Description
Motivation
PPI System Overview
PPI System Main Components
Dependency Parse Tree
Similarity Metrics
K-Nearest Neighbor Classifier
Evaluation of PPI
Evaluation Metrics
Results and Comparison
62. System Overview
We worked on Sentence level (Why?)
It increases the semantic understood from the sentence.
Synthesis of the sentence increases the knowledge
obtained from it.
Specific relation between proteins can be deduced from
it.
64. System Overview
Our approach depends on:
The shortest path between the entities in dependency
tree of a sentence usually captures the necessary
information to identify their relationship.
65. PPI Agenda
Problem Description
Motivation
PPI System Overview
PPI System Main Components
Dependency Parse Tree
Similarity Metrics
K-Nearest Neighbor Classifier
Evaluation of PPI
Evaluation Metrics
Results and Comparison
67. Dependency Parse Tree
• Unlike a syntactic parse, it captures the semantic
predicate-argument relationships among its words.
Stanford Parser API to make the Natural Language
processing task.
Shortest path is found using Breadth First Search
(BFS) as each edge has equal wait, and therefore this
leads to most near path discovered first.
68. Dependency Parse Tree (Example)
"The dependency tree of the sentence “The results demonstrated
that KaiC interacts rhythmically with KaiA, KaiB, and SasA.”
69. Example (Cont.)
• Then, we select the shortest paths between the
protein pairs:
• KaiC - nsubj - interacts - prep with – SasA
• KaiC - nsubj - interacts - prep with - SasA - conj and -
KaiA
• KaiC - nsubj - interacts - prep with – SasA - conj and –
KaiB
• SasA - conj and – KaiA
• SasA - conj and – KaiB
• KaiA – conj and – SasA - conj and - KaiB
70. Example (Cont.)
• Then, we rename the proteins in the pair as PROTX1
and PROTX2, and all the other proteins in the sentence
as PROTX0:
• PROTX1 - nsubj - interacts - prep with - ROTX2
• PROTX1 - nsubj - interacts - prep with - ROTX0 – conj_and -
PROTX2
• PROTX1 - nsubj - interacts - prep with – ROTX0 –conj_and -
PROTX2
• PROTX1 – conj_and - PROTX2
• PROTX1 – conj_and - PROTX2
• PROTX1 – conj_and – PROTX0 – conj_and - PROTX2
71. PPI Agenda
Problem Description
Motivation
PPI System Overview
PPI System Main Components
Dependency Parse Tree
Similarity Metrics
K-Nearest Neighbor Classifier
Evaluation of PPI
Evaluation Metrics
Results and Comparison
73. Similarity Metrics
The main idea of using similarity metrics is to
find a function that maps input patterns into a
target space such that a simple distance in the
target space approximates the “semantic”
distance in the input space.
74. Similarity Metrics
We implemented Levenshtein distance (Edit
Distance).
number of transpositions, substitutions and deletions
needed to transform one string into another.
We also used an open source library called
“SimMetrics” – Java library of 23 string similarity
metrics.
• Developed at the University of Sheffield (Chapman,
2004)
75. Similarity Metrics
• We used only 10 string similarities from SimMetrics.
• Cosine Similarity
• Block Distance
• Dice Similarity
• Euclidean Distance
• Jaccard Similarity
• Jaro Similarity
• Jaro Winkler Similarity
• Matching Coecient
• Monge Elkan Similarity
76. PPI Agenda
Problem Description
Motivation
PPI System Overview
PPI System Main Components
Dependency Parse Tree
Similarity Metrics
K-Nearest Neighbor Classifier
Evaluation of PPI
Evaluation Metrics
Results and Comparison
78. K-Nearest Neighbor Classifier
• k nearest neighbor-assign label according to the
majority label of k nearest-neighboor training
patterns.
79. KNN Example
• If k = 3, it is classified as
a triangle
• k = 5, it is classified as a
square
80. KNN Strengths and Weaknesses
• Strengths:
• Simple to implement and use
• Comprehensible – easy to explain prediction
• Robust to noisy data by averaging k-nearest neighbors
81. KNN Strengths and Weaknesses
• Weaknesses:
• Need a lot of space to store all examples.
• Takes more time to classify a new example than with a
model (need to calculate and compare distance from new
example to all other examples).
82. PPI Agenda
Problem Description
Motivation
PPI System Overview
PPI System Main Components
Dependency Parse Tree
Similarity Metrics
K-Nearest Neighbor Classifier
Evaluation of PPI
Evaluation Metrics
Results and Comparison
84. Evaluation of PPI
• we used five different datasets which are:
• BioInfer dataset.
• AIMed dataset.
• LLL dataset.
• IEPA dataset.
• HPRD50 dataset.
• We used KNN classier and changing K and similarity
metric as parameters.
87. PPI Agenda
Problem Description
Motivation
PPI System Overview
PPI System Main Components
Dependency Parse Tree
Similarity Metrics
K-Nearest Neighbor Classifier
Evaluation of PPI
Evaluation Metrics
Results and Comparison
93. Results and Comparison
Dataset Min. Result Max. Result
BioInfer 32 56.9
AIMed 5 48.9
LLL 48.8 73
IEPA 36.6 72
HPRD50 12.9 63.49
94. Our PPI System Vs. Graph Kernel
Approach
Dataset Our System Graph Kernel
(%) Approach (%)
BioInfer 56.9 52.9
AIMed 48.9 56.4
LLL 73 76.8
IEPA 72 75.1
HPRD50 67 63.4
95. Agenda
Introduction to Biomedical Text Mining
System Overview
Problem Description
Motivation
Challenges
System Framework
Application upon System Framework
Swanson’s Algorithm
Protein to Protein Interactions (PPI)
Gene Clustering based on Text Mining
Extended Work
Conclusion and Future Work.
110. Swanson Algorithm
Search PubMed for gene A and extract set A ( the
most related keywords - MeSH or GO terms - ) .
Search PubMed for gene B and extract set B ( the most
related keywords - MeSH or GO terms - ) .
Based on the intersection between set A and set B, we
apply the cosine similarity.
111. Document Occurrences
Search PubMed for gene A and extract set A
(documents Ids of gene A) .
Search PubMed for gene B and extract set B
(documents Ids of gene B).
Based on the intersection between set A and set B, we
apply the Jaccard Similarity Coefficient.
112. Agenda
Introduction to Biomedical Text Mining
System Overview
Problem Description
Motivation
Challenges
System Framework
Application upon System Framework
Swanson’s Algorithm
Protein to Protein Interactions (PPI)
Gene Clustering based on Text Mining
Extended Work
Conclusion and Future Work.
113. Extended Work: PPI System with
SVM Classifier (1)
Equation :
u=w⋅x-b
- Objective :
min (1/2) || w||2
subject to
yi (w ⋅ xi-b) ≥ 1,
∀i
114. Extended Work: PPI System with
SVM Classifier (2)
min Ψ (α ) = min (1/2) ∑ ∑ yi yj (xi ⋅xj)αi αj- ∑ αi
α is called multiplier and if we can get α we can get (w , b) .
w = ∑ yi αi xi , b = w ⋅ xk-yk for some αk > 0
115. Agenda
Introduction to Biomedical Text Mining
System Overview
Problem Description
Motivation
Challenges
System Framework
Application upon System Framework
Swanson’s Algorithm
Protein to Protein Interactions (PPI)
Gene Clustering based on Text Mining
Extended Work
Conclusion and Future Work.
116. Conclusion
Problem 1: Algorithms for concept recognition in
documents abstracts and titles
We introduced an algorithm to annotate the Gene Ontology
terms in the documents.
Problem 2: Use the annotated documents to build a
structured representation of documents
We introduced how framework uses Gene Ontology to build a
semantic representation of the obtained documents
Problem 3: Design a system for ontology based search
engines for biological researchers
We introduced design of the framework and how it is flexible
for future modifications and scalable with respect to number
of documents and number of users.
117. Conclusion
Problem 4: Using Swanson’s algorithm to assess the similarity between
different biological terms
We introduced how can Swanson's algorithm be used to estimate the
similarity between two instances (P53 and P21)
Problem 5: Supervised machine learning algorithms for prediction of
Protein to Protein interactions
We introduced how we used supervised machine learning algorithms such
as KNN and a new technique to estimate the distance between sentence in
order to predict the possible interactions between proteins mentioned in
the documents.
Problem 6: Unsupervised machine learning algorithms to identify
different clusters of Genes
We introduced how we used unsupervised machine learning algorithms
such as DBScan and the similarity based on Swanson Algorithms and
Cosine similarity in order to group genes mentioned in the documents in
different clusters.
118. Future work
There are hot research areas and open problems
in the biological text mining
The content Provider for Documents
Google Scholar
Using Semantic web 3.0 ( Online Journals )
The Ontology Generation
Ability to Edit the Ontologies and Adding knowledge
Other Ontologies
Using Wikipedia as an Ontology
119. Future work
There are some features that may be added to the
System
Biomedical Ontology based Search Engine
Provide documents summary for each group of documents
Allow the user to save and print the results obtained by the system.
Protein-Protein Interaction (PPI)
Use more sophisticated classifiers and machine learning techniques
such as AdaBoost to enhance the classification process.
Use a background knowledge of verbs as there are many verbs gives the
same meaning.
This will help the system to have more accurate results, as we can
introduce some fuzzy distance to the differences between the meaning
of verbs. This also will introduce the ability to discover the type of
relations between the terms and to be more semantic relations
identification.
120. Future work
• There are some features that may be added to
the System
Gene Clustering
Using more sophisticated clustering algorithms which originally
designed for gene clustering.
More Applications:
Based on the services provided by the ontology based
engine, we can construct some applications such as
extracting the relation between the drugs and diseases,
group diseases in different clusters which decision helps
to identify the characteristics of a new discovered disease
and other applications that relay on text mining in
biomedical documents.