The document discusses approaches to integrating internal and external data across pharmaceutical research. It describes utilizing a data warehousing strategy through a Research Information Factory (RIF) to create a single global repository for research data. However, integrating external data from various sources poses additional challenges. Tools like PharmaMatrix provide a pre-indexed mine of scientific literature linking drug targets to indications, but result sets can be large. The document suggests that Web 2.0 technologies like wikis, blogs and tagging could help turn integrated information into knowledge by enabling collaboration and sharing. Industry-wide data standards and common ontologies would also help facilitate external data integration.
Slides to be presented at a webinar arranged by Metasolution as part of a Vinnova project http://metasolutions.se/2014/03/webbinarium-med-kerstin-forsberg-om-lankade-data-i-lakemedelsforskningen/
Collaborative Drug Discovery -- Life Science Collaboration & Virtualization: ...Pistoia Alliance
The Pistoia Alliance Conference in April 2011 included a series of 10-minute "lightning talks" from vendors about what they think pharma will look like in 2020. This presentation was delivered by Sean Ekins of Collaborative Drug Discovery.
Slides to be presented at a webinar arranged by Metasolution as part of a Vinnova project http://metasolutions.se/2014/03/webbinarium-med-kerstin-forsberg-om-lankade-data-i-lakemedelsforskningen/
Collaborative Drug Discovery -- Life Science Collaboration & Virtualization: ...Pistoia Alliance
The Pistoia Alliance Conference in April 2011 included a series of 10-minute "lightning talks" from vendors about what they think pharma will look like in 2020. This presentation was delivered by Sean Ekins of Collaborative Drug Discovery.
Open science and medical evidence generation - Kees van Bochove - The HyveKees van Bochove
Presentation about open science, the FAIR principles, and medical evidence generation with the OHDSI COVID-19 study-a-thon as an example. I've used variations on this deck in a couple of classroom and online courses for PhD and master students early 2020.
Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis.
http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop
Talk given at the Data Visualisation and the Future of Academic Publishing event. https://www.eventbrite.com/e/data-visualisation-and-the-future-of-academic-publishing-tickets-25372801733?password=dataviz
This is a presentation given at the Opal Events meeting ""Drug Discovery Partnerships: Filling the Pipeline". I was speaking in a session with Jean-Claude Bradley regarding "Pre-competitive Collaboration: Sharing Data to Increase Predictability". This presentation discussed some of the work we are doing on Open PHACTS. My thanks especially to Carole Goble, Lee Harland and Sean Ekins for their comments.
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Alasdair Gray
The Open PHACTS Discovery Platform aims to provide an integrated information space to advance pharmacological research in the area of drug discovery. Effective drug discovery requires comprehensive data coverage, i.e. integrating all available sources of pharmacology data. While many relevant data sources are available on the linked open data cloud, their content needs to be combined with that of commercial datasets and the licensing of these commercial datasets respected when providing access to the data. Additionally, pharmaceutical companies have built up their own extensive private data collections that they require to be included in their pharmacological dataspace. In this paper we discuss the challenges of incorporating private and commercial data into a linked dataspace: focusing on the modelling of these datasets and their interlinking. We also present the graph-based access control mechanism that ensures commercial and private datasets are only available to authorized users.
http://link.springer.com/chapter/10.1007/978-3-642-41338-4_5
Brief introduction to FAIRsharing work with industry (publishers, pharmas) and the FAIR Cookbook (for the Life Science): https://www.opensciencefair.eu/2021/workshops/applying-fair-principles-to-open-science-and-industry-to-drive-innovation-challenges-and-opportunities
Overview of the role of FAIRsharing and a dedicated Collection of data resources (platforms and registries that collect, harmonize, and share participant-level clinical-epidemiological, OMICs, and/or imaging data) for the COVID-19 Clinical Research Coalition and The Tropical Disease Research initiatives: https://coronavirus.tghn.org/research-resources/data-sharing-covid-19
In this webinar we introduce you to the workflows supported by Embase, describe the benefits of Embase content and coverage and show you how you may utilize deep drug indexing to pinpoint and track biomedical information.
Presentation to the "FAIRification put into practice: Characterization of energy data and development of workflows" event by https://www.eeradata.eu => https://www.eeradata.eu/event/2857:online-discussion-fairification-put-into-practice-characterization-of-energy-data-and-development-of-workflows.html#
Open science and medical evidence generation - Kees van Bochove - The HyveKees van Bochove
Presentation about open science, the FAIR principles, and medical evidence generation with the OHDSI COVID-19 study-a-thon as an example. I've used variations on this deck in a couple of classroom and online courses for PhD and master students early 2020.
Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis.
http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop
Talk given at the Data Visualisation and the Future of Academic Publishing event. https://www.eventbrite.com/e/data-visualisation-and-the-future-of-academic-publishing-tickets-25372801733?password=dataviz
This is a presentation given at the Opal Events meeting ""Drug Discovery Partnerships: Filling the Pipeline". I was speaking in a session with Jean-Claude Bradley regarding "Pre-competitive Collaboration: Sharing Data to Increase Predictability". This presentation discussed some of the work we are doing on Open PHACTS. My thanks especially to Carole Goble, Lee Harland and Sean Ekins for their comments.
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Alasdair Gray
The Open PHACTS Discovery Platform aims to provide an integrated information space to advance pharmacological research in the area of drug discovery. Effective drug discovery requires comprehensive data coverage, i.e. integrating all available sources of pharmacology data. While many relevant data sources are available on the linked open data cloud, their content needs to be combined with that of commercial datasets and the licensing of these commercial datasets respected when providing access to the data. Additionally, pharmaceutical companies have built up their own extensive private data collections that they require to be included in their pharmacological dataspace. In this paper we discuss the challenges of incorporating private and commercial data into a linked dataspace: focusing on the modelling of these datasets and their interlinking. We also present the graph-based access control mechanism that ensures commercial and private datasets are only available to authorized users.
http://link.springer.com/chapter/10.1007/978-3-642-41338-4_5
Brief introduction to FAIRsharing work with industry (publishers, pharmas) and the FAIR Cookbook (for the Life Science): https://www.opensciencefair.eu/2021/workshops/applying-fair-principles-to-open-science-and-industry-to-drive-innovation-challenges-and-opportunities
Overview of the role of FAIRsharing and a dedicated Collection of data resources (platforms and registries that collect, harmonize, and share participant-level clinical-epidemiological, OMICs, and/or imaging data) for the COVID-19 Clinical Research Coalition and The Tropical Disease Research initiatives: https://coronavirus.tghn.org/research-resources/data-sharing-covid-19
In this webinar we introduce you to the workflows supported by Embase, describe the benefits of Embase content and coverage and show you how you may utilize deep drug indexing to pinpoint and track biomedical information.
Presentation to the "FAIRification put into practice: Characterization of energy data and development of workflows" event by https://www.eeradata.eu => https://www.eeradata.eu/event/2857:online-discussion-fairification-put-into-practice-characterization-of-energy-data-and-development-of-workflows.html#
The All-New 2016 Engineering Academic Challenge - developed by students for students
The Engineering Academic Challenge (formerly as the Knovel Academic Challenge) is an immersive, 5-week interactive problem-set competition, featuring weekly thematic engineering challenges built around five transdisciplinary themes inspired by the National Academy of Engineering Grand Challenges.
This was a presentation given at the European Patent Office's annual Patent Information Conference in Madrid, Spain on November 10th, 2016.
In it, we give an overview of how machine translation works, latest advances in neural MT, and how this can be applied to patents and intellectual property content, not only for translations but also information extraction and other NLP applications.
Substance searching in Reaxys - Webinar - 24 March 2015Ann-Marie Roche
Professor Damon Ridley was our special guest speaker for this webinar. Damon was Professor of Chemistry at the University of Sydney until 2002 when he left to become Head of the Chemistry Department at Silverbrook Research – which then was Australia’s largest privately owned research organization.
He has published over 150 scientific papers and is an inventor named in over 50 patents granted by the US Patent Office.
However, he also is very well known internationally for his work and publications in scientific information retrieval.
In this webinar Damon shared his years of experience with us and focused in particular on searching for substances in Reaxys.
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
Since the FAIR data principles were published in 2016, many organizations including science funders and governments have adopted these principles to promote and foster true open science collaborations. However, to define a vision and create a video of a Personal Health Train that leverages worldwide FAIR health data in a federated manner is one step. To actually make this happen at scale and be able to show new scientific and medical insights for it is quite another!
In this webinar, we will dive into the basics of FAIR health data, but also take stock of the current situation in health data networks: after a year of frantic research and collaborations and many open datasets and hackathons on COVID-19, has the situation actually improved? Are we sharing health data on a global scale to improve medical practice, or is quality medical data still only accessible to researchers with the right credentials and deep pockets?
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 21st January 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The HyveKees van Bochove
In this talk, the Personal Health Train concept will be introduced, which enables running personalized medicine workflows as trains visiting data stations (e.g. hospital records, primary care records, clinical studies and registries, patient-held data from e.g. wearable sensors etc.) The Personal Health Train is a very powerful concept, which is however dependent on source medical data to be coded with appropriate metadata on consent, license, scope etc. of the data, and the data itself to be encoded using biomedical data standards, which is an ever growing field in biomedical informatics. In order to realize the Personal Health Train biomedical data will need to be FAIR, i.e. adopt the FAIR Guiding Principles. This talk will cover the emerging GO-FAIR international movement, and provide examples of how several European health data networks currently are adopting open standards based stacks, to enable routine health care data to be come accessible for research.
In this presentation, you will learn how to transform a Big Data initiative into a realized, measurable ROI:
• Understand the complex mix of business expectation, hype, reality, and new information source opportunities in the Big Data space
• Use the Business Case process to help to you identify what you can achieve and what is not yet ready
• Build communities of interest around prototypes and plan for success for your company’s advantage
• Learn how to industrialize your Big Data innovations to achieve measurable, sustainable benefits
RSC|ChemSpider is one of the world’s largest online resources for chemistry related data and services. Developed with the intention of delivering access to structure-based chemistry data via the internet the ChemSpider platform hosts over 26 million unique chemical compounds aggregated from over 400 data sources and provides an environment for the community to both annotate and curate these existing data as well as deposit new data to the system. The search system delivers flexible querying capabilities together with links to external sites for publication and patent data. This presentation will review the present capabilities of the ChemSpider system providing direct examples of how to use the system to source high quality data of value to chemists. We will discuss some of the challenges associated with validating data quality and examine how ChemSpider is a part of the new “semantic web for chemistry”. ChemSpider has also spawned a number of additional projects include ChemSpider SyntheticPages for hosting openly peer-reviewed chemical synthesis articles, Learn Chemistry Wiki for students learning chemistry and SpectraSchool for learning spectroscopy.
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
The Pistoia Alliance Biology Domain Strategy April 2011Pistoia Alliance
Michael Braxenthaler (Roche and external liaison officer for Pistoia) describes the Pistoia Alliance biology domain strategy at the first Pistoia Alliance Conference in April 2011.
RSC|ChemSpider is one of the world’s largest online resources for chemistry related data and services. Developed with the intention of delivering access to structure-based chemistry data via the internet the ChemSpider platform hosts over 26 million unique chemical compounds aggregated from over 400 data sources and provides an environment for the community to both annotate and curate these existing data as well as deposit new data to the system. The search system delivers flexible querying capabilities together with links to external sites for publication and patent data. ChemSpider has spawned a number of projects include ChemSpider SyntheticPages for hosting openly peer-reviewed chemical synthesis articles. This presentation will review the present capabilities of the ChemSpider system providing direct examples of how to use the system to source high quality data of value to pharmaceutical companies. We will discuss some of the challenges associated with validating data quality, examine how ChemSpider is a part of the semantic web for chemistry and investigate approaches to using ChemSpider integrated to analytical instrumentation.
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...David Peyruc
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons Learned in Academic and Life Science Settings
Dan Housman, Recombinant by Deloitte
The Recombinant by Deloitte team has worked with organizations such as Kimmel Cancer Center as a model to adapt existing mature i2b2 implementations to meet business and scientific needs. Other organizations are increasingly focused on how to use cloud and high performance computing models to achieve different performance levels. Advanced initiatives are progressing to link commercial tools such as Qlikview to explore tranSMART data and to solve for key gaps in scientific pipelines. Dan will present recent lessons learned, new capabilities, and some of the impact on the path forwards for future tranSMART updates.
FAIR Data-centric Information Architecture.pptxBen Gardner
FAIR Data (Findable, Accessible, Interoperable, Re-usable) is seen as a route to releasing value from our existing data in AstraZeneca as well as setting us up to be able to do so more easily with new data we generate from here on. As we look into the dimensions of FAIR data, Findability can be addressed by indexing and cataloguing our data, accessibility by a combination of information classification, automation and manual processes (including understanding informed consent from patients/participants) and re-usability can be supported by provisioning processes into approved analytical environments. These are all significant challenges, with significant opportunities offered through optimisation and standardisation of supporting processes, but the biggest challenge of all is interoperability. Interoperability requires us to know whether two datasets of the same data type can be pooled for analytical purposes and how we can join together datasets of different types to answer complex questions. In this talk, I will show how AZ R&D is approaching the challenges of Interoperability to enhance the re-use of our data.
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
Linklaters is one of the world’s leading global law firms. The firm has a wealth of high value information held within our systems however due to the nature of these systems it is not always easy to leverage this value. Our goal was to improve decision making across the firm by transforming access to and ability to query data. To do this we wanted a solution that would combine our information, was easy to extend in an iterative fashion and would leverage our existing investment in business intelligence. To achieve this we chose to create a graph based warehouse using Linked Data. Data from our SAP Business Warehouse was combined with flat file and XML feeds from our systems of record and transformed into RDF via ETL services that loaded it into a triple store. To provide simple integration with our existing environment a SPARQL to OData service was deployed creating an OData compliant endpoint. Finally a model driven, mobile friendly, user interface was created allowing users to query, review results and explore the underlying graph. This talk will describe the approach we took and the lessons learnt.
A primer on Blockchain, Semantic Web and Ricardian Contracts.
Semantic Blockchain is a proposal where the Semantic Web meets the Blockchain. Combining these two technologies could provide the Semantic web with a transparent proof of work and trust mechanism while conversely disambiguating data stored on the blockchain, solving one of the key challenges with Riccardian/Smart contracts. This presentation will explore how these two technologies might be combine using the example of a smart contract. However the potential application is much bigger and could provide a key back bone underlying the Internet of Things.
What AI is and examples of how it is used in legalBen Gardner
This presentation was given at Legal Geek on 10th Dec 2015. It is a scenesetting peice that looks to de-mystify artificial intelligence by looking beyond the hype.
When we think about search there are essentially two activities we wish to perform 1) Search to find a known thing and 2) Search to explore/research around a thing. When we search to find returning hits in a list format works well. However when we want to explore a list is a poor way to visualise the returned hits. This presentation looks at how semantics can be used to develop solutions that allow people to explore information space rather than just search it.
This is an intriduction to how Semantic/Linked Data technologies can help solve the challenges of information overload. This presentation is often given in co-junction with 'meet Jessica - Making Connections Matter.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Stratergies for the intergration of information (IPI_ConfEX)
1. Approaches to Information Integration Ben Gardner Therapeutic Area Scientific Information Services IPI-ConFex (March 2007)
2.
3.
4.
5.
6.
7.
8.
9.
10.
11. How many nets can we cast…? ~6000 diseases recognized by Medicine ~25K genes in human genome Indication 1 Indication 2 Indication N Indication 3 Diseases / Indications Target 1 Target 2 Target 3 Target N Targets
12. How many nets can we cast…? Target 1 Target 2 Target 3 Target N Targets Indication 1 Indication 2 Indication N Indication 3 Diseases / Indications Informatics allows us to systematically mine for evidence against all hypotheses: All drug targets x All diseases
13. How many nets can we cast…? Target 1 Target 2 Target 3 Target N Targets Indication 1 Indication 2 Indication N Indication 3 Diseases / Indications All diseases for a target
14. How many nets can we cast…? Target 1 Target 2 Target 3 Target N Targets Indication 1 Indication 2 Indication N Indication 3 Diseases / Indications All targets for a disease
15. What is the Matrix ? 12.8 Million Medline Abstracts 4 Billion words 4 Million Intersects (20 Synonyms per entity) Targets Diseases Evidence 2000 curated diseases 2000 curated targets
16. “ Show me all the diseases associated with PDE5 from scientific literature ” PDE5 has 40 related terms (PDE5 or phosphodiesterase 5 or phosphodiesterase V or phosphodiesterase (PDE) 5 or phosphodiesterase (PDE) V or pde V or PDE-5 or PDE V or PDE 5 or phosphodiesterase-5 or phosphodiesterase 5A or HSPDE5A or PDE5A or phosphodiesterase-5A or PDE(5) or PDE(5A) or phosphodiesterase (PDE) 5A or PDE 5A or PDE-5A or UK-092,480 or viagra or sildenafil or IBMX or 3-isobutyl-1-methylxanthine or zaprinast or tadalafil or vardenafil or SKF-96231 or YC-1 or DMPPO or UK-83405 or Sch-51866 or UK-343664 or WIN-65579 or GF-248 or T-1032 or SR-265579 or KF-31327 or OPC-35564) There’s ~ 6000 curated diseases Here’s just one: (ASTHMA or asthmatic or Acute severe asthma or Asthmaticus or Excercise induced Asthma or mild intermittent asthma or mild persistant Asthma or moderate persisitant Asthma or Severe persistant Asthma or chronic persistant Asthma or extrinsic Asthma or intrinsic Asthma or aspirin-sensitive asthmatics or Aspirin induced Asthma or occupational asthma or Atopy or allergic asthma) There are about 17 million articles in Medline 17 million abstracts from 32000 separate journals. 4 billion words X X Systematic Searching for Evidence PharmaMatrix - US Patent No . US2005060305 17 th March 2003 Hopkins el al.
17. And here’s a matrix! – All Targets for Asthma Simple abstract co-occurrence Big numbers! Filter for precedence Reduce the workload