The Longevity Genie is an open-source toolbox and a chatbot that aims to enhance the capacity of large language models (LLMs) to address inquiries on personal health, genetics, and longevity research.
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Longevity Genie, 01.10.2023
1. Kulaga Anton
Karmazin Alex
Koval Maria
Nikhil Yadala
for ARDD-2023
Emerging Science &
Technologies
Workshop
Large Language Models for Longevity Longevity Genie
2.
3.
4. Systems Biology Of Aging Group
● Aging research
● Bioinformatic pipelines
● RNA-Seq
● De novo assembly
● LightGBM-SHAP ML
● Cross-species
transcriptomic analysis
CellFabrik
● Autoimmune and cultured
meat research
● Genetic constructs design
● CRISPR design
● Adaptive immune repertoire
sequencing analysis (AIRR-
Seq)
● Protein docking
● Basic molecular dynamics
Nonprofit Activities
● Started as Longevity/H+
volunteer and then
switched field
● Bioinformatic/Longevity/H
+ meetups, communities
● Just-DNA-Seq project
● Longevity genie
project
Few words about myself …
5. Kulaga, Anton Y., et al. "Machine learning analysis of longevity-associated gene expression landscapes in
mammals." International journal of molecular sciences 22.3 (2021): 1073.
Some of my previous applied ML work, citations recommended =)
7. LLMs do not know specialized longevity-
related datasets
If you ask about APOE or several
other variants popular in
newspaper you will get results.
However, for the most of the
variants you get nothing or
totally irrelevant general
information.
The main reason for this is lack
of scientific literature in the
training data.
8. LLMs often hallucinate, especially
about sources of knowledge
What is wrong with links
provided by ChatGPT?
TRUE
TRUE
FAKE
9. LLMs are bad in logic and math
unless assisted with other tools
https://medium.com/@konstantine_45825/gpt-
4-cant-reason-2eab795e2523
11. Problem Statement
What are the current limitations of using ChatGPT and other LLMs in
Longevity related queries?
● LLMs do not know specialized longevity-related datasets
● LLMs often hallucinate, especially about sources of knowledge
● LLMs basically do not have easy access to your health related
data and cannot take it into account
● Is bad in logic and math unless assisted by additional tools
14. Let’s take my personal longevity genomics results
I am proud of my Foxo gene variants, let’s ask ChatGPT
15. Let’s ask ChatGPT!
The specific polymorphisms you
mentioned (rs4946936, rs2802290,
rs9400239, rs7762395, rs13217795)
in the FOXO genes may have
potential associations with aging
and longevity, given the critical
role of the FOXO genes in aging-
related processes.
Conclusions:
useless
general info about gene
vague
specific, on topic, useful
Q: There are rs4946936,
rs2802290, rs9400239,
rs7762395, rs13217795 in FOXO
gene, explain their connection
with aging and longevity
16. Let’s ask LongevityGPT http://asklongevitygpt.com/
Developed by our collaborator
Nikhil Yadala (Healome)
17. Longevity GPT
Pros:
● provides sources
● searching in
SemanticScholar
Cons:
● too restrictive if
source not found
but model knows
● is limited to top
100 papers
● the code is still
non-opened
(technical issues)
21. Our results from Longevity Genie
Q: There are rs4946936, rs2802290, rs9400239, rs7762395,
rs13217795 in FOXO gene, explain their connection with aging
and longevity
Polymorphisms in the FOXO gene, including rs2802290, rs9400239, rs7762395, and
rs13217795, have been associated with human longevity. Specifically, rs2802292
(proxied by rs2802288) showed significant allelic association with extreme longevity in a
study of Southern Italian centenarians. Furthermore, a connection between insulin,
FOXO, oxidative stress, and human longevity has been hypothesized, as oxidative stress
is a proposed mechanism of aging. Haplotype blocks, including rs7762395 and
rs13217795, have also shown nominally significant differences between the oldest old
and control groups, providing further evidence for the association between FOXO gene
polymorphisms and longevity.nn', 'sources': 'http://doi.org/10.1089/rej.2008.0827 ,
http://doi.org/10.1073/pnas.0801030105 , http://doi.org/10.1186/1471-2350-8-S1-S13,
http://doi.org/10.1111/j.1474-9726.2010.00627.x
23. LLM:
● split words into
tokens
● embed them in a
vector space
● process the results
and generate new
tokens
24. Ada-002 which is ChatGPT
embedding is only at 13th
place for general embedding
tasks where bge-large-en
takes first place
It is also not represented
in BLURB leaderboard focused
on biomedical retrieval
tasks, where BioLinkBERT
rules
https://huggingface.co/spaces/mteb/leaderboard
25. Preparing for RAG
What we have done:
● we took S2ORC Semantic Scholar Corpus
that contains 12 mln papers
● we selected 80K longevity related,
preprocessed them and indexed with Bio
BioLinkBERT (best in biomedical domain) and
BGE-large-en (best overall), published at
hugging-face
● QDrant vector store was used for storage and
similarity search
● Condenser was used to preprocess retrieved
https://huggingface.co/longevity-genie
44. Key observations from LongevityGPT
USER QUERIES:
- Inquiries about specific or recent studies
- General longevity queries and causes of aging
- Drug combinations for healthspan extension
- Personal health strategies, including diets and
supplements
- Attempts to execute scientific workflows,
often limited by the lack of bioinformatic tools
for LLMs.
NOTES:
While general health queries came
from longevity enthusiasts, advanced
users looked for LLM-supported
research tools.
For instance, a user tried using LLM to
identify lifespan interventions via in
vivo transient reprogramming, seeking
overexpressed genes and suggesting
potential human-applicable
nutraceuticals or foods.
45. WHAT CAN BE IMPROVED SHORT TERM? (4-5 months)
UI improvements:
● web
● bots (telegram, whatsapp)
ChatGPT plugin:
● can save a lot of money on ChatGPT API calls and can bring users who will select it in
catalogues
● answers on questions about ageing
Better answers to questions about:
● Longevity , longevity activism and ageing in general
● Genetic and drug lifespan interventions
● Genes and gene products about longevity
● Ageing research articles
● Drugs and ageing
● Questions that require biological databases or APIs
Better search (mix of structural sources, text and vector search in literature)
50. Answer drug-gene interaction questions better:
Possible answer using DGIdb:
● Generic information
about the drug and the
gene
● Guesses on how they
can interact
● No specific information
on the question
Rapamycin's Inhibition of mTOR: Rapamycin is an inhibitor of mTOR. Since
decorin's effect on fibrillin-1 synthesis involves the mTOR pathway, rapamycin
can block the enhanced synthesis of fibrillin-1 mediated by decorin.
IGF-I and Fibrillin-1 Synthesis: IGF-I, which also signals through the IGF-I
receptor/mTOR/p70 S6 kinase pathway, stimulates fibrillin-1 synthesis, further
emphasizing the importance of this pathway in regulating fibrillin-1 synthesis.
Rapamycin's Therapeutic Potential: In a mouse model of renal fibrosis
(unilateral ureteral obstruction), which is characterized by increased fibrillin-1
synthesis, systemic administration of rapamycin significantly reduced the
number of interstitial fibroblasts and the deposition of fibrillin-1.
In summary, decorin plays a crucial role in the synthesis of fibrillin-1 in the
kidney by interacting with and activating the IGF-I receptor/mTOR/p70 S6
kinase signaling pathway. Rapamycin, by inhibiting mTOR, can block the
effects of decorin on fibrillin-1 synthesis, highlighting its potential therapeutic
role in conditions characterized by increased fibrillin-1 synthesis, such as renal
fibrosis.
Source: https://pubmed.ncbi.nlm.nih.gov/17200203/
51. Specific questions for studies related to longevity drugs
Possible answer using DrugAge:
Rosmarinic Acid promoted lifespan
and motoricity in a dose-dependent
manner, and reduced fat store
without threatening fertility in
Caenorhabditis elegans
(https://pubmed.ncbi.nlm.nih.gov/3
1206890/) . It was possible to
increase their average lifespan on
63.23% using 180 μM
of RA.
53. NOTE:
before now we were
talking about in-context
learning which is
actually advanced way
of prompting.
Much more (but at
higher cost) can be
achieved if we
train/fine-tune the
existing open-source
models
54. Conclusions(?)
● OpenSource is efficient way
to move things forward which
is proved by flourishing of
the open-source LLMs
● Large Language Models can be
efficiently tuned for ageing
research needs
● To harness the power of the
OpenSource LLM ecosystem we
have to turn it into a
longevity community effort
55. Contribution statement
Anton Kulaga
( presentation,
code, idea)
Newton Winter
(telegram bot,logo,
data cleanup)
Maria Koval
(clinvar code,
design)
Vlada Tyshchenko,
(GPT agents for
clinical trials )
Nikhil Yadala (Longevity GPT)
Gratitudes:
Stan Skakun,
Dmitry Zayka,
Vincent Weisser
Alex Karmazin
(AnAge,
LongevityMap,
GenAge
agents)