SlideShare a Scribd company logo
1 of 16
PRIVATE AND CONFIDENTIAL 1
CimateGPT
2
The
People
Highly experienced team of AI scientists
and engineers, providing word-class
expertise to help customers refine their
licensed or in-house AI models
AI
Models
Advanced engines for automatic speech
recognition, machine translation,
natural language understanding /
processing, and text to speech
Transcribed audio hours
250K+
1.7M+
Audio hours for unsupervised training
250K
1.5M
60+ Languages with 100s of dialects
Data
AppTek’s data packs and services are
highly valued for training AI and
machine learning models
Automatic transcription of broadcast, media and entertainment,
microphone and telephony in 60+ languages
Utilizes software to translate text or speech into different
languages, featuring 600+ language pairs
Context from ASR used to discern meaning and execute an intent
from voice commands
Reading out text in human-like, expressive and adapted voices
Scientists
PhDs
Research engineers
100s
9
Peer-reviewed papers
Patents
ASR
MT
NLU/P
TTS
~60
32
~20
AppTek Company Overview
• Developed and fine-tuned a generative LLM model to
improve fluency of answers to climate change questions
• 3 dimensions/perspectives: Natural Science, Economics,
and Sociology
• Baselines are Llama2-7B, Llama2-13B and Llama2-70b
trained on 2T tokens
• Continuous pre-training on 4.2B tokens climate-related
text
• Instruction Fine Tuning augmented with climate-scientist
curated data (10k demonstration pairs)
• Hierarchical retrieval augmentation
• Multilinguality through cascaded system
ClimateGPT
ClimateGPT
MT
x -> en
MT
en -> x
Retrieval
Scientific
Documents
Questio
n
Answer
• Continuous pre-training on 4.2B climate-related text
• Extreme Weather reports (10 years * 1M articles)
• Technical Game-Changing Breakthroughs (153
themes x.1000 articles)
• Selection through Sustainable Development Goals (17
SDGs)
• Climate Change News
• Climate Change reports
• World Bank, OECD, IPCC, UN, EU, TFCD, US,
NASA, ESA, WRI, IREA, WEF, Nature Finance
• Climate Academic Research
ClimateGPT
Training
CPT + IFT
ClimateGPT
Llama2-7B CPT
Climate
document
s
IFT
Climate
instruction
s
• Instruction Fine Tuning augmented with climate-scientist
curated data (10k demonstration pairs)
ClimateGPT
Training
CPT + IFT
ClimateGPT
Llama2-7B CPT
Climate
document
s
IFT
Climate
instruction
s
Domain Name Total Size Training Samples
Climate
Senior Expert Interviews 74 1,332
Grounded Expert Demonstration 403 7,254
Grounded Non-Expert Demonstrations 9,663 146,871
Synthetically Generated Demonstrations 57,609 0
Climate-dimension specific StackExchange 3,282 9,846
General
AppTek General 700 2,100
OASST-1 3,783 11,349
Dolly 15,001 45,003
Llama-2 Safety 939 2,817
FLAN 38,909 30,000
CoT 448,439 15,000
60.8%
39.2%
271,572 demonstration pairs
• 700 documents (IPCC* reports + academic papers
cleaned from tables and references)
• 20k pages
• GPT-3.5 tagged along 3 dimensions (economy, social,
science)
• Vector search (transformer bi-encoder)
• Hierarchical retrieval
• Page level search (top 60)
• Chunks of 115 tokens per page (top 5)
• Citations provided through selected chunk
 5*115 + meta-data == 154 tokens added per
dimension
ClimateGPT
Inference time
RAG
ClimateGPT
MT
x -> en
MT
en -> x
Retrieval
Scientific
Document
s
Questio
n
Answer
* Intergovernmental Panel on Climate Change
Hierarchical
RAG
7
Climate
reports
Climate
economy
Climate
science
Climate
social
Top 60
pages
Top 60
pages
Top 60
pages
Top 5
chunks
Top 5
chunks
Top 5
chunks
20k pages
• Multilinguality through cascaded system
No truly multilingual open source LLM available
Allows to keep compactness and LLM model
precision
Answer quality for low resourced languages
(science)
– MT may not be adapted to culture
ClimateGPT
Inference time
MT
ClimateGPT
MT
x -> en
MT
en -> x
Retrieval
Scientific
Documents
Questio
n
Answer
• Evaluated on
– standard language comprehension tasks
– climate related comprehension tasks
• ClimateGPT-7B models equals performance of
Llama2-70B on climate tasks
– 10 times smaller
– 12 time less energy needed at inference time
• Incremental training at a tiny fraction of the cost needed to
train the base model
• Multilinguality addressed with a cascaded approach
ClimateGPT
Results
9
PRIVATE AND CONFIDENTIAL 10
RAG: vector search quality
PRIVATE AND CONFIDENTIAL 11
The higher the abstraction level
the higher precision is needed
12
K
G (Manual) KG is factual
Contains explicit alternatives /
complementarity /
inconsistencies
Allows reasoning
Does not always have an answer
PROS
Relations are based on hard-
coded ontologies
Intensive manual work for high
quality
To be efficient, KG expansion is
task dependent
Precision impacts flexibility
CONS
KGs: Pros and Cons
13
K
G Based on data
Automatic
Task/Domain independent
PROS
Hallucinates
One answer per perspective
No abstraction: No reasoning
structure
Always has an answer
CONS
LLMs: Pros and Cons
14
KGs to Improve RAG*
Document-based KG generation has good results when intention/goal is known
• Given a question to the LLM (Q-Intention + Q-Entities)
• Given a set of documents used as a priori knowledge indexed on D-Entities
• Select subset of documents based on Q-Entities
• Apply Q-Intention Recognition on the subset of documents
*work in progress
• Extract document snippets with
ranked Q-Intention
• Provide LLM with question + snippets
• Build a (dynamic) KG from subset
• Get KG-facts related by Q-Intention
• Provide LLM with question, KG facts and
document snippet related to KG facts
Reduce snippet vector search to intention recognition
and
keep entities (abstraction and resolved value) as hard as possible
15
PRIVATE AND CONFIDENTIAL
Q-intention
Q-Entity-type-
1
Question
Documents
Q-Entity-type-
2
D1
k-l-Entity-
type-i
Di
m-n-Entity-
type-i
Dn
o-p-Entity-
type-n
… …
Snippet Selection
or
KG resolution
Q-intention
D1
k-l-Entity-
type-1
D2
q-r-Entity-
type-2
PRIVATE AND CONFIDENTIAL
Thank You!
16
LLM

More Related Content

Similar to AppTek-CLimateGPT-EvryWS20240308-v3.pptx

Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudAdianto Wibisono
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriGeospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriSafe Software
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Safe Software
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
 
Jos van Sas - Testimonial Alcatel-Lucent Bell Labs
Jos van Sas - Testimonial Alcatel-Lucent Bell LabsJos van Sas - Testimonial Alcatel-Lucent Bell Labs
Jos van Sas - Testimonial Alcatel-Lucent Bell Labsimec.archive
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...confluent
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptxKtonNguyn2
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
MSR 2009
MSR 2009MSR 2009
MSR 2009swy351
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.Pankaj Chandan Mohapatra
 
PetroSync - Advanced Seismic Data Acquisition and Processing
PetroSync - Advanced Seismic Data Acquisition and ProcessingPetroSync - Advanced Seismic Data Acquisition and Processing
PetroSync - Advanced Seismic Data Acquisition and ProcessingPetroSync
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Joel Saltz
 
CV_Kelvin_2016
CV_Kelvin_2016CV_Kelvin_2016
CV_Kelvin_2016Kelvin Tan
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersIan Foster
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...In-Memory Computing Summit
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformShunya Ueta
 
Asymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain AdaptationAsymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain AdaptationYoshitaka Ushiku
 

Similar to AppTek-CLimateGPT-EvryWS20240308-v3.pptx (20)

Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriGeospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & Esri
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Jos van Sas - Testimonial Alcatel-Lucent Bell Labs
Jos van Sas - Testimonial Alcatel-Lucent Bell LabsJos van Sas - Testimonial Alcatel-Lucent Bell Labs
Jos van Sas - Testimonial Alcatel-Lucent Bell Labs
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
MSR 2009
MSR 2009MSR 2009
MSR 2009
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.
 
PetroSync - Advanced Seismic Data Acquisition and Processing
PetroSync - Advanced Seismic Data Acquisition and ProcessingPetroSync - Advanced Seismic Data Acquisition and Processing
PetroSync - Advanced Seismic Data Acquisition and Processing
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
 
CV_Kelvin_2016
CV_Kelvin_2016CV_Kelvin_2016
CV_Kelvin_2016
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platform
 
Asymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain AdaptationAsymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain Adaptation
 

Recently uploaded

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 

Recently uploaded (20)

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 

AppTek-CLimateGPT-EvryWS20240308-v3.pptx

  • 2. 2 The People Highly experienced team of AI scientists and engineers, providing word-class expertise to help customers refine their licensed or in-house AI models AI Models Advanced engines for automatic speech recognition, machine translation, natural language understanding / processing, and text to speech Transcribed audio hours 250K+ 1.7M+ Audio hours for unsupervised training 250K 1.5M 60+ Languages with 100s of dialects Data AppTek’s data packs and services are highly valued for training AI and machine learning models Automatic transcription of broadcast, media and entertainment, microphone and telephony in 60+ languages Utilizes software to translate text or speech into different languages, featuring 600+ language pairs Context from ASR used to discern meaning and execute an intent from voice commands Reading out text in human-like, expressive and adapted voices Scientists PhDs Research engineers 100s 9 Peer-reviewed papers Patents ASR MT NLU/P TTS ~60 32 ~20 AppTek Company Overview
  • 3. • Developed and fine-tuned a generative LLM model to improve fluency of answers to climate change questions • 3 dimensions/perspectives: Natural Science, Economics, and Sociology • Baselines are Llama2-7B, Llama2-13B and Llama2-70b trained on 2T tokens • Continuous pre-training on 4.2B tokens climate-related text • Instruction Fine Tuning augmented with climate-scientist curated data (10k demonstration pairs) • Hierarchical retrieval augmentation • Multilinguality through cascaded system ClimateGPT ClimateGPT MT x -> en MT en -> x Retrieval Scientific Documents Questio n Answer
  • 4. • Continuous pre-training on 4.2B climate-related text • Extreme Weather reports (10 years * 1M articles) • Technical Game-Changing Breakthroughs (153 themes x.1000 articles) • Selection through Sustainable Development Goals (17 SDGs) • Climate Change News • Climate Change reports • World Bank, OECD, IPCC, UN, EU, TFCD, US, NASA, ESA, WRI, IREA, WEF, Nature Finance • Climate Academic Research ClimateGPT Training CPT + IFT ClimateGPT Llama2-7B CPT Climate document s IFT Climate instruction s
  • 5. • Instruction Fine Tuning augmented with climate-scientist curated data (10k demonstration pairs) ClimateGPT Training CPT + IFT ClimateGPT Llama2-7B CPT Climate document s IFT Climate instruction s Domain Name Total Size Training Samples Climate Senior Expert Interviews 74 1,332 Grounded Expert Demonstration 403 7,254 Grounded Non-Expert Demonstrations 9,663 146,871 Synthetically Generated Demonstrations 57,609 0 Climate-dimension specific StackExchange 3,282 9,846 General AppTek General 700 2,100 OASST-1 3,783 11,349 Dolly 15,001 45,003 Llama-2 Safety 939 2,817 FLAN 38,909 30,000 CoT 448,439 15,000 60.8% 39.2% 271,572 demonstration pairs
  • 6. • 700 documents (IPCC* reports + academic papers cleaned from tables and references) • 20k pages • GPT-3.5 tagged along 3 dimensions (economy, social, science) • Vector search (transformer bi-encoder) • Hierarchical retrieval • Page level search (top 60) • Chunks of 115 tokens per page (top 5) • Citations provided through selected chunk  5*115 + meta-data == 154 tokens added per dimension ClimateGPT Inference time RAG ClimateGPT MT x -> en MT en -> x Retrieval Scientific Document s Questio n Answer * Intergovernmental Panel on Climate Change
  • 8. • Multilinguality through cascaded system No truly multilingual open source LLM available Allows to keep compactness and LLM model precision Answer quality for low resourced languages (science) – MT may not be adapted to culture ClimateGPT Inference time MT ClimateGPT MT x -> en MT en -> x Retrieval Scientific Documents Questio n Answer
  • 9. • Evaluated on – standard language comprehension tasks – climate related comprehension tasks • ClimateGPT-7B models equals performance of Llama2-70B on climate tasks – 10 times smaller – 12 time less energy needed at inference time • Incremental training at a tiny fraction of the cost needed to train the base model • Multilinguality addressed with a cascaded approach ClimateGPT Results 9
  • 10. PRIVATE AND CONFIDENTIAL 10 RAG: vector search quality
  • 11. PRIVATE AND CONFIDENTIAL 11 The higher the abstraction level the higher precision is needed
  • 12. 12 K G (Manual) KG is factual Contains explicit alternatives / complementarity / inconsistencies Allows reasoning Does not always have an answer PROS Relations are based on hard- coded ontologies Intensive manual work for high quality To be efficient, KG expansion is task dependent Precision impacts flexibility CONS KGs: Pros and Cons
  • 13. 13 K G Based on data Automatic Task/Domain independent PROS Hallucinates One answer per perspective No abstraction: No reasoning structure Always has an answer CONS LLMs: Pros and Cons
  • 14. 14 KGs to Improve RAG* Document-based KG generation has good results when intention/goal is known • Given a question to the LLM (Q-Intention + Q-Entities) • Given a set of documents used as a priori knowledge indexed on D-Entities • Select subset of documents based on Q-Entities • Apply Q-Intention Recognition on the subset of documents *work in progress • Extract document snippets with ranked Q-Intention • Provide LLM with question + snippets • Build a (dynamic) KG from subset • Get KG-facts related by Q-Intention • Provide LLM with question, KG facts and document snippet related to KG facts Reduce snippet vector search to intention recognition and keep entities (abstraction and resolved value) as hard as possible

Editor's Notes

  1. HuggingFace Transformers (Wolf et al., 2020) and Sentence Transformers9 (Reimers and Gurevych, 2019) was used to deploy the embedding models and to embed text in preprocessing and inference stages. For efficient nearest-neighbor search we use ScaNN10 (Guo et al., 2020).