SlideShare a Scribd company logo
1 of 15
Download to read offline
Latencies within
Multimodal GenAI
Varun Singh
Daily.co
11 Dec 2023
Understanding Latency
Toolkits for
AI-powered voice
and video
Memory-based Networks
and Transformers
2016
Conversational AIs are
everywhere
2018
LLMs and Generative AIs:
Image/ Video/Chat
2023
2013
Neural Networks for Natural
Language Processing
AI’s new inflection point, Generative AI
ChatGPT and friends
Text AI Engine/LLM Text
Websocket
HTTP2/3
Websocket
HTTP2/3
Latency for chat while measurable is
not significant, because there can be
hidden with UX tools
Voice AI Engine Voice
Speech to
text
Text to
Speech
Multimodal Bots!
Humans are messy, AI engines need
to work with this
Confidential ©2023 Daily
Real-time AI experience workflow
Confidential ©2023 Daily
Transports
// Understanding latency
WebSockets and WebTransport
● Can stream packets, but handle audio chunking
● Maybe, HOLB
WebRTC
● Best option, currently
Irrespective of transport, things to consider
● Voice activity detection
● Audio packetization, is 20ms the best option
● Redundancy – FEC, NACK etc
Confidential ©2023 Daily
Latencies stack up
// Understanding latency
Speech-to-text
Text fed to LLM, LLM response
Text-to-speech
Confidential ©2023 Daily
Latencies stack up, but do they?
// Understanding latency
Speech-to-text
Text fed to LLM, LLM response
Text-to-speech
We hem and haw
We interrupt, speak over each other
We sometimes take a moment to
conjure and organise our thoughts
Confidential ©2023 Daily
Latencies stack up, but do they?
// Understanding latency
Speech-to-text
● Provide each word? Until a punctuation? Talk spurt?
● Because spoken word is not the same as written
Confidential ©2023 Daily
Latencies stack up, but do they?
// Understanding latency
Speech-to-text
● Provide each word? Until a punctuation? Talk spurt?
● Because spoken word is not the same as written
Text fed to LLM
● Feed fragments or complete talk spurt?
● More context means better LLM response
Confidential ©2023 Daily
Latencies stack up, but do they?
// Understanding latency
Speech-to-text
● Provide each word? Until a punctuation? Talk spurt?
● Because spoken word is not the same as written
Text fed to LLM
● Feed fragments or complete talk spurt?
● More context means better LLM response
Text-to-Speech
● Word by word? Sentence?
● Again spoken words have intonation, which require
context
Confidential ©2023 Daily
Real-time AI experience workflow
Download it, give it a try

More Related Content

Similar to Latencies+within++Multimodal+GenAI.pdf

Presentation 4 1 REDY ok.pptx
Presentation 4 1  REDY ok.pptxPresentation 4 1  REDY ok.pptx
Presentation 4 1 REDY ok.pptxSambitkumarBarik2
 
Ebook media-and-entertainment
Ebook media-and-entertainmentEbook media-and-entertainment
Ebook media-and-entertainmentBnsplBraahmam
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft Meetups
 
The Ring programming language version 1.9 book - Part 97 of 210
The Ring programming language version 1.9 book - Part 97 of 210The Ring programming language version 1.9 book - Part 97 of 210
The Ring programming language version 1.9 book - Part 97 of 210Mahmoud Samir Fayed
 
Intelligent Microcontent: At the Point of Content Convergence | Rob Hanna
Intelligent Microcontent: At the Point of Content Convergence | Rob HannaIntelligent Microcontent: At the Point of Content Convergence | Rob Hanna
Intelligent Microcontent: At the Point of Content Convergence | Rob HannaLavaConConference
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfOrtus Solutions, Corp
 
Video Accessibility Toolkit for Success in a Virtual Environment
Video Accessibility Toolkit for Success in a Virtual EnvironmentVideo Accessibility Toolkit for Success in a Virtual Environment
Video Accessibility Toolkit for Success in a Virtual Environment3Play Media
 
Conversational interfaces and time series prediction
Conversational interfaces and time series predictionConversational interfaces and time series prediction
Conversational interfaces and time series predictionBirger Moell
 
From Mediasoup WebRTC to Livekit Self-Hosted .pdf
From Mediasoup WebRTC to  Livekit Self-Hosted .pdfFrom Mediasoup WebRTC to  Livekit Self-Hosted .pdf
From Mediasoup WebRTC to Livekit Self-Hosted .pdfatyenoria
 
The Ring programming language version 1.2 book - Part 77 of 84
The Ring programming language version 1.2 book - Part 77 of 84The Ring programming language version 1.2 book - Part 77 of 84
The Ring programming language version 1.2 book - Part 77 of 84Mahmoud Samir Fayed
 
The 3Play Way: Real-Time Captioning in Higher Education
The 3Play Way: Real-Time Captioning in Higher EducationThe 3Play Way: Real-Time Captioning in Higher Education
The 3Play Way: Real-Time Captioning in Higher Education3Play Media
 
Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Moses Altovar
 
Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOMsathiyaseelanm
 
What is the best programming language for your web product?
What is the best programming language for your web product?What is the best programming language for your web product?
What is the best programming language for your web product?MobiDev
 
Conversational experience by Systango
Conversational experience by SystangoConversational experience by Systango
Conversational experience by SystangoSystango
 
AI for Subtitling - Limecraft presentation the 2022 Open Forum
AI for Subtitling - Limecraft presentation the 2022 Open ForumAI for Subtitling - Limecraft presentation the 2022 Open Forum
AI for Subtitling - Limecraft presentation the 2022 Open ForumMaarten Verwaest
 
Language translator
Language translatorLanguage translator
Language translatorSumitSumit26
 
What is the best programming language to learn if you want to work on the blo...
What is the best programming language to learn if you want to work on the blo...What is the best programming language to learn if you want to work on the blo...
What is the best programming language to learn if you want to work on the blo...BlockchainX
 

Similar to Latencies+within++Multimodal+GenAI.pdf (20)

Presentation 4 1 REDY ok.pptx
Presentation 4 1  REDY ok.pptxPresentation 4 1  REDY ok.pptx
Presentation 4 1 REDY ok.pptx
 
Ebook media-and-entertainment
Ebook media-and-entertainmentEbook media-and-entertainment
Ebook media-and-entertainment
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPT
 
The Ring programming language version 1.9 book - Part 97 of 210
The Ring programming language version 1.9 book - Part 97 of 210The Ring programming language version 1.9 book - Part 97 of 210
The Ring programming language version 1.9 book - Part 97 of 210
 
Intelligent Microcontent: At the Point of Content Convergence | Rob Hanna
Intelligent Microcontent: At the Point of Content Convergence | Rob HannaIntelligent Microcontent: At the Point of Content Convergence | Rob Hanna
Intelligent Microcontent: At the Point of Content Convergence | Rob Hanna
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
 
Video Accessibility Toolkit for Success in a Virtual Environment
Video Accessibility Toolkit for Success in a Virtual EnvironmentVideo Accessibility Toolkit for Success in a Virtual Environment
Video Accessibility Toolkit for Success in a Virtual Environment
 
Proposal presentation.pptx
Proposal presentation.pptxProposal presentation.pptx
Proposal presentation.pptx
 
Conversational interfaces and time series prediction
Conversational interfaces and time series predictionConversational interfaces and time series prediction
Conversational interfaces and time series prediction
 
From Mediasoup WebRTC to Livekit Self-Hosted .pdf
From Mediasoup WebRTC to  Livekit Self-Hosted .pdfFrom Mediasoup WebRTC to  Livekit Self-Hosted .pdf
From Mediasoup WebRTC to Livekit Self-Hosted .pdf
 
The Ring programming language version 1.2 book - Part 77 of 84
The Ring programming language version 1.2 book - Part 77 of 84The Ring programming language version 1.2 book - Part 77 of 84
The Ring programming language version 1.2 book - Part 77 of 84
 
The 3Play Way: Real-Time Captioning in Higher Education
The 3Play Way: Real-Time Captioning in Higher EducationThe 3Play Way: Real-Time Captioning in Higher Education
The 3Play Way: Real-Time Captioning in Higher Education
 
ChatGPT PPT
ChatGPT PPTChatGPT PPT
ChatGPT PPT
 
Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...Techniques in translation, computer assisted, machine translation, subtitling...
Techniques in translation, computer assisted, machine translation, subtitling...
 
Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOM
 
What is the best programming language for your web product?
What is the best programming language for your web product?What is the best programming language for your web product?
What is the best programming language for your web product?
 
Conversational experience by Systango
Conversational experience by SystangoConversational experience by Systango
Conversational experience by Systango
 
AI for Subtitling - Limecraft presentation the 2022 Open Forum
AI for Subtitling - Limecraft presentation the 2022 Open ForumAI for Subtitling - Limecraft presentation the 2022 Open Forum
AI for Subtitling - Limecraft presentation the 2022 Open Forum
 
Language translator
Language translatorLanguage translator
Language translator
 
What is the best programming language to learn if you want to work on the blo...
What is the best programming language to learn if you want to work on the blo...What is the best programming language to learn if you want to work on the blo...
What is the best programming language to learn if you want to work on the blo...
 

More from Nakhoudah

IBA-Business Model Canvas for Small Scale Startups
IBA-Business Model Canvas for Small Scale StartupsIBA-Business Model Canvas for Small Scale Startups
IBA-Business Model Canvas for Small Scale StartupsNakhoudah
 
WebRTCProgrammersCombatingLatency.key.pdf
WebRTCProgrammersCombatingLatency.key.pdfWebRTCProgrammersCombatingLatency.key.pdf
WebRTCProgrammersCombatingLatency.key.pdfNakhoudah
 
20231211+-+Winning+with+Latency.pdf
20231211+-+Winning+with+Latency.pdf20231211+-+Winning+with+Latency.pdf
20231211+-+Winning+with+Latency.pdfNakhoudah
 
111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf
111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf
111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdfNakhoudah
 
Matt_Cyber Security Core Deck September 2016.pptx
Matt_Cyber Security Core Deck September 2016.pptxMatt_Cyber Security Core Deck September 2016.pptx
Matt_Cyber Security Core Deck September 2016.pptxNakhoudah
 
ATS-Airship_v0.2.pptx
ATS-Airship_v0.2.pptxATS-Airship_v0.2.pptx
ATS-Airship_v0.2.pptxNakhoudah
 
ATS-Overview_v0.1.pptx
ATS-Overview_v0.1.pptxATS-Overview_v0.1.pptx
ATS-Overview_v0.1.pptxNakhoudah
 
ATS-Airship_v0.1.pptx
ATS-Airship_v0.1.pptxATS-Airship_v0.1.pptx
ATS-Airship_v0.1.pptxNakhoudah
 
IBA-Lec1.pptx
IBA-Lec1.pptxIBA-Lec1.pptx
IBA-Lec1.pptxNakhoudah
 

More from Nakhoudah (9)

IBA-Business Model Canvas for Small Scale Startups
IBA-Business Model Canvas for Small Scale StartupsIBA-Business Model Canvas for Small Scale Startups
IBA-Business Model Canvas for Small Scale Startups
 
WebRTCProgrammersCombatingLatency.key.pdf
WebRTCProgrammersCombatingLatency.key.pdfWebRTCProgrammersCombatingLatency.key.pdf
WebRTCProgrammersCombatingLatency.key.pdf
 
20231211+-+Winning+with+Latency.pdf
20231211+-+Winning+with+Latency.pdf20231211+-+Winning+with+Latency.pdf
20231211+-+Winning+with+Latency.pdf
 
111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf
111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf
111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf
 
Matt_Cyber Security Core Deck September 2016.pptx
Matt_Cyber Security Core Deck September 2016.pptxMatt_Cyber Security Core Deck September 2016.pptx
Matt_Cyber Security Core Deck September 2016.pptx
 
ATS-Airship_v0.2.pptx
ATS-Airship_v0.2.pptxATS-Airship_v0.2.pptx
ATS-Airship_v0.2.pptx
 
ATS-Overview_v0.1.pptx
ATS-Overview_v0.1.pptxATS-Overview_v0.1.pptx
ATS-Overview_v0.1.pptx
 
ATS-Airship_v0.1.pptx
ATS-Airship_v0.1.pptxATS-Airship_v0.1.pptx
ATS-Airship_v0.1.pptx
 
IBA-Lec1.pptx
IBA-Lec1.pptxIBA-Lec1.pptx
IBA-Lec1.pptx
 

Recently uploaded

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 

Recently uploaded (20)

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 

Latencies+within++Multimodal+GenAI.pdf

  • 1. Latencies within Multimodal GenAI Varun Singh Daily.co 11 Dec 2023 Understanding Latency
  • 3. Memory-based Networks and Transformers 2016 Conversational AIs are everywhere 2018 LLMs and Generative AIs: Image/ Video/Chat 2023 2013 Neural Networks for Natural Language Processing AI’s new inflection point, Generative AI
  • 4. ChatGPT and friends Text AI Engine/LLM Text Websocket HTTP2/3 Websocket HTTP2/3 Latency for chat while measurable is not significant, because there can be hidden with UX tools
  • 5. Voice AI Engine Voice Speech to text Text to Speech Multimodal Bots! Humans are messy, AI engines need to work with this
  • 6.
  • 7. Confidential ©2023 Daily Real-time AI experience workflow
  • 8. Confidential ©2023 Daily Transports // Understanding latency WebSockets and WebTransport ● Can stream packets, but handle audio chunking ● Maybe, HOLB WebRTC ● Best option, currently Irrespective of transport, things to consider ● Voice activity detection ● Audio packetization, is 20ms the best option ● Redundancy – FEC, NACK etc
  • 9. Confidential ©2023 Daily Latencies stack up // Understanding latency Speech-to-text Text fed to LLM, LLM response Text-to-speech
  • 10. Confidential ©2023 Daily Latencies stack up, but do they? // Understanding latency Speech-to-text Text fed to LLM, LLM response Text-to-speech We hem and haw We interrupt, speak over each other We sometimes take a moment to conjure and organise our thoughts
  • 11. Confidential ©2023 Daily Latencies stack up, but do they? // Understanding latency Speech-to-text ● Provide each word? Until a punctuation? Talk spurt? ● Because spoken word is not the same as written
  • 12. Confidential ©2023 Daily Latencies stack up, but do they? // Understanding latency Speech-to-text ● Provide each word? Until a punctuation? Talk spurt? ● Because spoken word is not the same as written Text fed to LLM ● Feed fragments or complete talk spurt? ● More context means better LLM response
  • 13. Confidential ©2023 Daily Latencies stack up, but do they? // Understanding latency Speech-to-text ● Provide each word? Until a punctuation? Talk spurt? ● Because spoken word is not the same as written Text fed to LLM ● Feed fragments or complete talk spurt? ● More context means better LLM response Text-to-Speech ● Word by word? Sentence? ● Again spoken words have intonation, which require context
  • 14. Confidential ©2023 Daily Real-time AI experience workflow
  • 15. Download it, give it a try