This document discusses HTEC's approach to architecting generative AI solutions. It covers topics such as large language models (LLMs), common use cases for AI like classification and generation, and HTEC's framework which includes cognitive design, responsible AI practices, and a six step process from problem assessment to deployment. It also discusses prompts and prompt engineering to influence LLM output, and shows an architecture for a dynamic agent that can authenticate, upload data, perform inference, and notify users.
Greet the audience, greet the oportunity to speak and share on DSC
Provide an short intro if not done before the presentation
Glance over topics that we will cover
- First one is something that will help Non AI folks to get an understanding what the heck is this genAI everyone talks about (if there are any)
- Overview of how we @ HTEC approaches about thinking and solving these problems and what to consider when building
- Go into one of the accelerators we have and an example of the system we have built
This can be an overview for someone who is non AI engineer
@ HTEC we have identified that there is a path engineer can take to get into the field of Generative AI and we will touch on that
- To enable them to catch up
- Learning Plan for engineers
Why is it important to emphasise this path to learning and intro to Generative AI
What are LLMs?
Number of parameters, RLHF
How LLMs can be used/deployed
- on prem, deploying custom LLM on cloud enviorment
- Managed by your company
- ON EDGE
- using 3rd party LLM, e.g. OpenAI via OpenAI API
- Hybrid; combination of first two
What modalities are there that LLMs are going if not to be using?
- Text
- Image
- Audio
- Sensor data (Meta)
Etc.
Naglasiti da postoje posebne implementacije koje to omogućavaju
- možda reći malo o tome šta je tokenizacija
-
What are some of the use cases when LLMs can be used
Focus is only on text right now, but we are seeing that image is being used also but not that often.
Before diving into the concreete examples
- describe to audience how we at HTEC are approaching the problem of identifying
HTEC's
Go trough the steps but from the engineering and problem perspective
Emphasis PREDICTIVE WALK BETWEEN RND AND Productization
This is a fork of existing workflow and augmented by the needs we have seen to emerge in GenAI
Whitepaper was released by our colleagues that targets most of the first 2 steps. We will focus mostly on the middle part right now because it is important to see how much it differs from the traditional AI workflows
- What we are observing as new steps emerging, or being more emphasized
- Cognitive design
- More emphasis on security and monitoring for potential rogue behavior (Agents can require access sensitive data, like query databases, generate queries, etc.)
- Accelerators are something you build to help you jump-start the implementation
- What are Prompts?
- What is their modality? Considering only text now
- What is prompt engineering?
- How to handle creativity of the LLMs?
- From which parts it consists of?
What are tokens?
Postaviti se iz pozicije da smo mi ovo pomagali našim arhitektama...
- treniram trenere
- šta smo mi pomogli i kako našim kolegama
- What and how models can learn and be trained?
- Model training – adapting the weights to accommodate new data
- Model prompting – selecting model input to achieve desired output
- Types of learning
- slightly different than one in DL
- Ethical consideration
- Licensing and knowledge cutoff
- Vector Databasees
- How the semantic search is done
- Token Limitation
- Prompt length is limited
- prompt contains task specification, additional information, and context information
- Token Modalities
- Text modality, a token is approx 3 characteds (3 tokens are used per word, on average)
- Other modalities use different tokenization schemes
- Tendencija da 'e modeli da budu skoro near real time...
- Promena u poslednjih mesec dva koliko je
- GROK je near real time
- Recent update to openAI
Talk about the complexity
- Increasing complexity provides more capabitilities to be achieved
- Stateless API
– Two approaches
- chat completition
- text completition
- Hint kako utiče kontekst i da li veličlina konteksta je bitna
- AutoGPT malo bolje objasniti
- šta su obećali a šta je moguće
Dive into an example how an Dynamic Agent would be implemented
Implementation depends on the use-case
Focus is on reusability and modularity of the system that is being implemented
Building the solution, modularity and configurability is the key.
This is one of the ways how to create an accelearator
Sinhroni i asinhrona komunikacija
Ideja za regione, multiregion za lokalne limite
Crtica ka spolja
C4 L2 nivo dijagrama
- Da postoje kontejneri
Naglasiti kao simplified
Vehicle ka arhitektama da oni budu svesni šta sve postoji ovde o čemu moraju voditi rešenja
Sinhroni i asinhrona komunikacija
Ideja za regione, multiregion za lokalne limite
Crtica ka spolja
C4 L2 nivo dijagrama
- Da postoje kontejneri
Naglasiti kao simplified
Vehicle ka arhitektama da oni budu svesni šta sve postoji ovde o čemu moraju voditi rešenja
Sinhroni i asinhrona komunikacija
Ideja za regione, multiregion za lokalne limite
Crtica ka spolja
C4 L2 nivo dijagrama
- Da postoje kontejneri
Naglasiti kao simplified
Vehicle ka arhitektama da oni budu svesni šta sve postoji ovde o čemu moraju voditi rešenja
Get SAS Token
Sinhroni i asinhrona komunikacija
Ideja za regione, multiregion za lokalne limite
Crtica ka spolja
C4 L2 nivo dijagrama
- Da postoje kontejneri
Naglasiti kao simplified
Vehicle ka arhitektama da oni budu svesni šta sve postoji ovde o čemu moraju voditi rešenja
RPC
- job based sistem za obradu nekih duga;kih zadataka
- sistem koji ne zavisi od request response timeouta
Progress queue
Da bi korisnik znao gde se trenutno nalazi sistem
100 documents to process
Sinhroni i asinhrona komunikacija
Ideja za regione, multiregion za lokalne limite
Crtica ka spolja
C4 L2 nivo dijagrama
- Da postoje kontejneri
Naglasiti kao simplified
Vehicle ka arhitektama da oni budu svesni šta sve postoji ovde o čemu moraju voditi rešenja
Not constrained to function calling for certain tasks
This enables the system to be interchangable