Join Alon Fliess, Azure MVP, and Microsoft RD in an enlightening lecture where C# meets the forefront of AI. Discover how the Semantic Kernel project bridges traditional programming with advanced AI, empowering C# developers to integrate AI functionalities into their software seamlessly.
Experience a paradigm shift in diagnostics through a real-world example: a sophisticated system crafted with C#, Semantic Kernel, and Azure. Witness the synergy of C# and AI in action, optimizing system analysis and problem-solving in complex environments.
Embark on a journey where C# and AI meet.
3. About Me
Alon Fliess:
CTO of ZioNet
More than 30 years of hands-on experience
Microsoft Regional Director & Microsoft
Azure MVP
4. About ZioNet
A Professional Software service provider company
The home for hi-potential novice developers and expert
leaders
We support our developers’ growth, provide them with
professional and personal mentoring
ZioNet management has over 20 years of experience
We strive to fulfill the need by ensuring developers have the best first-job
experience!
6. Overview of Different Types of AI
Algorithm-based AI: Uses rule-based decision-
making systems
Supervised Learning: Learns from a labeled dataset
Unsupervised Learning: Finds hidden patterns in
data
Reinforcement Learning: Improves via reward-
based feedback
Hybrid AI: Combines different AI methodologies
Large Language Models: Generates text by
learning from a tremendous amount of text
7. What is a Neural Network?
Inspired by the human or animal brain
Far from being the same
8. Large Language Model Overview (GPT, LLaMA, LaMDA, PaLM)
Capture the semantics of a language
Trained with very large amount of data
All you need is Attention… and position (context)
The model predicts the next world using
probability, based on the input text
The next word (token) is predicted using the
original input and all the word that were
generated before
9. LLM Processing (GPT)
Tokenization
The text is split into smaller
parts, like words or parts of
words.
Mapping
Each word or part of a word is
given a unique number (ID).
Embedding
The ID for each word is then
turned into a list of numbers
(a vector) that represent the
meaning of the word.
Context
Understanding
The position of each word in
the text and the influence of
nearby words are considered
to understand the context.
Passing through Layers
All these vectors are passed
through a series of layers in a
neural network, which helps
the model understand
complex relationships
between words.
Output
The model gives an output
in the form of a prediction
for the next word or part of
the word based on the
input and learned context.
Training
The model learns from its
mistakes and adjusts its
predictions based on actual
data.
11. What are the Base Models
(OpenAI, Azure)
Some predefined models were
trained to do specific tasks
There are variations:
The size (input + output) tokens
The speed
The usage price
The ability to be fine-tuned
12. Can I have my Fined-Tuned Models?
Why Fine-Tune?
Improve the model performance on specific tasks
Adapt the model to new data
Customize the model's behavior
When to Fine-Tune?
When the Pre-Trained Model is not performing well, or the grounding (system) prompt is too large
When you have a specific task or data
How to Fine-Tune?
Use tools like Azure Machine Learning Studio or OpenAI's Python client for fine-tuning
Specify the Base Model and Dataset
Monitor Fine-Tuning Job and retrain
Use Fine-Tuned Model for Predictions
13. Learn how to Generate or Manipulate Text
Classification, such as sentiment
Generation – Create a text or formatted text (JSON, XML)
Conversation – Chat to get information, give commands, or generate
and manipulate the result
Transformation:
Language Translation
Text to emoji
Any format to any format
Summarization – reduce the size of text
Completion – complete a statement
Code - Use Codex to generate, complete, or manipulate source code
15. Introduction to Prompt Engineering
What is the first thing that comes to your mind when I say
“<prompt>?”
Prompt Engineering is the art of crafting inputs to get desired
outputs from AI models
It's a crucial part of using AI models effectively
The design of the prompt can greatly influence the model's response.
Examples:
If you want a list, start your prompt with a numbered list.
If you want a specific format, provide an example in that format.
It often involves a lot of trial and error
Different prompt strategies may work better for different tasks
Less is more!
16. Prompt Engineering Recommendations
Goal: Define what you want from the model
Instructions: Be clear and explicit
Examples: Use them for specific formats or styles
Iterate: Experiment with different prompts
Guidance: Use system-level and user-level instructions
Settings: Adjust temperature and max tokens as needed
17. Playground – ChatGPT and Azure
To get started, use ChatGPT
or Azure Open AI playground
- Demo
You can generate your
boilerplate code for
grounding and examples
You can play with fine-tuning
parameters
19. aka.ms/semantic-kernel
The Easy Way To Add AI To Your App
Goals-First AI
ernel
Steps ready
from planner
1 2 3 …
RUNNING STEPS PIPELINE
Result
is ready
1 2 3 …
Execute
Steps
APIs
Gather
Connectors
Gather
Skills
SKILLS
V1 READY
Gather
Memories
It all starts with
a user’s AI ask …
ASK
… resulting in
new productivity GET
20. Semantic Kernel Main Concepts
Prompt Functions:
Define interaction patterns with LLMs
Native Functions:
Enable direct code execution by AI
Plugins:
Custom-built, modular elements enabling specialized LLM task handling
Memory:
Contextual data repository; supports key-value pairs and semantic embeddings
Connectors:
Interface with external data and APIs
Planners:
Orchestrate tasks, manage complex tasks through intelligent LLM planning
Agents:
Autonomous entities executing orchestrated tasks
24. Lesson Learned
The first project iteration: An OpenAI ChatGPT plugin
Before Semantic Kernel
Had to bridge LLM to a C# Code
Simple APIs, Json, Reflection, Default Values
Needed to reduce the description load dew to token limitation
No Planner
The second project iteration: Use multiple LLM and SK
Work in progress
Requires lots of fine tuning – debug while you go
Expensive
For agents Use Asynchronous Model or a batch processing
25. Lessons Learned from Developing Plugin
Plugin API Design
• Be Systematic: Stick to one or a few APIs for consistency
• OpenAPI Description: Provide a comprehensive Open API specification
Plugin Manifest
• Examples: Include examples for each query method
• Instructions: Update examples and instructions if ChatGPT calls with the wrong data schema
Data Handling
• Paging: Implement paging capabilities
• Truncation: Use HTTP code 206 and a special message to indicate truncated results
• Important Data: Always return key data like I do in the Windows Troubleshooting with Tenant Id and PC ID
Performance & Limitations
• Trial and Error: Extensive testing is crucial
• Size Limits: Be aware of total size and per JSON element limits
• Resource Management: Be cautious of overusing ChatGPT 4 resources
Dynamic Content
• For complex plugins, dynamically generate the Open API specification and plugin manifest
26. Lessons Learned from Embedding LLM into Applications
Grounding & Serialization
• Provide accurate grounding and a JSON schema describing the result
• Use Semantic Kernel Prompt and Native function – use string and Json parameters
Model Selection
• Use GPT3/3.5/4 for chat functionalities
• Use other models for embedding, image creation and recognition
Performance & Cost
• GPT4 is more accurate but costly and slower
• Consider fine-tuned models if applicable – high cost for a good result
Message Handling
• Handle truncated messages with a continuation strategy – less important on 120K tokens
• Manage message size by counting tokens and removing history
• Use summary messages to replace original history if needed
• Use Memory (Embedding/Vectorization)
27. Overcome the Model Limitation
Token Limitation
Use Summarization
Use Memory (store important facts)
Use Retrieval Augmented Generation (RAG) + Vector and other
search
Overcome the lack of current information
Use Bing, or other search Plugins
Overcome cost and availability
Use GPT3.5 Turbo
Use Open-Source Models
Prompt engineer cheap models (3.5) with GPT 4
28. The Windows Event Log Plugin
https://github.com/alonf/WindowsEventLogChat
GPTPlugIn.git
Retrieve specific events from the Windows Event
Log using XPath queries.
Supports all major log names: Application,
Security, Setup, System, and ForwardedEvents.
Use it to solve problems and get information
about your Windows system status.
The plugin supports paging. It estimates the
number of tokens and limits the result.
30. Developing ChatGPT Plugin Using C#
Use ASP
.NET Minimal or Controller based API
Use YamlDotNet to convert Json Open API specification to yaml
based
The Open API specification, the plugin manifest and the icon file can
come from the file system, or as a HTTP query
app.UseStaticFiles(new StaticFileOptions
{
FileProvider = new PhysicalFileProvider(
Path.Combine(app.Environment.WebRootPath, "OpenAPI")),
RequestPath = "/OpenAPI"
});
31. Developing ChatGPT Plugin Using C#
Route HTTP Request to a function
For a simple plugin, it is just a get request
For A complex plugin, use POST with a body and have your own route
In the Windows Troubleshooting plugin, I use a map of providers
and made the call to the specific function using reflection
I use validation to make sure the data size and type is correct
I provide extensive error message
Use correlation id for local development (Tenant ID, PC ID)
Use OAuth for released plugin
32. Controlling the Output layer
Max response: defines the token limit for the model's response
One token is roughly equivalent to 4 English characters
Temperature: Controls the randomness of the model's responses.
Lower values result in more deterministic responses, higher values lead to creativity
Top P: Another parameter to control randomness
Lower values make the model choose more likely tokens
Stop sequence:
Specifies a sequence at which the model should stop generating a response
Frequency penalty: Reduces the likelihood of the model repeating the same
text
by penalizing tokens that have appeared frequently
Presence penalty:
Encourages the model to introduce new topics in a response
by penalizing any token that has appeared in the text so far.
33. Cost, Privacy & Security
ChatGPT and Azure OpenAI services can be used without
donating your data
For the Public ChatGPT you can ask to opt-out
For the API, You can ask to opt-in
ChatGPT 4 is a very hi-cost model (become cheaper)
You can use ChatGPT 4 to create prompt and examples for ChatGPT
3.5
You can host your model on-premise, however:
The Open-Source models do not contain the latest ChatGPT 3.5 and 4
You may train your own model - costly
34. Summary
AI Types: Explored diverse AI forms
Neural Networks & LLMs: Discussed their functionality
Base Models & Fine-Tuning: Highlighted fine-tuning's importance
Prompt Engineering: Introduced crafting effective prompts
Semantic Kernel: Your LLM Swiss army tool
Application Transformation: Extending applications with AI
Autonomous Agents: The future of AI systems
LLM The Software System on a Chip
34
Algorithm-based AI:
These are rule-based systems that follow predefined logic to make decisions. Examples include traditional algorithms like Minimax (used in game theory and decision making), A* (used in pathfinding and graph traversal), and others.
Machine Learning (ML) based AI:
This type of AI learns patterns from data and makes predictions or decisions without being explicitly programmed to do so. Machine Learning can be further divided into subtypes:
Supervised learning: The model is trained on labeled data (data with known outputs).
Unsupervised learning: The model is trained on unlabeled data and identifies patterns within the data.
Reinforcement learning: The model learns based on rewards and penalties and iteratively improves its performance.
Deep Learning based AI:
A subset of ML, deep learning utilizes artificial neural networks with multiple layers (hence the term 'deep') to model and understand complex patterns. This is the foundation for models like Convolutional Neural Networks (CNNs) used in image recognition, or Recurrent Neural Networks (RNNs) and their derivatives like LSTMs and GRUs used in sequence prediction tasks.
Hybrid AI:
Hybrid AI systems use a combination of the above methods to solve complex problems. They may combine rule-based algorithms with machine learning models to leverage the strengths of both.
Large Language Models (LLM):
These are a type of AI models based on deep learning, specifically using architectures like Transformers. They are trained on a large corpus of text and are capable of generating human-like text, making them useful in a variety of applications, which we'll be delving into in this presentation.
Prompt: Show an example of multilayer neural networks with the weights and functions.
The human brain contains around 86 billion neurons, each connected to thousands of others, for a total of about 100 trillion connections
Tokenization: The text input is broken down into tokens. These tokens can represent words, parts of words, or even individual characters, depending on the language model.
Integer IDs: Each token is mapped to an integer ID according to a predefined vocabulary.
Embedding: Each integer ID is then mapped to a high-dimensional vector. These vectors, or embeddings, represent the meanings of the tokens and are learned during the training phase.
Positional Encoding: The model takes into account the position of each token in the sequence, adding this information to the embeddings. This allows the model to understand the order of the words in a sentence, which is crucial for understanding language.
Transformer Layers: The sequence of updated embeddings is passed through a series of transformer layers. These layers can "attend" to different parts of the input sequence when making predictions for each token. The transformer layers essentially help the model to understand the context of each word by taking into account the other words in the sentence.
Output: The final transformer layer outputs a new sequence of vectors, one for each input token. Each vector is a set of scores, one for each word in the model's vocabulary. The model's prediction for each token is the word with the highest score.
Training and Adjustment: The whole process is governed by a large number of parameters (weights and biases in the embeddings and transformer layers). These parameters are adjusted during the training phase to minimize the difference between the model's predictions and the actual words in the training data. The model learns to predict each word based on the context provided by the other words in the sentence.
Tokenization: The text input is broken down into tokens. These tokens can represent words, parts of words, or even individual characters, depending on the language model.
Integer IDs: Each token is mapped to an integer ID according to a predefined vocabulary.
Embedding: Each integer ID is then mapped to a high-dimensional vector. These vectors, or embeddings, represent the meanings of the tokens and are learned during the training phase.
Positional Encoding: The model takes into account the position of each token in the sequence, adding this information to the embeddings. This allows the model to understand the order of the words in a sentence, which is crucial for understanding language.
Transformer Layers: The sequence of updated embeddings is passed through a series of transformer layers. These layers can "attend" to different parts of the input sequence when making predictions for each token. The transformer layers essentially help the model to understand the context of each word by taking into account the other words in the sentence.
Output: The final transformer layer outputs a new sequence of vectors, one for each input token. Each vector is a set of scores, one for each word in the model's vocabulary. The model's prediction for each token is the word with the highest score.
Training and Adjustment: The whole process is governed by a large number of parameters (weights and biases in the embeddings and transformer layers). These parameters are adjusted during the training phase to minimize the difference between the model's predictions and the actual words in the training data. The model learns to predict each word based on the context provided by the other words in the sentence.
The output:
During the initial pass, all input tokens influence all other input tokens in forming their respective context-aware hidden states.
During the text generation phase, each newly generated token is influenced by all previous tokens (including both original input tokens and already generated tokens), but it does not influence the hidden states of the tokens that came before it.
Prompt: I am an LLM beginner that need some examples. Please show using a basic example the different parts of Large Language model execution stages. Use vectors and matrices. Use a few token-based language.
Prompt: Show an example of two words that there embedding vectors are closed
Prompt: f I have a sentence the King loves the Queen. What does the attention algorithm do? Can you show it using vector numbers?
Define Your Goal: Understand what you want the model to generate. This could be a specific type of text, a certain format, or a particular style.
Craft Clear Instructions: Start your prompt with explicit instructions. The more specific you are, the better the model can generate the desired output.
Provide Examples: If you want a specific format or style, provide an example in your prompt. The model will try to follow the pattern you set.
Experiment and Iterate: Don't be afraid to tweak your prompts and try different approaches. Prompt engineering often involves a lot of trial and error.
Use System-Level and User-Level Instructions: These can guide the model's behavior throughout the conversation or for a specific user turn.
Adjust Temperature and Max Tokens: These settings can influence the output. Higher temperature values make output more random, while lower values make it more deterministic. Max tokens limit the length of the model's response.
Semantic Kernel has more GitHub stars on a daily basis, more developers joining our open source Discord community, and regular blog posts are trending as well. You’ve joined Semantic Kernel at the right time!
It always starts with an Ask. A user has a goal they want to achieve. We have seen how the Kernel orchestrates the ask to the planner. The Planner finds the right AI skills that can be used to solve that need. Some skills are enhanced with memories and with live data connections. The steps to complete the users ask are executed as part of the plan and the results are returned to the user, resulting in productivity gains and ideally the goal reached.
Prompt: What are the main APIs of Dapr Workflow Building Block
Prompt: https://github.com/dapr/python-sdk/blob/master/ext/dapr-ext-workflow/dapr/ext/workflow/dapr_workflow_client.py
Prompt: https://github.com/alonf/Apple_IIe_Snake/blob/master/snake_asm.txt
Prompt: Please describe the code
Prompt: Please translate the code to C
Prompt: Continue