SlideShare a Scribd company logo
How to build an in-house Chat GPT
Researched and
presented by
Agenda
1. Executive Summary
2. OpenAssistant
3. Haystack.deepset & Coati
4. Contact information for demo
Copyrighted by
Citynow Asia Inc, an IT solutions Japan-based-company, provides
comprehensive solutions for implementing in-house conversional AI,
powered by the latest generative AI models and domain knowledge from
B2B customers.
Our Conversional AI platform can support question answering features
and generate relevant content in Japanese from query information in real
time, all while keeping training/testing data safely in-house.
Executive Summary
Copyrighted by
OpenAssistant Approach
Copyrighted by
This pipeline includes four components:
1. User Interface: Where the user interact with the system by inputting
prompts and receiving responses.
2. Standardized Database: Storage for available data and knowledge base.
The data formats should be standardized beforehand.
3. Internal Search Engine: Retrieve data that are relevant to the prompts
from the user from the Standardized Database.
4. Conversional AI Module: Respond to the user’s prompt using retrieved
data from the Internal Search Engine.
OpenAssistant Approach
Copyrighted by
Key benefits when using OpenAssistant Approach
● Built-in models: in-house, not stored in third-party systems.
● Database security: securely stored on customer infrastructure.
● User-friendliness: customers can use the conversional AI without technical
domain knowledge.
● Model fine-tuning: platform for data validation and updating.
OpenAssistant Approach
Copyrighted by
Regard ColossalAI and Deepspeed Approach, Due to lack of Hardware power comparing to OpenAi (the
company behind ChatGPT), we will use Hardware Accelerator/Optimizor to help best combine the power of
our own DGX system and training step (provided by OpenAi) and Dataset (provided by Customer).
1. For Hardware support, candidate tools are (ColossalAi and DeepSpeed)
2. For System Architecture, we will build our system on 2 main components:
+ Infomation Retrieval System: This will enable the ability to matching question to closest information
store in database, and allow update database (knowledge space) regularly without the need to retrain
chatbot
+ Chatbot System: This will help to make the answer look more natural, humanlike.
Haystack.deepset & Coati Approach
Copyrighted by
The information retrieval system
(1) User sends question to system
(2) Python scripts embedding user question as vector
and compare to all document in system to find the
best match (evaluate by cosin score)
(3) The system returns the best match document and
best paragraph which can answer user’s question
(4) Python scripts convert all information (user
question, answer) into a format that can be fed to
the chatbot
(5) Chatbot converts all information above into a
natural response to answer user
Description
Haystack.deepset & Coati Approach
Copyrighted by
The information retrieval system is a tool that helps find relevant documents based on a user's query. In
this system, a Vector Database is used to store documents from the client.
Each document is converted into a set of numbers called vectors using a tool called "sentence-
transformers". This tool uses a pre-trained model called "paraphrase-multilingual-mpnet-base-v2" to
convert the text into vectors.
This approach ensures that the chatbot using this system will not have a knowledge cut problem like GPT
(which has a maximum amount of information it can store and use at a time). Additionally, the system can
easily add new documents to the database without needing to retrain the chatbot*.
*For more detail, please visit this link: https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2
Haystack.deepset & Coati Approach
Copyrighted by
Both the chatbot and information retrieval system can work with Japanese language data if they are
trained on Japanese language data. This means that the chatbot and the information retrieval system
will be able to understand and respond to queries in Japanese.
The pre-trained model used in the information retrieval system also supports Japanese language, so it
may not require retraining to work with Japanese data. However, further experimentation is required
to confirm this. If the system needs to support other languages, it is also possible to consult the list of
supported languages provided in the documentation of the pre-trained models.
For more detail, please visit this Link: https://www.sbert.net/docs/pretrained_models.html#multi-
lingual-models
Haystack.deepset & Coati Approach
Copyrighted by
Citynow Asia Inc
Website: https://citynow.jp/
Hotline: +81 3 4405 3731
Email: corp@citynow.jp
Facebook: https://www.facebook.com/citynow.asia
Citynow Asia株式会社 | Bunkyo-ku Tokyo | Facebook
Contact us today for further information
Copyrighted by
THANK YOU

More Related Content

Similar to How to build your in-house ChatGPT

ChatGPT Shaping Tomorrow's Conversations
ChatGPT Shaping Tomorrow's ConversationsChatGPT Shaping Tomorrow's Conversations
ChatGPT Shaping Tomorrow's Conversations
FahadZafar39
 

Similar to How to build your in-house ChatGPT (20)

An Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for CounsellingAn Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for Counselling
 
Hrms industrial training report
Hrms industrial training reportHrms industrial training report
Hrms industrial training report
 
IRJET- An Intelligent Behaviour Shown by Chatbot System for Banking in Ve...
IRJET-  	  An Intelligent Behaviour Shown by Chatbot System for Banking in Ve...IRJET-  	  An Intelligent Behaviour Shown by Chatbot System for Banking in Ve...
IRJET- An Intelligent Behaviour Shown by Chatbot System for Banking in Ve...
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
 
IRJET - A Study on Building a Web based Chatbot from Scratch
IRJET - A Study on Building a Web based Chatbot from ScratchIRJET - A Study on Building a Web based Chatbot from Scratch
IRJET - A Study on Building a Web based Chatbot from Scratch
 
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
 
IRJET- Communication between Personal Assistant and User to Understand Interv...
IRJET- Communication between Personal Assistant and User to Understand Interv...IRJET- Communication between Personal Assistant and User to Understand Interv...
IRJET- Communication between Personal Assistant and User to Understand Interv...
 
IRJET- Review of Chatbot System in Hindi Language
IRJET-  	  Review of Chatbot System in Hindi LanguageIRJET-  	  Review of Chatbot System in Hindi Language
IRJET- Review of Chatbot System in Hindi Language
 
ChatGPT Shaping Tomorrow's Conversations
ChatGPT Shaping Tomorrow's ConversationsChatGPT Shaping Tomorrow's Conversations
ChatGPT Shaping Tomorrow's Conversations
 
IRJET - Chat-Bot for College Information System using AI
IRJET -  	  Chat-Bot for College Information System using AIIRJET -  	  Chat-Bot for College Information System using AI
IRJET - Chat-Bot for College Information System using AI
 
IRJET- Recruitment Chatbot
IRJET- Recruitment ChatbotIRJET- Recruitment Chatbot
IRJET- Recruitment Chatbot
 
Revolutionizing Industry 4.0: GPT-Enabled Real-Time Support
Revolutionizing Industry 4.0: GPT-Enabled Real-Time SupportRevolutionizing Industry 4.0: GPT-Enabled Real-Time Support
Revolutionizing Industry 4.0: GPT-Enabled Real-Time Support
 
IRJET- Artificial Intelligence Based Chat-Bot
IRJET-  	  Artificial Intelligence Based Chat-BotIRJET-  	  Artificial Intelligence Based Chat-Bot
IRJET- Artificial Intelligence Based Chat-Bot
 
Online job portal management system..pdf
Online job portal management system..pdfOnline job portal management system..pdf
Online job portal management system..pdf
 
tweet segmentation
tweet segmentation tweet segmentation
tweet segmentation
 
IRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi LanguageIRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi Language
 
IRJET- Chatbot using NLP and Deep Learning
IRJET-  	  Chatbot using NLP and Deep LearningIRJET-  	  Chatbot using NLP and Deep Learning
IRJET- Chatbot using NLP and Deep Learning
 
An Intelligent Chatbot for College Enquiry with Amazon Lex
An Intelligent Chatbot for College Enquiry with Amazon LexAn Intelligent Chatbot for College Enquiry with Amazon Lex
An Intelligent Chatbot for College Enquiry with Amazon Lex
 
Aditya_2015
Aditya_2015Aditya_2015
Aditya_2015
 
IRJET - E-Assistant: An Interactive Bot for Banking Sector using NLP Process
IRJET -  	  E-Assistant: An Interactive Bot for Banking Sector using NLP ProcessIRJET -  	  E-Assistant: An Interactive Bot for Banking Sector using NLP Process
IRJET - E-Assistant: An Interactive Bot for Banking Sector using NLP Process
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

How to build your in-house ChatGPT

  • 1. How to build an in-house Chat GPT Researched and presented by
  • 2. Agenda 1. Executive Summary 2. OpenAssistant 3. Haystack.deepset & Coati 4. Contact information for demo Copyrighted by
  • 3. Citynow Asia Inc, an IT solutions Japan-based-company, provides comprehensive solutions for implementing in-house conversional AI, powered by the latest generative AI models and domain knowledge from B2B customers. Our Conversional AI platform can support question answering features and generate relevant content in Japanese from query information in real time, all while keeping training/testing data safely in-house. Executive Summary Copyrighted by
  • 5. This pipeline includes four components: 1. User Interface: Where the user interact with the system by inputting prompts and receiving responses. 2. Standardized Database: Storage for available data and knowledge base. The data formats should be standardized beforehand. 3. Internal Search Engine: Retrieve data that are relevant to the prompts from the user from the Standardized Database. 4. Conversional AI Module: Respond to the user’s prompt using retrieved data from the Internal Search Engine. OpenAssistant Approach Copyrighted by
  • 6. Key benefits when using OpenAssistant Approach ● Built-in models: in-house, not stored in third-party systems. ● Database security: securely stored on customer infrastructure. ● User-friendliness: customers can use the conversional AI without technical domain knowledge. ● Model fine-tuning: platform for data validation and updating. OpenAssistant Approach Copyrighted by
  • 7. Regard ColossalAI and Deepspeed Approach, Due to lack of Hardware power comparing to OpenAi (the company behind ChatGPT), we will use Hardware Accelerator/Optimizor to help best combine the power of our own DGX system and training step (provided by OpenAi) and Dataset (provided by Customer). 1. For Hardware support, candidate tools are (ColossalAi and DeepSpeed) 2. For System Architecture, we will build our system on 2 main components: + Infomation Retrieval System: This will enable the ability to matching question to closest information store in database, and allow update database (knowledge space) regularly without the need to retrain chatbot + Chatbot System: This will help to make the answer look more natural, humanlike. Haystack.deepset & Coati Approach Copyrighted by
  • 8. The information retrieval system (1) User sends question to system (2) Python scripts embedding user question as vector and compare to all document in system to find the best match (evaluate by cosin score) (3) The system returns the best match document and best paragraph which can answer user’s question (4) Python scripts convert all information (user question, answer) into a format that can be fed to the chatbot (5) Chatbot converts all information above into a natural response to answer user Description Haystack.deepset & Coati Approach Copyrighted by
  • 9. The information retrieval system is a tool that helps find relevant documents based on a user's query. In this system, a Vector Database is used to store documents from the client. Each document is converted into a set of numbers called vectors using a tool called "sentence- transformers". This tool uses a pre-trained model called "paraphrase-multilingual-mpnet-base-v2" to convert the text into vectors. This approach ensures that the chatbot using this system will not have a knowledge cut problem like GPT (which has a maximum amount of information it can store and use at a time). Additionally, the system can easily add new documents to the database without needing to retrain the chatbot*. *For more detail, please visit this link: https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2 Haystack.deepset & Coati Approach Copyrighted by
  • 10. Both the chatbot and information retrieval system can work with Japanese language data if they are trained on Japanese language data. This means that the chatbot and the information retrieval system will be able to understand and respond to queries in Japanese. The pre-trained model used in the information retrieval system also supports Japanese language, so it may not require retraining to work with Japanese data. However, further experimentation is required to confirm this. If the system needs to support other languages, it is also possible to consult the list of supported languages provided in the documentation of the pre-trained models. For more detail, please visit this Link: https://www.sbert.net/docs/pretrained_models.html#multi- lingual-models Haystack.deepset & Coati Approach Copyrighted by
  • 11. Citynow Asia Inc Website: https://citynow.jp/ Hotline: +81 3 4405 3731 Email: corp@citynow.jp Facebook: https://www.facebook.com/citynow.asia Citynow Asia株式会社 | Bunkyo-ku Tokyo | Facebook Contact us today for further information Copyrighted by