Presentation from TITAN's participation in a special session on AI Foundation Models and Large Language Models organised by BDVA (Big Data Value Association).
The primary objectives of this workshop, as outlined by BDVA, were to address the growing influence of Large Language Models (LLMs) and Generative AI on the market, policy, and society. Recognizing the importance of these topics for BDVA members, the workshop aimed to leverage the collective resources and knowledge of the community to explore various aspects. These aspects included technical challenges, privacy concerns, ethical considerations, security implications, and the broader impact of these technologies on companies, sectors, individuals, and society. Additionally, the workshop also aimed to focus on "Data-centric AI," a topic of great interest within the community. By bringing together experts and stakeholders, BDVA intended to generate further interest and resources for the topic of Data-centric AI. Following the workshop, the BDVA Board of Directors planned to evaluate and determine how to sustain these discussions within the Association going forward.
During the session, Ramfos presented an in-depth overview of the TITAN solution—an AI-powered intelligent coach designed to combat disinformation. This solution assists users in conducting investigations by providing support for critical thinking through chat interactions and microlessons. It empowers users to make their own informed decisions about the validity of content.
BDVA workshop on Foundation AI LLMs and Generative AI.pptx
1. An AI-based Citizen Coaching Ecosystem
against Disinformation
BDVA workshop on Foundation AI/LLMs and Generative AI
Brussels, 29/6/2023
Antonis Ramfos
2. TITAN:
An AI-based Citizen Coaching Ecosystem against Disinformation
• Disinformation campaigns aim at inactivating citizen’s critical thinking capacity.
• TITAN implements an intelligent ‘chatbot’ capable of guiding the citizen to logical conclusions about the factual
correctness or reliability of a statement. Citizen will be assisted in:
• interpreting and critically assessing the reasoning and arguments involved in a statement on their own
• conducting their own investigations either on an individual basis, or in collaboration with other citizens on
whether factual statements are true or reliable
• enhancing their critical thinking skills for better detection of disinformation they may encounter in the future at
scale
3. TITAN
An AI-based Citizen Coaching Ecosystem against Disinformation
TITAN will use a Generative-AI approach to generate Socratic inquiring
dialogues about a statement that:
• take into consideration the citizen’s critical thinking capacity
assessment
• incorporate use of fact-checking/verification processes and tools
based on the analysis of the statement at hand and citizen’s skills
• provide personalised experiential learning approach to media
literacy micro-lessons according to the citizen’s critical thinking
capacity
4. Choice of LLM: VICUNA-13B
• open-source chatbot
• trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
• Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B
• achieves more than 90%* quality of OpenAI ChatGPT and Google Bard
• outperforms other models like LLaMA and Stanford Alpaca in more than 90%* of cases.
• The cost of training is around $300.
5. •
Fine-tuning the LLM: Data Acquisition
• What data is needed to solve the underlying problem of guiding the citizen to logical conclusions about the
factual correctness or reliability of a statement?
• Data for the generation of Socratic inquiring dialogues about a statement
• Data for the assessment of critical thinking capability of the citizen/user
• Data for the productive incorporation of fact-checking tools in the Socratic dialogues
• Data for the effective provision of micro-lessons during user interaction
• Is the data available? -> NO
• Is the data readily collectable? -> NO
• Is the data available for purchase? -> NO
6. Fine-tuning the LLM: Modeling
Instead of trying to model and engineer a full-fledged LLM-based solution upfront, first focus on learning as
much as possible about the problem-solution space
• Define specific user inquiries – statements to be verified
• Develop inquiring dialogues by-hand (Socratic dialogues
incorporating C-T assessment, fact-checking tools, micro-
lessons)
• Use co-creation to complete and expand above dialogues
• Implement a rule-based system that implements the
above inquiring dialogues
• Use co-creation to collect validated learning about
the inquiring dialogues with the least effort
• Produce reliable data for fine-tuning the LLM
• Perform fine-tuning on selected LLM model.