A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(Note: Discover a slightly updated version of this deck at slideshare.net/LoicMerckel/introduction-to-llms.)
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(This updated version builds on our previous deck: slideshare.net/LoicMerckel/intro-to-llms.)
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(This updated version builds on our previous deck: slideshare.net/LoicMerckel/intro-to-llms.)
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
Leveraging Generative AI & Best practicesDianaGray10
In this event we will cover:
- What is Generative AI and how it is being for future of work.
- Best practices for developing and deploying generative AI based models in productions.
- Future of Generative AI, how generative AI is expected to evolve in the coming years.
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
Generative AI is here, and it can revolutionize your business. With its powerful capabilities, this technology can help companies create more efficient processes, unlock new insights from data, and drive innovation. But how do you make the most of these opportunities?
This guide will provide you with the information and resources needed to understand the ins and outs of Generative AI, so you can make informed decisions and capitalize on the potential. It covers important topics such as strategies for leveraging large language models, optimizing MLOps processes, and best practices for building with Generative AI.
How can we use generative AI in learning products? A rapid introduction to generative AI. Presented at ED Games Expo 2023 at the U.S. Department of Education, September 22, 2023.
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
Session 1
👉This first session will cover an introduction to Generative AI & harnessing the power of large language models. The following topics will be discussed:
Introduction to Generative AI & harnessing the power of large language models.
What’s generative AI & what’s LLM.
How are we using it in our document understanding & communication mining models?
How to develop a trustworthy and unbiased AI model using LLM & GenAI.
Personal Intelligent Assistant
Speakers:
📌George Roth - AI Evangelist at UiPath
📌Sharon Palawandram - Senior Machine Learning Consultant @ Ashling Partners & UiPath MVP
📌Russel Alfeche - Technology Leader RPA @qBotica & UiPath MVP
This presentation presents an overview of the challenges and opportunities of generative artificial intelligence in Web3. It includes a brief research history of generative AI as well as some of its immediate applications in Web3.
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
What are the "use case patterns" for deploying LLMs into production? Understanding these will allow you to spot "LLM-shaped" problems in your own industry.
A brief introduction to generative models in general is given, followed by a succinct discussion about text generation models and the "Transformer" architecture. Finally, the focus is set on a non-technical discussion about ChatGPT with a selection of recent news articles.
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
Generative AI: Past, Present, and Future – A Practitioner's Perspective
As the academic realm grapples with the profound implications of generative AI
and related applications like ChatGPT, I will present a grounded view from my
experience as a practitioner. Starting with the origins of neural networks in
the fields of logic, psychology, and computer science, I trace its history and
align it within the wider context of the pursuit of artificial intelligence.
This perspective will also draw parallels with historical developments in
psychology. Against this backdrop, I chart a proposed trajectory for the future.
Finally, I provide actionable insights for both academics and enterprising
individuals in the field.
Presented at All Things Open RTP Meetup
Presented by Karthik Uppuluri, Fidelity
Title: Generative AI
Abstract: In this session, let us embark on a journey into the fascinating world of generative artificial intelligence. As an emergent and captivating branch of machine learning, generative AI has become instrumental in myriad of sectors, ranging from visual arts to creating software for technological solutions. This session requires no prior expertise in machine learning or AI. It aims to inculcate a robust understanding of fundamental concepts and principles of generative AI and its diverse applications. Join us as we delve into the mechanics of this transformative technology and unpack its potential.
Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...rahul_net
ChatGPT has taken the world of natural language processing by storm, and as an experienced AI practitioner, enterprise architect, and technologist with over two decades of experience, I'm excited to share my insights on how this innovative powerhouse is designed from an AI components perspective. In this post, I'll provide a fresh take on the key components that make ChatGPT a powerful conversational AI tool, including its use of the Transformer architecture, pre-training on large amounts of text data, and fine-tuning with human feedback. With ChatGPT's massive success, there's no doubt that it's changing the way we think about language and conversation. So, whether you're a seasoned pro or new to the world of AI, my post will provide a valuable perspective on this fascinating technology. Check out my slides to learn more!
Generative AI models, such as ChatGPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts, as well as handle complex dialogs and reason about problems with or without images. These models are disrupting traditional technologies, from search and content creation to automation and problem solving, and are fundamentally shaping the future user interface to computing devices. Generative AI can apply broadly across industries, providing significant enhancements for utility, productivity, and entertainment. As generative AI adoption grows at record-setting speeds and computing demands increase, on-device and hybrid processing are more important than ever. Just like traditional computing evolved from mainframes to today’s mix of cloud and edge devices, AI processing will be distributed between them for AI to scale and reach its full potential.
In this presentation you’ll learn about:
- Why on-device AI is key
- Full-stack AI optimizations to make on-device AI possible and efficient
- Advanced techniques like quantization, distillation, and speculative decoding
- How generative AI models can be run on device and examples of some running now
- Qualcomm Technologies’ role in scaling on-device generative AI
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
Discuss the impact and opportunity of using Generative AI to support your development and creative teams
* Explore business challenges in content creation
* Cost-per-unit of different types of content
* Use AI to reduce cost-per-unit
* New partnerships being formed that will have a material impact on the way we search and engage with content
Part 4 of a 9 Part Research Series named "What matters in AI" published on www.andremuscat.com
The GPT-3 model architecture is a transformer-based neural network that has been fed 45TB of text data. It is non-deterministic, in the sense that given the same input, multiple runs of the engine will return different responses. Also, it is trained on massive datasets that covered the entire web and contained 500B tokens, humongous 175 Billion parameters, a more than 100x increase over GPT-2, which was considered state-of-the-art technology with 1.5 billion parameters.
Leveraging Generative AI & Best practicesDianaGray10
In this event we will cover:
- What is Generative AI and how it is being for future of work.
- Best practices for developing and deploying generative AI based models in productions.
- Future of Generative AI, how generative AI is expected to evolve in the coming years.
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
Generative AI is here, and it can revolutionize your business. With its powerful capabilities, this technology can help companies create more efficient processes, unlock new insights from data, and drive innovation. But how do you make the most of these opportunities?
This guide will provide you with the information and resources needed to understand the ins and outs of Generative AI, so you can make informed decisions and capitalize on the potential. It covers important topics such as strategies for leveraging large language models, optimizing MLOps processes, and best practices for building with Generative AI.
How can we use generative AI in learning products? A rapid introduction to generative AI. Presented at ED Games Expo 2023 at the U.S. Department of Education, September 22, 2023.
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
Session 1
👉This first session will cover an introduction to Generative AI & harnessing the power of large language models. The following topics will be discussed:
Introduction to Generative AI & harnessing the power of large language models.
What’s generative AI & what’s LLM.
How are we using it in our document understanding & communication mining models?
How to develop a trustworthy and unbiased AI model using LLM & GenAI.
Personal Intelligent Assistant
Speakers:
📌George Roth - AI Evangelist at UiPath
📌Sharon Palawandram - Senior Machine Learning Consultant @ Ashling Partners & UiPath MVP
📌Russel Alfeche - Technology Leader RPA @qBotica & UiPath MVP
This presentation presents an overview of the challenges and opportunities of generative artificial intelligence in Web3. It includes a brief research history of generative AI as well as some of its immediate applications in Web3.
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
What are the "use case patterns" for deploying LLMs into production? Understanding these will allow you to spot "LLM-shaped" problems in your own industry.
A brief introduction to generative models in general is given, followed by a succinct discussion about text generation models and the "Transformer" architecture. Finally, the focus is set on a non-technical discussion about ChatGPT with a selection of recent news articles.
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
Generative AI: Past, Present, and Future – A Practitioner's Perspective
As the academic realm grapples with the profound implications of generative AI
and related applications like ChatGPT, I will present a grounded view from my
experience as a practitioner. Starting with the origins of neural networks in
the fields of logic, psychology, and computer science, I trace its history and
align it within the wider context of the pursuit of artificial intelligence.
This perspective will also draw parallels with historical developments in
psychology. Against this backdrop, I chart a proposed trajectory for the future.
Finally, I provide actionable insights for both academics and enterprising
individuals in the field.
Presented at All Things Open RTP Meetup
Presented by Karthik Uppuluri, Fidelity
Title: Generative AI
Abstract: In this session, let us embark on a journey into the fascinating world of generative artificial intelligence. As an emergent and captivating branch of machine learning, generative AI has become instrumental in myriad of sectors, ranging from visual arts to creating software for technological solutions. This session requires no prior expertise in machine learning or AI. It aims to inculcate a robust understanding of fundamental concepts and principles of generative AI and its diverse applications. Join us as we delve into the mechanics of this transformative technology and unpack its potential.
Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...rahul_net
ChatGPT has taken the world of natural language processing by storm, and as an experienced AI practitioner, enterprise architect, and technologist with over two decades of experience, I'm excited to share my insights on how this innovative powerhouse is designed from an AI components perspective. In this post, I'll provide a fresh take on the key components that make ChatGPT a powerful conversational AI tool, including its use of the Transformer architecture, pre-training on large amounts of text data, and fine-tuning with human feedback. With ChatGPT's massive success, there's no doubt that it's changing the way we think about language and conversation. So, whether you're a seasoned pro or new to the world of AI, my post will provide a valuable perspective on this fascinating technology. Check out my slides to learn more!
Generative AI models, such as ChatGPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts, as well as handle complex dialogs and reason about problems with or without images. These models are disrupting traditional technologies, from search and content creation to automation and problem solving, and are fundamentally shaping the future user interface to computing devices. Generative AI can apply broadly across industries, providing significant enhancements for utility, productivity, and entertainment. As generative AI adoption grows at record-setting speeds and computing demands increase, on-device and hybrid processing are more important than ever. Just like traditional computing evolved from mainframes to today’s mix of cloud and edge devices, AI processing will be distributed between them for AI to scale and reach its full potential.
In this presentation you’ll learn about:
- Why on-device AI is key
- Full-stack AI optimizations to make on-device AI possible and efficient
- Advanced techniques like quantization, distillation, and speculative decoding
- How generative AI models can be run on device and examples of some running now
- Qualcomm Technologies’ role in scaling on-device generative AI
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
Discuss the impact and opportunity of using Generative AI to support your development and creative teams
* Explore business challenges in content creation
* Cost-per-unit of different types of content
* Use AI to reduce cost-per-unit
* New partnerships being formed that will have a material impact on the way we search and engage with content
Part 4 of a 9 Part Research Series named "What matters in AI" published on www.andremuscat.com
The GPT-3 model architecture is a transformer-based neural network that has been fed 45TB of text data. It is non-deterministic, in the sense that given the same input, multiple runs of the engine will return different responses. Also, it is trained on massive datasets that covered the entire web and contained 500B tokens, humongous 175 Billion parameters, a more than 100x increase over GPT-2, which was considered state-of-the-art technology with 1.5 billion parameters.
This is the slideshow for a presentation I gave as part of my graduate coursework at the Institute for Innovation and Public Purpose at University College London (UCL IIPP). Drawing on the work of IIPP professors including Carlota Perez (techno-economic paradigms), Mariana Mazzucato (“The Entrepreneurial State”), and Tim O’Reilly, I evaluate the innovation trajectory of Deep Neural Networks as a method of machine learning. I trace the history of machine learning to its present-day and conclude that while Deep Neural Networks have not yet reached technological maturity, they are already starting to encounter barriers to exponential growth and innovation. These slides were designed to be read independently from the spoken portion. If you found this useful or interesting, please message me on LinkedIn! - Justin Beirold
November 5, 2023
NHH: FRONT LINES ON ADOPTION OF DIGITAL AND
AI-BASED SERVICES
Thanks to Tor Andreassen for the opportunity
To discuss AI and IA.
Tor Andeassen: https://www.linkedin.com/in/tor-wallin-andreassen-1aa9031/
Genetic Algorithms and Programming - An Evolutionary Methodologyacijjournal
Genetic programming (GP) is an automated method for creating a working computer program from a high-level problem statement of a problem. Genetic programming starts from a high-level statement of “what needs to be done” and automatically creates a computer program to solve the problem. In artificial intelligence, genetic programming (GP) is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user defined task. It is a specialization of genetic algorithms (GA) where each individual is a computer program. It is a machine learning technique used to optimize a population of computer programs according to a fitness span determined by a program's ability to perform a given computational task. This paper presents a idea of the various principles of genetic programming which includes, relative effectiveness of mutation, crossover, breeding computer programs and fitness test in genetic programming. The literature of traditional genetic algorithms contains related studies, but through GP, it saves time by freeing the human from having to design complex algorithms. Not only designing the algorithms but creating ones that give optimal solutions than traditional counterparts in noteworthy ways.
History of AI - Presentation by Sanjay KumarSanjay Kumar
Join AI Shorts For Such Contents - https://lnkd.in/gpyzTpa2
Exponential growth of ChatGPT didn't happen in a day. AI Winter - The time when funding went dry, no corporate was ready to do any further development on AI or related stuff etc happened twice.
Started with Alan Turing question in 1956 "Can Machine Think?" and a conference at Dartmouth where John McCarthy coined "AI" and set the goals of AI. Arthur Samuel wrote a program that learnt to play Chinese Checker and popularise ML.
We are progressing at such a speed that we need to create a governing body "OpenAI" to make sure autonomous system don't hurt us back.
History of Artificial Intelligence (AI) from birth till date (2023).
Covers all the important events happened in due course of time with the AI Winter period.
https://bigscience.huggingface.co/
EN: Presentation of the BigScience project: a research initiative launched by HuggingFace and aiming to build a large language model (inspired by OpenAI and GPTx) over multiple languages and a very large processing cluster. The participants plan to investigate the dataset and the model from all angles: bias, social impact, capabilities, limitations, ethics, potential improvements, specific domain performances, carbon impact, general AI/cognitive research landscape.
FR : Présentation du projet Bigscience : un projet de recherche ouvert lancé par HuggingFace et qui a pour objectif de contruire un modèle de langue (ie un peu comme openAI et GPT-3) mais en explorant les problèmes liés au jeux de données et au modèle selon les angles des biais cognitifs, de l'impact social et environemental, des limites éthiques, des possibles gain de performance et de l'impact général de ce type d'approche lorsque le but n'est pas seulement "d'avoir un plus gros modèle".
Talk at SRI for Post Industrial Forum
June 28, 2023 5pm-8pm
https://post-industrial.institute/forum/
Frode Odegard invitation
https://www.linkedin.com/in/odegard/
Give a background of Data Science and Artificial Intelligence, to better understand the current state of the art (SOTA) for Large Language Models (LLMs) and Generative AI. Then start a discussion on the direction things are going in the future.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. 1966: ELIZA
Image source: en.wikipedia.org/wiki/ELIZA#/media/File:ELIZA_conversation.png
“While ELIZA was capable of
engaging in discourse, it
could not converse with true
understanding. However,
many early users were
convinced of ELIZA's
intelligence and
understanding, despite
Weizenbaum's insistence to
the contrary.”
Source: en.wikipedia.org/wiki/ELIZA (and
references therein).
3. 2005: SCIgen - An Automatic CS Paper Generator
nature.com/articles/d41586-021-01436-7
news.mit.edu/2015/how-three-mit-students-fooled-scientific-journals-0414
A project using a rather rudimentary technology that aimed to "maximize amusement, rather than coherence" is
still the cause of troubles today...
pdos.csail.mit.edu/archive/scigen
4. 2017: Google Revolutionized Text Generation
■ Vaswani (2017), Attention Is All You Need (doi.org/10.48550/arXiv.1706.03762)
■ openai.com/research/better-language-models
Image generated with DALL.E: “A small robot standing on the
shoulder of a giant robot” (and slightly modified with The Gimp)
OpenAI’s Generative Pre-trained
Transformer (DALL.E, 2021; ChatGPT,
2022), as the name suggests, reposes on
Transformers.
Google introduced the Transformer,
which rapidly became the state-of-the-art
approach to solve most NLP problems.
5. ● Kiela et al. (2021), Dynabench: Rethinking Benchmarking in NLP: arxiv.org/abs/2104.14337
● Roser (2022), The brief history of artificial intelligence: The world has changed fast – what might be next?: ourworldindata.org/brief-history-of-ai
Transformers
2017
Text and shapes in blue have been added to the original work from Max Roser.
6. What Are Transformers?
Source: Vaswani (2017), Attention Is All You Need
(doi.org/10.48550/arXiv.1706.03762)
Generative (deep learning) models for understanding and generating text,
images and many other types of data.
Transformers analyze chunks of data, called "tokens" and learn to predict
the next token in a sequence, based on previous and, if available, following
tokens.
The auto-regressive concept means that the output of the model, such as
the prediction of a word in a sentence, is influenced by the previous words it
has generated.
Music—MusicLM (Google) and Jukebox (OpenAI) generate music from text.
Image—Imagen (Google) and DALL.E (OpenAI) generate novel images from text.
Texte—OpenAI’s GPT has become widely known, but other players have similar technology
(including Google, Meta, Anthropic and others).
Others—Recommender (movies, books, flight destinations), drug discovery…
Models that learn from a given dataset how to
generate new data instances.
7. 2022: ChatGPT
“ChatGPT, the popular chatbot
from OpenAI, is estimated to have
reached 100 million monthly
active users in January, just two
months after launch, making it the
fastest-growing consumer
application in history”
statista.com/chart/29174/time-to-one-million-users
Reuters, Feb 1, 2023
https://reut.rs/3yQNlGo
8. The Mushrooming of Transformer-Based LLMs
PaML (540b), LaMDA
(137b) and others (Bard
relies on LaMDA)
OPT-IML (175b), Galactica
(120b), BlenderBot3
(175b), Llama 2 (70b)
ERNIE 3.0 Titan (260b)
GPT-3 (175b), GPT-3.5 (?b),
GPT-4 (?b)
BLOOM (176b)
PanGu-𝛼 (200b)
Jurassic-1 (178b), Jurassic-2 (?b)
Exaone (300b)
Megatron-Turing NLG (530b)
(It appears that all those models rely only on
transformer-based decoders)
12. AI Mentions Boost Stock Prices
● AI-mentioning companies:
+4.6% avg. stock price
increase (nearly double of the
non-mentioning).
● In general, 67% of companies
that mentioned AI observed an
increase in their stock prices
→ +8.5% on average.
● Tech companies:
71% → +11.9% on avg.
● Non-tech companies:
65% → +6.8% on avg.
- Mentions of "AI" and related terms (machine learning, automation, robots, etc.).
- S&P 500 companies in 2023.
- 3-day change from the date the earnings call transcript was published. Source: wallstreetzen.com/blog/ai-mention-moves-stock-prices-2023
13. GPUs Demand Skyrockets
Before LLMs, GPUs were primarily needed for training, and
CPUs were used for inference. However, with the emergence
of LLMs, GPUs have become almost essential for both tasks.
Paraphrasing Brannin McBee, co-founder of CoreWeave, in
Bloomberg Podcast*:
While you may train the model using 10,000 GPUs, the real
challenge arises when you need 1 million GPUs to meet the
entire inference demand. This surge in demand is expected
during the initial one to two years after the launch, and it's likely
to keep growing thereafter.
* How to Build the Ultimate GPU Cloud to Power AI | Odd Lots (youtube.com/watch?v=9OOn6u6GIqk&t=1308s)
14. Enhancing Productivity With Generative AI?
nature.com/articles/d41586-023-02270-9
science.org/doi/10.1126/science.adh2586
16. Beware of “Hallucinations” Which Do Remain Very Real
“Hallucinations” are “confident
statements that are not true”1
.
For the moment, this
phenomenon inexorably
affects all known LLMs.
1: fr.wikipedia.org/wiki/Hallucination_(intelligence_artificielle)
Yves Montand in “Le Cercle Rouge” during an attack of delirium tremens
This thing probably doesn't exist.
17. Concrete
Hallucinations (GPT-4)
We asked ChatGPT the first part of the third
question of the British Mathematical Olympiad
1977: bmos.ukmt.org.uk/home/bmo-1977.pdf
Is that so? Although not an obvious
hallucination, it may remind us of Fermat’s
lack of space in the margin to give the proof
of his last theorem… Perhaps here there is a
lack of tokens?
Here a total hallucination, this statement is
evidently false.
Perhaps it meant “the
product of two negative
numbers”
Here a total hallucination, this statement is
evidently false. (Although in this case the
inequality is indeed clearly true.)
18. The Saga of the Lawyer Who Used ChatGPT
nytimes.com/2023/06/08/nyregion/law
yer-chatgpt-sanctions.html
nytimes.com/2023/05/27/nyregion/avia
nca-airline-lawsuit-chatgpt.html
nytimes.com/2023/06/22/nyregion/la
wyers-chatgpt-schwartz-loduca.html
19. ChatGPT: Achieving Human-Level Performance in
Professional and Academic Benchmarks
● GPT-4's performance in recent tests is
undeniably impressive.
● Study conducted by OpenAI
(openai.com/papers/gpt-4.pdf).
● Most of those tests mainly focus on high
school-level content.
● Many are prepared through test prep
courses and resources.
● By contrast, university exams typically
require a deeper understanding of course
material and critical thinking skills.
● Uniform Bar Exam: Worth noting, but
potential overestimation concerns (see
dx.doi.org/10.2139/ssrn.4441311).
20. Exploring the MIT Mathematics and EECS Curriculum Using
Large Language Models
Published on Jun 15, 2023
Authors: Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith
Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando
Solar-Lezama, Iddo Drori
Abstract
We curate a comprehensive dataset of 4,550 questions and solutions from problem sets,
midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and
Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of
large language models to fulfill the graduation requirements for any MIT major in Mathematics
and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT
curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set
excluding questions based on images. We fine-tune an open-source large language model on
this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed
performance breakdown by course, question, and answer type. By embedding questions in a
low-dimensional space, we explore the relationships between questions, topics, and classes and
discover which questions and classes are required for solving other questions and classes
through few-shot learning. Our analysis offers valuable insights into course prerequisites and
curriculum design, highlighting language models' potential for learning and improving
Mathematics and EECS education.
Source: arxiv.org/abs/2306.08997
i.e., GPT-4
scored 100% on
MIT EECS
Curriculum
(Electrical
Engineering and
Computer
Science)
21. “No, GPT4 can’t ace MIT”
Three MIT undergrads have debunked the myth.
- 4% of the questions were unsolvable. (How did GPT-4 achieve 100%?)
- Information leak in some few-shot prompts: for those, the answer was
quasi-given in the question.
- The automatic grading using GPT-4 itself has some severe issues: prompt
cascade that reprompted (many times) when the given answer was deemed
incorrect. 16% of the questions were multi-choices questions, hence a
quasi-guaranteed correct response.
- Bugs found in the research script that raise serious questions regarding the
soundness of the study.
Source: flower-nutria-41d.notion.site/No-GPT4-can-t-ace-MIT-b27e6796ab5a48368127a98216c76864
Note: The paper has since been withdrawn (see official statement at people.csail.mit.edu/asolar/CoursesPaperStatement.pdf)
22. Chemistry May Not Be ChatGPT Cup of Tea
A study conducted by three researchers of the University of
Hertfordshire (UK) showed that ChatGPT is not a fan of
chemistry.
Real exams were used, and the authors note that “[a] well-written
question item aims to create intellectual challenge and to require
interpretation and inquiry. Questions that cannot be easily
‘Googled’ or easily answered through a single click in an
internet search engine is a focus.”
“The overall grade on the year 1 paper calculated from the top
four graded answers would be 34.1%, which does not meet the
pass criteria. The overall grade on the year 2 paper would be
18.3%, which does not meet the pass criteria.”
Source: Fergus et al., 2023, Evaluating Academic Answers Generated Using ChatGPT (pubs.acs.org/doi/10.1021/acs.jchemed.3c00087)
23. The “Drift” Phenomenon
Sources:
- wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0
- Chaîne et al., 2023, arxiv.org/abs/2307.09009
● New research from Stanford and UC Berkeley
highlights a fundamental challenge in AI
development: "drift."
● Drift occurs when improving one aspect of
complex AI models leads to a decline in
performance in other areas.
● ChatGPT has shown deterioration in basic math
operations despite advancements in other tasks.
● GPT-4 exhibits reduced responsiveness to
chain-of-thought prompting (may be intended to
mitigate potential misuse with malicious
prompts).
The “behavior of the ‘same’ LLM service can
change substantially in a relatively short amount of
time, highlighting the need for continuous monitoring
of LLMs” (Chain et al., 2023).
24. Techniques for Tailoring LLMs to
Specific Problems
Prompts Engineering
Fine-Tuning
Reinforcement Learning From Human Feedback (RLHF)
25. First We Must Have a Problem to Solve…
Source: DeepLearning.AI, licensed under CC BY-SA 2.0
26. Then We Need a Model
Commercial APIs
- Google, OpenAI, Anthropic, Microsoft...
- Privacy concerns may arise.
- No specific hardware requirement.
- Prompt engineering (OpenAI offers prompt fine-tuning).
Use a foundation model (many open sources models are available)
- As it is (prompt engineering),
- or fine-tuned (either full or parameter efficient fine-tuning).
- May required specific hardware/infrastructure for hosting, fine-tuning and
inferences.
Train a model from the scratch
- Requires huge resources (both data and computing power).
- (e.g., BloombergGPT, arxiv.org/abs/2303.17564.)
27. A Plethora of Open
Source Pre-Trained
Models
huggingface.co/models
Models should be selected
depending on:
● The problem at hand.
● The strength of the model.
● The operating costs (larger
models require more
resources).
● Other considerations (e.g.,
license).
28. Prompt Engineering: “Query Crafting”
Improving the output with actions like phrasing
queries, specifying styles, providing context, or
assigning roles (e.g., 'Act as a mathematics
teacher') (Wikipedia, 2023).
Some hints can be found in OpenAI’s “GPT best
practices” (OpenAi, 2023).
Chain-of-thought: popular technique consisting
in “guiding [LLMs] to produce a sequence of
intermediate steps before giving the final answer”
(Wei et al., 2022).
Sources:
- Wei, J.et al., 2022. Emergent abilities of large language models, arxiv.org/abs/2206.07682
- OpenAI, 2023, platform.openai.com/docs/guides/gpt-best-practices/six-strategies-for-getting-better-results
- Wikipedia, 2023, , Prompt Engineering, en.wikipedia.org/wiki/Prompt_engineering
(graph from Wei et al., 2022)
About GSM8K benchmark: arxiv.org/abs/2110.14168
29. Prompt Engineering: In-Context Learning (ICL)
In-Context Learning (ICL) consists in “a few input-output
examples in the model’s context (input) as a preamble
before asking the model to perform the task for an unseen
inference-time example” (Wei et al., 2022).
It is a kind of “ephemeral supervised learning.”
- Zero-shot prompting or Zero-shot learning: no example
given (for largest LLMs, smaller ones may struggle).
- One-shot prompting: one example provided.
- Few-shot prompting: a few examples (typically 3~6).
⚠ Context window limits (e.g., 4096 tokens).
Tweet: @lufthansa Please find our
missing luggage!!
Sentiment: negative
Tweet: Will be on LH to FRA very soon.
Cheers!
Sentiment: positive
Tweet: Refused to compensate me for 2
days cancelled flights . Joke of a airline
Sentiment:
LLM
negative
Example of an input and
output for two-shot prompting
Source: Wei, J.et al., 2022. Emergent abilities of large language models, arxiv.org/abs/2206.07682
30. Fine-Tuning: Introduction
Few shot learning:
- May not be sufficient for smaller models.
- Consumes tokens from the context window.
Fine-tuning is a supervised learning process
that leads to a new model (in contrast with
in-context learning that is “ephemeral”).
Task specific prompt-completion pairs data are
required.
Base LLM
Fine-tuned
LLM
(Prompt_1, completion_1)
(Prompt_2, completion_2)
…
(Prompt_n, completion_n)
Task specific prompt-completion
pairs data
31. Full Fine-Tuning: Updating All Parameters
Fine-tuning very often means “instruction fine-tuning.”
Instruction fine-tuning: each prompt-completion pair includes a specific
instruction (summarize this, translate that, classify this tweet, …).
● Fine-tuning on a single task (e.g, summarization) may lead to a phenomenon
referred to as “catastrophic forgetting” (arxiv.org/pdf/1911.00202), where the
model loses its abilities on other tasks (may not be a business issue, though).
● Fine-tuning on multi tasks (e.g., summarization, translation, classification, …).
This requires a lot more training data. (E.g., see FLAN in Wei et al., 2022.)
Full fine-tuning is extremely resources demanding, even more so for large models.
Source: Wei et al., 2022, Finetuned Language Models Are Zero-Shot Learners. arxiv.org/abs/2109.01652
32. Parameter Efficient Fine-Tuning (PEFT)
Unlike full fine-tuning, PEFT preserves the vast majority of the weights of the original
model.
● Less prone to “catastrophic forgetting” on single task.
● Often a single GPU is enough.
Three methods:
● Selective—subset of initial params to fine-tune.
● Reparameterization—reparameterize model weights using a low-rank
representation, e.g., LoRA (Hu et al., 2021).
● Additive—add trainable layers or parameters to model, two approaches:
- Adapters: add new trainable layers to the architecture of the model.
- Soft prompts: focus on manipulating the input (this is not prompt engineering).
Source:
- coursera.org/learn/generative-ai-with-llms/lecture/rCE9r/parameter-efficient-fine-tuning-peft
- Hu et al., 2021, LoRA: Low-Rank Adaptation of Large Language Models. arxiv.org/abs/2106.09685
33. OpenAI API offers
prompt tuning for
gpt-3.5-turbo, but not
“yet” for GPT-4.
platform.openai.com/docs/guides/fine-tuning
Fine-Tuning With
OpenAI GPT
(PEFT)
34. Reinforcement Learning From Human Feedback
LLMs are trained on the web data with a lot of irrelevant matters (unhelpful), or worse,
where false (dishonest) and/or harmful information are abundant, e.g.,
● Potentially dangerous false medical advices.
● Valid techniques for illegal activities (hacking, deceiving, building weapons, …).
HHH (Helpful, Honest & Harmless) alignment (Askell et al., 2021): ensuring that the
model's behavior and outputs are consistent with human values, intentions, and ethical
standards.
Reinforcement Learning from Human Feedback, or RLHF (Casper et al., 2023)
● “is a technique for training AI systems to align with human goals.”
● “[It] has emerged as the central method used to finetune state-of-the-art [LLMs].”
● It reposes on human judgment and consensus.
Source:
- Casper et al., 2023, Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. arxiv.org/abs/2307.15217
- Ziegler et al., 2022, Fine-Tuning Language Models from Human Preferences. arxiv.org/abs/1909.08593
- Askell et al., 2021, A General Language Assistant as a Laboratory for Alignment. arxiv.org/abs/2112.00861
35. What Is RLHF by Sam Altman
5:59
What is RLHF? Reinforcement Learning with Human Feedback, …
6:07
… So, we trained these models on a lot of text data and, in that process, they
learned the underlying, …. And they can do amazing things.
6:26
But when you first play with that base model, that we call it, after you finish
training, … it can do a lot of, you know, there's knowledge in there. But it's not
very useful or, at least, it's not easy to use, let's say. And RLHF is how we
take some human feedback,
6:45
the simplest version of this is show two outputs, ask which one is better
than the other,
6:50
which one the human raters prefer, and then feed that back into the model
with reinforcement learning.
6:56
And that process works remarkably well with, in my opinion, remarkably little
data to make the model more useful. So, RLHF is how we align the model to
what humans want it to do.
Sam Altman: OpenAI CEO on
GPT-4, ChatGPT, and the Future of
AI | Lex Fridman Podcast #367
(youtu.be/L_Guz73e6fw?si=vfkdtN
CyrQa1RzZR&t=359)
36. Source: Liu et al., 2022, Aligning Generative Language Models with Human Values. aclanthology.org/2022.findings-naacl.18
RLHF: Example of Alignment Tasks
38. Assessing and Comparing LLMs
Metrics while training the model—ROUGE (summary) or BLEU (translation).
Benchmarks—A non-exhaustive list:
- ARC (Abstraction and Reasoning Corpus, arxiv.org/pdf/2305.18354),
- HellaSwag (arxiv.org/abs/1905.07830),
- TruthfulQA (arxiv.org/abs/2109.07958),
- GLUE & SuperGLUE (General Language Understanding Evaluation, gluebenchmark.com),
- HELM (Holistic Evaluation of Language Models, crfm.stanford.edu/helm),
- MMLU (Massive Multitask Language Understanding, arxiv.org/abs/2009.03300),
- BIG-bench (arxiv.org/pdf/2206.04615).
Others—“Auto-Eval of Question-Answering Tasks”
(blog.langchain.dev/auto-eval-of-question-answering-tasks).
39. Source: Wu et al., 2023,
BloombergGPT: A Large Language
Model for Finance.
arxiv.org/abs/2303.17564 (Table 13:
“BIG-bench hard results using
standard 3-shot prompting”)
40. Source: Touvron et al., 2023, Llama 2: Open Foundation and Fine-Tuned Chat Models,
scontent-fra3-1.xx.fbcdn.net/v/t39.2365-6/10000000_662098952474184_2584067087619170692_n.pdf
42. Question ChatGPT About the Latest Financial
Reports?
—blog.langchain.dev/tutorial-
chatgpt-over-your-data
“[ChatGPT] doesn’t know about
your private data, it doesn’t know
about recent sources of data.
Wouldn’t it be useful if it did?”
43. Workflow Overview
Question
Answer
« Quels vont être les dividendes payés
par action par le Groupe Crit ? »
« Le Groupe CRIT proposera lors de sa prochaine Assemblée Générale, le 9
juin 2023, le versement d'un dividende exceptionnel de 3,5 € par action. »
The example (the question and associated
answer) is a real example (the LLM was
“gpt-3.5-turbo” from OpenAI)
Technique described in: Lewis et al., 2020.
Retrieval-augmented generation for knowledge-intensive
nlp tasks. (doi.org/10.48550/arXiv.2005.11401)
Extracting
relevant
information
(“context”)
Generate a prompt
accordingly
(“question +
context”)
LLM
Vector store
Split into chunks
1
2 3
Compute
embeddings
44. Preliminary Prototype
Financial reports retrieved directly from the French AMF (“Autorité
des marchés financiers”) via their API (info-financiere.fr).
xhtml document in
French language.
Question and answer
are in English (they
would be in French
should the question be
asked in French).
45. Except where otherwise noted, this work is licensed under
https://creativecommons.org/licenses/by/4.0/
619.io