Advertisement
Advertisement

More Related Content

Similar to Generative Models and ChatGPT(20)

Advertisement

Generative Models and ChatGPT

  1. Generative Models & ChatGPT Loic Merckel Image generated with DALL.E: “Billboard advertising a robot in a futuristic city at night with bluish neon lit” (and slightly modified with The Gimp) March 19, 2023
  2. Generative Models Models that learn from a given dataset how to generate new data instances. https://developers.google.com/machine-learning/gan/generative ♫ ♬ ♪♪ A generative models is trained using a dataset: It can subsequently generate new data instances: ♫ Music—Google Research introduced MusicLM that generates music from text. OpenAI released Jukebox, “provided with genre, artist, and lyrics as input, Jukebox outputs a new music sample produced from scratch.” Image—Both Google (Imagen) and OpenAI (DALL.E) have developed impressive models that generate novel images from text. Text—OpenAI’s ChatGPT has become widely known, but other players have similar, possibly even better, technology (including Google, with Bard, and Meta with BlenderBot3). Others—Recommender (movies, books, flight destinations), drug discovery… ■ ChatGPT: https://chat.openai.com/ ■ Bard: https://bit.ly/3JpiFkH ■ Recommender: https://arxiv.org/abs/1802.05814 ■ Drug discovery: https://bit.ly/42lguaj ■ MusicLM: https://bit.ly/3Tm4Rfk ■ Jukebox: https://openai.com/research/jukebox ■ Imagegen: https://imagen.research.google/ ■ DALL.E: https://labs.openai.com/
  3. Discriminative vs. Generative Models GLM, GBM, SVM, RF, Feedforward ANN, … GMM, VAE, GAN, Transformers … Given a set of data instances X (and a set of labels Y) “Discriminative models capture the conditional probability P(Y | X).” “Generative models capture the joint probability P(X, Y), or just P(X) if there are no labels.” Source: https://developers.google.com/machine-learning/gan/generative Y1 Y2 In a regression analysis, Y is continuous. We are then interested in the conditional expectation E(Y|X)—which depends on the conditional probability density function.
  4. Discriminative Model: 2016 Olympics Athletes ● We know the gender (y) and the weight (X) of each athlete. ● Given a weight, what is the probability of the gender, i.e., P(y | X)? ● P(y = Female | X = 50 kg) ≈ 89.6% ● P(y = Female | X = 65 kg) ≈ 60.4% ● P(y = Female | X = 100 kg) ≈ 2.6% (Obtained by fitting a simple logistic regression model) Dataset: https://www.kaggle.com/datasets/rio2016/olympic-games ≈ 69 kg Female Male
  5. Generative Model: 2016 Olympics Athletes Let us imagine a situation where we have only the weights data of athletes (no gender information). We wish to generate more synthetic data that cannot easily be discerned from the real world observed data. In this toy case, a Gaussian mixture model can be fitted. Although the model identifies two components, it cannot label them. The labels (‘Female’ and ‘Male’) have been set via our knowledge of the context. Newly generated data instances
  6. Text Generation Models Image generated with DALL.E: “Writing with a fountain pen”
  7. 1966: ELIZA Image source: https://en.wikipedia.org/wiki/ELIZA#/media/File:ELIZA_conversation.png “While ELIZA was capable of engaging in discourse, it could not converse with true understanding. However, many early users were convinced of ELIZA's intelligence and understanding, despite Weizenbaum's insistence to the contrary.” Source: https://en.wikipedia.org/wiki/ELIZA (and references therein).
  8. 2005: SCIgen - An Automatic CS Paper Generator https://www.nature.com/articles/d41586-021-01436-7 https://news.mit.edu/2015/how-three-mit-students-fooled-scientific-journals-0414 A project using a rather rudimentary technology that aimed to "maximize amusement, rather than coherence" is still the cause of troubles today... https://pdos.csail.mit.edu/archive/scigen/
  9. 2017: Google Revolutionized Text Generation ■ Vaswani (2017), Attention Is All You Need (doi.org/10.48550/arXiv.1706.03762) ■ https://openai.com/research/better-language-models Image generated with DALL.E: “A small robot standing on the shoulder of a giant robot” (and slightly modified with The Gimp) OpenAI’s Generative Pre-trained Transformer (DALL.E, 2021; ChatGPT, 2022), as the name suggests, reposes on Transformers. Google introduced the Transformer, which rapidly became the state-of-the-art approach to solve most NLP problems.
  10. ● Kiela et al. (2021), Dynabench: Rethinking Benchmarking in NLP: https://arxiv.org/abs/2104.14337 ● Roser (2022), The brief history of artificial intelligence: The world has changed fast – what might be next?: https://ourworldindata.org/brief-history-of-ai Transformers 2017 Text and shapes in blue have been added to the original work from Max Roser.
  11. What are Transformers? Images source: https://colab.research.google.com/drive/1L42pL04PbauS-nNzVg7IYNtrK0pFYCGY Encoder Decoder Encoder—Self-attention mechanism: each word is encoded in a numerical sequence, which is contextualized, for this sequence is formed taking into account the other surrounding words (left and right, the “context”). Decoder—Masked self-attention mechanism (left xor right context), cross-attention and auto-regressive (re-uses its past outputs as inputs of the following steps) Transformer (1-layer ) Transformer (4-layer ) Both encoder and decoder can be used as a standalone model. Popular LLMs rely only on decoders. Whereas, e.g., machine translations may leverage the “full” transformer architecture. Source: Vaswani (2017), Attention Is All You Need (doi.org/10.48550/arXiv.1706.03762)
  12. Going Further… https://youtu.be/LE3NfEULV6k For a rather high-level understanding: For getting your hands dirty: https://colab.research.google.com/drive/1L42pL04Pba uS-nNzVg7IYNtrK0pFYCGY https://youtu.be/H39Z_720T5s https://youtu.be/MUqNwgPjJvQ https://youtu.be/d_ixlCubqQw https://youtu.be/0_4KEb08xrE Video lecture on Embeddings: https://developers.google.com/ machine-learning/crash-course/ embeddings/video-lecture
  13. The Mushrooming of Transformer-based LLMs PaML (540b), LaMDA (137b) and others (Bard relies on LaMDA) OPT-IML (175b), Galactica (120b), BlenderBot3 (175b) and perhaps others? ERNIE 3.0 Titan (260b) GPT-3 (175b), GPT-3.5 (?b), more versions coming… (ChatGPT relies on GPT-3.5) BLOOM (176b) PanGu-𝛼 (200b) Jurassic-1 (178b), Jurassic-2 (?b) Exaone (300b) Megatron-Turing NLG (530b) (It appears that all those models rely only on transformer-based decoders)
  14. ChatGPT
  15. 2022: ChatGPT “ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after launch, making it the fastest-growing consumer application in history” https://www.statista.com/chart/29174/time-to-one-million-users/ Reuters, Feb 1, 2023 https://reut.rs/3yQNlGo
  16. “ChatGPT is 'not particularly innovative,' and 'nothing revolutionary', says Meta's chief AI scientist The public perceives OpenAI's ChatGPT as revolutionary, but the same techniques are being used and the same kind of work is going on at many research labs, says the deep learning pioneer.” Irrational Exuberance? https://twitter.com/ylecun/status/1617921903934726144 https://on.ft.com/3JRPM22 zdnet.com, https://zd.net/3mTlOS0
  17. Google’s Bard, Meta’s Galactica & Baidu’s Ernie https://bit.ly/3Jnt404 https://bit.ly/3TnLRwS https://reut.rs/3FvarpQ “Bard and ChatGPT are large language models, not knowledge models. They are great at generating human-sounding text, they are not good at ensuring their text is fact-based.” —Jack Krawczyk, the product lead for Bard, March 2, 2023 (https://cnb.cx/3ZXFFy3) https://on.ft.com/3JogEVH (March 16, 2023) “Baidu Inc. surged more than 14% Friday after brokerages including Citigroup tested the company’s just-unveiled ChatGPT-like service and granted it their preliminary approval.” —Bloomberg, March 17, 2023 (https://yhoo.it/3JLxAXI)
  18. “we’ll see” https://bit.ly/3Z365gC https://spectrum.ieee.org/ai-hallucination
  19. Except where otherwise noted, this work is licensed under https://creativecommons.org/licenses/by/4.0/ 619.io
Advertisement