The article compares large language models like ChatGPT to lossy data compression. It argues that ChatGPT retains much of the information from the web, but presents it in an approximated, blurred way similar to how a JPEG retains image information at a lower resolution. This blurriness can cause the model to generate fabricated or nonsensical answers. However, the rephrasing of information makes ChatGPT seem more knowledgeable than it is, like it truly understands the material rather than just regurgitating facts. While large language models are impressive, we must be aware of their limitations in understanding complex topics versus just identifying statistical patterns in text data.
This presentation presents an overview of the challenges and opportunities of generative artificial intelligence in Web3. It includes a brief research history of generative AI as well as some of its immediate applications in Web3.
Testing AI involves validating that AI systems perform as intended and are free of unintended behaviors. This includes testing the training data, model architecture, and system outputs. Challenges include the inability to test all possible inputs and scenarios, as well as accurately interpreting ambiguous or uncertain outputs. Emerging techniques use machine learning to automatically generate test cases, fuzz testing to introduce adversarial inputs, and model analysis to evaluate behaviors. Proper testing is crucial to ensure AI systems do not negatively impact users or society.
in this captcha report, you get everything those you need in a seminar. hope you like this report..
please check it out. and use it
for more report contect me. my email id is rkrakeshkumar99@gmail.com
1) ChatGPT is an AI conversational agent developed by OpenAI that can understand questions and respond in a human-like manner across a wide range of topics.
2) It works using a deep learning technique called a transformer that processes input text and generates responses through encoding and decoding layers with self-attention.
3) Some potential uses of ChatGPT include building chatbots for customer support, creating content, developing personal assistants, enabling language translation, and aiding education and research.
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and PublishingErin Owens
The artificial intelligence tool ChatGPT has taken the world by storm, prompting concerns about student plagiarism. But A.I. text and image generators also pose ethical and legal conundrums for scholarly researchers. This session will delve into some of the emerging issues and developments that may affect faculty in scholarly writing and publishing.
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Produced by Nathan Benaich and Air Street Capital team
Details regarding the working of chatgpt and basic use cases can be found in this presentation. The presentation also contains details regarding other Open AI products and their useability. You can also find ways in which chatgpt can be implemented in existing App and websites.
Thompson Sampling for Machine Learning - Ruben MakPyData
PyData Amsterdam 2018
In this talk I hope to give a clear overview of the opportunites for applying Thompson Sampling in machine learning. I will share some technical examples in recent developments (for example Bayesian Neural Networks using Edward) but more importantly I hope to trigger the audience to start thinking strategically about how we want our machine learning models to learn from new data.
This presentation presents an overview of the challenges and opportunities of generative artificial intelligence in Web3. It includes a brief research history of generative AI as well as some of its immediate applications in Web3.
Testing AI involves validating that AI systems perform as intended and are free of unintended behaviors. This includes testing the training data, model architecture, and system outputs. Challenges include the inability to test all possible inputs and scenarios, as well as accurately interpreting ambiguous or uncertain outputs. Emerging techniques use machine learning to automatically generate test cases, fuzz testing to introduce adversarial inputs, and model analysis to evaluate behaviors. Proper testing is crucial to ensure AI systems do not negatively impact users or society.
in this captcha report, you get everything those you need in a seminar. hope you like this report..
please check it out. and use it
for more report contect me. my email id is rkrakeshkumar99@gmail.com
1) ChatGPT is an AI conversational agent developed by OpenAI that can understand questions and respond in a human-like manner across a wide range of topics.
2) It works using a deep learning technique called a transformer that processes input text and generates responses through encoding and decoding layers with self-attention.
3) Some potential uses of ChatGPT include building chatbots for customer support, creating content, developing personal assistants, enabling language translation, and aiding education and research.
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and PublishingErin Owens
The artificial intelligence tool ChatGPT has taken the world by storm, prompting concerns about student plagiarism. But A.I. text and image generators also pose ethical and legal conundrums for scholarly researchers. This session will delve into some of the emerging issues and developments that may affect faculty in scholarly writing and publishing.
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Produced by Nathan Benaich and Air Street Capital team
Details regarding the working of chatgpt and basic use cases can be found in this presentation. The presentation also contains details regarding other Open AI products and their useability. You can also find ways in which chatgpt can be implemented in existing App and websites.
Thompson Sampling for Machine Learning - Ruben MakPyData
PyData Amsterdam 2018
In this talk I hope to give a clear overview of the opportunites for applying Thompson Sampling in machine learning. I will share some technical examples in recent developments (for example Bayesian Neural Networks using Edward) but more importantly I hope to trigger the audience to start thinking strategically about how we want our machine learning models to learn from new data.
The document lists various AI tools across different categories including chat/speech tools, artwork generators, writing tools, speech-to-text transcription, visual editors, and video tools. Some of the tools listed include ChatGPT, DALL-E, Stable Diffusion, IBM Watson, Google Cloud, Microsoft Azure, Adobe Sensei, and Synthesia. The tools cover a wide range of applications from chatbots, image generation from text, writing assistance, speech recognition, image editing, and automated video creation.
The document summarizes a seminar on artificial intelligence presented by Mr. Ishwar Bulbule. It discusses the history of AI, including key events and figures. It also covers different approaches to AI, current applications like Siri and Watson, and potential future uses such as automated transportation, cyborg technology, and improved elder care. The conclusion states that AI has increased understanding of intelligence while also revealing its complexity, providing new challenges for the future.
GPT-3 is a large language model trained by OpenAI to be task agnostic. It has 175 billion parameters compared to its predecessor GPT-2 which has 1.5 billion parameters. OpenAI plans to provide API access to select partners to query GPT-3 rather than releasing the full model. This could accelerate the development of NLP applications and allow startups to build minimum viable products without training their own models if GPT-3 performance is good enough. However, startups relying solely on the API may lack expertise to improve upon initial products.
ChatGPT is a natural language processing model developed by OpenAI that can generate human-like text in response to user inputs. The document discusses ChatGPT's capabilities and limitations, including its applications in areas like customer service, education, and entertainment. However, the document also notes that ChatGPT is still undergoing training, its responses may be inaccurate at times, and it cannot match the emotional expressiveness of human interactions.
Large Language Models and Applications in HealthcareAsma Ben Abacha
The time that medical doctors spend on Electronic Health Record systems was shown to contribute to work-life imbalance, dissatisfaction, high rates of attrition, and a burnout rate exceeding 50%. In particular, doctors spend on average 52 to 102 minutes per day writing clinical notes from their conversations with the patients. Recent studies on clinical note generation have shown that doctors can save a significant amount of time with automatic note generation systems. Progress in LLMs can play a key role in enabling further such systems and improving their performance. However, this requires high-quality datasets and benchmarks and relevant evaluation metrics to assess which model would best serve clinicians in their daily practice. In this lecture, I’ll present LLM-based solutions for the task of clinical note generation from doctor-patient conversations. I’ll also present insights from different evaluation studies and shared tasks that we organized on this topic.
Another important aspect of supporting healthcare providers with documentation and clinical decisions is to detect medical errors in clinical notes and to suggest corrections (e.g., diagnosis, treatment, medication). Such errors require medical expertise and knowledge to be both identified and corrected. Recent LLMs showed promise in being applied on unseen tasks with competitive ability. The second part of the lecture will cover a new research endeavor on medical error detection and correction and present LLM-based solutions for the task.
The document discusses various topics related to artificial intelligence including machine learning, large language models, neural networks, generative bots, ChatGPT, and Midjourney. It describes how AI is being used in applications such as healthcare, customer service, and content creation. The future of AI is explored with possibilities such as more integrated virtual assistants and personalized healthcare through processing of large amounts of medical data.
A brief introduction to generative models in general is given, followed by a succinct discussion about text generation models and the "Transformer" architecture. Finally, the focus is set on a non-technical discussion about ChatGPT with a selection of recent news articles.
This document summarizes an event organized by Pantech Solutions and the Institution of Electronics and Telecommunication (IETE) on the future of artificial intelligence. The event featured several presentations and demos on topics related to AI, including computer vision with deep learning, natural language processing, machine and deep learning, AI applications in various domains like medical, agriculture, autonomous vehicles, and brain-computer interfaces. It also discussed topics like machine learning, deep learning, AI safety concerns, and examples of AI applications in areas like search engines, social media, e-commerce, music and more. The agenda included presentations on object recognition with YOLO, brain enhancement with BCI technology, and a Python AI demo.
The document discusses how generative AI can be used to scale content operations by reducing the time it takes to generate content. It explains that generative AI learns from natural language models and can generate new text or ideas based on prompts provided by users. While generative AI has benefits like speeding up content creation and ideation, it also has limitations such as not being able to conduct original research or ensure quality. The document provides examples of how generative AI can be used for tasks like generating ideas, simplifying complex text, creating visuals, and more. It also discusses challenges like bias in AI models and the low risk of plagiarism.
Explore the transformative impact of ChatGPT, the cutting-edge AI language model, in revolutionizing human-machine conversations. From customer support to content creation, delve into its applications across industries and the ethical considerations it raises. Discover how ChatGPT evolution is shaping the future of AI-driven communication.
The Digital Sociology of Generative AI (1).pptxMark Carrigan
Generative AI will diffuse rapidly but messily as tech companies rush to incorporate it to drive growth. This will further concentrate the tech industry among a few large firms. Its effects on work will vary, intensifying rationalization in bureaucratic roles but opening new opportunities for human-machine creativity in collegial roles. It will change the nature of factual descriptions by enabling immediate production of plausible texts without identifiable authors. This will exacerbate epistemic chaos and platform capitalism's tendencies towards credulity, paranoia, and misinformation saturation. Education must prepare students for these changes through critical AI literacy and skills like content curation, evaluation, and creative expression.
This document discusses CAPTCHAs, which are challenges used to distinguish humans from bots by testing patterns recognition. It begins by defining CAPTCHAs and providing background on why they were developed, such as to prevent spam. It then covers various types of CAPTCHAs, including text, image, and audio-based, as well as their applications and how they work. The document also addresses issues with CAPTCHAs, such as accessibility and usability problems, as well as methods that have been used to break existing CAPTCHAs. In conclusion, while CAPTCHAs are generally effective against bots, their implementations face challenges to be improved in terms of issues like accessibility, compatibility and security.
The document discusses AI chatbots and their uses in various industries like travel, food/beverage, banking, healthcare, and retail. It covers how chatbots have evolved to leverage techniques like natural language processing. Examples are provided of chatbots assisting with tasks like booking travel, making reservations, checking bank accounts, and answering healthcare questions. The document also discusses emotion AI and sentiment analysis, and provides examples of chatbots like Replika and Woebot that analyze emotion. Demonstrations are included of emotion AI chatbots and integrating chatbots with e-commerce platforms and big data analytics.
This document provides a tutorial for ChatGPT, an AI chatbot created by OpenAI. It discusses how ChatGPT works, how to get started using it for free, its capabilities and limitations. Key points include: ChatGPT can be accessed for free on the OpenAI website; it uses neural networks and reinforcement learning; it has the ability to generate text but lacks image generation; and limitations include potential bias, lack of specialized knowledge, and lack of full context for questions. Alternatives to ChatGPT include paid subscription to ChatGPT Plus or using other AI text generators but no direct chatbot equivalents exist.
10 Limitations of Large Language Models and Mitigation OptionsMihai Criveti
10 Limitations of Large Language Models and ways to overcome them. Dealing with hallucinations, performance,
costs, stale training data, injecting private data, token limits and contextual memory, text conversion, lack of
transparency, ethical concerns and training costs.
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
This document provides a technical introduction to large language models (LLMs). It explains that LLMs are based on simple probabilities derived from their massive training corpora, containing trillions of examples. The document then discusses several key aspects of how LLMs work, including that they function as a form of "lossy text compression" by encoding patterns and relationships in their training data. It also outlines some of the key elements in the architecture and training of the most advanced LLMs, such as GPT-4, focusing on their huge scale, transformer architecture, and use of reinforcement learning from human feedback.
AI art refers to art generated with the assistance of artificial intelligence. The document discusses what AI art is, some popular AI art tools like DeepDream and GauGAN2, applications of AI art, its impact on human artists, and the future outlook of the field. The future of AI art looks promising but also poses challenges as AI systems get better at generating content that mimics human creations.
Artificial intelligence- The science of intelligent programsDerak Davis
Artificial intelligence (AI) involves creating intelligent computer programs and machines that can interact with the real world similarly to humans. AI uses techniques like machine learning, deep learning, and neural networks to allow programs to learn from data and experience without being explicitly programmed. While AI has potential benefits, some experts warn that advanced AI could pose risks if not developed carefully due to concerns it could become difficult for humans to control once a certain level of intelligence is achieved.
About the Webinar
The digitization of resources can provide expanded access to information as well as a preservation mechanism for now-fragile materials. Preserving the digital copy of the resource is an issue now being addressed, but what about the software used to create digital files? How can software on media which can no longer be read -- or no longer be read easily -- be preserved? If that software can’t be accessed, what happens to the material created by, and only read by, that software?
Progress has been made in formulating standards for the preservation and description of digital materials and a framework for addressing digital item preservation has been proposed. Despite, however, meetings such as the Library of Congress’ “Preserving.exe: Toward a National Strategy for Preserving Software,” no formal standard or framework yet exists for software digitization and preservation. This webinar will feature three presenters who will speak on aspects of software digitization and preservation, including a how-to approach (technical aspects), a metadata component, and observations from the field as part of the continuing discussion on the state of the field and the need for standardization.
Agenda
Introduction
Todd Carpenter, Executive Director, NISO
Software artifacts: Migration and Emulation
Michael Lesk, Professor of Library and Information Science, Rutgers University
Emulation in practice: Emulation as a Service at Yale University Library: Lessons learnt and plans for the future
Euan Cochrane, Digital Preservation Manager, Yale University Library
No (You Can't Expect To Run Your Files Just Because You Saved Them)
Jon Ippolito, Professor of New Media and Director of the Digital Curation graduate program, University of Maine
HA5 – COMPUTER ARTS BLOG ARTICLE – 3D: The Basicshamza_123456
This document discusses 3D modeling techniques for movies versus games. It explains that movie models can have millions of polygons while game models need to be more efficient to maintain performance. Game models often use techniques like normal mapping to add detail without increasing polygons. It also discusses differences in level of detail models and how not everything needs to be modeled in movies.
The document lists various AI tools across different categories including chat/speech tools, artwork generators, writing tools, speech-to-text transcription, visual editors, and video tools. Some of the tools listed include ChatGPT, DALL-E, Stable Diffusion, IBM Watson, Google Cloud, Microsoft Azure, Adobe Sensei, and Synthesia. The tools cover a wide range of applications from chatbots, image generation from text, writing assistance, speech recognition, image editing, and automated video creation.
The document summarizes a seminar on artificial intelligence presented by Mr. Ishwar Bulbule. It discusses the history of AI, including key events and figures. It also covers different approaches to AI, current applications like Siri and Watson, and potential future uses such as automated transportation, cyborg technology, and improved elder care. The conclusion states that AI has increased understanding of intelligence while also revealing its complexity, providing new challenges for the future.
GPT-3 is a large language model trained by OpenAI to be task agnostic. It has 175 billion parameters compared to its predecessor GPT-2 which has 1.5 billion parameters. OpenAI plans to provide API access to select partners to query GPT-3 rather than releasing the full model. This could accelerate the development of NLP applications and allow startups to build minimum viable products without training their own models if GPT-3 performance is good enough. However, startups relying solely on the API may lack expertise to improve upon initial products.
ChatGPT is a natural language processing model developed by OpenAI that can generate human-like text in response to user inputs. The document discusses ChatGPT's capabilities and limitations, including its applications in areas like customer service, education, and entertainment. However, the document also notes that ChatGPT is still undergoing training, its responses may be inaccurate at times, and it cannot match the emotional expressiveness of human interactions.
Large Language Models and Applications in HealthcareAsma Ben Abacha
The time that medical doctors spend on Electronic Health Record systems was shown to contribute to work-life imbalance, dissatisfaction, high rates of attrition, and a burnout rate exceeding 50%. In particular, doctors spend on average 52 to 102 minutes per day writing clinical notes from their conversations with the patients. Recent studies on clinical note generation have shown that doctors can save a significant amount of time with automatic note generation systems. Progress in LLMs can play a key role in enabling further such systems and improving their performance. However, this requires high-quality datasets and benchmarks and relevant evaluation metrics to assess which model would best serve clinicians in their daily practice. In this lecture, I’ll present LLM-based solutions for the task of clinical note generation from doctor-patient conversations. I’ll also present insights from different evaluation studies and shared tasks that we organized on this topic.
Another important aspect of supporting healthcare providers with documentation and clinical decisions is to detect medical errors in clinical notes and to suggest corrections (e.g., diagnosis, treatment, medication). Such errors require medical expertise and knowledge to be both identified and corrected. Recent LLMs showed promise in being applied on unseen tasks with competitive ability. The second part of the lecture will cover a new research endeavor on medical error detection and correction and present LLM-based solutions for the task.
The document discusses various topics related to artificial intelligence including machine learning, large language models, neural networks, generative bots, ChatGPT, and Midjourney. It describes how AI is being used in applications such as healthcare, customer service, and content creation. The future of AI is explored with possibilities such as more integrated virtual assistants and personalized healthcare through processing of large amounts of medical data.
A brief introduction to generative models in general is given, followed by a succinct discussion about text generation models and the "Transformer" architecture. Finally, the focus is set on a non-technical discussion about ChatGPT with a selection of recent news articles.
This document summarizes an event organized by Pantech Solutions and the Institution of Electronics and Telecommunication (IETE) on the future of artificial intelligence. The event featured several presentations and demos on topics related to AI, including computer vision with deep learning, natural language processing, machine and deep learning, AI applications in various domains like medical, agriculture, autonomous vehicles, and brain-computer interfaces. It also discussed topics like machine learning, deep learning, AI safety concerns, and examples of AI applications in areas like search engines, social media, e-commerce, music and more. The agenda included presentations on object recognition with YOLO, brain enhancement with BCI technology, and a Python AI demo.
The document discusses how generative AI can be used to scale content operations by reducing the time it takes to generate content. It explains that generative AI learns from natural language models and can generate new text or ideas based on prompts provided by users. While generative AI has benefits like speeding up content creation and ideation, it also has limitations such as not being able to conduct original research or ensure quality. The document provides examples of how generative AI can be used for tasks like generating ideas, simplifying complex text, creating visuals, and more. It also discusses challenges like bias in AI models and the low risk of plagiarism.
Explore the transformative impact of ChatGPT, the cutting-edge AI language model, in revolutionizing human-machine conversations. From customer support to content creation, delve into its applications across industries and the ethical considerations it raises. Discover how ChatGPT evolution is shaping the future of AI-driven communication.
The Digital Sociology of Generative AI (1).pptxMark Carrigan
Generative AI will diffuse rapidly but messily as tech companies rush to incorporate it to drive growth. This will further concentrate the tech industry among a few large firms. Its effects on work will vary, intensifying rationalization in bureaucratic roles but opening new opportunities for human-machine creativity in collegial roles. It will change the nature of factual descriptions by enabling immediate production of plausible texts without identifiable authors. This will exacerbate epistemic chaos and platform capitalism's tendencies towards credulity, paranoia, and misinformation saturation. Education must prepare students for these changes through critical AI literacy and skills like content curation, evaluation, and creative expression.
This document discusses CAPTCHAs, which are challenges used to distinguish humans from bots by testing patterns recognition. It begins by defining CAPTCHAs and providing background on why they were developed, such as to prevent spam. It then covers various types of CAPTCHAs, including text, image, and audio-based, as well as their applications and how they work. The document also addresses issues with CAPTCHAs, such as accessibility and usability problems, as well as methods that have been used to break existing CAPTCHAs. In conclusion, while CAPTCHAs are generally effective against bots, their implementations face challenges to be improved in terms of issues like accessibility, compatibility and security.
The document discusses AI chatbots and their uses in various industries like travel, food/beverage, banking, healthcare, and retail. It covers how chatbots have evolved to leverage techniques like natural language processing. Examples are provided of chatbots assisting with tasks like booking travel, making reservations, checking bank accounts, and answering healthcare questions. The document also discusses emotion AI and sentiment analysis, and provides examples of chatbots like Replika and Woebot that analyze emotion. Demonstrations are included of emotion AI chatbots and integrating chatbots with e-commerce platforms and big data analytics.
This document provides a tutorial for ChatGPT, an AI chatbot created by OpenAI. It discusses how ChatGPT works, how to get started using it for free, its capabilities and limitations. Key points include: ChatGPT can be accessed for free on the OpenAI website; it uses neural networks and reinforcement learning; it has the ability to generate text but lacks image generation; and limitations include potential bias, lack of specialized knowledge, and lack of full context for questions. Alternatives to ChatGPT include paid subscription to ChatGPT Plus or using other AI text generators but no direct chatbot equivalents exist.
10 Limitations of Large Language Models and Mitigation OptionsMihai Criveti
10 Limitations of Large Language Models and ways to overcome them. Dealing with hallucinations, performance,
costs, stale training data, injecting private data, token limits and contextual memory, text conversion, lack of
transparency, ethical concerns and training costs.
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
This document provides a technical introduction to large language models (LLMs). It explains that LLMs are based on simple probabilities derived from their massive training corpora, containing trillions of examples. The document then discusses several key aspects of how LLMs work, including that they function as a form of "lossy text compression" by encoding patterns and relationships in their training data. It also outlines some of the key elements in the architecture and training of the most advanced LLMs, such as GPT-4, focusing on their huge scale, transformer architecture, and use of reinforcement learning from human feedback.
AI art refers to art generated with the assistance of artificial intelligence. The document discusses what AI art is, some popular AI art tools like DeepDream and GauGAN2, applications of AI art, its impact on human artists, and the future outlook of the field. The future of AI art looks promising but also poses challenges as AI systems get better at generating content that mimics human creations.
Artificial intelligence- The science of intelligent programsDerak Davis
Artificial intelligence (AI) involves creating intelligent computer programs and machines that can interact with the real world similarly to humans. AI uses techniques like machine learning, deep learning, and neural networks to allow programs to learn from data and experience without being explicitly programmed. While AI has potential benefits, some experts warn that advanced AI could pose risks if not developed carefully due to concerns it could become difficult for humans to control once a certain level of intelligence is achieved.
About the Webinar
The digitization of resources can provide expanded access to information as well as a preservation mechanism for now-fragile materials. Preserving the digital copy of the resource is an issue now being addressed, but what about the software used to create digital files? How can software on media which can no longer be read -- or no longer be read easily -- be preserved? If that software can’t be accessed, what happens to the material created by, and only read by, that software?
Progress has been made in formulating standards for the preservation and description of digital materials and a framework for addressing digital item preservation has been proposed. Despite, however, meetings such as the Library of Congress’ “Preserving.exe: Toward a National Strategy for Preserving Software,” no formal standard or framework yet exists for software digitization and preservation. This webinar will feature three presenters who will speak on aspects of software digitization and preservation, including a how-to approach (technical aspects), a metadata component, and observations from the field as part of the continuing discussion on the state of the field and the need for standardization.
Agenda
Introduction
Todd Carpenter, Executive Director, NISO
Software artifacts: Migration and Emulation
Michael Lesk, Professor of Library and Information Science, Rutgers University
Emulation in practice: Emulation as a Service at Yale University Library: Lessons learnt and plans for the future
Euan Cochrane, Digital Preservation Manager, Yale University Library
No (You Can't Expect To Run Your Files Just Because You Saved Them)
Jon Ippolito, Professor of New Media and Director of the Digital Curation graduate program, University of Maine
HA5 – COMPUTER ARTS BLOG ARTICLE – 3D: The Basicshamza_123456
This document discusses 3D modeling techniques for movies versus games. It explains that movie models can have millions of polygons while game models need to be more efficient to maintain performance. Game models often use techniques like normal mapping to add detail without increasing polygons. It also discusses differences in level of detail models and how not everything needs to be modeled in movies.
HA5 – COMPUTER ARTS BLOG ARTICLE – 3D: The Basicshamza_123456
This document provides information on 3D modeling techniques for movies versus games. It discusses how movie models can have millions of polygons while game models need to be more efficient, often using fewer than 10,000 polygons. Normal maps are described as a technique to add surface detail without adding polygons. Level of detail (LOD) modeling is discussed for both movies and games. Overall, the techniques differ due to movies having no interactivity or frame rate requirements, while games need efficient, real-time rendering.
Digital cameras allow users to capture photos digitally instead of on film. They have become multifunctional, capable of recording video and sound as well. While providing benefits like easy distribution and manipulation of images, their use raises ethical considerations regarding responsibilities to subjects. Digital photos are composed of pixels, with higher pixel counts enabling larger print sizes. Optical zoom uses lens magnification while digital zoom crops and enlarges images, reducing quality. Photos are compressed using JPEG to reduce file sizes, with greater compression lowering quality more.
This document discusses 3D modeling techniques for movies versus games. It explains that movies can use higher polygon counts and various modeling techniques, while games need more efficient, lower polygon models to maintain performance. Techniques like normal mapping are used to add detail to game assets without increasing polygon counts. Level of detail (LOD) modeling is also discussed where lower resolution models are used at a distance. The document also covers differences in what needs to be modeled, such as only modeling visible parts for movies but full 360 degree models for games.
In our heated learning of the scope of genetic programming, before ...butest
The document appears to be notes from research into genetic programming and related topics. It contains summaries of various websites on artificial intelligence, genetic algorithms, genetic programming, and artificial life. Many of the sites are outdated or abandoned. Key figures and concepts mentioned include John Koza, gene expression programming, NEAT, and memristors. The document discusses applying genetic programming techniques to games and explores potential applications and limitations.
The document discusses starting a 3D printing lab using the Doodle3D API. It provides an overview of 3D printing technologies like SLA and mentions common open-source 3D printers like the Prusa i3. It then explains how the Doodle3D API allows controlling 3D printers over WiFi using HTTP requests and G-code instructions to move the print head. Finally, it discusses using the API to build a 3D printing GUI.
Isolating Cancellations from Scanned Stamps and Postal HistoryRobert Swanson
Collectors of cancellations, postal markings, and covers often wish to illustrate only the
cancellation or marking as it appears on a cover or stamp. Historically, this operation has
been performed by hand, and is called “tracing”. It is literally an artistic activity, and great skill
is required to create a good facsimile of a cancel for research purposes.
1. File formats are complex with many stakeholders who interpret specifications differently, leading to divergent implementations over time.
2. Specifications are often incomplete, unclear, non-free, or do not reflect reality, making it difficult to determine what a valid file is.
3. Relying on specifications alone is not sufficient - one must also analyze sample files and code to understand how file formats work in practice.
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(This updated version builds on our previous deck: slideshare.net/LoicMerckel/intro-to-llms.)
Presented at Troopers 2016.
When Infosec and Digipres share interests...
TL;DR
- Attack surface with file formats is too big.
- Specs are useless (just a nice ‘guide’), not representing reality.
- We can’t deprecate formats because we can’t preserve and we can’t define how they really work
- We need open good libraries to simplify landscape, and create a corpus to express the reality of file format, which gives us real “documentation”.
- Then we can preserve and deprecate older format, which reduces attack surface.
- From then on, we can focus on making the present more secure.
- We don't need new formats: reality will diverge from the specs anyway - we need 'alive' (up to date, traceable) specs.
The document discusses various digital graphics file formats including raster graphics and vector graphics.
For raster graphics, it describes common file formats like JPEG, TIFF, GIF, BMP and how they are used. JPEG is commonly used for photos on the internet due to its small file size and compatibility. TIFF maintains image quality and is used for printing. GIF supports animation and transparency. BMP stores uncompressed data but has large file sizes.
For vector graphics, it discusses formats like PSD, AI, FLA, WMF. PSD maintains layers and effects for Photoshop. AI and Illustrator create scalable images. FLA is used for Flash files and animation. WMF contains both vectors and bitmaps.
Polygon count, file size, and rendering times can constrain 3D graphics. A high polygon count means more complex models but larger file sizes that require more processing power. If the polygon count or file size is too high for the available memory and processing, it can cause issues rendering animations or walkthroughs in real-time. While polygons make up 3D objects, triangles are how they are rendered by graphics hardware. Polygon count refers to the number of triangles, and a high triangle or vertex count can impact performance. Rendering is the process of generating 2D images from 3D scene data and requires solving lighting and other effects, which may exceed real-time capabilities without rendering to temporary files.
Today’s blog post described the several aspects of Nuke software and its utilization effects and magic created in movies.
The blog is initiated by the MAAC Kolkata team to acknowledge the readers about the software Nuke.
This document provides an overview of graphics processing units (GPUs). It discusses the history and evolution of GPUs, how they work, and their increasing use for general purpose computing beyond just graphics. Specifically, it notes that GPUs were designed for parallel processing of graphics but are now used more broadly due to their high computational power. The document also summarizes key aspects of GPU architecture, programming, applications, and ongoing work to improve GPU computing tools and techniques.
This document provides an overview of graphics processing units (GPUs). It discusses the history and evolution of GPUs, how they work, and their increasing use for general purpose computing beyond just graphics. Specifically, it outlines how GPUs were originally designed to process graphics but are now highly parallel processors that can be used to accelerate complex computations. It also summarizes some of the key components of GPUs and how their performance advantages have led to a growing field of GPU computing.
The document discusses the topics of Web 3D, WebGL, and 3D interaction on the web. It provides definitions and history for these topics. WebGL allows 3D graphics rendering within web browsers without plugins using OpenGL ES. It works by using shader programs written in GLSL to render 3D graphics on a canvas element. The document discusses challenges with 3D interaction due to the 2D nature of displays and inputs, and covers various techniques for 3D input and output. Examples of 3D applications using these technologies are also mentioned.
This document discusses working as a software engineer at Facebook and some of the key aspects of the role. It covers topics like culture, scale of the platform, technical scope, and fixing bugs. As an example bug, it describes an issue with file descriptor leaks on Android and the steps taken to identify and address the problem, including testing different libraries and approaches to image loading. The engineer provides a potential hybrid solution to improve file descriptor management and memory usage.
The document analyzes four random number algorithms - Linear Congruential Generator, Xorshift, Fibonacci Linear Feedback Shift Register, and Mersenne Twister - to determine the most appropriate for procedural content generation in video game development. Through statistical tests of the algorithms, the Mersenne Twister was found to be most suitable due to its long period and statistical randomness, though the Fibonacci LFSR provides an alternative in time-critical applications where a smaller period is acceptable.
1. The document discusses key concepts related to digital photography including doing, being, becoming, and belonging.
2. It provides an overview of different types of digital cameras and how they allow users to capture, distribute, and manipulate images.
3. The document covers important technical aspects of digital photography such as pixels, pixel count, optical zoom, digital zoom, and file formats like JPEG that allow for compression of large image files.
Similar to ChatGPT Is a Blurry JPEG of the Web (20)
An astonishing, first-of-its-kind, report by the NYT assessing damage in Ukraine. Even if the war ends tomorrow, in many places there will be nothing to go back to.
OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems.
I have never seen any movie like it, ever. There are no words. Simply, “The Zone of Interest” is the greatest meditation ever made on film about the banality of evil and the capacity of human beings to be indifferent towards cruelty that beggars imagination.
Kai-Fu Lee, an AI expert and prominent investor who helped Google and Microsoft get established in China, says his new startup 01.AI will create the first “killer apps” of generative AI.
This document is the first part of a three-part series exploring issues with measuring and evaluating AI systems. It discusses how AI was traditionally evaluated by benchmarks like chess games, but that benchmarks are limited for large language models like GPT-3. Old tests like the Turing Test are no longer relevant as newer models can mimic humans. The document examines the history of focusing on data-driven approaches to progress AI and how that led to more complex models that are difficult to properly evaluate. It introduces the need to address challenges in evaluating large language models to help guide their development and impact.
Google introduced its new AI model Gemini this week, which showed impressive capabilities. However, Google exaggerated in its promotional video for Gemini by speeding up responses, shortening outputs, and using still images rather than video. While hype videos often take artistic license, people in the AI community felt Google's exaggerations went too far. The misleading marketing tactics damaged Google's credibility with developers and onlookers, though its parent company Alphabet's stock still rose following the announcement.
Previously redacted portions of the Federal Trade Commission’s lawsuit against Amazon allege Bezos gave the go-ahead to make search results worse in favor of increasing advertising revenue
This article discusses 16-year-old Alexandra Duarte's decision to undergo bariatric surgery to address her severe obesity. It describes her struggles over many years trying different diets and programs, as well as bullying due to her weight. The article provides context on the rise of childhood obesity in the US and debates around new guidelines recommending more aggressive treatment, including drugs and surgery. It also explores the biological factors that influence appetite and weight regulation in the brain and genes.
Alleged censorship of social media and disruptions to electricity and internet access have meant people under fire in Gaza can’t get the information they need to survive.
A flood of false information, partisan narratives, and weaponized “fact-checking" has obscured efforts to find out who’s responsible for an explosion at a hospital in Gaza.
The US and EU finalized a long-awaited data-sharing agreement that will allow personal data to continue flowing freely between the two regions. The deal establishes an independent review body for Europeans to appeal potential improper data collection by US intelligence agencies. It also outlines more clearly when intelligence agencies can access personal data of EU residents and how Europeans can appeal such collection. Some privacy advocates and EU lawmakers remain skeptical that it does enough to curb US mass surveillance.
He wrote a book on a rare subject. Then a ChatGPT replica appeared on Amazon.
From recipes to product reviews to how-to books, artificial intelligence text generators are quietly authoring more and more of the internet.
ChatGPT invented a sexual harassment scandal and named a real law prof as the accused. The AI chatbot can misrepresent key facts with great flourish, even citing a fake Washington Post article as evidence.
More from LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP (20)
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
1. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 1 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
ChatGPT Is a Blurry JPEG of the
Web
OpenAI’s chatbot offers paraphrases, whereas
Google offers quotes. Which do we prefer?
By Ted Chiang February 9, 2023
Illustration by Vivek Thakker
In 2013, workers at a German construction company noticed something
odd about their Xerox photocopier: when they made a copy of the floor
plan of a house, the copy differed from the original in a subtle but
significant way. In the original floor plan, each of the house’s three rooms
was accompanied by a rectangle specifying its area: the rooms were
14.13, 21.11, and 17.42 square metres, respectively. However, in the
photocopy, all three rooms were labelled as being 14.13 square metres in
size. The company contacted the computer scientist David Kriesel to
investigate this seemingly inconceivable result. They needed a computer
scientist because a modern Xerox photocopier doesn’t use the physical
xerographic process popularized in the nineteen-sixties. Instead, it scans
the document digitally, and then prints the resulting image file. Combine
that with the fact that virtually every digital image file is compressed to
save space, and a solution to the mystery begins to suggest itself.
Compressing a file requires two steps: first, the encoding, during which
the file is converted into a more compact format, and then the decoding,
whereby the process is reversed. If the restored file is identical to the
original, then the compression process is described as lossless: no
information has been discarded. By contrast, if the restored file is only an
approximation of the original, the compression is described as lossy: some
information has been discarded and is now unrecoverable. Lossless
compression is what’s typically used for text files and computer programs,
2. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 2 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
because those are domains in which even a single incorrect character has
the potential to be disastrous. Lossy compression is often used for
photos, audio, and video in situations in which absolute accuracy isn’t
essential. Most of the time, we don’t notice if a picture, song, or movie
isn’t perfectly reproduced. The loss in fidelity becomes more perceptible
only as files are squeezed very tightly. In those cases, we notice what are
known as compression artifacts: the fuzziness of the smallest JPEG and
MPEG images, or the tinny sound of low-bit-rate MP3s.
Xerox photocopiers use a lossy compression format known as JBIG2,
designed for use with black-and-white images. To save space, the copier
identifies similar-looking regions in the image and stores a single copy for
all of them; when the file is decompressed, it uses that copy repeatedly to
reconstruct the image. It turned out that the photocopier had judged the
labels specifying the area of the rooms to be similar enough that it needed
to store only one of them—14.13—and it reused that one for all three
rooms when printing the floor plan.
The fact that Xerox photocopiers use a lossy compression format instead
of a lossless one isn’t, in itself, a problem. The problem is that the
photocopiers were degrading the image in a subtle way, in which the
compression artifacts weren’t immediately recognizable. If the
photocopier simply produced blurry printouts, everyone would know that
they weren’t accurate reproductions of the originals. What led to problems
was the fact that the photocopier was producing numbers that were
readable but incorrect; it made the copies seem accurate when they
weren’t. (In 2014, Xerox released a patch to correct this issue.)
I think that this incident with the Xerox photocopier is worth bearing in
mind today, as we consider OpenAI’s ChatGPT and other similar
programs, which A.I. researchers call large language models. The
resemblance between a photocopier and a large language model might
not be immediately apparent—but consider the following scenario.
3. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 3 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
Imagine that you’re about to lose your access to the Internet forever. In
preparation, you plan to create a compressed copy of all the text on the
Web, so that you can store it on a private server. Unfortunately, your
private server has only one per cent of the space needed; you can’t use a
lossless compression algorithm if you want everything to fit. Instead, you
write a lossy algorithm that identifies statistical regularities in the text and
stores them in a specialized file format. Because you have virtually
unlimited computational power to throw at this task, your algorithm can
identify extraordinarily nuanced statistical regularities, and this allows you
to achieve the desired compression ratio of a hundred to one.
Now, losing your Internet access isn’t quite so terrible; you’ve got all the
information on the Web stored on your server. The only catch is that,
because the text has been so highly compressed, you can’t look for
information by searching for an exact quote; you’ll never get an exact
match, because the words aren’t what’s being stored. To solve this
problem, you create an interface that accepts queries in the form of
questions and responds with answers that convey the gist of what you
have on your server.
What I’ve described sounds a lot like ChatGPT, or most any other large
language model. Think of ChatGPT as a blurry JPEG of all the text on the
Web. It retains much of the information on the Web, in the same way that a
JPEG retains much of the information of a higher-resolution image, but, if
you’re looking for an exact sequence of bits, you won’t find it; all you will
ever get is an approximation. But, because the approximation is presented
in the form of grammatical text, which ChatGPT excels at creating, it’s
usually acceptable. You’re still looking at a blurry JPEG, but the blurriness
occurs in a way that doesn’t make the picture as a whole look less sharp.
This analogy to lossy compression is not just a way to understand
ChatGPT’s facility at repackaging information found on the Web by using
different words. It’s also a way to understand the “hallucinations,” or
4. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 4 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
nonsensical answers to factual questions, to which large language models
such as ChatGPT are all too prone. These hallucinations are compression
artifacts, but—like the incorrect labels generated by the Xerox photocopier
—they are plausible enough that identifying them requires comparing
them against the originals, which in this case means either the Web or our
own knowledge of the world. When we think about them this way, such
hallucinations are anything but surprising; if a compression algorithm is
designed to reconstruct text after ninety-nine per cent of the original has
been discarded, we should expect that significant portions of what it
generates will be entirely fabricated.
This analogy makes even more sense when we remember that a common
technique used by lossy compression algorithms is interpolation—that is,
estimating what’s missing by looking at what’s on either side of the gap.
When an image program is displaying a photo and has to reconstruct a
pixel that was lost during the compression process, it looks at the nearby
pixels and calculates the average. This is what ChatGPT does when it’s
prompted to describe, say, losing a sock in the dryer using the style of the
Declaration of Independence: it is taking two points in “lexical space” and
generating the text that would occupy the location between them. (“When
in the Course of human events, it becomes necessary for one to separate
his garments from their mates, in order to maintain the cleanliness and
order thereof. . . .”) ChatGPT is so good at this form of interpolation that
people find it entertaining: they’ve discovered a “blur” tool for paragraphs
instead of photos, and are having a blast playing with it.
Given that large language models like ChatGPT are often extolled as the
cutting edge of artificial intelligence, it may sound dismissive—or at least
deflating—to describe them as lossy text-compression algorithms. I do
think that this perspective offers a useful corrective to the tendency to
anthropomorphize large language models, but there is another aspect to
the compression analogy that is worth considering. Since 2006, an A.I.
researcher named Marcus Hutter has offered a cash reward—known as
5. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 5 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
the Prize for Compressing Human Knowledge, or the Hutter Prize—to
anyone who can losslessly compress a specific one-gigabyte snapshot of
Wikipedia smaller than the previous prize-winner did. You have probably
encountered files compressed using the zip file format. The zip format
reduces Hutter’s one-gigabyte file to about three hundred megabytes; the
most recent prize-winner has managed to reduce it to a hundred and
fifteen megabytes. This isn’t just an exercise in smooshing. Hutter
believes that better text compression will be instrumental in the creation
of human-level artificial intelligence, in part because the greatest degree
of compression can be achieved by understanding the text.
To grasp the proposed relationship between compression and
understanding, imagine that you have a text file containing a million
examples of addition, subtraction, multiplication, and division. Although
any compression algorithm could reduce the size of this file, the way to
achieve the greatest compression ratio would probably be to derive the
principles of arithmetic and then write the code for a calculator program.
Using a calculator, you could perfectly reconstruct not just the million
examples in the file but any other example of arithmetic that you might
encounter in the future. The same logic applies to the problem of
compressing a slice of Wikipedia. If a compression program knows that
force equals mass times acceleration, it can discard a lot of words when
compressing the pages about physics because it will be able to
reconstruct them. Likewise, the more the program knows about supply
and demand, the more words it can discard when compressing the pages
about economics, and so forth.
Large language models identify statistical regularities in text. Any analysis
of the text of the Web will reveal that phrases like “supply is low” often
appear in close proximity to phrases like “prices rise.” A chatbot that
incorporates this correlation might, when asked a question about the
effect of supply shortages, respond with an answer about prices
increasing. If a large language model has compiled a vast number of
6. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 6 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
correlations between economic terms—so many that it can offer plausible
responses to a wide variety of questions—should we say that it actually
understands economic theory? Models like ChatGPT aren’t eligible for the
Hutter Prize for a variety of reasons, one of which is that they don’t
reconstruct the original text precisely—i.e., they don’t perform lossless
compression. But is it possible that their lossy compression nonetheless
indicates real understanding of the sort that A.I. researchers are interested
in?
Let’s go back to the example of arithmetic. If you ask GPT-3 (the large-
language model that ChatGPT was built from) to add or subtract a pair of
numbers, it almost always responds with the correct answer when the
numbers have only two digits. But its accuracy worsens significantly with
larger numbers, falling to ten per cent when the numbers have five digits.
Most of the correct answers that GPT-3 gives are not found on the Web—
there aren’t many Web pages that contain the text “245 + 821,” for
example—so it’s not engaged in simple memorization. But, despite
ingesting a vast amount of information, it hasn’t been able to derive the
principles of arithmetic, either. A close examination of GPT-3’s incorrect
answers suggests that it doesn’t carry the “1” when performing arithmetic.
The Web certainly contains explanations of carrying the “1,” but GPT-3
isn’t able to incorporate those explanations. GPT-3’s statistical analysis of
examples of arithmetic enables it to produce a superficial approximation
of the real thing, but no more than that.
Given GPT-3’s failure at a subject taught in elementary school, how can
we explain the fact that it sometimes appears to perform well at writing
college-level essays? Even though large language models often
hallucinate, when they’re lucid they sound like they actually understand
subjects like economic theory. Perhaps arithmetic is a special case, one
for which large language models are poorly suited. Is it possible that, in
areas outside addition and subtraction, statistical regularities in text
actually do correspond to genuine knowledge of the real world?
7. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 7 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
I think there’s a simpler explanation. Imagine what it would look like if
ChatGPT were a lossless algorithm. If that were the case, it would always
answer questions by providing a verbatim quote from a relevant Web
page. We would probably regard the software as only a slight
improvement over a conventional search engine, and be less impressed by
it. The fact that ChatGPT rephrases material from the Web instead of
quoting it word for word makes it seem like a student expressing ideas in
her own words, rather than simply regurgitating what she’s read; it creates
the illusion that ChatGPT understands the material. In human students,
rote memorization isn’t an indicator of genuine learning, so ChatGPT’s
inability to produce exact quotes from Web pages is precisely what makes
us think that it has learned something. When we’re dealing with
sequences of words, lossy compression looks smarter than lossless
compression.
A lot of uses have been proposed for large language models. Thinking
about them as blurry JPEGs offers a way to evaluate what they might or
might not be well suited for. Let’s consider a few scenarios.
Can large language models take the place of traditional search engines?
For us to have confidence in them, we would need to know that they
haven’t been fed propaganda and conspiracy theories—we’d need to
know that the JPEG is capturing the right sections of the Web. But, even if
a large language model includes only the information we want, there’s still
the matter of blurriness. There’s a type of blurriness that is acceptable,
which is the re-stating of information in different words. Then there’s the
blurriness of outright fabrication, which we consider unacceptable when
we’re looking for facts. It’s not clear that it’s technically possible to retain
the acceptable kind of blurriness while eliminating the unacceptable kind,
but I expect that we’ll find out in the near future.
Even if it is possible to restrict large language models from engaging in
fabrication, should we use them to generate Web content? This would
8. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 8 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
make sense only if our goal is to repackage information that’s already
available on the Web. Some companies exist to do just that—we usually
call them content mills. Perhaps the blurriness of large language models
will be useful to them, as a way of avoiding copyright infringement.
Generally speaking, though, I’d say that anything that’s good for content
mills is not good for people searching for information. The rise of this type
of repackaging is what makes it harder for us to find what we’re looking
for online right now; the more that text generated by large language
models gets published on the Web, the more the Web becomes a blurrier
version of itself.
There is very little information available about OpenAI’s forthcoming
successor to ChatGPT, GPT-4. But I’m going to make a prediction: when
assembling the vast amount of text used to train GPT-4, the people at
OpenAI will have made every effort to exclude material generated by
ChatGPT or any other large language model. If this turns out to be the
case, it will serve as unintentional confirmation that the analogy between
large language models and lossy compression is useful. Repeatedly
resaving a JPEG creates more compression artifacts, because more
information is lost every time. It’s the digital equivalent of repeatedly
making photocopies of photocopies in the old days. The image quality
only gets worse.
Indeed, a useful criterion for gauging a large language model’s quality
might be the willingness of a company to use the text that it generates as
training material for a new model. If the output of ChatGPT isn’t good
enough for GPT-4, we might take that as an indicator that it’s not good
enough for us, either. Conversely, if a model starts generating text so good
that it can be used to train new models, then that should give us
confidence in the quality of that text. (I suspect that such an outcome
would require a major breakthrough in the techniques used to build these
models.) If and when we start seeing models producing output that’s as
good as their input, then the analogy of lossy compression will no longer
9. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 9 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
be applicable.
Can large language models help humans with the creation of original
writing? To answer that, we need to be specific about what we mean by
that question. There is a genre of art known as Xerox art, or photocopy art,
in which artists use the distinctive properties of photocopiers as creative
tools. Something along those lines is surely possible with the photocopier
that is ChatGPT, so, in that sense, the answer is yes. But I don’t think that
anyone would claim that photocopiers have become an essential tool in
the creation of art; the vast majority of artists don’t use them in their
creative process, and no one argues that they’re putting themselves at a
disadvantage with that choice.
So let’s assume that we’re not talking about a new genre of writing that’s
analogous to Xerox art. Given that stipulation, can the text generated by
large language models be a useful starting point for writers to build off
when writing something original, whether it’s fiction or nonfiction? Will
letting a large language model handle the boilerplate allow writers to focus
their attention on the really creative parts?
Obviously, no one can speak for all writers, but let me make the argument
that starting with a blurry copy of unoriginal work isn’t a good way to
create original work. If you’re a writer, you will write a lot of unoriginal work
before you write something original. And the time and effort expended on
that unoriginal work isn’t wasted; on the contrary, I would suggest that it is
precisely what enables you to eventually create something original. The
hours spent choosing the right word and rearranging sentences to better
follow one another are what teach you how meaning is conveyed by prose.
Having students write essays isn’t merely a way to test their grasp of the
material; it gives them experience in articulating their thoughts. If students
never have to write essays that we have all read before, they will never
gain the skills needed to write something that we have never read.
And it’s not the case that, once you have ceased to be a student, you can
10. 13/2/23, 12:29 PM
ChatGPT Is a Blurry JPEG of the Web | The New Yorker
Page 10 of 10
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
safely use the template that a large language model provides. The
struggle to express your thoughts doesn’t disappear once you graduate—
it can take place every time you start drafting a new piece. Sometimes it’s
only in the process of writing that you discover your original ideas. Some
might say that the output of large language models doesn’t look all that
different from a human writer’s first draft, but, again, I think this is a
superficial resemblance. Your first draft isn’t an unoriginal idea expressed
clearly; it’s an original idea expressed poorly, and it is accompanied by
your amorphous dissatisfaction, your awareness of the distance between
what it says and what you want it to say. That’s what directs you during
rewriting, and that’s one of the things lacking when you start with text
generated by an A.I.
There’s nothing magical or mystical about writing, but it involves more
than placing an existing document on an unreliable photocopier and
pressing the Print button. It’s possible that, in the future, we will build an
A.I. that is capable of writing good prose based on nothing but its own
experience of the world. The day we achieve that will be momentous
indeed—but that day lies far beyond our prediction horizon. In the
meantime, it’s reasonable to ask, What use is there in having something
that rephrases the Web? If we were losing our access to the Internet
forever and had to store a copy on a private server with limited space, a
large language model like ChatGPT might be a good solution, assuming
that it could be kept from fabricating. But we aren’t losing our access to
the Internet. So just how much use is a blurry JPEG, when you still have
the original? ♦