The document provides an overview of the state of natural language processing (NLP) and Amazon's NLP offering Amazon Comprehend. It discusses the evolution of NLP from rule-based systems to modern neural models like BERT and Transformer and the increasing complexity of NLP tasks. The document also describes Amazon Comprehend's capabilities in areas like sentiment analysis, named entity recognition, keyphrase extraction, and language detection.
How to Enhance your Application using Amazon Comprehend for NLP - AWS Online ...Amazon Web Services
Â
Learning Objectives:
- How to use machine learning and NLP to find insights and relationships in text
- How to use new pre-built models in Amazon Comprehend
- How to create new models in Amazon SageMaker
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
Â
In this session, we will what are large language models, how we can fin-tune a pre-trained LLM with our data, including data preparation, model training, model evaluation.
OpenAIâs GPT 3 Language Model - guest Steve OmohundroNumenta
Â
In this research meeting, guest Stephen Omohundro gave a fascinating talk on GPT-3, the new massive OpenAI Natural Language Processing model. He reviewed the network architecture, training process, and results in the context of past work. There was extensive discussion on the implications for NLP and for Machine Intelligence / AGI.
Link to GPT-3 paper: https://arxiv.org/abs/2005.14165
Link to YouTube recording of Steve's talk: https://youtu.be/0ZVOmBp29E0
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMĂĄrton Kodok
Â
The document discusses Vertex AI, Google Cloud's unified machine learning platform. It provides an overview of Vertex AI's key capabilities including gathering and labeling datasets at scale, building and training models using AutoML or custom training, deploying models with endpoints, managing models with confidence through explainability and monitoring tools, using pipelines to orchestrate the entire ML workflow, and adapting to changes in data. The conclusion emphasizes that Vertex AI offers an end-to-end platform for all stages of ML development and productionization with tools to make ML more approachable and pipelines that can solve complex tasks.
A presentation I did on what, why, how, and benefits of centralized logging in the Enterprise. This presentation was focused on implementing centralized logging in a environment that is mostly .NET/Windows.
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
Â
Mihai is the Principal Architect for Platform Engineering and Technology Solutions at IBM, responsible for Cloud Native and AI Solutions. He is a Red Hat Certified Architect, CKA/CKS, a leader in the IBM Open Innovation community, and advocate for open source development. Mihai is driving the development of Retrieval Augmentation Generation platforms, and solutions for Generative AI at IBM that leverage WatsonX, Vector databases, LangChain, HuggingFace and open source AI models.
Mihai will share lessons learned building Retrieval Augmented Generation, or âChat with Documentsâ platforms and APIs that scale, and deploy on Kubernetes. His talk will cover use cases for Generative AI, limitations of Large Language Models, use of RAG, Vector Databases and Fine Tuning to overcome model limitations and build solutions that connect to your data and provide content grounding, limit hallucinations and form the basis of explainable AI. In terms of technology, he will cover LLAMA2, HuggingFace TGIS, SentenceTransformers embedding models using Python, LangChain, and Weaviate and ChromaDB vector databases. Heâll also share tips on writing code using LLM, including building an agent for Ansible and containers.
Scaling factors for Large Language Model Architectures:
âą Vector Database: consider sharding and High Availability
âą Fine Tuning: collecting data to be used for fine tuning
âą Governance and Model Benchmarking: how are you testing your model performance
over time, with different prompts, one-shot, and various parameters
âą Chain of Reasoning and Agents
âą Caching embeddings and responses
âą Personalization and Conversational Memory Database
âą Streaming Responses and optimizing performance. A fine tuned 13B model may
perform better than a poor 70B one!
âą Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are
terrible at reasoning and prediction, consider calling other models)
âą Fallback techniques: fallback to a different model, or default answers
âą API scaling techniques, rate limiting, etc.
âą Async, streaming and parallelization, multiprocessing, GPU acceleration (including
embeddings), generating your API using OpenAPI, etc.
The GPT-3 model architecture is a transformer-based neural network that has been fed 45TB of text data. It is non-deterministic, in the sense that given the same input, multiple runs of the engine will return different responses. Also, it is trained on massive datasets that covered the entire web and contained 500B tokens, humongous 175 Billion parameters, a more than 100x increase over GPT-2, which was considered state-of-the-art technology with 1.5 billion parameters.
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
Â
The document discusses optimizing question answering systems called RAG (Retrieve-and-Generate) stacks. It outlines challenges with naive RAG approaches and proposes solutions like improved data representations, advanced retrieval techniques, and fine-tuning large language models. Table stakes optimizations include tuning chunk sizes, prompt engineering, and customizing LLMs. More advanced techniques involve small-to-big retrieval, multi-document agents, embedding fine-tuning, and LLM fine-tuning.
How to Enhance your Application using Amazon Comprehend for NLP - AWS Online ...Amazon Web Services
Â
Learning Objectives:
- How to use machine learning and NLP to find insights and relationships in text
- How to use new pre-built models in Amazon Comprehend
- How to create new models in Amazon SageMaker
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
Â
In this session, we will what are large language models, how we can fin-tune a pre-trained LLM with our data, including data preparation, model training, model evaluation.
OpenAIâs GPT 3 Language Model - guest Steve OmohundroNumenta
Â
In this research meeting, guest Stephen Omohundro gave a fascinating talk on GPT-3, the new massive OpenAI Natural Language Processing model. He reviewed the network architecture, training process, and results in the context of past work. There was extensive discussion on the implications for NLP and for Machine Intelligence / AGI.
Link to GPT-3 paper: https://arxiv.org/abs/2005.14165
Link to YouTube recording of Steve's talk: https://youtu.be/0ZVOmBp29E0
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMĂĄrton Kodok
Â
The document discusses Vertex AI, Google Cloud's unified machine learning platform. It provides an overview of Vertex AI's key capabilities including gathering and labeling datasets at scale, building and training models using AutoML or custom training, deploying models with endpoints, managing models with confidence through explainability and monitoring tools, using pipelines to orchestrate the entire ML workflow, and adapting to changes in data. The conclusion emphasizes that Vertex AI offers an end-to-end platform for all stages of ML development and productionization with tools to make ML more approachable and pipelines that can solve complex tasks.
A presentation I did on what, why, how, and benefits of centralized logging in the Enterprise. This presentation was focused on implementing centralized logging in a environment that is mostly .NET/Windows.
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
Â
Mihai is the Principal Architect for Platform Engineering and Technology Solutions at IBM, responsible for Cloud Native and AI Solutions. He is a Red Hat Certified Architect, CKA/CKS, a leader in the IBM Open Innovation community, and advocate for open source development. Mihai is driving the development of Retrieval Augmentation Generation platforms, and solutions for Generative AI at IBM that leverage WatsonX, Vector databases, LangChain, HuggingFace and open source AI models.
Mihai will share lessons learned building Retrieval Augmented Generation, or âChat with Documentsâ platforms and APIs that scale, and deploy on Kubernetes. His talk will cover use cases for Generative AI, limitations of Large Language Models, use of RAG, Vector Databases and Fine Tuning to overcome model limitations and build solutions that connect to your data and provide content grounding, limit hallucinations and form the basis of explainable AI. In terms of technology, he will cover LLAMA2, HuggingFace TGIS, SentenceTransformers embedding models using Python, LangChain, and Weaviate and ChromaDB vector databases. Heâll also share tips on writing code using LLM, including building an agent for Ansible and containers.
Scaling factors for Large Language Model Architectures:
âą Vector Database: consider sharding and High Availability
âą Fine Tuning: collecting data to be used for fine tuning
âą Governance and Model Benchmarking: how are you testing your model performance
over time, with different prompts, one-shot, and various parameters
âą Chain of Reasoning and Agents
âą Caching embeddings and responses
âą Personalization and Conversational Memory Database
âą Streaming Responses and optimizing performance. A fine tuned 13B model may
perform better than a poor 70B one!
âą Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are
terrible at reasoning and prediction, consider calling other models)
âą Fallback techniques: fallback to a different model, or default answers
âą API scaling techniques, rate limiting, etc.
âą Async, streaming and parallelization, multiprocessing, GPU acceleration (including
embeddings), generating your API using OpenAPI, etc.
The GPT-3 model architecture is a transformer-based neural network that has been fed 45TB of text data. It is non-deterministic, in the sense that given the same input, multiple runs of the engine will return different responses. Also, it is trained on massive datasets that covered the entire web and contained 500B tokens, humongous 175 Billion parameters, a more than 100x increase over GPT-2, which was considered state-of-the-art technology with 1.5 billion parameters.
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
Â
The document discusses optimizing question answering systems called RAG (Retrieve-and-Generate) stacks. It outlines challenges with naive RAG approaches and proposes solutions like improved data representations, advanced retrieval techniques, and fine-tuning large language models. Table stakes optimizations include tuning chunk sizes, prompt engineering, and customizing LLMs. More advanced techniques involve small-to-big retrieval, multi-document agents, embedding fine-tuning, and LLM fine-tuning.
Langchain Framework is an innovative approach to linguistic data processing, combining the principles of language sciences, blockchain technology, and artificial intelligence. This deck introduces the groundbreaking elements of the framework, detailing how it enhances security, transparency, and decentralization in language data management. It discusses its applications in various fields, including machine learning, translation services, content creation, and more. The deck also highlights its key features, such as immutability, peer-to-peer networks, and linguistic asset ownership, that could revolutionize how we handle linguistic data in the digital age.
Regulating Generative AI - LLMOps pipelines with TransparencyDebmalya Biswas
Â
The growing adoption of Gen AI, esp. LLMs, has re-ignited the discussion around AI Regulations â to ensure that AI/ML systems are responsibly trained and deployed. Unfortunately, this effort is complicated by multiple governmental organizations and regulatory bodies releasing their own guidelines and policies with little to no agreement on the definition of terms.
Rather than trying to understand and regulate all types of AI, we recommend a different (and practical) approach in this talk based on AI Transparency â
to transparently outline the capabilities of the AI system based on its training methodology and set realistic expectations with respect to what it can (and cannot) do.
We outline LLMOps architecture patterns and show how the proposed approach can be integrated at different stages of the LLMOps pipeline capturing the model's capabilities. In addition, the AI system provider also specifies scenarios where (they believe that) the system can make mistakes, and recommends a âsafeâ approach with guardrails for those scenarios.
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
Â
Thank you for the overview of Florence and vision capabilities. Large foundational models continue advancing multimodal abilities in helpful ways when guided by principles of safety, transparency and accountability.
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsDobo Radichkov
Â
Here are some key factors to consider when choosing between GPT models:
- Response quality: gpt-4/turbo will generally provide higher quality responses, though gpt-3.5 quality can be improved with techniques like few-shot learning.
- Speed: gpt-3.5 is significantly faster than gpt-4 models, processing prompts around 5x faster. This is important for real-time applications.
- Cost: gpt-3.5 is much more cost effective, around 15-30x cheaper per prompt than gpt-4.
So in summary, for applications where response quality is paramount, gpt-4 may be preferable. But for most use cases,
Natural language processing and transformer modelsDing Li
Â
The document discusses several approaches for text classification using machine learning algorithms:
1. Count the frequency of individual words in tweets and sum for each tweet to create feature vectors for classification models like regression. However, this loses some word context information.
2. Use Bayes' rule and calculate word probabilities conditioned on class to perform naive Bayes classification. Laplacian smoothing is used to handle zero probabilities.
3. Incorporate word n-grams and context by calculating word probabilities within n-gram contexts rather than independently. This captures more linguistic information than the first two approaches.
An introduction to the Transformers architecture and BERTSuman Debnath
Â
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
The document discusses different methods for customizing large language models (LLMs) with proprietary or private data, including training a custom model, fine-tuning a general model, and prompting with expanded inputs. Fine-tuning techniques like low-rank adaptation and supervised fine-tuning allow emphasizing custom knowledge without full retraining. Prompt expansion using techniques like retrieval augmented generation can provide additional context beyond the character limit.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
Â
The document describes the RAG (Retrieval-Augmented Generation) model for knowledge-intensive NLP tasks. RAG combines a pre-trained language generator (BART) with a dense passage retriever (DPR) to retrieve and incorporate relevant knowledge from Wikipedia. RAG achieves state-of-the-art results on open-domain question answering, abstractive question answering, and fact verification by leveraging both parametric knowledge from the generator and non-parametric knowledge retrieved from Wikipedia. The retrieved knowledge can also be updated without retraining the model.
This document provides an overview of generative AI tools for project managers and includes prompts and examples for using ChatGPT to generate various project deliverables and analyses. It discusses tailoring prompts, recommended output formats, and includes examples of prompts for tasks like creating a cost-benefit analysis, business case, project charter, requirements traceability matrix, and more. The document aims to demonstrate how generative AI can assist with common project management activities.
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?Bernard Marr
Â
GPT-3 is an AI tool created by OpenAI that can generate text in human-like ways. It has been trained on vast amounts of text from the internet. GPT-3 can answer questions, summarize text, translate languages, and generate computer code. However, it has limitations as its output can become gibberish for complex tasks and it operates as a black box system. While impressive, GPT-3 is just an early glimpse of what advanced AI may be able to accomplish.
An overview of some key concepts of chatbots, with some do's and don'ts.
We will happily present the high-resolution version of this presentation, extended with additional detailed slides, and a clear explanation at your offices. Contact us for that.
Delve into this insightful article to explore the current state of generative AI, its ethical implications, and the power of generative AI models across various industries.
Build Text Analytics Solutions with Amazon Comprehend and Amazon TranslateAmazon Web Services
Â
by Pratap Ramamurthy, Partner Solutions Architect, AWS
Natural language holds a wealth of information like user sentiment and conversational intent. In this session, we'll demonstrate the capabilities of Amazon Comprehend, a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. We'll show you how to build a VOC (Voice of the Customer) application and integrate it with other AWS services including AWS Lambda, Amazon S3, Amazon Athena, Amazon QuickSight, and Amazon Translate. Weâll also show you additional methods for NLP available through Amazon Sagemaker.
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
Â
The document summarizes a presentation about state-of-the-art natural language processing (NLP) techniques. It discusses how transformer networks have achieved state-of-the-art results in many NLP tasks using transfer learning from large pre-trained models. It also describes how Hugging Face's Transformers library and Tokenizers library provide tools for tokenization and using pre-trained transformer models through a simple interface.
As an AI language model, ChatGPT is a program consisting of a large neural network that has been trained on vast amounts of textual data. Specifically, ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) family of models developed by OpenAI.
Generative AI models, such as ChatGPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts, as well as handle complex dialogs and reason about problems with or without images. These models are disrupting traditional technologies, from search and content creation to automation and problem solving, and are fundamentally shaping the future user interface to computing devices. Generative AI can apply broadly across industries, providing significant enhancements for utility, productivity, and entertainment. As generative AI adoption grows at record-setting speeds and computing demands increase, on-device and hybrid processing are more important than ever. Just like traditional computing evolved from mainframes to todayâs mix of cloud and edge devices, AI processing will be distributed between them for AI to scale and reach its full potential.
In this presentation youâll learn about:
- Why on-device AI is key
- Full-stack AI optimizations to make on-device AI possible and efficient
- Advanced techniques like quantization, distillation, and speculative decoding
- How generative AI models can be run on device and examples of some running now
- Qualcomm Technologiesâ role in scaling on-device generative AI
Insight Asset Management in Jira and eazyBI Powered Insight ReportingeazyBI
Â
Assets are all around us in our day to day work, whether it's IT assets, employees, customers, facilities or something else. How you can use Insight to help you manage assets within Jira and how eazyBI can provide flexible reports and an overview of your assets.
- Rickard Hyllenstam, Atlassian Consultant â Riada, Sweden
ChatGPT (Chat Generative pre-defined transformer) is OpenAI's application that performs human like interactions. GitHub Copilot uses the OpenAI Codex to suggest code and entire functions in real-time, right from your editor. Deck contains more details about ChatGPT, AI, AGI, CoPilot, OpenAI API, and use case scenarios.
Generative AI models like LLMs can be customized for specific tasks through techniques like prompt engineering, retrieval augmented generation, and fine-tuning. Prompt engineering involves providing contextual information to steer model responses, while retrieval augmented generation combines generative and retrieval models to improve performance. Fine-tuning customizes foundation models with domain-specific training data. The document discusses these techniques and their benefits, encouraging hands-on experience to learn how to best apply generative AI.
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
Â
The document outlines an AI and NLP seminar, including three parts: natural language processing, speech, and introduction. Part II on NLP covers topics like word representations, sentence representations, NLP benchmarks, multilingual representations, and applications of text and graph embeddings. Part III on speech discusses speech recognition approaches and multimodal speech and text for emotion recognition.
This was presented to software developers with the goal of introducing them to basic machine learning workflow, code snippets, possibilities and state-of-the-art in NLP and give some clues on where to get started.
Langchain Framework is an innovative approach to linguistic data processing, combining the principles of language sciences, blockchain technology, and artificial intelligence. This deck introduces the groundbreaking elements of the framework, detailing how it enhances security, transparency, and decentralization in language data management. It discusses its applications in various fields, including machine learning, translation services, content creation, and more. The deck also highlights its key features, such as immutability, peer-to-peer networks, and linguistic asset ownership, that could revolutionize how we handle linguistic data in the digital age.
Regulating Generative AI - LLMOps pipelines with TransparencyDebmalya Biswas
Â
The growing adoption of Gen AI, esp. LLMs, has re-ignited the discussion around AI Regulations â to ensure that AI/ML systems are responsibly trained and deployed. Unfortunately, this effort is complicated by multiple governmental organizations and regulatory bodies releasing their own guidelines and policies with little to no agreement on the definition of terms.
Rather than trying to understand and regulate all types of AI, we recommend a different (and practical) approach in this talk based on AI Transparency â
to transparently outline the capabilities of the AI system based on its training methodology and set realistic expectations with respect to what it can (and cannot) do.
We outline LLMOps architecture patterns and show how the proposed approach can be integrated at different stages of the LLMOps pipeline capturing the model's capabilities. In addition, the AI system provider also specifies scenarios where (they believe that) the system can make mistakes, and recommends a âsafeâ approach with guardrails for those scenarios.
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
Â
Thank you for the overview of Florence and vision capabilities. Large foundational models continue advancing multimodal abilities in helpful ways when guided by principles of safety, transparency and accountability.
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsDobo Radichkov
Â
Here are some key factors to consider when choosing between GPT models:
- Response quality: gpt-4/turbo will generally provide higher quality responses, though gpt-3.5 quality can be improved with techniques like few-shot learning.
- Speed: gpt-3.5 is significantly faster than gpt-4 models, processing prompts around 5x faster. This is important for real-time applications.
- Cost: gpt-3.5 is much more cost effective, around 15-30x cheaper per prompt than gpt-4.
So in summary, for applications where response quality is paramount, gpt-4 may be preferable. But for most use cases,
Natural language processing and transformer modelsDing Li
Â
The document discusses several approaches for text classification using machine learning algorithms:
1. Count the frequency of individual words in tweets and sum for each tweet to create feature vectors for classification models like regression. However, this loses some word context information.
2. Use Bayes' rule and calculate word probabilities conditioned on class to perform naive Bayes classification. Laplacian smoothing is used to handle zero probabilities.
3. Incorporate word n-grams and context by calculating word probabilities within n-gram contexts rather than independently. This captures more linguistic information than the first two approaches.
An introduction to the Transformers architecture and BERTSuman Debnath
Â
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
The document discusses different methods for customizing large language models (LLMs) with proprietary or private data, including training a custom model, fine-tuning a general model, and prompting with expanded inputs. Fine-tuning techniques like low-rank adaptation and supervised fine-tuning allow emphasizing custom knowledge without full retraining. Prompt expansion using techniques like retrieval augmented generation can provide additional context beyond the character limit.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
Â
The document describes the RAG (Retrieval-Augmented Generation) model for knowledge-intensive NLP tasks. RAG combines a pre-trained language generator (BART) with a dense passage retriever (DPR) to retrieve and incorporate relevant knowledge from Wikipedia. RAG achieves state-of-the-art results on open-domain question answering, abstractive question answering, and fact verification by leveraging both parametric knowledge from the generator and non-parametric knowledge retrieved from Wikipedia. The retrieved knowledge can also be updated without retraining the model.
This document provides an overview of generative AI tools for project managers and includes prompts and examples for using ChatGPT to generate various project deliverables and analyses. It discusses tailoring prompts, recommended output formats, and includes examples of prompts for tasks like creating a cost-benefit analysis, business case, project charter, requirements traceability matrix, and more. The document aims to demonstrate how generative AI can assist with common project management activities.
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?Bernard Marr
Â
GPT-3 is an AI tool created by OpenAI that can generate text in human-like ways. It has been trained on vast amounts of text from the internet. GPT-3 can answer questions, summarize text, translate languages, and generate computer code. However, it has limitations as its output can become gibberish for complex tasks and it operates as a black box system. While impressive, GPT-3 is just an early glimpse of what advanced AI may be able to accomplish.
An overview of some key concepts of chatbots, with some do's and don'ts.
We will happily present the high-resolution version of this presentation, extended with additional detailed slides, and a clear explanation at your offices. Contact us for that.
Delve into this insightful article to explore the current state of generative AI, its ethical implications, and the power of generative AI models across various industries.
Build Text Analytics Solutions with Amazon Comprehend and Amazon TranslateAmazon Web Services
Â
by Pratap Ramamurthy, Partner Solutions Architect, AWS
Natural language holds a wealth of information like user sentiment and conversational intent. In this session, we'll demonstrate the capabilities of Amazon Comprehend, a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. We'll show you how to build a VOC (Voice of the Customer) application and integrate it with other AWS services including AWS Lambda, Amazon S3, Amazon Athena, Amazon QuickSight, and Amazon Translate. Weâll also show you additional methods for NLP available through Amazon Sagemaker.
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
Â
The document summarizes a presentation about state-of-the-art natural language processing (NLP) techniques. It discusses how transformer networks have achieved state-of-the-art results in many NLP tasks using transfer learning from large pre-trained models. It also describes how Hugging Face's Transformers library and Tokenizers library provide tools for tokenization and using pre-trained transformer models through a simple interface.
As an AI language model, ChatGPT is a program consisting of a large neural network that has been trained on vast amounts of textual data. Specifically, ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) family of models developed by OpenAI.
Generative AI models, such as ChatGPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts, as well as handle complex dialogs and reason about problems with or without images. These models are disrupting traditional technologies, from search and content creation to automation and problem solving, and are fundamentally shaping the future user interface to computing devices. Generative AI can apply broadly across industries, providing significant enhancements for utility, productivity, and entertainment. As generative AI adoption grows at record-setting speeds and computing demands increase, on-device and hybrid processing are more important than ever. Just like traditional computing evolved from mainframes to todayâs mix of cloud and edge devices, AI processing will be distributed between them for AI to scale and reach its full potential.
In this presentation youâll learn about:
- Why on-device AI is key
- Full-stack AI optimizations to make on-device AI possible and efficient
- Advanced techniques like quantization, distillation, and speculative decoding
- How generative AI models can be run on device and examples of some running now
- Qualcomm Technologiesâ role in scaling on-device generative AI
Insight Asset Management in Jira and eazyBI Powered Insight ReportingeazyBI
Â
Assets are all around us in our day to day work, whether it's IT assets, employees, customers, facilities or something else. How you can use Insight to help you manage assets within Jira and how eazyBI can provide flexible reports and an overview of your assets.
- Rickard Hyllenstam, Atlassian Consultant â Riada, Sweden
ChatGPT (Chat Generative pre-defined transformer) is OpenAI's application that performs human like interactions. GitHub Copilot uses the OpenAI Codex to suggest code and entire functions in real-time, right from your editor. Deck contains more details about ChatGPT, AI, AGI, CoPilot, OpenAI API, and use case scenarios.
Generative AI models like LLMs can be customized for specific tasks through techniques like prompt engineering, retrieval augmented generation, and fine-tuning. Prompt engineering involves providing contextual information to steer model responses, while retrieval augmented generation combines generative and retrieval models to improve performance. Fine-tuning customizes foundation models with domain-specific training data. The document discusses these techniques and their benefits, encouraging hands-on experience to learn how to best apply generative AI.
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
Â
The document outlines an AI and NLP seminar, including three parts: natural language processing, speech, and introduction. Part II on NLP covers topics like word representations, sentence representations, NLP benchmarks, multilingual representations, and applications of text and graph embeddings. Part III on speech discusses speech recognition approaches and multimodal speech and text for emotion recognition.
This was presented to software developers with the goal of introducing them to basic machine learning workflow, code snippets, possibilities and state-of-the-art in NLP and give some clues on where to get started.
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013CA API Management
Â
The document discusses the author's realization that his previous work on H-Factors for describing protocol affordances was missing a consideration of "application affordances". This led to the idea of "vocabularies" for describing shared understanding at the application level. However, vocabularies alone do not describe how to interact with application concepts. The author proposes a new specification called ALPS that combines the description of application concepts ("what") using vocabularies, along with the description of how to interact with those concepts using hypermedia controls and protocols ("how"). ALPS aims to provide shared understanding of both the state and transitions of application domains across different media types and implementations.
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
Â
The document describes a tutorial on using neural networks for information retrieval. It discusses an agenda for the tutorial that includes fundamentals of IR, word embeddings, using word embeddings for IR, deep neural networks, and applications of neural networks to IR problems. It provides context on the increasing use of neural methods in IR applications and research.
The NLP muppets revolution! @ Data Science London 2019
video: https://skillsmatter.com/skillscasts/13940-a-deep-dive-into-contextual-word-embeddings-and-understanding-what-nlp-models-learn
event: https://www.meetup.com/Data-Science-London/events/261483332/
Neural networks have a long and rich history in automatic speech recognition. In this talk, we present a brief primer on the origin of deep learning in spoken language, and then explore todayâs world of Alexa. Alexa is the AWS service that understands spoken language and powers Amazon Echo. Alexa relies heavily on machine learning and deep neural networks for speech recognition, text-to-speech, language understanding, and more. We also discuss the Alexa Skills Kit, which lets any developer teach Alexa new skills.
Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen
Â
This is a summary of the latest research on model interpretability, including Recurrent neural networks (RNN) for Natural Language Processing (NLP) in terms of what's in an RNN.
In addition, it contains suggestion to improve machine learning based user interface, to engage users and encourage them to contribute data to adapt the models to them.
This document provides an overview of natural language processing (NLP) and the use of deep learning for NLP tasks. It discusses how deep learning models can learn representations and patterns from large amounts of unlabeled text data. Deep learning approaches are now achieving superior results to traditional NLP methods on many tasks, such as named entity recognition, machine translation, and question answering. However, deep learning models do not explicitly model linguistic knowledge. The document outlines common NLP tasks and how deep learning algorithms like LSTMs, CNNs, and encoder-decoder models are applied to problems involving text classification, sequence labeling, and language generation.
Devoxx traitement automatique du langage sur du texte en 2019 Alexis Agahi
Â
This document contains a summary of a presentation on natural language processing of text given at Devoxx in April 2019. It discusses using natural language processing for contract management, data extraction, and review. The document also mentions using a machine learning pipeline to analyze documents and extract titles.
Natural language processing (NLP) involves developing systems that can process and understand human language. This document discusses NLP tools and techniques for representing text numerically so it can be analyzed by machine learning algorithms. It covers topics like tokenization, part-of-speech tagging, named entity recognition, vector space models, term frequency-inverse document frequency (TF-IDF) weighting, and word embeddings which represent words as dense vectors of numbers. Popular Python libraries for NLP and text analysis are also introduced.
Gadgets pwn us? A pattern language for CALLLawrie Hunter
Â
The document discusses creating a pattern language for computer-assisted language learning (CALL). It explores the concept of a pattern language as defined by Christopher Alexander and proposes a framework for creating a CALL pattern language in the era of web 2.0. The paper seeks to rework concepts from other fields, like "formal learning design expression" and "task arc," and have participants brainstorm elements to include through graphical challenges. The overall goal is to establish foundational patterns for CALL work.
Understanding Names with Neural Networks - May 2020Basis Technology
Â
The document discusses name matching techniques using neural networks. It describes how earlier techniques like Hidden Markov Models (HMMs) had limitations in capturing context around character sequences in names. The researchers at Basis Technology developed a sequence-to-sequence model using long short-term memory (LSTM) neural networks to transliterate names between languages. While more accurate, the LSTM model was slower than HMMs. To address this, they explored using a convolutional neural network which provided speed improvements while maintaining accuracy gains over HMMs. The researchers concluded that name matching remains an open problem but data-driven neural approaches hold promise for continued advances.
The document provides an overview of machine learning for natural language processing (NLP) tasks. It discusses framing NLP problems as supervised learning tasks, preprocessing text, feature extraction using the FEX tool, and examples of NLP tasks like part-of-speech tagging and named entity recognition that can be solved using these techniques. It also describes the typical components of a machine learning system for NLP, including preprocessing, feature extraction, classifiers, and evaluation.
The document provides an overview of machine learning for natural language processing (NLP) tasks. It discusses framing NLP problems as supervised learning tasks, preprocessing text, feature extraction using the FEX tool, and examples of NLP tasks like part-of-speech tagging and named entity recognition that can be solved using these techniques. It also describes the typical components of a machine learning system for NLP, including preprocessing, feature extraction, classifiers, and evaluation.
This document provides an overview of natural language processing (NLP) tools and resources that can be used to build a machine learning classifier to identify the fame of people mentioned in news articles. It describes NLP tasks like tokenization, part-of-speech tagging, chunking, named entity recognition, parsing, and coreference resolution. It also introduces libraries like the Curator for accessing NLP tools, Edison for feature extraction, and Learning Based Java for building the classifier. Finally, it demonstrates connecting all the pieces to construct a system that can label famous people as politicians, athletes, or corporate moguls.
Recurrent neural networks (RNNs) are well-suited for analyzing text data because they can model sequential and structural relationships in text. RNNs use gating mechanisms like LSTMs and GRUs to address the problem of exploding or vanishing gradients when training on long sequences. Modern RNNs trained with techniques like gradient clipping, improved initialization, and optimized training algorithms like Adam can learn meaningful representations from text even with millions of training examples. RNNs may outperform conventional bag-of-words models on large datasets but require significant computational resources. The author describes an RNN library called Passage and provides an example of sentiment analysis on movie reviews to demonstrate RNNs for text analysis.
This document provides an introduction to natural language processing (NLP) and the Natural Language Toolkit (NLTK) module for Python. It discusses how NLP aims to develop systems that can understand human language at a deep level, lists common NLP applications, and explains why NLP is difficult due to language ambiguity and complexity. It then describes how corpus-based statistical approaches are used in NLTK to tackle NLP problems by extracting features from text corpora and using statistical models. The document gives an overview of the main NLTK modules and interfaces for common NLP tasks like tagging, parsing, and classification. It provides an example of word tokenization and discusses tokens and types in NLTK.
di Gianluca Maruzzella, Enrico Bertino, Marija Zdolsek e Mario Beraha (Indigo.ai https://ndg.ai/).
Qual Ăš la magia dietro ai chatbot? Esploriamo i modelli di Deep Learning piĂč avanzati che permettono ai chatbot di facilitare la comunicazione tra aziende e utenti. Sono mostrati alcuni esempi di architetture e le rispettive applicazioni nel mondo chatbot. In particolare si parla di BiLSTM, Seq2Seq with attention, CNN, VAEs e se ne analizzano i pro e i contro.
Presentazione all'evento https://www.meetup.com/Milano-Chatbots-Meetup/events/255234805
Similar to State of NLP and Amazon Comprehend (20)
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
Â
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Â
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Â
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. Weâll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, weâll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Â
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
Â
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
Â
In this second installment of our Essentials of Automations webinar series, weâll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
Weâll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether youâre tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Letâs turn complexity into clarity and make your workspaces work wonders!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Â
Monitoring and observability arenât traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current companyâs observability stack.
While the dev and ops silo continues to crumbleâŠ.many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Â
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind fĂŒr viele in der HCL-Community seit letztem Jahr ein heiĂes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und LizenzgebĂŒhren zu kĂ€mpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklĂ€ren Ihnen, wie Sie hĂ€ufige Konfigurationsprobleme lösen können, die dazu fĂŒhren können, dass mehr Benutzer gezĂ€hlt werden als nötig, und wie Sie ĂŒberflĂŒssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige AnsĂ€tze, die zu unnötigen Ausgaben fĂŒhren können, z. B. wenn ein Personendokument anstelle eines Mail-Ins fĂŒr geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche FĂ€lle und deren Lösungen. Und natĂŒrlich erklĂ€ren wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt nĂ€herbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Ăberblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und ĂŒberflĂŒssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps fĂŒr hĂ€ufige Problembereiche, wie z. B. Team-PostfĂ€cher, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether youâre at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. Weâll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
Â
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Â
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Donât worry, we can help with all of this!
Weâll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. Weâll provide examples and solutions for those as well. And naturally weâll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
Â
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Â
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Â
Are you ready to revolutionize how you handle data? Join us for a webinar where weâll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, weâll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sourcesâfrom PDF floorplans to web pagesâusing FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether itâs populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
Weâll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Â
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
National Security Agency - NSA mobile device best practices
Â
State of NLP and Amazon Comprehend
1. State of NLP &
Amazon Comprehend
Brief Overview of the State of NLP and
AWS NLP Offering
Egor Pushkin
Principal Engineer
Amazon AI
January 2020
2. Introductions
Presenter
Machine learning at AWS, Amazon AI
Internet-scale services and global mobile deployments at Glympse
This Talk
Audience
How many people are familiar with T5 model? BERT? Transformer?
Given a passage of natural text and a question, find a span in the initial passage
that constitutes the answer (if the one is available).
4. Dark Ages... or Once Upon a Time...
Rule-based Systems
hardcoded rules
regular expressions
Statistical / Shallow Neural
corpus-based statistics
feature engineering
linear models
5. Stepping Stones / Driving Forces
Perceptron / Backpropagation
Automatic Differentiation
Deep Neural / CNN / LSTM / Transformer
Computational Resources / GPU
Data Availability
Attention to the Space
6. Natural Language Understanding
Language Analysis
Task â Dataset â Challenge â Model
What is "progress" in NLU?
higher accuracy on existing tasks and datasets
new (more complex) tasks and datasets
new categories of problems
âLanguage shapes the way we think, and determines what we can think about.â
â Benjamin Lee Whorf
Categories of NLU Tasks
sequence tagging
semantic relations
question answering
classification
language generation
...
7. Journey
to common
sense
reasoning
(hopefully)
In 1935, in an annual birthday celebration interview, Tesla announced a method of
transmitting mechanical energy with minimal loss over any terrestrial distance, a
related new means of communication, and a method of accurately determining the
location of underground mineral deposits.
SQuAD 1.1
How far did he claim the mechanical energy could be transmitted?
DROP
The median age in the city was 22.1 years. 10.1% of residents were under the age
of 18; 56.2% were between the ages of 18 and 24; 16.1% were from 25 to 44;
10.5% were from 45 to 64; and 7% were 65 years of age or older. The gender
makeup of the city was 64.3% male and 35.7% female.
Which age group was the second largest?
Why do people read gossip magazines?
entertained | get information | learn | improve know how | lawyer told to
CommonSense QA
...
???
via increasing
task complexity
Non Existing QA Challenge
8. P( Summer is over | LMs are essential to NLP )
Natural Language Modeling
Fundamentals
Evolution
âYou shall know the nature of a word by the company it keeps.â
â John Rupert Firth
P(w) and P(wi
| wi-n-1:i-1
)
>100mm params
>10GB pre-training data
P( time | Once upon a )
Count-based LMs
Continuous-space LMs
(shallow)
Recurrent/Transformer LMs
9. Word Embeddings
Input representation
Semantic embeddings
Contextualized word embeddings
Word pieces
Hello? Is there
anybody in there?
surrealistic
existentialism
surreal ##istic existent
##ial ##ism ...
word2vec
GloVe
FastText
ELMO
BERT
walked
swam
walking
swimming
10. Sequence to Sequence
Encoder/decoder architecture
Sequence to Sequence Learning with Neural Networks (Sutskever et al., 2014)
encoder
decoder
input
output
internal
representation
A B C <EOS>
<EOS>
W X Y Z
W X Y Z
Encoder input and state as
input sequence is being fed into it.
Decoder state and output in process of
output sequence generation.
<BOS>
Previously generated context is added back to
the network giving it some context on what has
already been produced.
Translation, language generation
Multi-modal applications
Actual (or expected) output from training dataset
is used when model is trained with teacher
forcing enabled.
inference
training
11. Attention
Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et al., 2014)
(Prior to attention) Context
vector (encoder state)
tended to forget things...
Attention mechanism was
introduced to memorize
long sentences
Was born for translation
A B C <EOS>
x x x x
âą âą âą âą
+
<EOS>
W X Y Z
softmax
Each item is dot producted
with the query to produce a
score, describing how well it
matches the query. The
scores are fed into a
softmax to create the
attention distribution.
q
decoder
states
encoder
states
12. Transformer
Multi-head self-attention
Attention Is All You Need (Vaswani et al., 2017)
Encoder/decoder
stacks
x
+
x
â
x ...
combining output of
multiple attention heads
k
q
v
self-attention
...
...
...
13. Word-pieces
Positional encoding
Segment embeddings
Learning objectives
Masked language modeling
Next sentence prediction
Large scale pre-training
English Wikipedia, BooksCorpus
Bidirectional, autoencoder
BERT
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018)
...
transformer
encoder
stack
tokenized input
token representations
output of
downstream task
pre-training
setup
fine-tuning
setup
LM learning objective at
training time
additional
layer(s)
output
probabilities
Transformer
contributions
embeddings
embeddings'
+ softmax
fully-connected
+ GELU + norm
14. Example tasks:
Classification
NER
Question Answering
Relation Extraction
Coreference Resolution
Natural Language Inference
...
Transfer Learning via Fine-tuning LM Models
Transfer Learning in Natural Language Processing (quick ~230 slide overview)
âNLP's ImageNet moment has arrivedâ
â Sebastian Ruder
Transfer Learning
Inductive
Transfer Learning
Transductive
Transfer Learning
Multi-task
Learning
Sequential Transfer
Learning
Domain
Adaptation
Cross-lingual
Learning
Different tasks;
labeled data in target
domain
Same tasks;
labeled data only in
source domain
Different
domains
Different
languages
Tasks learned
simultaneously
Tasks learned
sequentially
15. How far ... transmitted?
Extractive, open-domain
question answering based on
the given context
Training
fine-tune BERT encoder layers
train dense layer from scratch
tokenizer vocabulary stays intact
Inference
BERT for QA In 1935, in an annual birthday celebration interview, Tesla announced a method of
transmitting mechanical energy with minimal loss over any terrestrial distance, a
related new means of communication, and a method of accurately determining the
location of underground mineral deposits.
How far did he claim the mechanical energy could be transmitted?
SQuAD 1.1
BERT
In 1935, in ⊠mineral deposits.
[CLS] 'In' '1935' ',' 'in' ... 'mineral' 'deposits' '.' [SEP] 'how' 'far' ... 'transmitted' '?'
BERT Tokenizer
Start Position End Position
⊠minimal loss over any terrestrial distance, a related new means of ...
Fully-connected layer
with weight matrix of
size [T*H, 2]
0 0 0 0 0 0 0 0 1 1 1 1 1
p
q
18. Pace Of Innovation
Perceptron 1957
Word Embeddings ~1960s
CNN 1989 ~30 years
LSTM 1997 12 years
Continuous-space LM 2001 4 years
Multi-task Learning 2008 7 years
Seq to Seq | Attention Sep, 2014 6 years
Transformer Jul 12, 2017 3 years
BERT Oct 11, 2018 1 year SoTA 11 tasks
Transformer-XL Jan 9, 2019 3 months SoTA 4 tasks
XLNet June 19, 2019 5 months SoTA 18 tasks
roBERTa July 26, 2019 1 month SoTA 6 tasks
ERNIE 2.0 July 29, 2019 3 days SoTA 9 tasks (ch)
ALBERT Sep 25, 2019 2 months SoTA 10 tasks
T5 Oct 24, 2019 1 month SoTA 16 tasks
...
All SoTA results below
this line are attributed to
transformer models