BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
BERT was developed by Google AI Language and came out Oct. 2018. It has achieved the best performance in many NLP tasks. So if you are interested in NLP, studying BERT is a good way to go.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
I summarized the GPT models in this slide and compared the GPT1, GPT2, and GPT3.
GPT means Generative Pre-Training of a language model and was implemented based on the decoder structure of the transformer model.
(24th May, 2021)
最近のNLP×DeepLearningのベースになっている"Transformer"について、研究室の勉強会用に作成した資料です。参考資料の引用など正確を期したつもりですが、誤りがあれば指摘お願い致します。
This is a material for the lab seminar about "Transformer", which is the base of recent NLP x Deep Learning research.
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
Review of paper
Language Models are Unsupervised Multitask Learners
(GPT-2)
by Alec Radford et al.
Paper link: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
YouTube presentation: https://youtu.be/f5zULULWUwM
(Slides are written in English, but the presentation is done in Korean)
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
BERT was developed by Google AI Language and came out Oct. 2018. It has achieved the best performance in many NLP tasks. So if you are interested in NLP, studying BERT is a good way to go.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
I summarized the GPT models in this slide and compared the GPT1, GPT2, and GPT3.
GPT means Generative Pre-Training of a language model and was implemented based on the decoder structure of the transformer model.
(24th May, 2021)
最近のNLP×DeepLearningのベースになっている"Transformer"について、研究室の勉強会用に作成した資料です。参考資料の引用など正確を期したつもりですが、誤りがあれば指摘お願い致します。
This is a material for the lab seminar about "Transformer", which is the base of recent NLP x Deep Learning research.
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
Review of paper
Language Models are Unsupervised Multitask Learners
(GPT-2)
by Alec Radford et al.
Paper link: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
YouTube presentation: https://youtu.be/f5zULULWUwM
(Slides are written in English, but the presentation is done in Korean)
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
n this SlideShare presentation, we delve into the intricate world of NLP Transformers, exploring their underlying architecture and uncovering their immense power in Natural Language Processing (NLP). Join us as we demystify the complexities and provide a comprehensive overview of how Transformers revolutionize tasks such as machine translation, sentiment analysis, question answering, and more. Gain valuable insights into the transformer model, attention mechanisms, self-attention, and the transformer encoder-decoder structure. Whether you're an NLP enthusiast or a beginner, this presentation will equip you with a solid foundation to comprehend and harness the potential of NLP Transformers.
A presentation on Bidirectional Encoder Representations from Transformers (BERT) meant to introduce the model's use cases and training mechanism. Best viewed with powerpoint since it contain many slide animations.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
This presentation is for SotA models in NLP called Transformer & BERT review materials. I reviewed many model in here Word2Vec, ELMo, GPT, ... etc
reference 1 : Kim Dong Ha (https://www.youtube.com/watch?v=xhY7m8QVKjo)
reference 2 : Raimi Karim (https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3)
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
In this part 1 presentation, I have attempted to provide a '30,000 feet view' of BERT (Bidirectional Encoder Representations from Transformer) - a state of the art Language Model in NLP with high level technical explanations. I have attempted to collate useful information about BERT from various useful sources.
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
A brief review of the paper:
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT (pp. 2227–2237)
Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
n this SlideShare presentation, we delve into the intricate world of NLP Transformers, exploring their underlying architecture and uncovering their immense power in Natural Language Processing (NLP). Join us as we demystify the complexities and provide a comprehensive overview of how Transformers revolutionize tasks such as machine translation, sentiment analysis, question answering, and more. Gain valuable insights into the transformer model, attention mechanisms, self-attention, and the transformer encoder-decoder structure. Whether you're an NLP enthusiast or a beginner, this presentation will equip you with a solid foundation to comprehend and harness the potential of NLP Transformers.
A presentation on Bidirectional Encoder Representations from Transformers (BERT) meant to introduce the model's use cases and training mechanism. Best viewed with powerpoint since it contain many slide animations.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
This presentation is for SotA models in NLP called Transformer & BERT review materials. I reviewed many model in here Word2Vec, ELMo, GPT, ... etc
reference 1 : Kim Dong Ha (https://www.youtube.com/watch?v=xhY7m8QVKjo)
reference 2 : Raimi Karim (https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3)
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
In this part 1 presentation, I have attempted to provide a '30,000 feet view' of BERT (Bidirectional Encoder Representations from Transformer) - a state of the art Language Model in NLP with high level technical explanations. I have attempted to collate useful information about BERT from various useful sources.
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
A brief review of the paper:
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT (pp. 2227–2237)
Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
Keynote at the Insight@DCU Deep Learning Workshop (https://www.eventbrite.ie/e/insightdcu-deep-learning-workshop-tickets-45474212594) on successes and frontiers of Deep Learning, particularly unsupervised learning and transfer learning.
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Yuki Tomo
12/22 Deep Learning勉強会@小町研 にて
"Learning Character-level Representations for Part-of-Speech Tagging" C ́ıcero Nogueira dos Santos, Bianca Zadrozny
を紹介しました。
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicateand apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks.
In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation.Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using POS to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
4. Improving Language Understanding by Generative
Pre-Training, (A. Radford et al., 2018)
• Goal:
• We demonstrate that large gains on these tasks can be realized by generative pre-training of a language
model on a diverse corpus of unlabeled text.
• In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to
achieve effective transfer while requiring minimal changes to the model architecture.
• Introduction :
• Problem : models that can leverage linguistic information from unlabeled data provide a valuable alternative
to gathering more annotation, which can be time-consuming and expensive
• Issue 1 :First, it is unclear what type of optimization objectives are most effective at learning text
representations that are useful for transfer.
• Issue 2 : Second, there is no consensus on the most effective way to transfer these learned representations to
the target task
• Motivation :
• we explore a semi-supervised approach for language understanding tasks using a combination of
unsupervised pre-training and supervised fine-tuning.
• Contribution:
• Our goal is to learn a universal representation that transfers with little adaptation to a wide range of tasks
5. Improving Language Understanding by Generative
Pre-Training, (A. Radford et al., 2018)
• Proposed Method
• Our training procedure consists of two stages.
• The first stage is learning a high-capacity language model on a large corpus of
text. This is followed by a fine-tuning stage, where we adapt the model to a
discriminative task with labeled data
• Unsupervised pre-training
6. Improving Language Understanding by Generative
Pre-Training, (A. Radford et al., 2018)
• Proposed Method
• Our training procedure consists of two stages.
• The first stage is learning a high-capacity language model on a large corpus of
text. This is followed by a fine-tuning stage, where we adapt the model to a
discriminative task with labeled data
• Supervised fine-tuning
Left section : Transformer architecture and training objectives
Right section : Input transformations for fine-tuning on different tasks
7. Improving Language Understanding by
Generative Pre-Training, A. Radford et al., 20
• Experiment
• We evaluate our approach on four types of language understanding tasks
• E.g. natural language inference, question answering , semantic similarity, and text classification.
• Five measure experiment
• Compare about state of the art methods
• Analysis
• Impact of number of layers transferred
• Effect of transferring increasing number of layer from the pre-trained language model
• Plot showing the evolution of zero-shot performance on different tasks as a function of LM pre-training updates.
• Zero-shot Behaviors
• We’d like to better understand why language model pre-training of transformers is effective
• Zero-shot := 훈련 데이터가 거의 또는 전혀 없어도 유용한 패턴 인식을 학습하는 방법
• Ablation studies
• we examine the performance of our method without the auxiliary LM objective during fine-tuning
• we analyze the effect of the Transformer by comparing it with a single layer 2048 unit LSTM using the same framework
• we also compare with our transformer architecture directly trained on supervised target tasks, without pre-training.
• Conclusion
• We introduced a framework for achieving strong natural language understanding with a single task-agnostic model through generative pre-
training and discriminative fine-tuning
• We study by pre-training on a diverse corpus with long stretches of contiguous text our model acquires significant world knowledge and ability
to process long-range dependencies which are then successfully transferred to solving discriminative tasks such as question answering,
semantic similarity assessment, entailment determination, and text classification, improving the state of the art on 9 of the 12 datasets
9. Language Models are Unsupervised Multitask
Learners, (A. Radford et al., 2019)
• Goal:
• We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new
dataset of millions of webpages called WebText
• Introduction
• Problem : Our suspicion is that the prevalence of single task training on single domain datasets is a major contributor to the
lack of generalization observed in Language model systems
• Motivation
• Affect of attention
• Existing :
• Adding condition :
• More convergence about multi task learning:
• the global minimum of the unsupervised objective is also the global minimum of the supervised objective.
• performing unsupervised multitask learning
• Contribution
• We demonstrate that language models begin to learn these tasks without any explicit supervision
• We conditioned on a document plus questions, the answers generated by the language model
• GPT-2 is a 1.5B parameter Transformer that achieving state of the art results on 7 out of 8 tested language modeling datasets
in a zero-shot setting.
10. Language Models are Unsupervised Multitask
Learners, (A. Radford et al., 2019)
• Proposal Methods
• Training dataset
• Our approach motivates building as large and diverse a dataset as possible in order to
collect natural language demonstrations of tasks in as varied of domains and contexts as
possible
• Input representation
• Combine the empirical benefits of word-level LMs with the generality of byte-level
approaches.
• Model
• Adding Layer norm and skip diagram
11. Language Models are Unsupervised Multitask
Learners, (A. Radford et al., 2019)
• Experiment
• Showing out of-distribution using Web Text LMs
• Showing different categories of words using Children's Book Test dataset
• Showing long-range dependencies using LAMBADA dataset
• Task
• Reading Comprehension, Summarization , Translation, Question Answering
• Generalization vs Memorization
• Text Memorization, Model capacity, Diversity, Robustness (무엇이 중요한가?)
• Discussion
• Much research has been dedicated to learning (Hill et al., 2016), understanding (Levy and
Goldberg, 2014), and critically evaluating (Wieting and Kiela, 2019) the representations of
both supervised and unsupervised pre-training methods
• Conclusion
• GPT-2 zero-shots to state of the art performance on 7 out of 8 tested language
modeling datasets
12. Reference
• A. Radford et al., Improving Language Understanding by Generative
Pre-training, 2018.
• A. Radford et al., Language Models are Unsupervised Multitask
Leaners, 2019.
• T. Brown et al., Language Models are Few-Shot Learners,
arXiv:2005.14165v4, 2020.