This document summarizes the LLaMa model, which is an open and efficient foundation language model.
[1] LLaMa achieves state-of-the-art performance on various tasks while being trained exclusively on publicly available data and requiring only a single GPU for inference, making it more accessible than other large models.
[2] Key aspects of LLaMa include pre-normalization, SwiGLU activation, rotary embeddings, and efficient implementation techniques. It was trained on 1.4 trillion tokens of publicly available data using 2048 A100 GPUs over 5 months.
[3] Evaluation shows LLaMa outperforms other models on common sense reasoning, question answering, reading comprehension,
LLaMa 2 is a large language model (LLM) developed by Meta AI. It is a successor to the Llama model, and it is one of the most powerful LLMs available today. Llama 2 is trained on a massive dataset of text and code, and it can be used for a wide range of tasks, including:
Generating text, such as articles, poems, and code
Translating languages
Answering questions in a comprehensive and informative way
Following instructions and completing requests thoughtfully
LLama 2 is still under development, but it has already been shown to outperform other LLMs on many benchmarks. For example, Llama 2 outperforms other open source LLMs on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests.
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
In this session, we will what are large language models, how we can fin-tune a pre-trained LLM with our data, including data preparation, model training, model evaluation.
Build an LLM-powered application using LangChain.pdfAnastasiaSteele10
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
Langchain Framework is an innovative approach to linguistic data processing, combining the principles of language sciences, blockchain technology, and artificial intelligence. This deck introduces the groundbreaking elements of the framework, detailing how it enhances security, transparency, and decentralization in language data management. It discusses its applications in various fields, including machine learning, translation services, content creation, and more. The deck also highlights its key features, such as immutability, peer-to-peer networks, and linguistic asset ownership, that could revolutionize how we handle linguistic data in the digital age.
LLaMa 2 is a large language model (LLM) developed by Meta AI. It is a successor to the Llama model, and it is one of the most powerful LLMs available today. Llama 2 is trained on a massive dataset of text and code, and it can be used for a wide range of tasks, including:
Generating text, such as articles, poems, and code
Translating languages
Answering questions in a comprehensive and informative way
Following instructions and completing requests thoughtfully
LLama 2 is still under development, but it has already been shown to outperform other LLMs on many benchmarks. For example, Llama 2 outperforms other open source LLMs on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests.
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
In this session, we will what are large language models, how we can fin-tune a pre-trained LLM with our data, including data preparation, model training, model evaluation.
Build an LLM-powered application using LangChain.pdfAnastasiaSteele10
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
Langchain Framework is an innovative approach to linguistic data processing, combining the principles of language sciences, blockchain technology, and artificial intelligence. This deck introduces the groundbreaking elements of the framework, detailing how it enhances security, transparency, and decentralization in language data management. It discusses its applications in various fields, including machine learning, translation services, content creation, and more. The deck also highlights its key features, such as immutability, peer-to-peer networks, and linguistic asset ownership, that could revolutionize how we handle linguistic data in the digital age.
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
LangChain Intro, Keymate.AI Search Plugin for ChatGPT, How to use langchain library? How to implement similar functionality in programming language of your choice? Example LangChain applications.
The presentation revolves around the concept of "langChain", This innovative framework is designed to "chain" together different components to create more advanced use cases around Large Language Models (LLMs). The idea is to leverage the power of LLMs to tackle complex problems and generate solutions that are more than the sum of their parts.
One of the key features of the presentation is the application of the "Keymate.AI Search" plugin in conjunction with the Reasoning and Acting Chain of Thought (ReAct) framework. The presenter encourages the audience to utilize these tools to generate reasoning traces and actions. The ReAct framework, learned from an initial search, is then applied to these traces and actions, demonstrating the potential of LLMs to learn and apply complex frameworks.
The presentation also delves into the impact of climate change on biodiversity. The presenter prompts the audience to look up the latest research on this topic and summarize the key findings. This exercise not only highlights the importance of climate change but also demonstrates the capabilities of LLMs in researching and summarizing complex topics.
The presentation concludes with several key takeaways. The presenter emphasizes that specialized custom solutions work best and suggests a bottom-up approach to expert systems. However, they caution that over-abstraction can lead to leakages, causing time and money limits to hit early and tasks to fail or require many iterations. The presenter also notes that while prompt engineering is important, it's not necessary to over-optimize if the LLM is clever. The presentation ends on a hopeful note, expressing a need for more clever LLMs and acknowledging that good applications are rare but achievable.
Overall, the presentation provides a comprehensive overview of the LanGCHAIN framework, its applications, and the potential of LLMs in solving complex problems. It serves as a call to action for the audience to explore these tools and frameworks.
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this talk, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
And then there were ... Large Language ModelsLeon Dohmen
It is not often even in the ICT world that one witnesses a revolution. The rise of the Personal Computer, the rise of mobile telephony and, of course, the rise of the Internet are some of those revolutions. So what is ChatGPT really? Is ChatGPT also such a revolution? And like any revolution, does ChatGPT have its winners and losers? And who are they? How do we ensure that ChatGPT contributes to a positive impulse for "Smart Humanity?".
During a key note om April 3 and 13 2023 Piek Vossen explained the impact of Large Language Models like ChatGPT.
Prof. PhD. Piek Th.J.M. Vossen, is Full professor of Computational Lexicology at the Faculty of Humanities, Department of Language, Literature and Communication (LCC) at VU Amsterdam:
What is ChatGPT? What technology and thought processes underlie it? What are its consequences? What choices are being made? In the presentation, Piek will elaborate on the basic principles behind Large Language Models and how they are used as a basis for Deep Learning in which they are fine-tuned for specific tasks. He will also discuss a specific variant GPT that underlies ChatGPT. It covers what ChatGPT can and cannot do, what it is good for and what the risks are.
‘Big models’: the success and pitfalls of Transformer models in natural langu...Leiden University
Abstract: Large Language Models receive a lot of attention in the media these days. We have all experienced that generative language models of the GPT family are very fluent and can convincingly answer complex questions. But they also have their limitations and pitfalls. In this presentation I will introduce Transformer-based language models, explain the relation between BERT, GPT, and the 130 thousand other models available on https://huggingface.co. I will discuss their use and applications and why they are so powerful. Then I will point out challenges and pitfalls of Large Language Models and the consequences for our daily work and education.
Build an LLM-powered application using LangChain.pdfStephenAmell4
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation
In this episode we'll discuss the different flavors of prompt engineering in the LLM/GPT space. According to your skill level you should be able to pick up at any of the following:
Leveling up with GPT
1: Use ChatGPT / GPT Powered Apps
2: Become a Prompt Engineer on ChatGPT/GPT
3: Use GPT API with NoCode Automation, App Builders
4: Create Workflows to Automate Tasks with NoCode
5: Use GPT API with Code, make your own APIs
6: Create Workflows to Automate Tasks with Code
7: Use GPT API with your Data / a Framework
8: Use GPT API with your Data / a Framework to Make your own APIs
9: Create Workflows to Automate Tasks with your Data /a Framework
10: Use Another LLM API other than GPT (Cohere, HuggingFace)
11: Use open source LLM models on your computer
12: Finetune / Build your own models
Series: Using AI / ChatGPT at Work - GPT Automation
Are you a small business owner or web developer interested in leveraging the power of GPT (Generative Pretrained Transformer) technology to enhance your business processes?
If so, Join us for a series of events focused on using GPT in business. Whether you're a small business owner or a web developer, you'll learn how to leverage GPT to improve your workflow and provide better services to your customers.
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
Conversational AI with Transformer ModelsDatabricks
With the advancements in Artificial Intelligence (AI) and cognitive technologies, automation has been a key prospect for many enterprises in various domains. Conversational AI is one such area where many organizations are heavily investing in.
In this session, we discuss the building blocks of conversational agents, Natural Language Understanding Engine with transformer models which have proven to offer state of the art results in standard NLP tasks.
We will first talk about the advantages of Transformer models over RNN/LSTM models and later talk about knowledge distillation and model compression techniques to make these parameter heavy models work in production environments with limited resources.
Key takeaways:
Understanding the building blocks & flow of Conversational Agents.
Advantages of Transformer based models over RNN/LSTMS
Knowledge distillation techniques
Different model compressions techniques including Quantization
Sample code in PyTorch & TF2
We will talk about how we can calculate the Big O in terms of Time Complexity, and then deep dive to understand Big O Rules. And the last, we will learn some techniques to improve the performance of our algorithm.
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
LangChain Intro, Keymate.AI Search Plugin for ChatGPT, How to use langchain library? How to implement similar functionality in programming language of your choice? Example LangChain applications.
The presentation revolves around the concept of "langChain", This innovative framework is designed to "chain" together different components to create more advanced use cases around Large Language Models (LLMs). The idea is to leverage the power of LLMs to tackle complex problems and generate solutions that are more than the sum of their parts.
One of the key features of the presentation is the application of the "Keymate.AI Search" plugin in conjunction with the Reasoning and Acting Chain of Thought (ReAct) framework. The presenter encourages the audience to utilize these tools to generate reasoning traces and actions. The ReAct framework, learned from an initial search, is then applied to these traces and actions, demonstrating the potential of LLMs to learn and apply complex frameworks.
The presentation also delves into the impact of climate change on biodiversity. The presenter prompts the audience to look up the latest research on this topic and summarize the key findings. This exercise not only highlights the importance of climate change but also demonstrates the capabilities of LLMs in researching and summarizing complex topics.
The presentation concludes with several key takeaways. The presenter emphasizes that specialized custom solutions work best and suggests a bottom-up approach to expert systems. However, they caution that over-abstraction can lead to leakages, causing time and money limits to hit early and tasks to fail or require many iterations. The presenter also notes that while prompt engineering is important, it's not necessary to over-optimize if the LLM is clever. The presentation ends on a hopeful note, expressing a need for more clever LLMs and acknowledging that good applications are rare but achievable.
Overall, the presentation provides a comprehensive overview of the LanGCHAIN framework, its applications, and the potential of LLMs in solving complex problems. It serves as a call to action for the audience to explore these tools and frameworks.
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this talk, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
And then there were ... Large Language ModelsLeon Dohmen
It is not often even in the ICT world that one witnesses a revolution. The rise of the Personal Computer, the rise of mobile telephony and, of course, the rise of the Internet are some of those revolutions. So what is ChatGPT really? Is ChatGPT also such a revolution? And like any revolution, does ChatGPT have its winners and losers? And who are they? How do we ensure that ChatGPT contributes to a positive impulse for "Smart Humanity?".
During a key note om April 3 and 13 2023 Piek Vossen explained the impact of Large Language Models like ChatGPT.
Prof. PhD. Piek Th.J.M. Vossen, is Full professor of Computational Lexicology at the Faculty of Humanities, Department of Language, Literature and Communication (LCC) at VU Amsterdam:
What is ChatGPT? What technology and thought processes underlie it? What are its consequences? What choices are being made? In the presentation, Piek will elaborate on the basic principles behind Large Language Models and how they are used as a basis for Deep Learning in which they are fine-tuned for specific tasks. He will also discuss a specific variant GPT that underlies ChatGPT. It covers what ChatGPT can and cannot do, what it is good for and what the risks are.
‘Big models’: the success and pitfalls of Transformer models in natural langu...Leiden University
Abstract: Large Language Models receive a lot of attention in the media these days. We have all experienced that generative language models of the GPT family are very fluent and can convincingly answer complex questions. But they also have their limitations and pitfalls. In this presentation I will introduce Transformer-based language models, explain the relation between BERT, GPT, and the 130 thousand other models available on https://huggingface.co. I will discuss their use and applications and why they are so powerful. Then I will point out challenges and pitfalls of Large Language Models and the consequences for our daily work and education.
Build an LLM-powered application using LangChain.pdfStephenAmell4
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation
In this episode we'll discuss the different flavors of prompt engineering in the LLM/GPT space. According to your skill level you should be able to pick up at any of the following:
Leveling up with GPT
1: Use ChatGPT / GPT Powered Apps
2: Become a Prompt Engineer on ChatGPT/GPT
3: Use GPT API with NoCode Automation, App Builders
4: Create Workflows to Automate Tasks with NoCode
5: Use GPT API with Code, make your own APIs
6: Create Workflows to Automate Tasks with Code
7: Use GPT API with your Data / a Framework
8: Use GPT API with your Data / a Framework to Make your own APIs
9: Create Workflows to Automate Tasks with your Data /a Framework
10: Use Another LLM API other than GPT (Cohere, HuggingFace)
11: Use open source LLM models on your computer
12: Finetune / Build your own models
Series: Using AI / ChatGPT at Work - GPT Automation
Are you a small business owner or web developer interested in leveraging the power of GPT (Generative Pretrained Transformer) technology to enhance your business processes?
If so, Join us for a series of events focused on using GPT in business. Whether you're a small business owner or a web developer, you'll learn how to leverage GPT to improve your workflow and provide better services to your customers.
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
Conversational AI with Transformer ModelsDatabricks
With the advancements in Artificial Intelligence (AI) and cognitive technologies, automation has been a key prospect for many enterprises in various domains. Conversational AI is one such area where many organizations are heavily investing in.
In this session, we discuss the building blocks of conversational agents, Natural Language Understanding Engine with transformer models which have proven to offer state of the art results in standard NLP tasks.
We will first talk about the advantages of Transformer models over RNN/LSTM models and later talk about knowledge distillation and model compression techniques to make these parameter heavy models work in production environments with limited resources.
Key takeaways:
Understanding the building blocks & flow of Conversational Agents.
Advantages of Transformer based models over RNN/LSTMS
Knowledge distillation techniques
Different model compressions techniques including Quantization
Sample code in PyTorch & TF2
We will talk about how we can calculate the Big O in terms of Time Complexity, and then deep dive to understand Big O Rules. And the last, we will learn some techniques to improve the performance of our algorithm.
Presented for the first time at INFORMS in November 2013, this deck explains how CPLEX 12.5.1 exploits random performance variability through parallel root cut loops.
This talk was given at INFORMS in November 2014. It presents some of the recent improvements made in CPLEX 12.6.1.
Topics include performance improvements, Local Implied Bound cuts, support for Python 3, Opportunistic Distributed MIP, and MIQP linearization.
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 270번째 논문 review입니다.
이번 논문은 Baidu에서 나온 PP-YOLO: An Effective and Efficient Implementation of Object Detector입니다. YOLOv3에 다양한 방법을 적용하여 매우 높은 성능과 함께 매우 빠른 속도 두마리 토끼를 다 잡아버린(?) 그런 논문입니다. 논문에서 사용한 다양한 trick들에 대해서 좀 더 깊이있게 살펴보았습니다. Object detection에 사용된 기법 들 중에 Deformable convolution, Exponential Moving Average, DropBlock, IoU aware prediction, Grid sensitivity elimination, MatrixNMS, CoordConv, 등의 방법에 관심이 있으시거나 알고 싶으신 분들은 영상과 발표자료를 참고하시면 좋을 것 같습니다!
논문링크: https://arxiv.org/abs/2007.12099
영상링크: https://youtu.be/7v34cCE5H4k
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide Joonyoung Yi
PPT for A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE).
I made ppt for explaining the paper.
Abstract of the paper:
This paper proposes CF-NADE, a neural autoregressive architecture for collaborative filtering (CF) tasks, which is inspired by the Restricted Boltzmann Machine (RBM) based CF model and the Neural Autoregressive Distribution Estimator (NADE). We first describe the basic CF-NADE model for CF tasks. Then we propose to improve the model by sharing parameters between different ratings. A factored version of CF-NADE is also proposed for better scalability. Furthermore, we take the ordinal nature of the preferences into consideration and propose an ordinal cost to optimize CF-NADE, which shows superior performance. Finally, CF-NADE can be extended to a deep model, with only moderately increased computational complexity. Experimental results show that CF-NADE with a single hidden layer beats all previous state-of-the-art methods on MovieLens 1M, MovieLens 10M, and Netflix datasets, and adding more hidden layers can further improve the performance.
Paper_An Efficient Garbage Collection in Java Virtual Machine via Swap I/O O...Hyo jeong Lee
This is a presentation for following paper:
Hyojeong Lee, et al. "An Efficient Garbage Collection in Java Virtual Machine via Swap I/O Optimization" (2019).
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Aalto University
Tutorial on Model-Based User Interface Optimization. Part IV: ADVANCED TOPICS.
Presented by Antti Oulasvirta (Aalto University) at SICSA Summer School on Computational Interaction in 2015 in Glasgow. Note: This one-day lecture is divided into multiple parts.
Visual Prompt Tuning (VPT),Parameter-efficient fine-tuning
지금까지 발표한 논문 :https://github.com/Lilcob/-DL_PaperReadingMeeting
발표자료 : https://www.slideshare.net/taeseonryu/mplug
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개 드릴 논문은 'Visual Prompt Tuning for Transformers with Frozen Weights' 입니다.
오늘 소개하는 논문은 대규모 Transformer 모델을 비전에 효율적이고 효과적으로 미세조정하는 대안인 Visual Prompt Tuning (VPT)를 소개하고 있습니다. VPT는 입력 공간에서 작은 양의 훈련 가능한 매개변수를 도입하면서 모델 백본을 고정합니다.
이 방법을 통해, VPT는 다른 매개변수 효율적인 튜닝 프로토콜에 비해 상당한 성능 향상을 달성하며, 많은 경우에는 전체 미세조정을 능가하면서 작업당 저장 비용을 줄인다는 것을 실험적으로 보여줍니다.
이 논문은 효과성과 효율성 면에서 대규모 사전 훈련된 Transformer를 하위 작업에 적용하는 도전을 다룹니다. 이를 통해, 더 효율적인 방식으로 다양한 비전-언어 작업에 대한 성능을 향상시킬 수 있음을 보여줍니다.
오늘 논문 리뷰를 위해 이미지처리 조경진님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
https://youtu.be/bVOk-hSYyZw
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Benoit Combemale
Talk given at the 8th ACM SIGPLAN Int'l Conf. on Software Language Engineering (SLE 2015), Pittsburgh, PA, USA on October 27, 2015. Preprint available at https://hal.inria.fr/hal-01182517
This Virtual User Group session, held on 2014-01-22, presents some of the techniques and algorithms used to improve the CPLEX MIP solver in versions 12.5.1 and 12.6.
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
Hyperparameter tuning is critical in model development. And its general form: parameter tuning with an objective function is also widely used in industry. On the other hand, Apache Spark can handle massive parallelism, and Apache Spark ML is a solid machine learning solution.
But we have not seen a general and intuitive distributed parameter tuning solution based on Apache Spark, why?
Not every tuning problem is on Apache Spark ML models. How can Apache Spark handle general models?
Not every tuning problem is a parallelizable grid or random search. Bayesian optimization is sequential, how can Apache Spark help in this case?
Not every tuning problem is single epoch, deep learning is not. How to fit algos such as hyperband and ASHA into Apache Spark?
Not every tuning problem is a machine learning problem, for example simulation + tuning is also common. How to generalize?
In this talk, we are going to show how using Fugue-Tune and Apache Spark together can eliminate these painpoints
Fugue-Tune like Fugue, is a “super framework” – an absraction layer unifying existing solutions such as Hyperopt and Optuna
It firstly models the general tuning problems, independent from machine learning
It is designed for both small and large scale problems. It can always fully parallelize the distributable part of a tuning problem
It works for both classical and deep learning models. With Fugue, running hyperband and ASHA becomes possible on Apache Spark.
In the demo, you will see how to do any type of tuning in a consistent, intuitive, scalable and minimal way. And you will see a live demo of the amazing performance.
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개할 논문은 3D관련 업무를 진행 하시는/ 희망하시는 분들의 필수 논문인 VoxelNET 입니다.
발표자료:https://www.slideshare.net/taeseonryu/mcsemultimodal-contrastive-learning-of-sentence-embeddings
안녕하세요! 딥러닝 논문읽기 모임입니다.
오늘은 자율 주행, 가정용 로봇, 증강/가상 현실과 같은 다양한 응용 분야에서 중요한 문제인 3D 포인트 클라우드에서의 객체 탐지에 대한 획기적인 진전을 소개하고자 합니다. 이를 위해 'VoxelNet'이라는 새로운 3D 탐지 네트워크에 대해 알아보겠습니다.
1. 기존 방법의 한계
기존의 많은 노력은 수동으로 만들어진 특징 표현, 예를 들어 새의 눈 시점 투영 등에 집중해 왔습니다. 하지만 이러한 방법들은 LiDAR 포인트 클라우드와 영역 제안 네트워크(RPN) 사이의 연결을 효과적으로 수행하기 어렵습니다.
2. VoxelNet의 혁신적 접근법
VoxelNet은 3D 포인트 클라우드를 위한 수동 특징 공학의 필요성을 없애고, 특징 추출과 바운딩 박스 예측을 단일 단계, end-to-end 학습 가능한 깊은 네트워크로 통합합니다. VoxelNet은 포인트 클라우드를 균일하게 배치된 3D 복셀로 나누고, 새롭게 도입된 복셀 특징 인코딩(VFE) 레이어를 통해 각 복셀 내의 포인트 그룹을 통합된 특징 표현으로 변환합니다.
3. 효과적인 기하학적 표현 학습
이 방식을 통해 포인트 클라우드는 서술적인 체적 표현으로 인코딩되며, 이는 RPN에 연결되어 탐지를 생성합니다. VoxelNet은 다양한 기하학적 구조를 가진 객체의 효과적인 구별 가능한 표현을 학습합니다.
4. 성능 평가
KITTI 자동차 탐지 벤치마크에서의 실험 결과, VoxelNet은 기존의 LiDAR 기반 3D 탐지 방법들을 큰 차이로 능가했습니다. 또한, LiDAR만을 기반으로 한 보행자와 자전거 탐지에서도 희망적인 결과를 보였습니다.
VoxelNet의 도입은 3D 포인트 클라우드에서의 객체 탐지를 혁신적으로 개선하고 있으며, 이 분야에서의 미래 발전에 중요한 영향을 미칠 것으로 기대됩니다.
오늘 논문 리뷰를 위해 이미지처리 허정원님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
https://youtu.be/yCgsCyoJoMg
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
오늘은 의견 요약 분야에서의 흥미로운 발전에 대해 이야기하고자 합니다. 특히, 제품이나 장소에 대한 수백 개의 리뷰를 요약하는 것은 중요하고도 어려운 과제인데요, 최근에 이를 위한 새로운 자가학습 접근법, 'OpineSum'이 소개되었습니다.
1. 의견 요약의 중요성
일반적으로 제품이나 장소에 대한 리뷰는 많은 양으로 존재합니다. 이러한 리뷰들을 요약하는 것은 사용자가 정보를 빠르게 파악하는 데 도움을 주며, 의사결정 과정을 간소화할 수 있습니다.
2. 기존의 접근 방식과 한계
뉴스 분야에서의 추상적 요약은 수백만 개의 뉴스 기사와 함께 제공되는 인간 작성 요약을 통해 훈련된 감독 시스템에 의해 큰 진전을 보였습니다. 하지만 의견 텍스트의 경우, 이러한 대규모 데이터셋이 드물게 존재합니다.
3. OpineSum의 소개
이러한 문제를 해결하기 위해, 'OpineSum'이라는 새로운 자가학습 접근법이 제안되었습니다. 이 방법은 텍스트 함축의 새로운 응용을 사용하여 여러 리뷰에서의 의견 합의를 포착하는 요약을 구축합니다.
4. OpineSum의 작동 방식
OpineSum은 대규모에서 은근한 표준 요약을 얻을 수 있으며, 비지도 및 소수샷 추상적 요약 시스템 훈련에 사용할 수 있습니다. 이 방법은 SOTA 달성했습니다.
OpineSum은 의견 요약의 새로운 지평을 열고, 대규모 데이터가 부족한 상황에서도 효과적인 요약을 생성할 수 있는 방법을 제시합니다. 이러한 발전은 의견 요약 기술의 미래에 큰 영향을 미칠 것으로 기대됩니다.
오늘 논문 리뷰를 위해 자연어 처리 변현정님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
https://youtu.be/gqJCWyYPtXQ
"3D Gaussian Splatting for Real-Time Radiance Field Rendering"은 고화질의 실시간 복사장 렌더링을 가능하게 하는 새로운 방법을 소개합니다. 이 방법은 혁신적인 3D 가우시안 장면 표현과 실시간 차별화 렌더러를 결합하여, 장면 최적화 및 새로운 시점 합성에서 상당한 속도 향상을 가능하게 합니다. 기존의 신경 복사장(NeRF) 방법들이 광범위한 훈련과 렌더링 자원을 요구하는 문제에 대한 해결책을 제시하며, 1080p 해상도에서 실시간 성능과 고품질의 새로운 시점 합성을 위해 설계되었습니다. 이는 이전 방법들에 비해 효율성과 품질 면에서 진보를 이루었습니다
이 논문은 컴퓨터 비전 작업, 예를 들면 이미지 분류, 검색 및 몇 번의 학습과 같은 작업에서의 하이퍼볼릭 임베딩의 사용에 대해 논의합니다. 저자들은 이미지 간의 계층적 관계를 임베딩하는 데 하이퍼볼릭 공간이 더 적합하다고 주장하며, 이러한 관계는 컴퓨터 비전 작업에서 흔히 볼 수 있습니다. 그들은 데이터셋의 초계성을 평가하는 방법을 제안하고, 하이퍼볼릭 임베딩이 이미지 분류와 몇 번의 학습을 위해 사용되는 표준 아키텍처의 성능을 향상시킬 수 있다고 보여줍니다. 또한, 이 논문은 하이퍼 볼릭 공간과 하이퍼볼릭 추정에 대한 기억을 상기시켜 줍니다.
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
이 논문은 MCSE라는 새로운 접근법을 제시하며, 시각과 텍스트 정보를 결합하여 의미있는 문장 임베딩을 학습합니다. 다양한 데이터셋과 사전 훈련된 인코더에서 성능 향상을 보이며, 의미론적으로 유사한 문장을 잘 정렬합니다. 또한, 비전을 추가 의미 정보로 사용함으로써 문장 표현 학습을 더욱 촉진할 수 있다는 주장을 하고 있습니다. 이 방법은 기존의 문장 임베딩 학습 방법과 비교되며, 그 결과로서 이론과 실제에서 모두 탁월한 성능을 보입니다.
이 논문은 데이터셋 디스틸레이션에 대한 새로운 접근법을 제안합니다. 데이터셋 디스틸레이션은 전체 데이터셋에서 학습된 모델의 테스트 정확도를 일치시킬 수 있는 작은 데이터셋을 합성하는 작업입니다. 제안된 방법은 디스틸레이션 데이터를 최적화하여 실제 데이터로 학습된 네트워크와 유사한 상태로 이끌어냅니다. 이 방법은 기존 방법들을 능가하며, 더 높은 해상도의 시각 데이터를 디스틸레이션할 수 있게 합니다. 데이터셋 디스틸레이션은 지속적인 학습, 신경 아키텍처 검색, 개인정보 보호 ML 등 다양한 응용 분야가 있습니다.
Dataset Distillation by Matching Training Trajectories taeseon ryu
이 논문은 데이터셋 디스틸레이션에 대한 새로운 접근법을 제안합니다. 데이터셋 디스틸레이션은 전체 데이터셋에서 학습된 모델의 테스트 정확도를 일치시킬 수 있는 작은 데이터셋을 합성하는 작업입니다. 제안된 방법은 디스틸레이션 데이터를 최적화하여 실제 데이터로 학습된 네트워크와 유사한 상태로 이끌어냅니다. 이 방법은 기존 방법들을 능가하며, 더 높은 해상도의 시각 데이터를 디스틸레이션할 수 있게 합니다. 데이터셋 디스틸레이션은 지속적인 학습, 신경 아키텍처 검색, 개인정보 보호 ML 등 다양한 응용 분야가 있습니다.
이 논문은 강화 학습(Reinforcement Learning, RL)을 감독 학습(Supervised Learning, SL)의 형태로 변환하는 새로운 접근법을 제안합니다. 이를 'Upside Down RL (UDRL)'이라 부릅니다. 표준 RL은 보상을 예측하는 반면, UDRL은 보상을 작업 정의 입력으로 사용하며, 시간 지향성과 기타 계산 가능한 함수를 이력 데이터와 원하는 미래 데이터에 적용합니다. UDRL은 이러한 입력 관찰을 명령으로 해석하고, 과거 경험에 대한 SL을 통해 이를 행동(또는 행동 확률)에 매핑하여 학습합니다.
UDRL은 높은 보상이나 다른 목표를 달성하기 위해 일반화하며, 이는 "주어진 시간 내에 많은 보상을 얻으라!"와 같은 입력 명령을 통해 이루어집니다. 또한 UDRL은 탐색 전략을 개선하는 방법을 배울 수 있습니다. 별도의 논문에서는 UDRL의 초기 버전이 특정 강화 학습 문제에서 전통적인 기준 알고리즘을 능가할 수 있다는 것을 보여줍니다.
이 논문은 또한 로봇이 사람을 모방하는 방법을 가르치는 접근법을 개념적으로 단순화합니다. 먼저 로봇의 현재 행동을 모방하는 사람들을 비디오로 촬영한 다음, 로봇이 이 비디오(입력 명령으로)를 이러한 행동에 매핑하는 방법을 SL을 통해 학습하게 합니다. 그런 다음 로봇은 일반화하고 이전에 알려지지 않은 행동을 실행하는 사람들의 비디오를 모방하게 됩니다.
오늘 논문 리뷰를 위해 강화학습 이도현님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
https://youtu.be/bsBvKdKCc1E
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
핵심 키워드
Packed Levitated Markers (PL-Marker)
Neighborhood-oriented packing strategy:
Subject-oriented packing strategy
지금까지 발표한 논문 :https://github.com/Lilcob/-DL_PaperReadingMeeting
발표자료 : https://www.slideshare.net/taeseonryu/morel-modelbased-offline-reinforcement-learning
이 논문은 새로운 개체 및 관계 추출 방법인 Packed Levitated Markers (PL-Marker)에 초점을 맞추고 있습니다. PL-Marker는 인코더 내에서 전략적으로 마커를 패킹하여 스팬 간의 상호 관계를 고려합니다.
논문에서는 이웃 중심 패킹 전략과 주제 중심 패킹 전략 두 가지를 제시합니다. 이러한 전략들은 개체 경계 정보와 동일 주제 스팬 쌍 간의 상호 관계를 더 잘 모델링하도록 설계되었습니다.
실험 결과는 제안된 접근법의 효과를 보여줍니다. PL-Marker는 6개의 Named Entity Recognition (NER) 벤치마크에서 이전의 최첨단 모델들을 능가합니다.
오늘 논문 리뷰를 위해 자연어 처리 김유진님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
https://youtu.be/aiS_iNOOUl8
오늘 소개할 논문은 'MOReL: Model-Based Offline Reinforcement Learning'입니다.
이 논문은 오프라인 강화 학습(Reinforcement Learning, RL)에 초점을 맞추고 있습니다. 오프라인 RL은 행동 정책을 개선하기 위해 사전에 수집된 데이터만을 사용하는 학습 방법입니다. 이 논문에서는 MOReL이라는 새로운 알고리즘 프레임워크를 제시하며, 이는 오프라인 RL을 위한 것입니다.
MOReL은 두 단계로 구성되어 있습니다: 첫째, 오프라인 데이터셋을 사용하여 비관적인 MDP(Model-based Decision Process)를 학습하고, 둘째, 이 P-MDP에서 거의 최적의 정책을 학습합니다. 학습된 P-MDP는 정책 평가와 학습에 대한 좋은 대리자 역할을 하며, 모델 기반 RL의 일반적인 함정인 모델 활용을 극복합니다.
이 논문에서는 MOReL이 오프라인 RL에 대해 최소최대 최적(minimax optimal)이며, 널리 연구된 벤치마크에서 최첨단 성능을 달성함을 보여줍니다. 또한, 이 논문은 오프라인 RL의 중요한 문제인 행동 정책의 안전성에 대한 중요한 통찰력을 제공합니다.
이 논문은 오프라인 강화 학습의 새로운 접근법을 제시하며, 이를 통해 더 효율적인 방식으로 다양한 강화 학습 작업에 대한 성능을 향상시킬 수 있음을 보여줍니다.
Scaling Instruction-Finetuned Language Modelstaeseon ryu
이 논문은 언어 모델에 대한 fine tuning하는 방법에 대해 탐구하고 있습니다. 특히, 작업의 수, 모델 크기, 그리고 체인-오브-소트 데이터를 확장하는 것에 초점을 맞추고 있습니다. 결과적으로, 다양한 모델 클래스와 평가 벤치마크에서 보이는 성능과 미처 보지 못한 작업에 대한 일반화에 있어서 상당한 향상을 보여줍니다.
이 논문은 또한, 강력한 few-shot 성능을 달성하는 Flan-T5 체크포인트를 공개합니다. 지시사항 미세조정은 사전 훈련된 언어 모델의 성능과 사용성을 향상시키는 일반적인 방법입니다.
이 논문은 언어 모델의 미세조정에 대한 새로운 접근법을 제시하며, 이를 통해 더 효율적인 방식으로 다양한 언어 작업에 대한 성능을 향상시킬 수 있음을 보여줍니다.
오늘 논문 리뷰를 위해 자연어처리 박산희님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
https://youtu.be/lta-rKYtVbg
오늘 영상에서 소개된 논문은 Alibaba의 DAMO Academy가 개발한 새로운 비전-언어 기반 모델인 mPLUG입니다. mPLUG는 cross-modal skip-connections을 사용하여 기존의 사전 훈련된 모델에서 나타나는 계산 효율성이 낮고 정보 불균형 문제를 해결합니다.
mPLUG는 이미지 캡셔닝, 이미지-텍스트 검색, 시각적 그라운딩, 시각적 질문 응답 등 다양한 비전-언어 하위 작업에서 최첨단 결과를 보여줍니다. 또한, 다수의 비디오-언어 작업에 직접 전환할 때 강력한 제로샷 전이성을 보여줍니다.
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
Train중 예상 Return 을 최대화하기 위해 알려지지 않은 환경에서 '탐색'과 '활용' 사이의 균형을 잘 이루는 것이 중요합니다. 이를 이상적으로 수행하는 '베이즈 최적 정책'은 환경 상태뿐만 아니라 에이전트가 환경에 대해 느끼는 불확실성에 따라 행동을 결정합니다. 하지만, 베이즈 최적 정책을 계산하는 것은 작은 작업들에 대해서조차 까다롭습니다. 이 논문에서는, 알려지지 않은 환경에서 근사적으로 추론을 수행하고, 그 불확실성을 행동 선택 과정에 직접 포함시키는 방법, 'variational Bayes-Adaptive Deep RL' (variBAD)를 소개합니다.
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
논문에서는 최적화 컴파일러에서 신경망 연산 그래프의 실행 비용을 최소화하는 깊은 강화 학습 방법을 제시합니다. 이전의 학습 기반 작업들은 최적화될 동일한 그래프에서 최적화기를 훈련하는 것이 필요했지만, 논문은 오프라인에서 최적화기를 훈련하고 이후에는 추가 훈련 없이 이전에 보지 못한 그래프에 일반화하는 학습 접근법을 제안합니다. 이를 통해 논문의 방법은 시간 대신 초 단위로 실제 세계의 텐서플로우(TensorFlow) 그래프에서 고품질의 실행 결정을 생성할 수 있습니다. 논문은 연산 그래프에 대한 두 가지 최적화 작업을 고려합니다: 실행 시간과 최대 메모리 사용량을 최소화하는 것입니다. 광범위한 기준 세트에 비해, 우리의 접근법은 이 두 가지 작업에서 고전적인 방법과 다른 학습 기반 방법에 비해 상당한 개선을 보여줍니다.
이 논문은 신경망 학습에 대한 새로운 방법을 제시하는데, 이 방법의 이름은 'Forward-Forward 알고리즘'입니다. 기존의 딥러닝 방법은 데이터를 앞으로 보내고 결과를 다시 뒤(backward)로 보내는 '앞-뒤' 방식인데, 이 새로운 방법은 '앞-앞' 방식을 사용하니까 'Forward-Forward'라고 부릅니다.
이 알고리즘에서는 '양의 데이터'와 '부정적인 데이터' 두 종류를 사용합니다. '양의 데이터'는 실제로 우리가 가진 정보를 의미하고, '부정적인 데이터'는 신경망이 스스로 생성하는 정보를 말합니다. 이 두 종류의 데이터를 각각 앞으로 보내서, 각 계층이 '양의 데이터'에 대해는 좋은 결과를, '부정적인 데이터'에 대해는 나쁜 결과를 내도록 학습합니다.
핵심은 신경망의 학습 방법에 변화를 주어, 더 효율적이고 간편하게 학습을 진행할 수 있도록 한다는 점입니다. 이 방법을 사용하면 비디오 같은 데이터를 신경망을 통해 처리하면서 복잡한 연산을 중단하거나 데이터를 저장할 필요 없이 진행할 수 있다는 것이 큰 장점입니다.
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
활성 학습(Active Learning, AL)은 대량의 라벨이 없는 데이터를 처리하고, 데이터 라벨링 비용이 지나치게 높은 영역에서 이를 줄이는데 유망한 머신러닝 패러다임입니다. 최근에 제안된 신경망 기반의 활성 학습 방법들은 이 목표를 달성하기 위해 다양한 휴리스틱을 사용합니다.
이 연구에서는 동일한 실험 조건 하에서, 다른 종류의 활성 학습 알고리즘들(불확실성 기반, 다양성 기반, 커미티 기반)이 무작위 샘플링 기준에 비해 일관성 없는 향상을 보이는 것을 보여줍니다. 다양한 실험을 통해, 확률적 요인을 제어하면서, 활성 학습 알고리즘들이 달성하는 성능 지표의 변동성이 이전에 보고된 결과와 일치하지 않는 결과를 가져올 수 있음을 보여줍니다
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
이 논문에서는 추상적 요약 모델의 훈련 방식에 대해 논의하고 있습니다. 일반적으로 이러한 모델은 최대 가능도 추정을 사용하여 훈련되는데, 이는 이상적인 모델이 모든 확률 질량을 참조 요약에 할당할 것이라고 가정하는 결정론적인 목표 분포를 가정합니다. 이런 가정은 추론 과정에서 성능 저하를 초래할 수 있는데, 모델이 참조 요약에서 벗어난 여러 후보 요약을 비교해야 하기 때문입니다. 이 문제를 해결하기 위해, 저자들은 서로 다른 후보 요약들이 그들의 품질에 따라 확률 질량을 할당받는 비결정론적 분포를 가정하는 새로운 훈련 패러다임을 제안합니다. 이 방법은 CNN/DailyMail (47.78 ROUGE-1) 및 XSum (49.07 ROUGE-1) 데이터셋에서 새로운 최고 성능을 달성했습니다.
논문은 환경과 상호작용을 통해 데이터를 샘플링하고 확률적 경사 상승법을 사용하여 "대리" 목표 함수를 최적화하는 강화 학습을 위한 새로운 정책 경사 방법들을 제안합니다. 표준 정책 경사 방법은 데이터 샘플마다 한 번의 경사 업데이트를 수행하지만, 미니배치 업데이트를 여러 에폭 수행할 수 있는 독창적인 목표 함수를 제안합니다. 이 새로운 방법을 근접 정책 최적화(Proximal Policy Optimization, PPO)라고 부르며, 신뢰 영역 정책 최적화(Trust Region Policy Optimization, TRPO)의 일부 이점이 있지만, 구현이 훨씬 간단하고 일반적이며 샘플 복잡성 면에서도 뛰어납니다(실제로). 실험에서는 PPO를 로봇 보행 시뮬레이션과 아타리 게임 등의 벤치마크 작업에 적용하였으며, 이를 통해 PPO가 다른 온라인 정책 경사 방법보다 우수하며, 전반적으로 샘플 복잡성, 단순성, 실제 소요 시간 면에서 유리한 균형을 이루고 있다는 것을 보여줍니다.
학습된 월드 모델은 에이전트의 경험을 요약하여 복잡한 행동 학습을 쉽게 만듭니다. 고차원의 감각 입력으로부터 월드 모델을 학습하는 것이 딥러닝을 통해 가능해지고 있지만, 그로부터 행동을 도출하는 여러 가지 방법이 있습니다. 우리는 Dreamer라는 강화학습 에이전트를 소개합니다. 이 에이전트는 이미지를 기반으로 순수한 잠재 상상력을 통해 장기적인 작업을 해결합니다. 학습된 월드 모델의 간결한 상태 공간에서 상상된 궤적을 통해 학습된 상태 가치의 분석 기울기를 역전파함으로써 행동을 효율적으로 학습합니다. 20개의 어려운 시각적 제어 과제에서 Dreamer는 데이터 효율성, 계산 시간 및 최종 성능 면에서 기존 접근 방식을 능가합니다.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
3. 1. Introduction
• Large Languages Models focus on further scaling
• The best performances are not achieved by the largest models
- Hoffmann et al. (2022)
4. 1. Introduction
• Hoffmann et al. (2022)
recommends training a 10B
model on 200B tokens.
• We find that the performance
of a 7B model continues to
improve even after 1T tokens.
6. 1. Introduction
• Only use publicly available data, making our work compatible with
open-sourcing
• There exist some exceptions, notably OPT(Zhang et al.), GPT-
NeoX(Black et al.), BLOOM(Scaoet et all.) and GLM(Zeng et al.) ,
but none that are competitive with PaLM-62B or Chinchilla.
• An overview of the modifications we made to the transformer
architecture
• Compare with others LLMs on a set of standard benchmarks
7. 2. Approach
Pre-training Data
• Reuse data sources that
have been leveraged to
train other LLMs, with
the restriction of only
using data that is
publicly available.
• Entire training dataset
contains roughly 1.4T
tokens after tokenization.
8. 2. Approach
Architecture
• Training approach is similar to the methods described in
previous work (Brown et al., 2020; Chowdhery et al., 2022)
• Pre-normalization [GPT3]
• SwiGLU activation function [PaLM]
• Rotary Embeddings [GPTNeo]
• AdamW optimizer, with the following hyper-parameters:
β1 = 0.9, β2 = 0.95
9. 2. Approach
Pre-normalization [GPT3] Zhang and Sennrich (2019).
• LayerNorm forces the norm of neurons to be decoupled from the
inputs and weight matrix.(internal covariate shift issue)
• RMSNorm which only focuses on re-scaling invariance and
regularizes the summed inputs simply according to the root mean
square (RMS) statistic:
=> improve the training stability
10. 2. Approach
SwiGLU activation function [PaLM] Shazeer (2020)
• Replace the ReLU non-linearity by the SwiGLU
activation function.
• Use a dimension of
!
"
4𝑑 instead of 4d as in
PaLM
SwiGLU = xW·sigmoid(βxW) @ xV
=> improve the performance
11. 2. Approach
Rotary Embeddings [GPTNeo] Su et al. (2021)
• Remove the absolute positional embeddings, and instead, add rotary positional
embeddings (RoPE)
PE(pos,2i) = sin(pos/100002i/d_model)
PE(pos,2i+1) = cos(pos/100002i/d_model)
12. 2. Approach
Efficient implementation
• Efficient implementation of the causal multi-head attention to reduce memory
usage and runtime
• Reduced the amount of activations that are recomputed during the backward pass
with checkpointing.
• Reduce the memory usage of the model by using model and sequence parallelism
13. 3. Main Results
Zero-shot and few-shot tasks
Report results on a total of 20 benchmarks:
• Common Sense Reasoning
14. 3. Main Results
• Closed-book Question Answering
This model runs on a single V100 GPU during inference.
15. 3. Main Results
• Reading Comprehension
• Mathematical reasoning
• Code generation
• Massive Multitask Language Understanding
17. 3. Results
Carbon footprint wu et al. (2022)
• Estimate that we used 2048 A100-80GB for a period of approximately
5 months to develop our models.
• Wh = GPU-h×(GPU power consumption)×PUE
• tCO2eq = MWh × 0.385.
• 2,638 MWh -> 1,015 tCO2eq.
18. 4. Conclusion
• Show that it is possible to achieve state-of-the-art performance by
training exclusively on publicly available data.
• Accelerate the development of large language models.
• Help efforts to improve their robustness.
• Mitigate known issues such as toxicity and bias.
• We believe that this model will help democratize the access and study
of LLMs, since it can be run on a single GPU.