Super tickets in pre trained language modelsHyunKyu Jeon
This document discusses finding "super tickets" in pre-trained language models through pruning attention heads and feedforward layers. It shows that lightly pruning BERT models can improve generalization without degrading accuracy (phase transition phenomenon). The authors propose a new pruning approach for multi-task fine-tuning of language models called "ticket sharing" where pruned weights are shared across tasks. Experiments on GLUE benchmarks show their proposed super ticket and ticket sharing methods consistently outperform unpruned baselines, with more significant gains on smaller tasks. Analysis indicates pruning reduces model variance and some tasks share more task-specific knowledge than others.
Synthesizer rethinking self-attention for transformer models HyunKyu Jeon
The document expresses gratitude to the reader for taking the time to listen. It does not provide any other details, context, or information beyond thanking the reader for listening. The summary captures the essence of the document in a single concise sentence.
This document summarizes Meta Back-Translation, a method for improving back-translation by training the backward model to directly optimize the performance of the forward model during training. The key points are:
1. Back-translation typically relies on a fixed backward model, which can lead the forward model to overfit to its outputs. Meta back-translation instead continually trains the backward model to generate pseudo-parallel data that improves the forward model.
2. Experiments show Meta back-translation generates translations with fewer pathological outputs like greatly differing in length from references. It also avoids both overfitting and underfitting of the forward model by flexibly controlling the diversity of pseudo-parallel data.
3. Related work leverages mon
Maxmin qlearning controlling the estimation bias of qlearningHyunKyu Jeon
This document summarizes the Maxmin Q-learning paper published at ICLR 2020. Maxmin Q-learning aims to address the overestimation bias of Q-learning and underestimation bias of Double Q-learning by maintaining multiple Q-functions and using the minimum value across them for the target in the Q-learning update. It defines the action selection and target construction for the update based on taking the maximum over the minimum Q-value for each action. The algorithm initializes multiple Q-functions, selects a random subset to update using the maxmin target constructed from the minimum Q-values. This approach reduces the biases seen in prior methods.
Super tickets in pre trained language modelsHyunKyu Jeon
This document discusses finding "super tickets" in pre-trained language models through pruning attention heads and feedforward layers. It shows that lightly pruning BERT models can improve generalization without degrading accuracy (phase transition phenomenon). The authors propose a new pruning approach for multi-task fine-tuning of language models called "ticket sharing" where pruned weights are shared across tasks. Experiments on GLUE benchmarks show their proposed super ticket and ticket sharing methods consistently outperform unpruned baselines, with more significant gains on smaller tasks. Analysis indicates pruning reduces model variance and some tasks share more task-specific knowledge than others.
Synthesizer rethinking self-attention for transformer models HyunKyu Jeon
The document expresses gratitude to the reader for taking the time to listen. It does not provide any other details, context, or information beyond thanking the reader for listening. The summary captures the essence of the document in a single concise sentence.
This document summarizes Meta Back-Translation, a method for improving back-translation by training the backward model to directly optimize the performance of the forward model during training. The key points are:
1. Back-translation typically relies on a fixed backward model, which can lead the forward model to overfit to its outputs. Meta back-translation instead continually trains the backward model to generate pseudo-parallel data that improves the forward model.
2. Experiments show Meta back-translation generates translations with fewer pathological outputs like greatly differing in length from references. It also avoids both overfitting and underfitting of the forward model by flexibly controlling the diversity of pseudo-parallel data.
3. Related work leverages mon
Maxmin qlearning controlling the estimation bias of qlearningHyunKyu Jeon
This document summarizes the Maxmin Q-learning paper published at ICLR 2020. Maxmin Q-learning aims to address the overestimation bias of Q-learning and underestimation bias of Double Q-learning by maintaining multiple Q-functions and using the minimum value across them for the target in the Q-learning update. It defines the action selection and target construction for the update based on taking the maximum over the minimum Q-value for each action. The algorithm initializes multiple Q-functions, selects a random subset to update using the maxmin target constructed from the minimum Q-values. This approach reduces the biases seen in prior methods.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.