2
Introduction
Problem Statement
• Recent advancement of large-scale pretrained models such as BERT, GPT-3, CLIP, and Gopher, has shown
astonishing achievements across various task domains
• Unlike vision recognition and language models, studies on general-purpose user representation at scale still
remain underexplored
3
Introduction
Problem Statement
• Can general-purpose user representations learned from multiple source data provide a promising transfer
learning capacity?
• Are pretraining and downstream task performances positively correlated?
• How various tasks can the pretrained user representations address?
• Does scaling up the pretraining model improve the generalization performance?
• If so, which factors, such as, training data and model size, behavior sequence length and batch size, should be
scaled up?
4
Introduction
Contribution
• Empirical scaling law
• Transferability improves as the pretraining error decrease
• Transforming tabular data to natural language text provides common semantic representation
• Advantages of training from multiple service logs
5
Introduction
Contribution
• Empirical scaling law
• Transferability improves as the pretraining error decrease
• Transforming tabular data to natural language text provides common semantic representation
• Advantages of training from multiple service logs
8
Method
Transferability improves as the pretraining error decrease
• heterogeneous dataset (multi-domain)
• The loss of pretrain task and downstream task has a strong correlation
It’s good for downstream Pretraining is general in various data distributions
9
Method
Transforming tabular data to natural language text provides common semantic representation
• they transform all data into natural language texts by extracting textual information from tabular data
(e.g., product descriptions from product data table)
• This policy alleviates the discrepancies in data format within different services; the data format of the
same product varies depending on the platform
10
Method
Advantages for training from multiple service logs
• CLUE learns a multi-modal user embedding space from two services, and shows promising results on
diverse downstream tasks
11
Experiment
Advantages for training from multiple service logs
• CLUE learns a multi-modal user embedding space from two services, and shows promising results on
diverse downstream tasks
16
Method
Scaling Law and Generalization
Scaling Laws for Neural Language Models (arxiv, OpenAI)
• OpenAI presented the basis for the introduction of the Large Model in earnest, a paper published in
early 2020.
17
Method
Scaling Law and Generalization
• Model architecture
• Model size
• Computing power required for learning
• Learning Dataset Size
18
Method
Scaling Law and Generalization
• power law relation?
ex) The number of cities with a specific population number appears in inverse proportion to the power of
the population number (멱급수)
→ A law that determines the factor in various models.
→ Unlike LM, factors such as batch size and sequence length were important in the user presentation task.
23
Conclusion
• The success of task-agnostic pretraining in other domain in valid in user representation
• CLUE trained on the billion scale real-world user behavior data to learn general-purpose user representations
• They further investigate the empirical scaling laws and the generalization ability of our method, and find that the
power-law learning curve as a function of computation (PF-days) is observed in the experiments
Editor's Notes
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.