AAAI2023「Are Transformers Effective for Time Series Forecasting?」と、HuggingFace「Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)」の紹介です。
This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
AAAI2023「Are Transformers Effective for Time Series Forecasting?」と、HuggingFace「Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)」の紹介です。
This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
The document discusses the results of a study on the impact of COVID-19 lockdowns on air pollution. Researchers analyzed satellite data from NASA and the European Space Agency and found that nitrogen dioxide levels decreased significantly during lockdown periods in major cities across the world as traffic and industrial activities reduced. Overall, the temporary improvements in air quality during widespread lockdowns highlight the human-caused nature of poor air quality but also show how collective changes in behavior can positively impact the environment.
ICLR/ICML2019読み会で紹介した、ICLR2019でのNLPに関するOral4件の論文紹介です。
紹介論文:
Shen, Yikang, et al. “Ordered neurons: Integrating tree structures into recurrent neural networks.” in Proc. of ICLR, 2019.
Li, Xiang, et al. "Smoothing the Geometry of Probabilistic Box Embeddings." in Proc. of ICLR, 2019.
Wu, Felix, et al. "Pay less attention with lightweight and dynamic convolutions." in Proc. of ICLR, 2019.
Mao, Jiayuan, et al. "The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision." in Proc. of ICLR, 2019.
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
The document discusses meta-learning and prototypical networks for few-shot learning. It introduces prototypical networks, which learn a metric space such that classification can be performed by finding the nearest class prototype to a query example in embedding space. The document summarizes results on few-shot image classification benchmarks like Omniglot and miniImageNet, finding that prototypical networks achieve state-of-the-art performance.
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
This document discusses neural message passing networks for modeling quantum chemistry. It defines message passing networks as having message functions that update node states based on neighboring node states, vertex update functions that update node states based to accumulated messages, and a readout function that produces an output for the full graph. It provides examples of specific message, update, and readout functions used in existing message passing models like interaction networks and molecular graph convolutions.
SchNet: A continuous-filter convolutional neural network for modeling quantum...Kazuki Fujikawa
The document summarizes a paper about modeling quantum interactions using a continuous-filter convolutional neural network called SchNet. Some key points:
1) SchNet performs convolution using distances between nodes in 3D space rather than graph connectivity, allowing it to model interactions between arbitrarily positioned nodes.
2) This is useful for cases where graphs have different configurations that impact properties, or where graph and physical distances differ.
3) The paper proposes a continuous-filter convolutional layer and interaction block to incorporate distance information into graph convolutions performed by the SchNet model.
The document summarizes the paper "Matching Networks for One Shot Learning". It discusses one-shot learning, where a classifier can learn new concepts from only one or a few examples. It introduces matching networks, a new approach that trains an end-to-end nearest neighbor classifier for one-shot learning tasks. The matching networks architecture uses an attention mechanism to compare a test example to a small support set and achieve state-of-the-art one-shot accuracy on Omniglot and other datasets. The document provides background on one-shot learning challenges and related work on siamese networks, memory augmented neural networks, and attention mechanisms.
42. References
• Wu, Zonghan, et al. "A comprehensive survey on graph neural networks." arXiv
preprint arXiv:1901.00596 (2019).
• Gilmer, Justin, et al. "Neural message passing for quantum chemistry." in Proc. of
ICML, 2017.
• Duvenaud, David K., et al. "Convolutional networks on graphs for learning molecular
fingerprints." in Proc. of NIPS, 2015.
• Xu, Keyulu, et al. "How powerful are graph neural networks?." in Proc. of ICLR, 2019.
• Schütt, Kristof, et al. "SchNet: A continuous-filter convolutional neural network for
modeling quantum interactions." in Proc. of NIPS, 2017.
42