A more deeper talk on the Transformer architecture from the webinar at NTR
https://www.ntr.ai/webinar/transformery
Google slides version: https://docs.google.com/presentation/d/1dIadh_nIszxXG8-672vJmvFGT6jBp0mOqzNV4g3e2Lc/edit?usp=sharing
A talk on Transformers at GDG DevParty
27.06.2020
Link to Google Slides version: https://docs.google.com/presentation/d/1N7ayCRqgsFO7TqSjN4OWW-dMOQPT5DZcHXsZvw8-6FU/edit?usp=sharing
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
Implementing a fixed point int16_t integer matrix vector multiplication kernel for Intel processors with AVX-512 and the Xbyak just-in-time compiler (what Intel MKL jit_cgemm uses)
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
Hacking GPUs for Deep Learning: GPUs have revolutionized machine learning in recent years, and have made both massive and deep multi-layer neural networks feasible. However, misunderstandings on why they seem to be winning persist. Many of deep learning’s workloads are in fact “too small” for GPUs, and require significantly different approaches to take full advantage of their power. There are many differences between traditional high-performance computing workloads, long the domain of GPUs, and those used in deep learning. This talk will cover these issues by looking into various quirks of GPUs, how they are exploited (or not) in current model architectures, and how Facebook AI Research is approaching deep learning programming through our recent work.
A talk on Transformers at GDG DevParty
27.06.2020
Link to Google Slides version: https://docs.google.com/presentation/d/1N7ayCRqgsFO7TqSjN4OWW-dMOQPT5DZcHXsZvw8-6FU/edit?usp=sharing
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
Implementing a fixed point int16_t integer matrix vector multiplication kernel for Intel processors with AVX-512 and the Xbyak just-in-time compiler (what Intel MKL jit_cgemm uses)
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
Hacking GPUs for Deep Learning: GPUs have revolutionized machine learning in recent years, and have made both massive and deep multi-layer neural networks feasible. However, misunderstandings on why they seem to be winning persist. Many of deep learning’s workloads are in fact “too small” for GPUs, and require significantly different approaches to take full advantage of their power. There are many differences between traditional high-performance computing workloads, long the domain of GPUs, and those used in deep learning. This talk will cover these issues by looking into various quirks of GPUs, how they are exploited (or not) in current model architectures, and how Facebook AI Research is approaching deep learning programming through our recent work.
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCMLconf
Graph Traversal at 30 billion edges per second with NVIDIA GPUs: I will discuss current research on the MapGraph platform. MapGraph is a new and disruptive technology for ultra-fast processing of large graphs on commodity many-core hardware. On a single GPU you can analyze the bitcoin transaction graph in .35 seconds. With MapGraph on 64 NVIDIA K20 GPUs, you can traverse a scale-free graph of 4.3 billion directed edges in .13 seconds for a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS). I will explain why GPUs are an interesting option for data intensive applications, how we map graphs onto many-core processors, and what the future looks like for the MapGraph platform.
MapGraph provides a familiar vertex-centric abstraction, but its GPU acceleration is 100s of times faster than main memory CPU-only technologies and up to 100,000 times faster than graph technologies based on MapReduce or key-value stores such as HBase, Titan, and Accumulo. Learn more athttp://MapGraph.io.
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Track 3 - A robot in the classroom
Authors: Pedro Tavares, Pedro Costa, José Lima and António Paulo Moreira
https://www.youtube.com/watch?v=2CFWpTaUY44&index=5&list=PLboNOuyyzZ85UwWh70luNvKIhX8U1gxug
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
For all that we're unable to attend or would like to recap our live webinar Deep Learning for Tensorflow Series part 2, we have all the information for you so would not miss out!
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
Academic project based on developing a LSTM distributing it on Spark and using Tensorflow for numerical operations.
Source code: https://github.com/EmanuelOverflow/LSTM-TensorSpark
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
spaGO: A self-contained ML & NLP library in GOMatteo Grella
Introduction to spaGO, a beautiful and maintainable machine learning library written in Go designed to support relevant neural network architectures in natural language processing tasks.
Github: https://github.com/nlpodyssey/spago
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
AI Infra Day
Oct. 25, 2023
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Wanchao Liang (Software Engineer, @Meta)
Explore the technology advancements of PyTorch Distributed, and dive into the details of how multi-dimensional parallelism is made possible to train Large Language Models by composing different PyTorch native distributed training APIs.
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCMLconf
Graph Traversal at 30 billion edges per second with NVIDIA GPUs: I will discuss current research on the MapGraph platform. MapGraph is a new and disruptive technology for ultra-fast processing of large graphs on commodity many-core hardware. On a single GPU you can analyze the bitcoin transaction graph in .35 seconds. With MapGraph on 64 NVIDIA K20 GPUs, you can traverse a scale-free graph of 4.3 billion directed edges in .13 seconds for a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS). I will explain why GPUs are an interesting option for data intensive applications, how we map graphs onto many-core processors, and what the future looks like for the MapGraph platform.
MapGraph provides a familiar vertex-centric abstraction, but its GPU acceleration is 100s of times faster than main memory CPU-only technologies and up to 100,000 times faster than graph technologies based on MapReduce or key-value stores such as HBase, Titan, and Accumulo. Learn more athttp://MapGraph.io.
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Track 3 - A robot in the classroom
Authors: Pedro Tavares, Pedro Costa, José Lima and António Paulo Moreira
https://www.youtube.com/watch?v=2CFWpTaUY44&index=5&list=PLboNOuyyzZ85UwWh70luNvKIhX8U1gxug
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
For all that we're unable to attend or would like to recap our live webinar Deep Learning for Tensorflow Series part 2, we have all the information for you so would not miss out!
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
Academic project based on developing a LSTM distributing it on Spark and using Tensorflow for numerical operations.
Source code: https://github.com/EmanuelOverflow/LSTM-TensorSpark
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
spaGO: A self-contained ML & NLP library in GOMatteo Grella
Introduction to spaGO, a beautiful and maintainable machine learning library written in Go designed to support relevant neural network architectures in natural language processing tasks.
Github: https://github.com/nlpodyssey/spago
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
AI Infra Day
Oct. 25, 2023
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Wanchao Liang (Software Engineer, @Meta)
Explore the technology advancements of PyTorch Distributed, and dive into the details of how multi-dimensional parallelism is made possible to train Large Language Models by composing different PyTorch native distributed training APIs.
Netflix success is credited to pioneering ways that the company introduced AI and ML into its products, services and infrastructure. ML learning is applied to solve a wide range of problems at Netflix.
“Show Me the Garbage!”, Garbage Collection a Friend or a FoeHaim Yadid
“Just leave the garbage outside and we will take care of it for you”. This is the panacea promised by garbage collection mechanisms built into most software stacks available today. So, we don’t need to think about it anymore, right? Wrong! When misused, garbage collectors can fail miserably. When this happens they slow down your application and lead to unacceptable pauses. In this talk we will go over different garbage collectors approaches and understand under which conditions they function well.
Our 3D technology is entirely CPU based, parallel, and scalable thus allowing real-time visualization of large scenes and datasets from various sources (triangulated scenes, volumetric objects, ...)
This webinar by Dov Nimratz (Senior Solution Architect, Consultant, GlobalLogic) was delivered at Embedded Community Webinar #1 on July 7, 2020.
Webinar agenda:
- CPU / GPU / TPU architectures
- Historical context
- CPU and their variations
- GPU or gin in a bottle for artificial intelligence tasks
- TPU architecture specialized artificial intelligence accelerator
- What's next in technology
More details and presentation: https://www.globallogic.com/ua/about/events/embedded-community-webinar-1/
Agenda:
In this talk we will present various locking mechanisms implemented in the linux kernel.
From System V locks to raw spinlocks and the RT patch.
Speaker:
Mark Veltzer - CTO of Hinbit and a senior instructor at John Bryce. Mark is also a member of the Free Source Foundation and contributes to many free projects.
https://github.com/veltzer
HKG18-110 - net_mdev: Fast path user space I/OLinaro
Session ID: HKG18-110
Session Name: HKG18-110 - net_mdev: Fast path user space I/O
Speaker: Ilias Apalodimas
Track: Networking
★ Session Summary ★
User space I/O offers significant speedup potential for data plane and other high-performance applications, but at the high cost of writing and maintaining separate device drivers. Building on the existing kernel mediated device framework originally introduced to support GPUs, net\_mdev extends this support to network I/O, requiring only minor changes to existing kernel drivers. Applications, in turn, need only provide ""mini drivers"" to handle the performance I/O paths in user space while leaving control operations in the kernel.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-110/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-110.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-110.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Networking
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...PROIDEA
Users of modern Linux containerization technologies are frequently at loss with what kind of security guarantees are delivered by tools they use. Typical questions range from Can these be used to isolate software with known security shortcomings and rich history of security vulnerabilities? to even Can I used such technique to isolate user-generated and potentially hostile assembler payloads?
Modern Linux OS code-base as well as independent authors provide a plethora of options for those who desire to make sure that their computational loads are solidly confined. Potential users can choose from solutions ranging from Docker-like confinement projects, through Xen hypervisors, seccomp-bpf and ptrace-based sandboxes, to isolation frameworks based on hardware virtualization (e.g. KVM).
The talk will discuss available today techniques, with focus on (frequently overstated) promises regarding their strength. In the end, as they say: “Many speed bumps don’t make a wall
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
The world in which we monitor software is growing more complex every year. There are increasingly more ways to run server-side software, with many more independent services and more points of failures, the list goes on! On the plus side, there’s a lot of great tools and patterns being developed to try and make things simple to assess and understand. This talk covers how metrics and monitoring can be leveraged in a variety of different ways, auto-discovering applications and their usage of databases, caches, load balancers, etc, setting up and tearing down dashboards and monitoring automatically for services and instances, and more.
We’ll also talk about how you can accomplish all this with a global view of your systems using both Prometheus and Graphite with M3, our open source metrics platform. We’ll take a deep dive look at how we use M3DB, distributed aggregation with the M3 aggregator and the M3 Kubernetes operator to horizontally scale a metrics platform in a way that doesn’t cost outrageous amounts to run with a system that’s still sane to operate with petabytes of metrics data.
Discussing two papers on applying Deep Learning for Biology:
BERTology Meets Biology: Interpreting Attention in Protein Language Models
https://arxiv.org/abs/2006.15222
Evaluating Protein Transfer Learning with TAPE
https://arxiv.org/abs/1906.08230
Modern neural net architectures - Year 2019 versionGrigory Sapunov
Slides from the talk on UseData 2019 conference. Describes what happened in the NN architecture space in the last two years. Focus on production-ready things. Other interesting but more research-related topics (like Graph networks) are not covered here.
The most significant (not purely scientific) results in AI in the last year (2018-2019).
Disclaimer: may be very subjective :)
Slides to the set of lectures given in Feb-Apr 2019.
This one was conducted in Atlas Biomed Group, 2019-04-26
Практический подход к выбору доменно-адаптивного NMTGrigory Sapunov
Выбор правильного машинного перевода (Machine Translation, MT) под конкретный проект крайне сложен: качество движков сильно разнится между различными доменами и языковыми парами и постоянно меняется с обновлением моделей. С доменно-адаптивным MT ситуация ещё более усложняется — добавляются требования к обучающему корпусу и становится менее прозрачной стоимость владения.
В докладе мы рассмотрим гибридный подход к оценке качества сервисов облачного доменно-адаптивного нейросетевого машинного перевода (NMT), при котором автоматическая оценка сравнивается с оценкой, данной экспертами, использующими единую метрику. Мы также обсудим детали обоих подходов, поделимся результатами оценки и мыслями относительно сравнения обоих методов.
Мы ответим на вопросы, насколько хорош доменно-адаптивный NMT, какова стоимость владения таким решением, как долго он обучается, сколько данных требуется, насколько это безопасно и как его использовать.
Введение в архитектуры нейронных сетей / HighLoad++ 2016Grigory Sapunov
Slides from HighLoad++ 2016 conference.
Introduction into neural network architectures (Rus)
Презентация для конференции HighLoad++ 2016.
http://www.highload.ru/2016/abstracts/2454.html
Видеозапись доклада:
https://www.youtube.com/watch?v=XY5AczPW7V4
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
2. ● Transformer architecture understanding
○ Original paper: https://arxiv.org/abs/1706.03762
○ Great visual explanation: http://jalammar.github.io/illustrated-transformer
○ Lecture #12 from my DL course
https://github.com/che-shr-cat/deep-learning-for-biology-hse-2019-course
● This talk is a follow-up talk for the one from the GDG DevParty
○ https://www.youtube.com/watch?v=KZ9NXYcXVBY
Prerequisites
4. Transformer
A new simple network architecture,
the Transformer:
● Is a Encoder-Decoder architecture
● Based solely on attention mechanisms
(no RNN/CNN)
● The major component in the transformer is
the unit of multi-head self-attention
mechanism.
● Fast: only matrix multiplications
● Strong results on standard WMT datasets
7. The transformer adopts the scaled dot-product
attention: the output is a weighted sum of the
values, where the weight assigned to each value
is determined by the dot-product of the query
with all the keys:
The input consists of queries and keys of
dimension dk, and values of dimension dv.
Scaled dot-product attention
8. Problems with vanilla transformers
● It’s a pretty heavy model
→ hard to train, tricky training
schedule
● It has O(N2) computational
complexity attention mechanism
→ scales poorly
● It has limited context span
(mostly due to the complexity),
typically 512 tokens
→ can’t process long sequences.
● May need different implicit bias
for other types of data (e.g. image,
sound, etc)
16. Input elements
● Character: Character-Level Language Modeling with Deeper Self-
Attention, https://arxiv.org/abs/1808.04444
● BPE (subword units): most of the transformers
● Word: still why not, but not so flexible working with out-of-vocabulary words
● Pixels: Image transformer or iGPT
● MIDI notes: Music transformer
● ...
17. Image GPT (iGPT)
Just GPT-2 trained on images unrolled into long sequences of pixels!
Waiting for GPT-3 (uses sparse attention) trained on images.
https://openai.com/blog/image-gpt/
18. Dimensionality: 1D, 2D, ...
Axial Transformer: for images and other data organized as high dim tensors.
Axial Attention in Multidimensional Transformers
https://arxiv.org/abs/1912.12180
19. Positional Encoding
1. Sinusoidal Position Encoding (Vaswani et al, 2017,
https://arxiv.org/abs/1706.03762)
Uses sine/cosine waves as in the original paper.
2. Learned Position Encoding (Gehring et al, 2017,
https://arxiv.org/abs/1705.03122)
Embed the absolute position of input elements. Can’t extrapolate to lengths
it has never seen during training.
3. Relative Position Representations (Shaw et al, 2018,
https://arxiv.org/abs/1803.02155)
Model the input as a labeled, directed, fully-connected graph. Learn edge
representation.
20. Transformer with added recurrence: it can see the previous segment
representations, so can process longer sentences.
Recurrence: Transformer-XL
https://arxiv.org/abs/1901.02860
21. The Compressive Transformer keeps a fine-grained memory of past activations,
which are then compressed into coarser compressed memories.
Recurrence & Mem: Compressive Transformer
Compressive Transformers for Long-Range Sequence Modelling
https://arxiv.org/abs/1911.05507
25. Attention mechanism: Image Transformer
Local self-attention: in every self-attention layer, each position in a query block
attends to all positions in the memory block.
Image Transformer, https://arxiv.org/abs/1802.05751
26. Sparse factorizations of the attention matrix reduces complexity to O(N*sqrt(N)).
Can generate sounds and images.
Attention mechanism: Sparse Transformer
Generating Long Sequences with Sparse Transformers
https://arxiv.org/abs/1904.10509
https://openai.com/blog/sparse-transformer/
27. Generating Long Sequences with Sparse Transformers
https://arxiv.org/abs/1904.10509
https://openai.com/blog/sparse-transformer/
28. Reformer is an optimizer transformer:
● Using less memory (reversible
layers do not store activations,
chunking ff-layer computations)
● Calculating attention using LSH
(Locality-sensitive hashing)
○ O(L2) → O(L*logL)
○ Approximate softmax by LSH (softmax
is dominated by the largest elements,
for each query qi we only need to focus on the keys in K that are closest to qi)
● => can process larger sequences!
64K Sequences on One GPU!
Reformer
Reformer: The Efficient Transformer
https://arxiv.org/abs/2001.04451
31. ETC: Encoding Long and Structured Data in Transformers
https://arxiv.org/abs/2004.08483
32. Use local sliding window attention + add global attention for pre-selected positions.
Longformer
Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
34. ● Another local + global attention.
● Can incorporate structured data into the model!
Extended Transformer Construction (ETC)
ETC: Encoding Long and Structured Data in Transformers
https://arxiv.org/abs/2004.08483
35. Idea:
● Apply ACT to Transformers
● Apply a variable number of repetitions for calculating each position: a
Universal Transformer (UT)
● Use dynamic attention span: Adaptive Attention Span in Transformers
Adaptive Computation Time in Transformers
Adaptive Computation Time (ACT) in Neural Networks [3/3]
https://medium.com/@moocaholic/adaptive-computation-time-act-in-neural-networks-3-3-99452b2eff18
36. ● Two flavors of UT in the paper:
○ UT with a fixed number of repetitions.
○ UT with dynamic halting.
● The UT repeatedly refines a series of vector representations for each position
of the sequence in parallel, by combining information from different positions
using self-attention and applying a recurrent transition function across all time
steps.
○ The number of time steps, T, is arbitrary but fixed (no ACT here, fixed
number of repetitions).
○ The number of time steps, T, is dymanic (a dynamic ACT halting
mechanism to each position in the input sequence)
Universal Transformer (UT): Implementation
“Universal Transformers”,
https://arxiv.org/abs/1807.03819
37. UT with a fixed number of repetitions
“Moving Beyond Translation with the Universal Transformer”,
https://ai.googleblog.com/2018/08/moving-beyond-translation-with.html
38. Adaptive UT with dynamic halting
“Universal Transformers”,
https://mostafadehghani.com/2019/05/05/universal-transformers/
39. ● Universal Transformer is a recurrent function (not in time, but in depth) that
evolves per-symbol hidden states in parallel, based at each step on the
sequence of previous hidden states.
○ In that sense, UT is similar to architectures such as the Neural GPU
and the Neural Turing Machine.
● When running for a fixed number of steps, the Universal Transformer is
equivalent to a multi-layer Transformer with tied parameters across its layers.
● Adaptive UT: as the recurrent transition function can be applied any number
of times, this implies that adaptive UTs can have variable depth (number of
per-symbol processing steps).
● Universal Transformer can be shown to be Turing-complete (or
“computationally universal”)
Universal Transformer (UT): Notes
“Universal Transformers”,
https://arxiv.org/abs/1807.03819
40. Related idea: cross-layer parameter sharing (ALBERT)
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
https://arxiv.org/abs/1909.11942
41. ● The problem with the vanilla transformer is its fixed context size (or attention
span).
● It cannot be very large because of the computation cost of the attention
mechanism (it requires O(n²) computations).
● Let the layer (or even the attention head) decide the required context size on
its own.
● There are two options:
○ Learnable (the adaptive attention span): let each attention head learn it’s
own attention span independently from the other heads. It is learnable,
but still fixed after the training is done.
○ ACT-like (the dynamic attention span): changes the span dynamically
depending on the current input.
Adaptive Attention Span: Idea & Implementation
“Adaptive Attention Span in Transformers”,
https://arxiv.org/abs/1905.07799
42. The models are smaller, the performance is better.
Adaptive Attention Span: Performance
“Adaptive Attention Span in Transformers”,
https://arxiv.org/abs/1905.07799
43. Adaptive spans (in log-scale) of every attention heads in a 12-layer model with
span limit S = 4096. Few attention heads require long attention spans
Adaptive spans are learned larger when needed
“Adaptive Attention Span in Transformers”,
https://arxiv.org/abs/1905.07799
44. Example of average dynamic attention span as a function of the input sequence.
The span is averaged over the layers and heads.
Dynamic spans adapt to the input sequence
“Adaptive Attention Span in Transformers”,
https://arxiv.org/abs/1905.07799
47. ● Transformers are cool and produce great results!
● There are many modifications, it’s kind of LEGO, you can combine it.
● More good source code and libraries are available (Huggingface, Colab
notebooks, etc)
● Definitely more transformers to come!
● GET INVOLVED!
You CAN move things forward!
(just combine several
ideas from these
slides 🙂)
Wrap up