The Age of Language Models in NLP
Tuesday | 23rd June, 2020
LIVE WEBINAR
Presented by
AGENDA
1 . About Tyrone
 World’s high performing AI platform system – A100
 Get Development, Training, Inference in one
 Era of Modern Mixed Workloads
 Tyrone Kubyts™
2. Word Embeddings
 How Word embeddings create a context based relationship
 How to create Word Embeddings
3. Sequence Modelling
 Introduction of Deep learning in NLP
 Overview on the model architecture to use
4. Advanced Language Models
 Overview of the Language models
 How are they created
 Transformers
 BERT , GP2 etc
5. NLP Attention Mechanism
 Overview on attention Mechanism
6. Case Studies
Tyrone Systems at a Glance
NVIDIA HGX A100 PERFORMANCE
New Tensor Core for AI & HPC
New Multi-instance GPU
New Hardware Engines
Increase in GPU
interconnect
bandwidth
Increase in GPU
memory
Increase in
memory
bandwidth
Speedup in
AI performance
54 Billion
XTORS
3rd Gen
Tensor cores
Sparsity
Acceleration
Multi
Instance GPU
3rd GEN NVLINK
& NVSwitch
NVIDIA A100
Greatest Generational Leap – 20X Volta
54B XTOR | 826mm2 | TSMC 7N | 40GB Samsung HBM2 | 600 GB/s NVLink
Peak Vs Volta
FP32 TRAINING 312 TFLOPS 20X
INT8 INFERENCE 1,248 TOPS 20X
FP64 HPC 19.5 TFLOPS 2.5X
MULTI INSTANCE GPU 7X GPUs
New tf32 tensor cores on A100
20X Higher FLOPS for AI, Zero Code Change
20X Faster than Volta FP32 | Works like FP32 for AI with Range of FP32 and Precision of FP16
No Code Change Required for End Users | Supported on PyTorch, TensorFlow and MXNet Frameworks Containers
Most flexible ai platform with MULTI-INSTANCE GPU (MIG)
Optimize GPU Utilization, Expand Access to More Users with Guaranteed Quality of Service
Up To 7 GPU Instances In a Single A100:
Simultaneous Workload Execution With
Guaranteed Quality Of Service:
All MIG instances run in parallel with predictable
throughput & latency
Flexibility to run any type of workload on a MIG
instance
Right Sized GPU Allocation:
Different sized MIG instances based on target
workloads
Amber
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
GPU Mem
GPU
ONE SYSTEM FOR ALL ai INFRASTRUCTURE
AI Infrastructure Re-Imagined, Optimized, and Ready for Enterprise AI-at-Scale
any job | any size | any node | anytime
Analytics  Training  Inference
Flexible AI infrastructure that adapts to the
pace of enterprise
• One universal building block for the AI data
center
• Uniform, consistent performance across the
data center
• Any workload on any node - any time
• Limitless capacity planning with predictably
great performance with scale
Game-changing performance for innovators
9x Mellanox ConnectX-6 200Gb/s Network Interface
450GB/sec Peak Bi-directional Bandwidth
Dual 64-core AMD Rome CPUs and 1TB RAM
3.2X More Cores to Power the Most Intensive AI Jobs
8x NVIDIA A100 GPUs with 320GB Total GPU Memory
12 NVLinks/GPU 600GB/sec GPU-to-GPU Bi-directional Bandwidth
15TB Gen4 NVME SSD
4.8TB/sec Bi-directional Bandwidth
2X More than Previous Generation NVSwitch
6x NVIDIA NVSwitches
25GB/sec Peak Bandwidth
2X Faster than Gen3 NVME SSDs
2U GPU server up to 4 NVIDIA HGX ™ A100 GPU
Camarero DAS7TGVQ-24RT
Tyrone NVIDIA A100 based SERVERS
• Supports 4x A100 40GB SXM4 GPUs
• Supports CPU TDP up to 280W
• Dual AMD EPYC™ 7002 Series Processors in up to 128 Cores
• Flexible storage with 4 hotswap for SAS, SATA or NVMe
• PCI-E Gen 4 NVLink for fast GPU-GPU connections
• 32 DIMM Slots that allow up to 8TB of 3200Mhz DDR4 memory
• 4 Hot-swap heavy duty fans
• 2x 2200W Redundant Power Supplies, Titanium Level
PCI-E Gen 4
NEW LAUNCH
NVIDIA NVLink
4U GPU server up to 8 NVIDIA HGX ™ A100 GPU
Tyrone NVIDIA A100 based SERVERS
NVIDIA NVLink & NVSwitch
NEW LAUNCH
• Supports up to 8 double-width GPUs,
• Supports CPU TDP up to 280W
• Dual AMD EPYC™ 7002 Series Processors in up to 128 Cores
• Flexible storage with 4 hotswap for SAS, SATA or NVMe
• PCI-E Gen 4 NVLink for fast GPU-GPU connections
• 32 DIMM Slots that allow up to 8TB of 3200Mhz DDR4 memory
• 4 Hot-swap heavy duty fans
• 2x 2000W Redundant Power Supplies, Titanium Level
4U GPU server up to 8 NVIDIA HGX ™ A100 GPU
Tyrone NVIDIA A100 based SERVERS
NVIDIA NVLink
COMING SOON
• Supports Intel Xeon
• Supports NVLink
• 8 x NVIDIA Tesla A100 SXM4
Delivers 4XFASTER TRAINING
than other GPU-based systems
Your Personal AI Supercomputer
Power-on to Deep Learning in Minutes
Pre-installed with Powerful
Deep Learning
Software
Extend workloads from your
Desk-to-Cloud in Minutes
Mixed Workloads Convergence of
AI|HPC| Cloud | Containers
The Era of Modern Mixed Workload
F L E X I B L E Is the usage going to be constant?
O P T I M I Z A T I O N Is optimal utilization required?
R E S I L I E N C E Do we need the application to run all the time.
E A S E Is ‘ease of maintenance’ key?
S C A L A B I L I T Y & S P E E D Do we have one size that fits all?
Connectivity and usage
Virtual Desktop
Laptop
Tyrone Cloud
Manager
Tyrone Cloud
Manager
Laptop
Run Multiple Applications
simultaneously
Flow Architecture Revolutionizing Deep Learning CPU-GPU Environment
10X20X30X40X50X60X70X
SPEED
WITH TYRONE KUBYTS™ CLIENT
Compatible Workstations
has a repository of :
50 containerized applications
100s of Containers
CLOUD
Tyrone KUBITS : Revolutionizing Deep Learning CPU-GPU Environment
Run different
applications
simultaneously
Check for Tyrone
KUBITS Compatible
Workstations
Get access to over
100+ Containers on
Tyrone KUBITS Cloud.
High scalability
Affordable price
Has both GPU &
CPU Optimized
Containers
Design a simple Workstation
or Large Clusters with KUBITS
technology.
Talk to our experts & build
the right workstation within
your budget.
KUBITS
CLOUDCOMPATIBLE
AGENDA
1 . About Tyrone
 World’s high performing AI platform system – A100
 Get Development, Training, Inference in one
 Era of Modern Mixed Workloads
 Tyrone Kubyts™
2. Word Embeddings
 How Word embeddings create a context based relationship
 How to create Word Embeddings
3. Sequence Modelling
 Introduction of Deep learning in NLP
 Overview on the model architecture to use
4. Advanced Language Models
 Overview of the Language models
 How are they created
 Transformers
 BERT , GP2 etc
5. NLP Attention Mechanism
 Overview on attention Mechanism
6. Case Studies
Word Embedding
• Word Embedding is a language modeling technique used
for mapping words to vectors of real numbers.
• It represents words or phrases in vector space with
several dimensions.
• Word embeddings can be generated using various
methods like neural networks, co-occurrence matrix,
probabilistic models, etc.
• Word2Vec consists of models for generating word
embedding. These models are shallow two layer neural
networks having one input layer, one hidden layer and
one output layer. Word2Vec utilizes two architectures
CBOW – Continuous Bag of Words
• CBOW model predicts the
current word given context
words within specific window.
The input layer contains the
context words and the output
layer contains the current word.
• The hidden layer contains the
number of dimensions in which
we want to represent current
word present at the output layer.
Skip Gram – Word Embeddings
• Skip gram predicts the surrounding
context words within specific
window given current word. The
input layer contains the current
word and the output layer contains
the context words.
• The hidden layer contains the
number of dimensions in which we
want to represent current word
present at the input layer.
Advanced Language Models and Transformers
ELMo ULMFit
BERT Transformer
Transformer
BERT Architecture
• Transformer is an attention-based architecture for NLP
• Transformer composed of two parts: Encoding component and
Decoding component
• BERT is a multi-layer bidirectional Transformer encoder
Attention Mechanism
BERT vs GPT
Q&A Session
Hirdey Vikram
Hirdey.vikram@netwebindia.com
India (North)
Niraj
niraj@netwebindia.com
India (South)
Vivek
vivek@netwebindia.com
India (East)
Navin
navin@netwebindia.com
India (West)
Anupriya
anupriya@netwebtech.com
Singapore
Arun
arun@netwebtech.com
UAE
Agam
agam@netwebtech.com
Indonesia
Contact our team if you have any further questions after this webinar
ai@netwebtech.comTalk to our AI Experts

Age of Language Models in NLP

  • 1.
    The Age ofLanguage Models in NLP Tuesday | 23rd June, 2020 LIVE WEBINAR Presented by
  • 2.
    AGENDA 1 . AboutTyrone  World’s high performing AI platform system – A100  Get Development, Training, Inference in one  Era of Modern Mixed Workloads  Tyrone Kubyts™ 2. Word Embeddings  How Word embeddings create a context based relationship  How to create Word Embeddings 3. Sequence Modelling  Introduction of Deep learning in NLP  Overview on the model architecture to use 4. Advanced Language Models  Overview of the Language models  How are they created  Transformers  BERT , GP2 etc 5. NLP Attention Mechanism  Overview on attention Mechanism 6. Case Studies
  • 3.
  • 4.
    NVIDIA HGX A100PERFORMANCE New Tensor Core for AI & HPC New Multi-instance GPU New Hardware Engines Increase in GPU interconnect bandwidth Increase in GPU memory Increase in memory bandwidth Speedup in AI performance
  • 5.
    54 Billion XTORS 3rd Gen Tensorcores Sparsity Acceleration Multi Instance GPU 3rd GEN NVLINK & NVSwitch
  • 6.
    NVIDIA A100 Greatest GenerationalLeap – 20X Volta 54B XTOR | 826mm2 | TSMC 7N | 40GB Samsung HBM2 | 600 GB/s NVLink Peak Vs Volta FP32 TRAINING 312 TFLOPS 20X INT8 INFERENCE 1,248 TOPS 20X FP64 HPC 19.5 TFLOPS 2.5X MULTI INSTANCE GPU 7X GPUs
  • 7.
    New tf32 tensorcores on A100 20X Higher FLOPS for AI, Zero Code Change 20X Faster than Volta FP32 | Works like FP32 for AI with Range of FP32 and Precision of FP16 No Code Change Required for End Users | Supported on PyTorch, TensorFlow and MXNet Frameworks Containers
  • 8.
    Most flexible aiplatform with MULTI-INSTANCE GPU (MIG) Optimize GPU Utilization, Expand Access to More Users with Guaranteed Quality of Service Up To 7 GPU Instances In a Single A100: Simultaneous Workload Execution With Guaranteed Quality Of Service: All MIG instances run in parallel with predictable throughput & latency Flexibility to run any type of workload on a MIG instance Right Sized GPU Allocation: Different sized MIG instances based on target workloads Amber GPU Mem GPU GPU Mem GPU GPU Mem GPU GPU Mem GPU GPU Mem GPU GPU Mem GPU GPU Mem GPU
  • 9.
    ONE SYSTEM FORALL ai INFRASTRUCTURE AI Infrastructure Re-Imagined, Optimized, and Ready for Enterprise AI-at-Scale any job | any size | any node | anytime Analytics  Training  Inference Flexible AI infrastructure that adapts to the pace of enterprise • One universal building block for the AI data center • Uniform, consistent performance across the data center • Any workload on any node - any time • Limitless capacity planning with predictably great performance with scale
  • 10.
    Game-changing performance forinnovators 9x Mellanox ConnectX-6 200Gb/s Network Interface 450GB/sec Peak Bi-directional Bandwidth Dual 64-core AMD Rome CPUs and 1TB RAM 3.2X More Cores to Power the Most Intensive AI Jobs 8x NVIDIA A100 GPUs with 320GB Total GPU Memory 12 NVLinks/GPU 600GB/sec GPU-to-GPU Bi-directional Bandwidth 15TB Gen4 NVME SSD 4.8TB/sec Bi-directional Bandwidth 2X More than Previous Generation NVSwitch 6x NVIDIA NVSwitches 25GB/sec Peak Bandwidth 2X Faster than Gen3 NVME SSDs
  • 11.
    2U GPU serverup to 4 NVIDIA HGX ™ A100 GPU Camarero DAS7TGVQ-24RT Tyrone NVIDIA A100 based SERVERS • Supports 4x A100 40GB SXM4 GPUs • Supports CPU TDP up to 280W • Dual AMD EPYC™ 7002 Series Processors in up to 128 Cores • Flexible storage with 4 hotswap for SAS, SATA or NVMe • PCI-E Gen 4 NVLink for fast GPU-GPU connections • 32 DIMM Slots that allow up to 8TB of 3200Mhz DDR4 memory • 4 Hot-swap heavy duty fans • 2x 2200W Redundant Power Supplies, Titanium Level PCI-E Gen 4 NEW LAUNCH NVIDIA NVLink
  • 12.
    4U GPU serverup to 8 NVIDIA HGX ™ A100 GPU Tyrone NVIDIA A100 based SERVERS NVIDIA NVLink & NVSwitch NEW LAUNCH • Supports up to 8 double-width GPUs, • Supports CPU TDP up to 280W • Dual AMD EPYC™ 7002 Series Processors in up to 128 Cores • Flexible storage with 4 hotswap for SAS, SATA or NVMe • PCI-E Gen 4 NVLink for fast GPU-GPU connections • 32 DIMM Slots that allow up to 8TB of 3200Mhz DDR4 memory • 4 Hot-swap heavy duty fans • 2x 2000W Redundant Power Supplies, Titanium Level
  • 13.
    4U GPU serverup to 8 NVIDIA HGX ™ A100 GPU Tyrone NVIDIA A100 based SERVERS NVIDIA NVLink COMING SOON • Supports Intel Xeon • Supports NVLink • 8 x NVIDIA Tesla A100 SXM4
  • 14.
    Delivers 4XFASTER TRAINING thanother GPU-based systems Your Personal AI Supercomputer Power-on to Deep Learning in Minutes Pre-installed with Powerful Deep Learning Software Extend workloads from your Desk-to-Cloud in Minutes
  • 15.
    Mixed Workloads Convergenceof AI|HPC| Cloud | Containers
  • 16.
    The Era ofModern Mixed Workload F L E X I B L E Is the usage going to be constant? O P T I M I Z A T I O N Is optimal utilization required? R E S I L I E N C E Do we need the application to run all the time. E A S E Is ‘ease of maintenance’ key? S C A L A B I L I T Y & S P E E D Do we have one size that fits all?
  • 17.
    Connectivity and usage VirtualDesktop Laptop Tyrone Cloud Manager Tyrone Cloud Manager Laptop
  • 18.
    Run Multiple Applications simultaneously FlowArchitecture Revolutionizing Deep Learning CPU-GPU Environment 10X20X30X40X50X60X70X SPEED WITH TYRONE KUBYTS™ CLIENT Compatible Workstations has a repository of : 50 containerized applications 100s of Containers CLOUD
  • 19.
    Tyrone KUBITS :Revolutionizing Deep Learning CPU-GPU Environment Run different applications simultaneously Check for Tyrone KUBITS Compatible Workstations Get access to over 100+ Containers on Tyrone KUBITS Cloud. High scalability Affordable price Has both GPU & CPU Optimized Containers Design a simple Workstation or Large Clusters with KUBITS technology. Talk to our experts & build the right workstation within your budget. KUBITS CLOUDCOMPATIBLE
  • 20.
    AGENDA 1 . AboutTyrone  World’s high performing AI platform system – A100  Get Development, Training, Inference in one  Era of Modern Mixed Workloads  Tyrone Kubyts™ 2. Word Embeddings  How Word embeddings create a context based relationship  How to create Word Embeddings 3. Sequence Modelling  Introduction of Deep learning in NLP  Overview on the model architecture to use 4. Advanced Language Models  Overview of the Language models  How are they created  Transformers  BERT , GP2 etc 5. NLP Attention Mechanism  Overview on attention Mechanism 6. Case Studies
  • 21.
    Word Embedding • WordEmbedding is a language modeling technique used for mapping words to vectors of real numbers. • It represents words or phrases in vector space with several dimensions. • Word embeddings can be generated using various methods like neural networks, co-occurrence matrix, probabilistic models, etc. • Word2Vec consists of models for generating word embedding. These models are shallow two layer neural networks having one input layer, one hidden layer and one output layer. Word2Vec utilizes two architectures
  • 22.
    CBOW – ContinuousBag of Words • CBOW model predicts the current word given context words within specific window. The input layer contains the context words and the output layer contains the current word. • The hidden layer contains the number of dimensions in which we want to represent current word present at the output layer.
  • 23.
    Skip Gram –Word Embeddings • Skip gram predicts the surrounding context words within specific window given current word. The input layer contains the current word and the output layer contains the context words. • The hidden layer contains the number of dimensions in which we want to represent current word present at the input layer.
  • 24.
    Advanced Language Modelsand Transformers ELMo ULMFit BERT Transformer
  • 25.
  • 26.
    BERT Architecture • Transformeris an attention-based architecture for NLP • Transformer composed of two parts: Encoding component and Decoding component • BERT is a multi-layer bidirectional Transformer encoder
  • 27.
  • 28.
  • 29.
    Q&A Session Hirdey Vikram Hirdey.vikram@netwebindia.com India(North) Niraj niraj@netwebindia.com India (South) Vivek vivek@netwebindia.com India (East) Navin navin@netwebindia.com India (West) Anupriya anupriya@netwebtech.com Singapore Arun arun@netwebtech.com UAE Agam agam@netwebtech.com Indonesia Contact our team if you have any further questions after this webinar ai@netwebtech.comTalk to our AI Experts