NLP補充

•Download as PPTX, PDF•

0 likes•96 views

NCUDSC

NLP

Technology

Word2vec by Google & GloVe by Stanford University
Word Embedding(詞嵌入)

Speech to Text: An overview
從傳統模型到End-to-End模型

Listen, Attend and Spell (LAS) Model
• 可被理解成 AM 與 LM 的結合
• AM 是圖中的 Encoder,LM是Decoder
• 用一個多層的 RNN 模型將聲音訊號編碼成模型的
隱狀態向量，可讓模型學到對輸入訊號更多層次的
理解
• 透過Attention機制將聲音訊號本身學出對每個不
同時間輸入的權重，讓模型更有效地捕捉輸出與輸
入局部資訊之間的關聯，並產生將當前輸入經過
attention 權重後的 context vector給予Decoder
去進一步生成預測識別字的機率分佈
揭密Google Cloud Speech API背後的關鍵技術

Bidirectional Encoder Representations from
Transformers (BERT)
• BERT-LARGE: 24 layers, 1024
dimensions, 16 self-attention head,
340M個參數,並使用維基百科和
BooksCorpus的語料庫，共有33億個單字
參與訓練
近年NLP界幾乎無處不在的架構
當然也是Google的驕傲

超大規模語言模型(GPT-3 & MT-NLG)
科技巨頭的軍備競賽
● Generative Pre-trained Transformer 3
: 由OpenAI開發，擅長生成讓人類能理解
的自然語言文本
● Megatron-Turing Natural Language
Generation:
由Microsoft和Nvidia共同開發，於今年
10月發表，是目前世界上最大的自然語言
模型

GPT-3應用場景 GitHub Copilot
感受自然語言模型的威力!
● Github Copilot是由微軟、GitHub
與 OpenAI共同打造提升開發效率
的智慧工具

大學生用 GPT-3 AI 寫假文登上 Hacker News榜首
然而，強大的自然語言模型也容易助長假消息的氾濫
● UC Berkeley的CS大學生Liam Porr使用
GPT-3生成了一篇標題為<Feeling
Unproductive? Maybe you should
stop overthinking> 的文章，在幾個小
時內吸引不少流量並登上了熱門榜首
● 雖然看似恐怖，不過對於需要嚴格邏輯
論證的文章而言，機器所生成的文本仍
存在很多缺陷，因此寫論文的不容易被
取代，但某些媒體就不好說了

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

NLP補充

1. Word2vec by Google & GloVe by Stanford University Word Embedding(詞嵌入)

2. Speech to Text: An overview 從傳統模型到End-to-End模型

3. Listen, Attend and Spell (LAS) Model • 可被理解成 AM 與 LM 的結合 • AM 是圖中的 Encoder,LM是Decoder • 用一個多層的 RNN 模型將聲音訊號編碼成模型的隱狀態向量，可讓模型學到對輸入訊號更多層次的理解 • 透過Attention機制將聲音訊號本身學出對每個不同時間輸入的權重，讓模型更有效地捕捉輸出與輸入局部資訊之間的關聯，並產生將當前輸入經過 attention 權重後的 context vector給予Decoder 去進一步生成預測識別字的機率分佈揭密Google Cloud Speech API背後的關鍵技術

4. Bidirectional Encoder Representations from Transformers (BERT) • BERT-LARGE: 24 layers, 1024 dimensions, 16 self-attention head, 340M個參數,並使用維基百科和 BooksCorpus的語料庫，共有33億個單字參與訓練近年NLP界幾乎無處不在的架構當然也是Google的驕傲

5. 超大規模語言模型(GPT-3 & MT-NLG) 科技巨頭的軍備競賽 ● Generative Pre-trained Transformer 3 : 由OpenAI開發，擅長生成讓人類能理解的自然語言文本 ● Megatron-Turing Natural Language Generation: 由Microsoft和Nvidia共同開發，於今年 10月發表，是目前世界上最大的自然語言模型

6. GPT-3應用場景 GitHub Copilot 感受自然語言模型的威力! ● Github Copilot是由微軟、GitHub 與 OpenAI共同打造提升開發效率的智慧工具

7. 大學生用 GPT-3 AI 寫假文登上 Hacker News榜首然而，強大的自然語言模型也容易助長假消息的氾濫 ● UC Berkeley的CS大學生Liam Porr使用 GPT-3生成了一篇標題為<Feeling Unproductive? Maybe you should stop overthinking> 的文章，在幾個小時內吸引不少流量並登上了熱門榜首 ● 雖然看似恐怖，不過對於需要嚴格邏輯論證的文章而言，機器所生成的文本仍存在很多缺陷，因此寫論文的不容易被取代，但某些媒體就不好說了

8. 按讚、訂閱、開啟小鈴鐺還有表單記得填！ Contact Us

9. National Central University

NLP補充

Recommended

Recommended

More Related Content

Featured

Featured (20)

NLP補充