SlideShare a Scribd company logo
Efficient Parallel Learning of Word2Vec
Jeroen B. P. Vuurens1, Carsten Eickhoff2, and Arjen P. de Vries3
1The Hague University of Applied Science
2ETH Zurich
3Radboud University Nijmegen
June 24, 2016
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 1 / 14
Word2Vec
Figure courtesy of T. Mikolov et al.
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 2 / 14
Word2Vec
Simple method for low-dimensional feature representation of words
Figure courtesy of T. Mikolov et al.
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 2 / 14
Word2Vec
Simple method for low-dimensional feature representation of words
Beneficial properties:
Unsupervised
Semantics-preserving (up to a point. . . )
Figure courtesy of T. Mikolov et al.
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 2 / 14
Word2Vec
Simple method for low-dimensional feature representation of words
Beneficial properties:
Unsupervised
Semantics-preserving (up to a point. . . )
Recently very popular
Figure courtesy of T. Mikolov et al.
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 2 / 14
More is more. . .
Figure courtesy of http://deepdist.com/
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 3 / 14
Parallel Training
Shared model θ
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
Parallel Training
Shared model θ
Parallel SGD threads
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
Parallel Training
Shared model θ
Parallel SGD threads
Draw a random training example xi
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
Parallel Training
Shared model θ
Parallel SGD threads
Draw a random training example xi
Acquire a lock on θ
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
Parallel Training
Shared model θ
Parallel SGD threads
Draw a random training example xi
Acquire a lock on θ
Read θ
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
Parallel Training
Shared model θ
Parallel SGD threads
Draw a random training example xi
Acquire a lock on θ
Read θ
Update θ ← (θ − α L(fθ(xi ), yi ))
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
Parallel Training
Shared model θ
Parallel SGD threads
Draw a random training example xi
Acquire a lock on θ
Read θ
Update θ ← (θ − α L(fθ(xi ), yi ))
Release lock
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
Parallel Training
Shared model θ
Parallel SGD threads
Draw a random training example xi
Acquire a lock on θ
Read θ
Update θ ← (θ − α L(fθ(xi ), yi ))
Release lock
Lots of waiting. . .
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
Hogwild!
Simply skip the locking:
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
Hogwild!
Simply skip the locking:
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
Hogwild!
Simply skip the locking:
Draw a random training example xi
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
Hogwild!
Simply skip the locking:
Draw a random training example xi
Read current state of θ
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
Hogwild!
Simply skip the locking:
Draw a random training example xi
Read current state of θ
Update θ ← (θ − α L(fθ(xi ), yi ))
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
Hogwild!
Simply skip the locking:
Draw a random training example xi
Read current state of θ
Update θ ← (θ − α L(fθ(xi ), yi ))
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
Parallel Word2Vec
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 6 / 14
Parallel Word2Vec
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 6 / 14
Parallel Word2Vec
Intel Xeon CPU E5-2698 v3, 32 cores
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 6 / 14
Parallel Word2Vec
Intel Xeon CPU E5-2698 v3, 32 cores
Original C implementation + Gensim
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 6 / 14
Hierarchical Softmax
Figure courtesy of X. Rong
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
Hierarchical Softmax
Binary Huffman tree
Figure courtesy of X. Rong
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
Hierarchical Softmax
Binary Huffman tree
V − 1 internal nodes
Figure courtesy of X. Rong
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
Hierarchical Softmax
Binary Huffman tree
V − 1 internal nodes
Each word w is represented by a number of binary decisions
Figure courtesy of X. Rong
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
Hierarchical Softmax
Binary Huffman tree
V − 1 internal nodes
Each word w is represented by a number of binary decisions
The tree’s top nodes are part of most paths
Figure courtesy of X. Rong
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
Zipf’s Law
Figure courtesy of http://wugology.com/
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 8 / 14
Cached Huffman Trees
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 9 / 14
Cached Huffman Trees
Cache the top c nodes in the tree
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 9 / 14
Cached Huffman Trees
Cache the top c nodes in the tree
Every thread works on their stale copy of these top nodes
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 9 / 14
Cached Huffman Trees
Cache the top c nodes in the tree
Every thread works on their stale copy of these top nodes
Update cache every u terms
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 9 / 14
Efficiency
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 10 / 14
Efficiency
Python/Cython implementation of cached Huffman trees
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 10 / 14
Efficiency
Python/Cython implementation of cached Huffman trees
Same problem at c = 0
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 10 / 14
Efficiency
Python/Cython implementation of cached Huffman trees
Same problem at c = 0
Significantly better performance at c = 31
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 10 / 14
Cache Size
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 11 / 14
Cache Size
Consistent improvements for all c ≤ 31
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 11 / 14
Cache Size
Consistent improvements for all c ≤ 31
Best results for 1 ≤ u ≤ 10
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 11 / 14
Cache Size
Consistent improvements for all c ≤ 31
Best results for 1 ≤ u ≤ 10
Too large choices of u degrade model quality
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 11 / 14
Effectiveness
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 12 / 14
Effectiveness
Stable model quality
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 12 / 14
Effectiveness
Stable model quality
Slight quality edge for Gensim implementation
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 12 / 14
Conclusion
Hierarchical Softmax scales badly beyond 4-8 nodes
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
Conclusion
Hierarchical Softmax scales badly beyond 4-8 nodes
Frequent memory accesses to top nodes
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
Conclusion
Hierarchical Softmax scales badly beyond 4-8 nodes
Frequent memory accesses to top nodes
Zipf’s Law
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
Conclusion
Hierarchical Softmax scales badly beyond 4-8 nodes
Frequent memory accesses to top nodes
Zipf’s Law
Caching few top nodes
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
Conclusion
Hierarchical Softmax scales badly beyond 4-8 nodes
Frequent memory accesses to top nodes
Zipf’s Law
Caching few top nodes
4x speed-up
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
Conclusion
Hierarchical Softmax scales badly beyond 4-8 nodes
Frequent memory accesses to top nodes
Zipf’s Law
Caching few top nodes
4x speed-up
Constant model quality
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
Conclusion
Hierarchical Softmax scales badly beyond 4-8 nodes
Frequent memory accesses to top nodes
Zipf’s Law
Caching few top nodes
4x speed-up
Constant model quality
Try it yourself: http://cythnn.github.io
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
Thank You!
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 14 / 14
Thank You!
j.b.p.vuurens@tudelft.nl
J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 14 / 14

More Related Content

Recently uploaded

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 

Recently uploaded (20)

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Efficient Parallel Learning of Word2Vec

  • 1. Efficient Parallel Learning of Word2Vec Jeroen B. P. Vuurens1, Carsten Eickhoff2, and Arjen P. de Vries3 1The Hague University of Applied Science 2ETH Zurich 3Radboud University Nijmegen June 24, 2016 J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 1 / 14
  • 2. Word2Vec Figure courtesy of T. Mikolov et al. J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 2 / 14
  • 3. Word2Vec Simple method for low-dimensional feature representation of words Figure courtesy of T. Mikolov et al. J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 2 / 14
  • 4. Word2Vec Simple method for low-dimensional feature representation of words Beneficial properties: Unsupervised Semantics-preserving (up to a point. . . ) Figure courtesy of T. Mikolov et al. J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 2 / 14
  • 5. Word2Vec Simple method for low-dimensional feature representation of words Beneficial properties: Unsupervised Semantics-preserving (up to a point. . . ) Recently very popular Figure courtesy of T. Mikolov et al. J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 2 / 14
  • 6. More is more. . . Figure courtesy of http://deepdist.com/ J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 3 / 14
  • 7. Parallel Training Shared model θ J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
  • 8. Parallel Training Shared model θ Parallel SGD threads J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
  • 9. Parallel Training Shared model θ Parallel SGD threads Draw a random training example xi J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
  • 10. Parallel Training Shared model θ Parallel SGD threads Draw a random training example xi Acquire a lock on θ J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
  • 11. Parallel Training Shared model θ Parallel SGD threads Draw a random training example xi Acquire a lock on θ Read θ J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
  • 12. Parallel Training Shared model θ Parallel SGD threads Draw a random training example xi Acquire a lock on θ Read θ Update θ ← (θ − α L(fθ(xi ), yi )) J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
  • 13. Parallel Training Shared model θ Parallel SGD threads Draw a random training example xi Acquire a lock on θ Read θ Update θ ← (θ − α L(fθ(xi ), yi )) Release lock J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
  • 14. Parallel Training Shared model θ Parallel SGD threads Draw a random training example xi Acquire a lock on θ Read θ Update θ ← (θ − α L(fθ(xi ), yi )) Release lock Lots of waiting. . . J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 4 / 14
  • 15. Hogwild! Simply skip the locking: J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
  • 16. Hogwild! Simply skip the locking: J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
  • 17. Hogwild! Simply skip the locking: Draw a random training example xi J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
  • 18. Hogwild! Simply skip the locking: Draw a random training example xi Read current state of θ J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
  • 19. Hogwild! Simply skip the locking: Draw a random training example xi Read current state of θ Update θ ← (θ − α L(fθ(xi ), yi )) J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
  • 20. Hogwild! Simply skip the locking: Draw a random training example xi Read current state of θ Update θ ← (θ − α L(fθ(xi ), yi )) J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 5 / 14
  • 21. Parallel Word2Vec J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 6 / 14
  • 22. Parallel Word2Vec J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 6 / 14
  • 23. Parallel Word2Vec Intel Xeon CPU E5-2698 v3, 32 cores J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 6 / 14
  • 24. Parallel Word2Vec Intel Xeon CPU E5-2698 v3, 32 cores Original C implementation + Gensim J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 6 / 14
  • 25. Hierarchical Softmax Figure courtesy of X. Rong J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
  • 26. Hierarchical Softmax Binary Huffman tree Figure courtesy of X. Rong J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
  • 27. Hierarchical Softmax Binary Huffman tree V − 1 internal nodes Figure courtesy of X. Rong J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
  • 28. Hierarchical Softmax Binary Huffman tree V − 1 internal nodes Each word w is represented by a number of binary decisions Figure courtesy of X. Rong J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
  • 29. Hierarchical Softmax Binary Huffman tree V − 1 internal nodes Each word w is represented by a number of binary decisions The tree’s top nodes are part of most paths Figure courtesy of X. Rong J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 7 / 14
  • 30. Zipf’s Law Figure courtesy of http://wugology.com/ J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 8 / 14
  • 31. Cached Huffman Trees J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 9 / 14
  • 32. Cached Huffman Trees Cache the top c nodes in the tree J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 9 / 14
  • 33. Cached Huffman Trees Cache the top c nodes in the tree Every thread works on their stale copy of these top nodes J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 9 / 14
  • 34. Cached Huffman Trees Cache the top c nodes in the tree Every thread works on their stale copy of these top nodes Update cache every u terms J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 9 / 14
  • 35. Efficiency J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 10 / 14
  • 36. Efficiency Python/Cython implementation of cached Huffman trees J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 10 / 14
  • 37. Efficiency Python/Cython implementation of cached Huffman trees Same problem at c = 0 J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 10 / 14
  • 38. Efficiency Python/Cython implementation of cached Huffman trees Same problem at c = 0 Significantly better performance at c = 31 J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 10 / 14
  • 39. Cache Size J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 11 / 14
  • 40. Cache Size Consistent improvements for all c ≤ 31 J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 11 / 14
  • 41. Cache Size Consistent improvements for all c ≤ 31 Best results for 1 ≤ u ≤ 10 J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 11 / 14
  • 42. Cache Size Consistent improvements for all c ≤ 31 Best results for 1 ≤ u ≤ 10 Too large choices of u degrade model quality J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 11 / 14
  • 43. Effectiveness J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 12 / 14
  • 44. Effectiveness Stable model quality J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 12 / 14
  • 45. Effectiveness Stable model quality Slight quality edge for Gensim implementation J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 12 / 14
  • 46. Conclusion Hierarchical Softmax scales badly beyond 4-8 nodes J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
  • 47. Conclusion Hierarchical Softmax scales badly beyond 4-8 nodes Frequent memory accesses to top nodes J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
  • 48. Conclusion Hierarchical Softmax scales badly beyond 4-8 nodes Frequent memory accesses to top nodes Zipf’s Law J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
  • 49. Conclusion Hierarchical Softmax scales badly beyond 4-8 nodes Frequent memory accesses to top nodes Zipf’s Law Caching few top nodes J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
  • 50. Conclusion Hierarchical Softmax scales badly beyond 4-8 nodes Frequent memory accesses to top nodes Zipf’s Law Caching few top nodes 4x speed-up J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
  • 51. Conclusion Hierarchical Softmax scales badly beyond 4-8 nodes Frequent memory accesses to top nodes Zipf’s Law Caching few top nodes 4x speed-up Constant model quality J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
  • 52. Conclusion Hierarchical Softmax scales badly beyond 4-8 nodes Frequent memory accesses to top nodes Zipf’s Law Caching few top nodes 4x speed-up Constant model quality Try it yourself: http://cythnn.github.io J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 13 / 14
  • 53. Thank You! J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 14 / 14
  • 54. Thank You! j.b.p.vuurens@tudelft.nl J. Vuurens et al. Efficient Parallel Learning of Word2Vec June 24, 2016 14 / 14