SlideShare a Scribd company logo
Approximate nearest
neighbors & vector
models
I’m Erik
• @fulhack
• Author of Annoy, Luigi
• Currently CTO of Better
• Previously 5 years at Spotify
What’s nearest
neighbor(s)
• Let’s say you have a bunch of points
Grab a bunch of points
5 nearest neighbors
20 nearest neighbors
100 nearest neighbors
…But what’s the point?
• vector models are everywhere
• lots of applications (language processing,
recommender systems, computer vision)
MNIST example
• 28x28 = 784-dimensional dataset
• Define distance in terms of pixels:
MNIST neighbors
…Much better approach
1. Start with high dimensional data
2. Run dimensionality reduction to 10-1000 dims
3. Do stuff in a small dimensional space
Deep learning for food
• Deep model trained on a GPU on 6M random pics
downloaded from Yelp
156x156x32
154x154x32
152x152x32
76x76x64
74x74x64
72x72x64
36x36x128
34x34x128
32x32x128
16x16x256
14x14x256
12x12x256
6x6x512
4x4x512
2x2x512
2048
2048
128
1244
3x3 convolutions
2x2 maxpool
fully
connected
with dropout
bottleneck
layer
Distance in smaller space
1. Run image through the network
2. Use the 128-dimensional bottleneck layer as an
item vector
3. Use cosine distance in the reduced space
Nearest food pics
Vector methods for text
• TF-IDF (old) – no dimensionality reduction
• Latent Semantic Analysis (1988)
• Probabilistic Latent Semantic Analysis (2000)
• Semantic Hashing (2007)
• word2vec (2013), RNN, LSTM, …
Represent documents and/or
words as f-dimensional vector
Latentfactor1
Latent factor 2
banana
apple
boat
Vector methods for
collaborative filtering
• Supervised methods: See everything from the
Netflix Prize
• Unsupervised: Use NLP methods
CF vectors – examplesIPMF item item:
P(i ! j) = exp(bT
j bi)/Zi =
exp(bT
j bi)
P
k exp(bT
k bi)
VECTORS:
pui = aT
u bi
simij = cos(bi, bj) =
bT
i bj
|bi||bj|
O(f)
i j simi,j
2pac 2pac 1.0
2pac Notorious B.I.G. 0.91
2pac Dr. Dre 0.87
2pac Florence + the Machine 0.26
Florence + the Machine Lana Del Rey 0.81
IPMF item item MDS:
P(i ! j) = exp(bT
j bi)/Zi =
exp( |bj bi|
2
)
P
k exp( |bk bi|
2
)
simij = |bj bi|
2
(u, i, count)
@L
Geospatial indexing
• Ping the world: https://github.com/erikbern/ping
• k-NN regression using Annoy
Nearest neighbors the
brute force way
• we can always do an exhaustive search to find the
nearest neighbors
• imagine MySQL doing a linear scan for every
query…
Using word2vec’s brute
force search
$ time echo -e "chinese rivernEXITn" | ./distance GoogleNews-
vectors-negative300.bin	
!
Qiantang_River		 0.597229	
Yangtse		 0.587990	
Yangtze_River		 0.576738	
lake		 0.567611	
rivers		 0.567264	
creek		 0.567135	
Mekong_river		 0.550916	
Xiangjiang_River		 0.550451	
Beas_river		 0.549198	
Minjiang_River		 0.548721	
real	2m34.346s	
user	1m36.235s	
sys	0m16.362s
Introducing Annoy
• https://github.com/spotify/annoy
• mmap-based ANN library
• Written in C++, with Python and R bindings
• 585 stars on Github
Using Annoy’s search
$ time echo -e "chinese rivernEXITn" | python
nearest_neighbors.py ~/tmp/word2vec/GoogleNews-vectors-
negative300.bin 100000	
Yangtse	 0.907756	
Yangtze_River	 0.920067	
rivers	 0.930308	
creek	 0.930447	
Mekong_river	 0.947718	
Huangpu_River	 0.951850	
Ganges	 0.959261	
Thu_Bon	 0.960545	
Yangtze	 0.966199	
Yangtze_river	 0.978978	
real	0m0.470s	
user	0m0.285s	
sys	0m0.162s
Using Annoy’s search
$ time echo -e "chinese rivernEXITn" | python
nearest_neighbors.py ~/tmp/word2vec/GoogleNews-vectors-
negative300.bin 1000000	
Qiantang_River	 0.897519	
Yangtse	 0.907756	
Yangtze_River	 0.920067	
lake	 0.929934	
rivers	 0.930308	
creek	 0.930447	
Mekong_river	 0.947718	
Xiangjiang_River	 0.948208	
Beas_river	 0.949528	
Minjiang_River	 0.950031	
real	0m2.013s	
user	0m1.386s	
sys	0m0.614s
(performance)
1. Building an Annoy
index
Start with the point set
Split it in two halves
Split again
Again…
…more iterations later
Side note: making trees
small
• Split until K items in each leaf (K~100)
• Takes (n/K) memory instead of n
Binary tree
2. Searching
Nearest neighbors
Searching the tree
Problemo
• The point that’s the closest isn’t necessarily in the
same leaf of the binary tree
• Two points that are really close may end up on
different sides of a split
• Solution: go to both sides of a split if it’s close
Trick 1: Priority queue
• Traverse the tree using a priority queue
• sort by min(margin) for the path from the root
Trick 2: many trees
• Construct trees randomly many times
• Use the same priority queue to search all of them at
the same time
heap + forest = best
• Since we use a priority queue, we will dive down
the best splits with the biggest distance
• More trees always helps!
• Only constraint is more trees require more RAM
Annoy query structure
1. Use priority queue to search all trees until we’ve
found k items
2. Take union and remove duplicates (a lot)
3. Compute distance for remaining items
4. Return the nearest n items
Find candidates
Take union of all leaves
Compute distances
Return nearest neighbors
“Curse of dimensionality”
Are we screwed?
• Would be nice if the data is has a much smaller
“intrinsic dimension”!
Improving the algorithm
Queries/s
1-NN accuracy
more accurate
faster
• https://github.com/erikbern/ann-benchmarks
ann-benchmarks
perf/accuracy tradeoffs
Queries/s
1-NN accuracy
search more nodes
more trees
Things that work
• Smarter plane splitting
• Priority queue heuristics
• Search more nodes than number of results
• Align nodes closer together
Things that don’t work
• Use lower-precision arithmetic
• Priority queue by other heuristics (number of trees)
• Precompute vector norms
Things for the future
• Use a optimization scheme for tree building
• Add more distance functions (eg. edit distance)
• Use a proper KV store as a backend (eg. LMDB) to
support incremental adds, out-of-core, arbitrary
keys: https://github.com/Houzz/annoy2
Thanks!
• https://github.com/spotify/annoy
• https://github.com/erikbern/ann-benchmarks
• https://github.com/erikbern/ann-presentation
• erikbern.com
• @fulhack

More Related Content

What's hot

CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
Vidhya Murali
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
Alluxio, Inc.
 
Bloom filters
Bloom filtersBloom filters
Bloom filters
Devesh Maru
 
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Emanuele Chioso
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
Erik Bernhardsson
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
Chris Johnson
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
Chris Johnson
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
Aleksandr Kuboskin, CFA
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Understanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningUnderstanding Feature Space in Machine Learning
Understanding Feature Space in Machine Learning
Alice Zheng
 
Building Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with Python
Timothy Spann
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
Neo4j
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
Martin Traverso
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
leopauly
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
OVHcloud
 

What's hot (20)

CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
 
Bloom filters
Bloom filtersBloom filters
Bloom filters
 
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Understanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningUnderstanding Feature Space in Machine Learning
Understanding Feature Space in Machine Learning
 
Building Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with Python
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 

Viewers also liked

김병관 성공캠프 SNS팀 자원봉사 후기
김병관 성공캠프 SNS팀 자원봉사 후기김병관 성공캠프 SNS팀 자원봉사 후기
김병관 성공캠프 SNS팀 자원봉사 후기
Harns (Nak-Hyoung) Kim
 
Developing Success in Mobile with Unreal Engine 4 | David Stelzer
Developing Success in Mobile with Unreal Engine 4 | David StelzerDeveloping Success in Mobile with Unreal Engine 4 | David Stelzer
Developing Success in Mobile with Unreal Engine 4 | David Stelzer
Jessica Tams
 
영상 데이터의 처리와 정보의 추출
영상 데이터의 처리와 정보의 추출영상 데이터의 처리와 정보의 추출
영상 데이터의 처리와 정보의 추출
동윤 이
 
Behavior Tree in Unreal engine 4
Behavior Tree in Unreal engine 4Behavior Tree in Unreal engine 4
Behavior Tree in Unreal engine 4
Huey Park
 
NDC16 스매싱더배틀 1년간의 개발일지
NDC16 스매싱더배틀 1년간의 개발일지NDC16 스매싱더배틀 1년간의 개발일지
NDC16 스매싱더배틀 1년간의 개발일지
Daehoon Han
 
게임회사 취업을 위한 현실적인 전략 3가지
게임회사 취업을 위한 현실적인 전략 3가지게임회사 취업을 위한 현실적인 전략 3가지
게임회사 취업을 위한 현실적인 전략 3가지
Harns (Nak-Hyoung) Kim
 
Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4
동석 김
 
Deep learning as_WaveExtractor
Deep learning as_WaveExtractorDeep learning as_WaveExtractor
Deep learning as_WaveExtractor
동윤 이
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
Erik Bernhardsson
 
Profiling - 실시간 대화식 프로파일러
Profiling - 실시간 대화식 프로파일러Profiling - 실시간 대화식 프로파일러
Profiling - 실시간 대화식 프로파일러
Heungsub Lee
 
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012Esun Kim
 
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
Sumin Byeon
 
Re:Zero부터 시작하지 않는 오픈소스 개발
Re:Zero부터 시작하지 않는 오픈소스 개발Re:Zero부터 시작하지 않는 오픈소스 개발
Re:Zero부터 시작하지 않는 오픈소스 개발
Chris Ohk
 
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
Imseong Kang
 
레퍼런스만 알면 언리얼 엔진이 제대로 보인다
레퍼런스만 알면 언리얼 엔진이 제대로 보인다레퍼런스만 알면 언리얼 엔진이 제대로 보인다
레퍼런스만 알면 언리얼 엔진이 제대로 보인다
Lee Dustin
 
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
Sumin Byeon
 
Docker
DockerDocker
Docker
Huey Park
 
Online game server on Akka.NET (NDC2016)
Online game server on Akka.NET (NDC2016)Online game server on Akka.NET (NDC2016)
Online game server on Akka.NET (NDC2016)
Esun Kim
 
8년동안 테라에서 배운 8가지 교훈
8년동안 테라에서 배운 8가지 교훈8년동안 테라에서 배운 8가지 교훈
8년동안 테라에서 배운 8가지 교훈
Harns (Nak-Hyoung) Kim
 
버텍스 셰이더로 하는 머리카락 애니메이션
버텍스 셰이더로 하는 머리카락 애니메이션버텍스 셰이더로 하는 머리카락 애니메이션
버텍스 셰이더로 하는 머리카락 애니메이션
동석 김
 

Viewers also liked (20)

김병관 성공캠프 SNS팀 자원봉사 후기
김병관 성공캠프 SNS팀 자원봉사 후기김병관 성공캠프 SNS팀 자원봉사 후기
김병관 성공캠프 SNS팀 자원봉사 후기
 
Developing Success in Mobile with Unreal Engine 4 | David Stelzer
Developing Success in Mobile with Unreal Engine 4 | David StelzerDeveloping Success in Mobile with Unreal Engine 4 | David Stelzer
Developing Success in Mobile with Unreal Engine 4 | David Stelzer
 
영상 데이터의 처리와 정보의 추출
영상 데이터의 처리와 정보의 추출영상 데이터의 처리와 정보의 추출
영상 데이터의 처리와 정보의 추출
 
Behavior Tree in Unreal engine 4
Behavior Tree in Unreal engine 4Behavior Tree in Unreal engine 4
Behavior Tree in Unreal engine 4
 
NDC16 스매싱더배틀 1년간의 개발일지
NDC16 스매싱더배틀 1년간의 개발일지NDC16 스매싱더배틀 1년간의 개발일지
NDC16 스매싱더배틀 1년간의 개발일지
 
게임회사 취업을 위한 현실적인 전략 3가지
게임회사 취업을 위한 현실적인 전략 3가지게임회사 취업을 위한 현실적인 전략 3가지
게임회사 취업을 위한 현실적인 전략 3가지
 
Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4
 
Deep learning as_WaveExtractor
Deep learning as_WaveExtractorDeep learning as_WaveExtractor
Deep learning as_WaveExtractor
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
 
Profiling - 실시간 대화식 프로파일러
Profiling - 실시간 대화식 프로파일러Profiling - 실시간 대화식 프로파일러
Profiling - 실시간 대화식 프로파일러
 
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
자동화된 소스 분석, 처리, 검증을 통한 소스의 불필요한 #if - #endif 제거하기 NDC2012
 
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
 
Re:Zero부터 시작하지 않는 오픈소스 개발
Re:Zero부터 시작하지 않는 오픈소스 개발Re:Zero부터 시작하지 않는 오픈소스 개발
Re:Zero부터 시작하지 않는 오픈소스 개발
 
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
NDC17 게임 디자이너 커리어 포스트모템: 8년, 3개의 회사, 4개의 게임
 
레퍼런스만 알면 언리얼 엔진이 제대로 보인다
레퍼런스만 알면 언리얼 엔진이 제대로 보인다레퍼런스만 알면 언리얼 엔진이 제대로 보인다
레퍼런스만 알면 언리얼 엔진이 제대로 보인다
 
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
 
Docker
DockerDocker
Docker
 
Online game server on Akka.NET (NDC2016)
Online game server on Akka.NET (NDC2016)Online game server on Akka.NET (NDC2016)
Online game server on Akka.NET (NDC2016)
 
8년동안 테라에서 배운 8가지 교훈
8년동안 테라에서 배운 8가지 교훈8년동안 테라에서 배운 8가지 교훈
8년동안 테라에서 배운 8가지 교훈
 
버텍스 셰이더로 하는 머리카락 애니메이션
버텍스 셰이더로 하는 머리카락 애니메이션버텍스 셰이더로 하는 머리카락 애니메이션
버텍스 셰이더로 하는 머리카락 애니메이션
 

Similar to Approximate nearest neighbor methods and vector models – NYC ML meetup

Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
MLconf
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
MapR Technologies
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
MapR Technologies
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
MapR Technologies
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Yan Xu
 
Deep learning trends
Deep learning trendsDeep learning trends
Deep learning trends
준호 김
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
huguk
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
Pranav Ainavolu
 
Paris Data Geeks
Paris Data GeeksParis Data Geeks
Paris Data Geeks
MapR Technologies
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Databricks
 
Machine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMachine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspective
Marijn van Zelst
 
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Codemotion
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
Databricks
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Neelabha Pant
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
Zachary Thomas
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancements
Chitta Ranjan
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
Ted Dunning
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
Ted Dunning
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
ssuserf583ac
 

Similar to Approximate nearest neighbor methods and vector models – NYC ML meetup (20)

Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Deep learning trends
Deep learning trendsDeep learning trends
Deep learning trends
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Paris Data Geeks
Paris Data GeeksParis Data Geeks
Paris Data Geeks
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Machine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMachine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspective
 
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
Evolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancementsEvolution of Deep Learning and new advancements
Evolution of Deep Learning and new advancements
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 

Recently uploaded

Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
PIMR BHOPAL
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
mahaffeycheryld
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
Kamal Acharya
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 

Recently uploaded (20)

Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 

Approximate nearest neighbor methods and vector models – NYC ML meetup