Feature Talk: Real-time Aggregations, Approximations, Similarities, and Recommendations at Scale using Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird
Talk Abstract: Starting with a live, interactive demo generating audience-specific recommendations, we'll dive deep into each of the key components including NiFi, Kafka, Stanford CoreNLP, Docker, Word2Vec, LDA, Twitter Algebird, Spark Streaming, SQL, ML, GraphX. As a bonus, we'll discuss the latest Netflix Recommendations Pipeline and related open source projects.
• Live, Interactive Recommendations Demo
• Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird (advancedspark.com)
• Types of Similarity
• Euclidean vs. Non-Euclidean Similarity
• Jaccard Similarity
• Cosine Similarity
• LogLikelihood Similarity
• Edit Distance
• Text-based Similarities and Analytics
• LDA Topic Extraction
• Similarity-based Recommendations
• Content-based, Item-to-Item (Amazon)
• Collaborative-based, User-to-Item (Netflix)
• Graph-based, Item-to-Item "Pathways" (Spotify)
• Aggregations, Approximations, and Similarities at Scale
• Twitter Algebird
• MinHash and Bucketing
• Locality Sensitive Hashing (LSH)
• CountMin Sketch
• Q & A
Speaker Bio: Chris Fregly is a Research Engineer @ Flux Capacitor AI in SF, an Apache Spark Contributor, and a Netflix Open Source Committer.
Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com.
Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.