ICML Talk on deep learning for music recommendationAloïs Gruson
We present a way to incorporate human knowledge in our acoustic model of music. We train a deep neural network to classify songs into playlists, and keep the generated embedding space to recommend songs and generate personalized radios. Try it for yourself : http://scarlett.fm
ICML Talk on deep learning for music recommendationAloïs Gruson
We present a way to incorporate human knowledge in our acoustic model of music. We train a deep neural network to classify songs into playlists, and keep the generated embedding space to recommend songs and generate personalized radios. Try it for yourself : http://scarlett.fm
Anghami: From Billions Of Streams To Better RecommendationsRamzi Karam
Anghami is the leading music streaming service in the MENA region. Our users listen to more than a century’s worth of music every single day, with an increasing number of streams coming from personalized recommendations. This talk presents a high level overview of the Anghami recommendations infrastructure, from the input data to personalized song and playlist recommendations. We discuss the different types of machine learning models used and where they are helpful, as well as how we go from models to serving better recommendations to users.
Generating Natural-Language Text with Neural NetworksJonathan Mugan
Automatic text generation enables computers to summarize text, to have conversations in customer-service and other settings, and to customize content based on the characteristics and goals of the human interlocutor. Using neural networks to automatically generate text is appealing because they can be trained through examples with no need to manually specify what should be said when. In this talk, we will provide an overview of the existing algorithms used in neural text generation, such as sequence2sequence models, reinforcement learning, variational methods, and generative adversarial networks. We will also discuss existing work that specifies how the content of generated text can be determined by manipulating a latent code. The talk will conclude with a discussion of current challenges and shortcomings of neural text generation.
Presented at the Machine Learning class at Chalmers, Gothenburg.
http://www.cse.chalmers.se/research/lab/courses.php?coid=9
Trying to connect their theoretical machine learning class with industry examples.
I give an overview of current state of natural language analysis using machine learning algorithms. #naturallanguage
#machinelearning #artificianintelligence
Slides from a talk at a meetup organized by SF Scala at Spotify's San Francisco office. The slides present details of playlist recommendations at Spotify and how Spotify uses Scalding to develop robust and reliable pipelines to generate these recommendations.
Meetup details: http://www.meetup.com/SF-Scala/events/224430674/
This is a fairly rough presentation targeted at helping managers understand various Agile Engineering practices:, CI, Pair Programming, TDD, and the Mikado Method. This consists of a lot of game instructions - some I created (like Test Driven Tinkering and Pair Poetry) to others I lifted and modified some (like Lego CI and Agile Jenga).
The details of the Creative Commons license applied to this deck is detailed on my blog: http://paulmboos.com/about/creative-commons-license/
Recommendation Subsystem - Museum RadarPanos Gemos
Presentation of the Recommendation Subsystem that was built for the Museum Radar application during the 2nd ELLAK Summer Code Camp at Harokopeio University. This is the same presentation that was used during the Presentations Day on 29/4/2015.
Deep learning is having a profound impact on AI applications. With the future of neural network-inspired computing in mind, re:Invent is hosting the first ever Deep Learning Summit. Designed for developers to learn about the latest in deep learning research and emerging trends, attendees will hear from industry thought leaders—members of the academic and venture capital communities—who will share their perspectives in 30-minute Lightning Talks.
The Summit will be held on Thursday, November 30th at the Venetian from 1-5pm.
The Deep Learning Revolution - Terrence Sejnowski, The Salk Institute for Biological Studies
Eye, Robot: Computer Vision and Autonomous Robotics - Aaron Ames & Pietro Perona, California Institute of Technology
Exploiting the Power of Language - Alexander Smola, Amazon Web Services
Reducing Supervision: Making More with Less - Martial Herbert, Carnegie Mellon University
Learning Where to Look in Video - Kristen Grauman, University of Texas
Look, Listen, Learn: The Intersection of Vision and Sound - Antonio Torralba, MIT
Investing in the Deep Learning Future - Matt Ocko, Data Collective Venture Capital
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.
Presentation by Karen Cariani, WGBH Media Library and Archives Senior Director and Project Director for the American Archive of Public Broadcasting at the 2017 Association of Moving Image Archivists Conference in New Orleans.
10 best practices and design principles to create effective dashboards using Tableau. View the webinar video recording to hear the narrated version of the good, the bad…and the downright ugly in dashboard design: http://www.senturus.com/resources/10-best-practices-for-tableau-dashboard-design/.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Anghami: From Billions Of Streams To Better RecommendationsRamzi Karam
Anghami is the leading music streaming service in the MENA region. Our users listen to more than a century’s worth of music every single day, with an increasing number of streams coming from personalized recommendations. This talk presents a high level overview of the Anghami recommendations infrastructure, from the input data to personalized song and playlist recommendations. We discuss the different types of machine learning models used and where they are helpful, as well as how we go from models to serving better recommendations to users.
Generating Natural-Language Text with Neural NetworksJonathan Mugan
Automatic text generation enables computers to summarize text, to have conversations in customer-service and other settings, and to customize content based on the characteristics and goals of the human interlocutor. Using neural networks to automatically generate text is appealing because they can be trained through examples with no need to manually specify what should be said when. In this talk, we will provide an overview of the existing algorithms used in neural text generation, such as sequence2sequence models, reinforcement learning, variational methods, and generative adversarial networks. We will also discuss existing work that specifies how the content of generated text can be determined by manipulating a latent code. The talk will conclude with a discussion of current challenges and shortcomings of neural text generation.
Presented at the Machine Learning class at Chalmers, Gothenburg.
http://www.cse.chalmers.se/research/lab/courses.php?coid=9
Trying to connect their theoretical machine learning class with industry examples.
I give an overview of current state of natural language analysis using machine learning algorithms. #naturallanguage
#machinelearning #artificianintelligence
Slides from a talk at a meetup organized by SF Scala at Spotify's San Francisco office. The slides present details of playlist recommendations at Spotify and how Spotify uses Scalding to develop robust and reliable pipelines to generate these recommendations.
Meetup details: http://www.meetup.com/SF-Scala/events/224430674/
This is a fairly rough presentation targeted at helping managers understand various Agile Engineering practices:, CI, Pair Programming, TDD, and the Mikado Method. This consists of a lot of game instructions - some I created (like Test Driven Tinkering and Pair Poetry) to others I lifted and modified some (like Lego CI and Agile Jenga).
The details of the Creative Commons license applied to this deck is detailed on my blog: http://paulmboos.com/about/creative-commons-license/
Recommendation Subsystem - Museum RadarPanos Gemos
Presentation of the Recommendation Subsystem that was built for the Museum Radar application during the 2nd ELLAK Summer Code Camp at Harokopeio University. This is the same presentation that was used during the Presentations Day on 29/4/2015.
Deep learning is having a profound impact on AI applications. With the future of neural network-inspired computing in mind, re:Invent is hosting the first ever Deep Learning Summit. Designed for developers to learn about the latest in deep learning research and emerging trends, attendees will hear from industry thought leaders—members of the academic and venture capital communities—who will share their perspectives in 30-minute Lightning Talks.
The Summit will be held on Thursday, November 30th at the Venetian from 1-5pm.
The Deep Learning Revolution - Terrence Sejnowski, The Salk Institute for Biological Studies
Eye, Robot: Computer Vision and Autonomous Robotics - Aaron Ames & Pietro Perona, California Institute of Technology
Exploiting the Power of Language - Alexander Smola, Amazon Web Services
Reducing Supervision: Making More with Less - Martial Herbert, Carnegie Mellon University
Learning Where to Look in Video - Kristen Grauman, University of Texas
Look, Listen, Learn: The Intersection of Vision and Sound - Antonio Torralba, MIT
Investing in the Deep Learning Future - Matt Ocko, Data Collective Venture Capital
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.
Presentation by Karen Cariani, WGBH Media Library and Archives Senior Director and Project Director for the American Archive of Public Broadcasting at the 2017 Association of Moving Image Archivists Conference in New Orleans.
10 best practices and design principles to create effective dashboards using Tableau. View the webinar video recording to hear the narrated version of the good, the bad…and the downright ugly in dashboard design: http://www.senturus.com/resources/10-best-practices-for-tableau-dashboard-design/.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
7. Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
ML
Content
User
8. Product Personalization
• Machine Learning does not trump a bad idea.
• Idea -> Data Driven Product Development -> ML
(More like design than coding)
ML
Content
User
9. Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
• News, Blogs, NLP
10. Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(http://musicmachinery.com/2014/02/10/gender-
specific-listening/)
11. Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(http://musicmachinery.com/2014/02/13/age-specific-
listening/)
12. Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
• News, Blogs, NLP
• Manually tag attributes
• Curation
13. Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(latimes.com)
14. Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(https://research.google.com/bigpicture/music/)
15. Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
(http://www.theverge.com/2012/3/18/2882372/netflix-recommended-genres-list)
16. Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
• News, Blogs, NLP
• Manually tag attributes
• Curation
• CF
18. Recommendation Systems
• Predict user response to options.
• Rich field: Matrix completion, ranking, text models,
latent factor models.
• Several conferences annually. RecSys, NIPS, ICML etc
• Industry researchers include NFLX, GOOG, MS and
more…
19. Similarity
Our problem is to figure out how similar two
items are.
Mathematically, this means modeling a function
Similarity(x,y) for all users and items, if possible.
20. Collaborative Filtering
Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!
Then you should check out
track P!
Nice! Btw try track T!
Model you based on songs you played…
Predict your future based on similar users…
Millions of users and billions of streams…
…. so there is someone like you out there
23. Matrix Completion
Matrix Completion. A matrix expresses a system. We model the
data in the form of a matrix. For example, play counts for all songs
and all users could be:
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Call Me Maybe
Esh
Esh listened to call me maybe once…
⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn
24. Matrix Completion is well studied …
Start with random vectors around the origin. Run alternating least
squares or gradient descent or stochastic gradient descent… All this
is Hadoopable™.
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Call Me Maybe
Esh
Esh listened to call me maybe once…
⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn
28. Language Models
• Language models work well too. For example,
a playlist could be considered as a document
and you could learn the latent vectors for tracks
(words).
• Then represent a User as a linear combination
of their Tracks.
35. Tools of the trade
• Build models in Python.
• Jobs in Scalding + Luigi ( https://github.com/spotify/luigi )
• Storm for real time.
• In house RPC for serving requests.
36. General Tips
• Analyze, prototype and then build.
• Simpler algorithms are easier to test than harder ones.
• Data Science is more art than science. Employthe laugh test of
evaluating your results.
37. Join the band!
• Machine Learning, Data & Backend Gigs.
• Now touring in New York, Boston & Stockholm!