2021 04-20 apache arrow and its impact on the database industry.pptx
KD_MB_MW_poster
1. GENERAL RECOMMENDATION SYSTEM APPLIED TO
MULTI-ATTRIBUTE PRODUCTION METADATA
KURT DACOSTA, MICHAEL BARONE & MATTHEW WOOLHOUSE
dacostak@mcmaster.ca, baronem@mcmaster.ca, woolhouse@mcmaster.ca
1. OBJECTIVES
Music recommendation algorithms are a unique
challenge which provide potential gains for both
academic and private sectors. Current techniques
apply machine learning due to their speed and
potential accuracy (Boriach, Chandola, & Kumar,
2008). We develop Cadence, a general recommen-
dation algorithm that:
1. Processes categorical production metadata.
2. Recommends using production metadata, a
feature largely unexamined in previous aca-
demic literature.
3. Initial state defined by a set of seed tracks.
4. Considers cognitive aspects such as mem-
ory, recency, feature preference, and
secondary-artist preference.
2. ALGORITHM
Seed Tracks
Generate
Finger-
print
Recommend
Production
Database
Like
Dislike
Figure 2. Algorithm Schema
Cadence builds a neural network to analyze a
user’s preferences:
(a) Seed tracks are collected from a user’s play
history; Table 1.
(b) Play history is analyzed to produce feature
values; Equation 1 and Table 2.
(c) Feature values are converted into a
Listening-Fingerprint; Equation 2, Equation
3 and Table 3.
(d) Fingerprint is correlated with production
data of new tracks. Tracks with highest coef-
ficient are recommended.
(e) If recommendation liked, feature values are
added to play history, and values are up-
dated.
(f) If recommendation disliked, feature values
are supressed and new recommendation
generated.
REFERENCES
Boriach, S., Chandola, V., & Kumar, V. (2008).
Similarity measures for categorical data: A
comparative evaluation. SIAM, 52, 243-254.
Ebbinghaus, H. (1885). Memory: A contribution to
experimental psychology. Columbia Univer-
sity.
6. FUTURE RESEARCH
Future work should focus on algorithm sensitivity. This can be most effectively solved through:
1. Enchanced sample of production metadata.
2. Test algorithm efficacy on real user consumption data.
3. Incorporate a time-based loss function with respect to recency feature of algorithm.
4. Weigh repeated song plays more strongly than novel songs.
ACKNOWLEDGEMENTS
Supported by SSHRC Insight Development Grant
(430-2012-0835), the Arts Research Board, and the
Office VP Research, McMaster University.
5. DATA & REQUIREMENTS
Data:
• Algorithm tested using data collected from
a project developed from Rutgers Informa-
tion Visualization graduate course.
• Dataset contains ca. 2000 songs with pro-
ducer and writer information.
Algorithm Requirements:
• Python 2.7.
• Modules: json, random, operator, numpy,
sys.
4. FORGETTING CURVE
The forgetting curve is a psychological phe-
nomenon which illustrates the effect of memory,
learning, and retention over time (Ebbinghaus,
1885). The algorithm preferences more recent
tracks in the fingerprint, in a manner similar to
the forgetting curve.
3. LISTENING FINGERPRINT
Notation & Equations Exemplar
T : Set of seed tracks for user.
P : Set of people found in T.
R : Set of roles found in T.
Tk(p) : Set of tracks where p assumes role k.
Wk : Role weighting for role k.
Ck(p) : Contribution of person p to role k in T.
Rk(t) : Set of people who assume role k in t.
Person-Role Contribution Equation
Ck(p) =
1
|T|
t∈Tk(p)
1
|Rk(t)|
(1)
Role-Weighting Equation
Wk = S{Ck(p)|p∈P } (2)
Listening-Fingerpring Equation
F = {f | ∀p ∈ P, f =
k∈R
Wk ∗ Ck(p)} (3)
Table 1. Seed User History
Track History Producer Writer
1) We Made You Dr.Dre, Eminem Eminem
2) What’s My Name? Dr.Dre Snoop Dogg
3) Still D.R.E. Dr.Dre, Mel-Man Dr.Dre, Jay-Z
Table 2. Person-Role Contributions
Person Artist Producer Writer
Dr.Dre 1.0 2.0 0.5
Eminem 1.0 0.5 0.5
Snoop Dogg 1.0 0.0 1.0
Jay-Z 0.0 0.0 0.5
Mel-Man 0.0 0.5 0.0
Table 3. Listening Fingerprint
Person Value
Eminem 0.75
Dr.Dre 0.43
Snoop Dogg 0.125
Jay-Z 0.03
Mel-Man 0.03