2. Why?
Recommendation engines have become very
popular in the last decade with explosion of e-
commerce, advertising and dating sites, especially
on the music.
3. So, what will we do?
We are trying to build a music recommender with
Apache and Mlib. Because Mlib contains
collaborative filtering enables building
recommendation models from billion records.
Mlib also uses alternating least squares (ALS)
algorithm.
4. How will it work?
Depending on your post actions or
your interest (i.e. the last song
you’ve listen.) . The recommendation
engine will present other songs and
products, which are, related this
specific song that might interest to
you.
5. So, what is the collaborative
filtering?
CF, is bringing together personal opinions, the
same interests and it is a method of pairing
individuals who need similar types of information,
on a particular content or data. "This content
lovers, loved it those too." approach is goal.
For example, if two people have given the rating
similar for same films or books the system says
that “You sincerely are connected to each other.”
6. Why Mlib? Why ALS?
Spark Mllib library method can be scaled to contend
with a rich set of providing analytical data.
Its Alternating Least Squares algorithm for
Collaborative Filtering is best for a recommendation
engine.
Due to the nature of collaborative filtering is an
expensive operation because when a new user
preferences because it requires its model update.
Therefore, having a distributed calculation engine
such as Spark to perform model computation is a
real-world recommendation engine like the one we
will built.
7. What about dataset?
We will use famous music and radio website and
application of Last.fm.
It’s dataset contains <user , artist-mbid , artist-
name , total-plays> tuples ( for approx. 360.00
users) collected from Last.fm API.
8.
9. We have divided the project into two parts.
First one is collecting datasets.
Second part have three sections:
1. Starting the engine
2. Adding new ratings
3. Making recommendation