6. Data Ingestion:
root
|-- asin: string (nullable = true)
|-- helpful: array (nullable = true)
| |-- element: long (containsNull = true)
|-- overall: double (nullable = true)
|-- reviewText: string (nullable = true)
|-- reviewTime: string (nullable = true)
|-- reviewerID: string (nullable = true)
|-- reviewerName: string (nullable = true)
|-- summary: string (nullable = true)
|-- unixReviewTime: long (nullable = true)
{asin:u'0790734680',
helpful:[0, 0],
overall:4.0,
reviewText:u'Kevin Spacey always gives a
credible performance. Curious to knowhow close
story in book was followed.Overall I liked it.',
reviewTime:u'06 28, 2014', reviewerID:
u'A3MG14J7MXE9CC', reviewerName:u'George
McGarrity', summary:u'Serious murder mixed
with comedy', unixReviewTime:1403913600}
7. Batch Layer:
Collaborative Filtering Model
● MLlib currently supports
model-based
collaborative filtering,
● Used to predict missing
entries.
● Uses the alternating
least squares (ALS)
algorithm to learn latent
factors.
11. Challenges:
● Spark MLlib, train CF ALS, parameter
tuning
● Scaling providing recommendations to
2 people:
○ Batch: 2.1m users * 200k movies = 420
billion combinations
○ Streaming with caching: 2 users * 200k
movies = 400k
● Implementing Power bar®
○ Normalization
○ Consensus function
Powerbar®
12. About me
● Patrick Zheng
● MS Computer Science
● Alternative Drug Recommendation System
● Retrospective Drug Utilization Review System
● Drug Adherence Predictive Modeling
● Movie
● Basketball
● Hearthstone
13. Low latency real time computation base on user’s input:
Movie group relevance:
Movie group disagreement:
Consensus function: