Get Raw Data Clean Data Train Model
Evaluate
Model
Feedback
ex) Tokenizing,
Remove Stop Words,
POS Tagging
ex) Classification,
CNN, RNN, LSTM
ex) Confusion Matrix,
Chi-squared test,
Cross-Validation
RDB
Excel
: https://support.office.com/en-ie/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3
CSV
Delimiters, encoding
CSV
CSV
Delimiters, encoding
CSV
CSV
CF Algorithm
?
?
Item1 Item2 Item3
Alice 1 0 1
Tom 0 1 1
Mary 0 1 0
: https://medium.com/radon-dev/item-item-collaborative-filtering-with-binary-or-unary-data-e8f0b465b2c3
?
: https://www.manning.com/books/practical-recommender-systems
?
Event Score
Click 10
Cart 50
Buy 100
?
Users Product Event Score Timestamp
Alice P1 Click 10 2018-10-01
Bob P2 Buy 100 2018-09-05
Mary P2 Click 10 2018-10-04
ex) Rating = Event * (1 / (Days_Since % 365.2425))
10 * (1 / (7 % 365.2425)
= 1.4285714286
100 * (1 / (34 % 365.2425))

= 2.9411764706
10 * (1 / (4% 365.2425))

= 2.5
?
Users Product Rating
Alice P1 1.42
Bob P2 2.94
Mary P2 2.5
Matrix Singular Value Decomposition
: https://www.manning.com/books/practical-recommender-systems
?
df %>% glimpse
Variables: 8
$ account_id <int> 0, 0, 10xxx, 10xxx, 10xxx, 102xxx, 10xxx, 102xxx, 10...
$ product_id <int> 3xxxxxx, 32xxxxx, 18xxxx, 12xxxx, 27xxxx, 43xxx...
$ buy <dbl> 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,...
$ cart <dbl> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ click <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1,...
$ wish <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ days_since <int> 253, 57, 723, 1038, 62, 278, 84, 81, 80, 73, 10, 14, 123...
$ rating <dbl> 1.0748214, 1.0748214, 0.2795189, 0.3251874, 0.4838710, 0...
?
IQR = Q3 - Q1
25% 75%
Q1 Q3
…
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row
ratings = spark.read.parquet('hdfs:///data/implicit-rating/2018-02-17')
(training, test) = ratings.randomSplit([0.8, 0.2])
als = ALS(maxIter=5, regParam=0.01, userCol="account_id", itemCol="product_id", ratingCol="rating",
coldStartStrategy="drop", implicitPrefs=True)
model = als.fit(training)
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating", predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("RMSE : " + str(rmse))
model.save(hdfs:///data/als/model/2018-02-19')
: https://github.com/jamenlong/ALS_expected_percent_rank_cv
Get Raw Data Clean Data Train Model
Evaluate
Model
Feedback
ex) Tokenizing,
Remove Stop Words,
POS Tagging
ex) Classification,
CNN, RNN, LSTM
ex) Confusion Matrix,
Chi-squared test,
Cross-Validation
: https://blogs.rstudio.com/tensorflow/posts/2018-09-26-embeddings-recommender/
Get Raw Data Clean Data Train Model
Evaluate
Model
Feedback
ex) Tokenizing,
Remove Stop Words,
POS Tagging
ex) Classification,
CNN, RNN, LSTM
ex) Confusion Matrix,
Chi-squared test,
Cross-Validation
Bipartite Graph
: https://ko.wikipedia.org/wiki/ _
: https://flink.apache.org
Neo4J
(product1)<-[:hasProduct]-()-[:Bought]-(p2:Account)-[:Bought]-()-[:hasProduct]-
>(product2) WITH p1,p2, count(product1) as cnt, collect(product1) as SharedItems,
product2 WHERE not((p1)-[:Bought]-()-[:hasProduct]->(product2)) AND cnt > 2 RETURN
distinct product2 LIMIT 100
• + 

• 

• Mecab (RmecabKo)

• Open Korean Text

• Khaiii

• TF-IDF

• TF-IDF + LSA

• TF-IDF + LDA

• Cosine Distance
: https://www.tidytextmining.com
: OCR
Tesseract 4
Get Raw Data Clean Data Train Model
Evaluate
Model
Feedback
ex) Tokenizing,
Remove Stop Words,
POS Tagging
ex) Classification,
CNN, RNN, LSTM
ex) Confusion Matrix,
Chi-squared test,
Cross-Validation
?
• (RFM, CLTV )
• ( / / / / )
• ( )
• ( / )
• ( / / )
Q/a

미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가