2. OpenDataScience Slack
Officially started 12.03.2015
● Started as platform for open data science communication
● The biggest data science community in the world
As for today 25.07.2017
● 1.3M messages, 5400 users (2000 weekly active)
Most active channels:
#deep_learning, #theory_and_practice, #visualization, #_general,
#_meetings, #_jobs, #big_data, #python, #r, #datasets, #nlp, #edu_courses
3. Our data:
● Users
● Messages and Threads
● Time response
● Reactions
● Replies
20. Detection of curious users
● Curious user - user who asks for help
● NLP techniques: preprocessing, regular expressions
Expert detection
● Experts - users with the highest numbers of specific reactions under his/her
messages in threads
Troll detection
● Trolls - users with the highest numbers of specific reactions under his/her
messages
Model Info
30. Data Stats
Features:
● Text of main message
● Day and hour of main message
● Length of main message
● Channel
● Mentioned users
● Links in text
● Historical activity
Target variable:
● Waiting time for response
31. Applied Approaches
Approaches:
● Lasso regression (Scikit learn)
MAE = 149 min
● XGBoost regression
MAE = 140 min
● Lightgbm regression
MAE = 119 min
Best results: Lightgbm
Plot for real and predicted response time
(in minutes) for deep_lerning channel:
32. Further Work
Future improvements:
● New features (for example, use number of active users in channel and
number of threads before new thread)
● Use answers in channel also
● Reduce dimensionality
● Take into account the topic of thread