The document discusses using Sina Weibo, a popular Chinese microblogging site, to spread information about Skoltech in China. It proposes analyzing popular posts related to topics like innovation, education, and science to identify influential users and companies. These influential users would then be optimized in a facility location framework to maximize information spread about Skoltech at minimum cost. The project overcame difficulties with Sina Weibo's unstable API and limited traffic by developing a state-of-the-art parser to grab data and interpret it using optimization models.
2. DOMAIN
• Sina Weibo is a Chinese microblogging
website
• One of the most popular sites in China
• Over 600 mln. registered users
(well over 30% of the Internet)
• 86.6% of the Chinese microblogging
market
• ~100 mln. messages posted each day
3. OBJECTIVE
• Investigate information and
influence spread in the network
• Find the most influential users
and companies in the IT,
Science &Technology sphere
• Find those who will spread the
word about Skoltech with
minimum cost and maximum
effectiveness
4. • The normal way to do that is to use the API
EXPECTATIONS
6. ROADBLOCKS
• 中国的语⾔言是很难理解
• API is essentially non-functional and the
documentation is misleading and confusing
• traffic is severely limited (150 calls/hour)
• connection is unstable
7. SOLUTION STRUCTURE
To overcome the difficulties we came up with the following
solution:
• refer to : whiteboard
• state of the art parser/grabber to capture data
• API is used to get user statistics
• data is interpreted in the facility location framework
8. APPROACH
• Analyze most popular posts with tags like
#innovation, #education, #science, #technology
• Create a ranked list of their authors (clients)
(higher relevance = higher rank)
• Find out, whom they follow (facilities)
• Optimize: open the facilities which provide
maximum information spread
11. POTENTIAL IMPROVEMENTS
• better cost assignment estimation (based on
facility posts ranks)
• better source clients (more tags)
• handling of influence of posts from multiple
facilities to the same client
12. CREDITS
• Kalan Abe: parser core, pagerank, graph visualization
• Nikita Pestrov: initial concept, raw data processing
for optimization via CVX, Chinese language
understanding
• Denis Antyukhov:Weibo API, parser, data grabbing,
infographics and presentation
13. Our project is available on GitHub:
https://github.com/pestrov/SkolWeng
where you can get the code, screenshots, raw data
and witness the history of our struggle
ThankYou