SlideShare a Scribd company logo
1 of 13
Download to read offline
Bringing China to Skoltech
DOMAIN
• Sina Weibo is a Chinese microblogging
website
• One of the most popular sites in China
• Over 600 mln. registered users
(well over 30% of the Internet)
• 86.6% of the Chinese microblogging
market
• ~100 mln. messages posted each day
OBJECTIVE
• Investigate information and
influence spread in the network
• Find the most influential users
and companies in the IT, 

Science &Technology sphere
• Find those who will spread the
word about Skoltech with
minimum cost and maximum
effectiveness
• The normal way to do that is to use the API
EXPECTATIONS
HARSH REALITY
ROADBLOCKS
• 中国的语⾔言是很难理解
• API is essentially non-functional and the
documentation is misleading and confusing
• traffic is severely limited (150 calls/hour)
• connection is unstable
SOLUTION STRUCTURE
To overcome the difficulties we came up with the following
solution:
• refer to : whiteboard
• state of the art parser/grabber to capture data
• API is used to get user statistics
• data is interpreted in the facility location framework
APPROACH
• Analyze most popular posts with tags like
#innovation, #education, #science, #technology
• Create a ranked list of their authors (clients)

(higher relevance = higher rank)
• Find out, whom they follow (facilities)
• Optimize: open the facilities which provide
maximum information spread
CLIENTS VS FACILITIES
Kai-Fu Lee
THE GREAT GRAPH OF CHINA
POTENTIAL IMPROVEMENTS
• better cost assignment estimation (based on
facility posts ranks)
• better source clients (more tags)
• handling of influence of posts from multiple
facilities to the same client
CREDITS
• Kalan Abe: parser core, pagerank, graph visualization
• Nikita Pestrov: initial concept, raw data processing
for optimization via CVX, Chinese language
understanding
• Denis Antyukhov:Weibo API, parser, data grabbing,
infographics and presentation
Our project is available on GitHub:
https://github.com/pestrov/SkolWeng
where you can get the code, screenshots, raw data
and witness the history of our struggle
ThankYou

More Related Content

Similar to optmeth-presentation

Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
Christopher Whitaker
 
Shared Situational Awareness: The Achievable Path. ICSJWG Spring 2014
Shared Situational Awareness:  The Achievable Path. ICSJWG Spring 2014Shared Situational Awareness:  The Achievable Path. ICSJWG Spring 2014
Shared Situational Awareness: The Achievable Path. ICSJWG Spring 2014
icsisac
 
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
jazoon13
 
Database & Technology 1 _ Barbara Rabinowicz _ Database Security Methoda and ...
Database & Technology 1 _ Barbara Rabinowicz _ Database Security Methoda and ...Database & Technology 1 _ Barbara Rabinowicz _ Database Security Methoda and ...
Database & Technology 1 _ Barbara Rabinowicz _ Database Security Methoda and ...
InSync2011
 
Techniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudTechniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloud
Akshay Mathur
 
Deployment of rd_fa_microdata_microformats_on_the_web
Deployment of rd_fa_microdata_microformats_on_the_webDeployment of rd_fa_microdata_microformats_on_the_web
Deployment of rd_fa_microdata_microformats_on_the_web
STIinnsbruck
 

Similar to optmeth-presentation (20)

Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
OpenStack - What is it and why you should know about it!
OpenStack - What is it and why you should know about it!OpenStack - What is it and why you should know about it!
OpenStack - What is it and why you should know about it!
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
 
Big data, Analytics and Beyond
Big data, Analytics and BeyondBig data, Analytics and Beyond
Big data, Analytics and Beyond
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Shared Situational Awareness: The Achievable Path. ICSJWG Spring 2014
Shared Situational Awareness:  The Achievable Path. ICSJWG Spring 2014Shared Situational Awareness:  The Achievable Path. ICSJWG Spring 2014
Shared Situational Awareness: The Achievable Path. ICSJWG Spring 2014
 
Cosi Usage Data
Cosi   Usage DataCosi   Usage Data
Cosi Usage Data
 
Advanced applications with MongoDB
Advanced applications with MongoDBAdvanced applications with MongoDB
Advanced applications with MongoDB
 
Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)
 
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
 
DevOps 2016 summit
DevOps 2016 summitDevOps 2016 summit
DevOps 2016 summit
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
 
IoT Interoperability: a Hub-based Approach
IoT Interoperability: a Hub-based ApproachIoT Interoperability: a Hub-based Approach
IoT Interoperability: a Hub-based Approach
 
Webinar - Web Accessibility 101 - 2016-08-09
Webinar - Web Accessibility 101 - 2016-08-09Webinar - Web Accessibility 101 - 2016-08-09
Webinar - Web Accessibility 101 - 2016-08-09
 
Database & Technology 1 _ Barbara Rabinowicz _ Database Security Methoda and ...
Database & Technology 1 _ Barbara Rabinowicz _ Database Security Methoda and ...Database & Technology 1 _ Barbara Rabinowicz _ Database Security Methoda and ...
Database & Technology 1 _ Barbara Rabinowicz _ Database Security Methoda and ...
 
Techniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudTechniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloud
 
Autodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataAutodiscovery or The long tail of open data
Autodiscovery or The long tail of open data
 
Empowerment technologies
Empowerment technologiesEmpowerment technologies
Empowerment technologies
 
Deployment of rd_fa_microdata_microformats_on_the_web
Deployment of rd_fa_microdata_microformats_on_the_webDeployment of rd_fa_microdata_microformats_on_the_web
Deployment of rd_fa_microdata_microformats_on_the_web
 

More from aphex34

Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talk
aphex34
 
final_nlp
final_nlpfinal_nlp
final_nlp
aphex34
 
BreathBeat
BreathBeatBreathBeat
BreathBeat
aphex34
 
NIGHT_MUNCH
NIGHT_MUNCHNIGHT_MUNCH
NIGHT_MUNCH
aphex34
 
LT_presentation
LT_presentationLT_presentation
LT_presentation
aphex34
 
DMG_final
DMG_finalDMG_final
DMG_final
aphex34
 

More from aphex34 (6)

Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talk
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
BreathBeat
BreathBeatBreathBeat
BreathBeat
 
NIGHT_MUNCH
NIGHT_MUNCHNIGHT_MUNCH
NIGHT_MUNCH
 
LT_presentation
LT_presentationLT_presentation
LT_presentation
 
DMG_final
DMG_finalDMG_final
DMG_final
 

optmeth-presentation

  • 1. Bringing China to Skoltech
  • 2. DOMAIN • Sina Weibo is a Chinese microblogging website • One of the most popular sites in China • Over 600 mln. registered users (well over 30% of the Internet) • 86.6% of the Chinese microblogging market • ~100 mln. messages posted each day
  • 3. OBJECTIVE • Investigate information and influence spread in the network • Find the most influential users and companies in the IT, 
 Science &Technology sphere • Find those who will spread the word about Skoltech with minimum cost and maximum effectiveness
  • 4. • The normal way to do that is to use the API EXPECTATIONS
  • 6. ROADBLOCKS • 中国的语⾔言是很难理解 • API is essentially non-functional and the documentation is misleading and confusing • traffic is severely limited (150 calls/hour) • connection is unstable
  • 7. SOLUTION STRUCTURE To overcome the difficulties we came up with the following solution: • refer to : whiteboard • state of the art parser/grabber to capture data • API is used to get user statistics • data is interpreted in the facility location framework
  • 8. APPROACH • Analyze most popular posts with tags like #innovation, #education, #science, #technology • Create a ranked list of their authors (clients)
 (higher relevance = higher rank) • Find out, whom they follow (facilities) • Optimize: open the facilities which provide maximum information spread
  • 10. THE GREAT GRAPH OF CHINA
  • 11. POTENTIAL IMPROVEMENTS • better cost assignment estimation (based on facility posts ranks) • better source clients (more tags) • handling of influence of posts from multiple facilities to the same client
  • 12. CREDITS • Kalan Abe: parser core, pagerank, graph visualization • Nikita Pestrov: initial concept, raw data processing for optimization via CVX, Chinese language understanding • Denis Antyukhov:Weibo API, parser, data grabbing, infographics and presentation
  • 13. Our project is available on GitHub: https://github.com/pestrov/SkolWeng where you can get the code, screenshots, raw data and witness the history of our struggle ThankYou