The Path to TrstRank: Building One Click Twitter Influence Metrics

466 views
406 views

Published on

A few years ago Infochimps created their own special version of Klout, one that took advantage of our vast historical record of the relationships to create an accurate number describing how influential a Twitter user is. It’s called Trstrank and it ranks a user on a scale of 1-10, with 10 being the most influential you can get.
Coming up with such a number like Trstrank is no small task.

Setting aside the issues of getting the data, there are some very real Big Data problems surrounding the product that require special tools for getting it done efficiently. And when you’re a bootstrapped startup like we were at the time, you have to be resourceful if you are to get by.

Learn more at http://infochimps.com

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
466
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Path to TrstRank: Building One Click Twitter Influence Metrics

  1. 1. The Path to TrstRank Building One-Click Twitter Influence Metrics Since the launch of Twitter, people have clamored for ways to access and “slice and dice” its data. One of the most common ways people use the Twitter data corpus is to measure a person’s importance and influence. Klout is an example of one product that specializes in this kind of “influencer” data.What isTrstRank? A few years ago, we created our own special version of Klout, one that took advantage of our vast historical record of theTrstRank is an Infochimps relationships to create an accurate number describing howdeveloped dataset and API influential a Twitter user is. It’s called TrstRank and it ranks a userthat provides Twitter influence on a scale of 1-10, with 10 being the most influential youmetrics. This API provides can get.Twitter influence metrics withthe click of a button! TrstRank Coming up with such a number like TrstRank is no small task.measures Twitter user Setting aside the issues of getting the data, there are some veryreputation, importance and real Big Data problems surrounding the product that requireinfluence in a far more special tools for getting it done efficiently. And when you’re arobust way than counting the bootstrapped startup, like we were at the time, you have to benumber of followers. It is a resourceful if you are going to get by.sophisticated measure of auser’s relative importance The biggest issue with pursuing a new data product like TrstRankwithin the entire Twitter is the same one any company faces when they decide to venturenetwork. into new territory - the high risks of wasting time and money. Wasting Time One of the first problems you run into as a small team trying your hand at data science is the excess time spent on server and ma- chine configuration, instead of focusing on modeling, algorithms, and manipulating the data.© 2012 Infochimps, Inc. All rights reserved. 1
  2. 2. Ramp-up time for even the first phase of a project like TrstRank can be a whole day or more of engineering time. Wasting Money From our earliest days Infochimps has been based on Amazon Web Services’ (AWS) cloud, taking advantage of the flexibility and scalability it provides. With AWS, you pay for what you use, so you are always inclined to eliminate waste. In our early days we even created decision trees for when to shut down a cluster or not, depending on how many hours it was to be up but not used. This can set conflicting goals for the data scientist who would prefer to leave a cluster up overnight, even if it’s unused, so they don’t have to deal with setting everything up again the next day! Enter Ironfan We created Ironfan to solve our own problems of how to save time and money during our data science operations in the cloud. When we came up with the idea for TrstRank, it was a simple operation to spin up a cluster for early analysis and experimenta- tion. We could validate some of our algorithms and ideas on a simple cluster before moving to something more heavyweight. Ironfan and TrstRank, Now Ironfan has continued as a key tool for our monthly TrstRank operation. We continue to scrape Twitter for follower information, and with the updated data every month we crunch the TrstRank numbers again. With Ironfan, we’re able to run a multiple step operation on 8 billion tweets on clusters of 30 m1.xlarge EC2 machines, while only running the resources we need when they’re needed. TrstRank takes 72 hours to complete, with resources being paid for commensurately. Without Ironfan, we’d be looking at 2-3x the costs in time and money!© 2012 Infochimps, Inc. All rights reserved. 2
  3. 3. About Infochimps Our mission is to make the world’s data more accessible. Infochimps helps companies understand their data. We provide tools and services that connect their internal data, leverage the power of cloud computing and new technologies such as Hadoop, and provide a wealth of external datasets, which organizations can connect to their own data. Contact Us Infochimps, Inc. 1214 W 6th St. Suite 202 Austin, TX 78703 1-855-DATA-FUN (1-855-328-2386) www.infochimps.com info@infochimps.com Twitter: @infochimps Get a free Big Data consultation Let’s talk Big Data in the enterprise! Get a free conference with the leading big data experts regarding your enterprise big data project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop about your project objectives, design, infrastructure, tools, etc. Find out how other compa- nies are solving similar problems. Learn best practices and get recommendations — free.© 2012 Infochimps, Inc. All rights reserved. 8

×