SlideShare a Scribd company logo
1 of 3
Download to read offline
The Path to TrstRank
         Building One-Click Twitter Influence Metrics
                                    Since the launch of Twitter, people have clamored for ways to
                                    access and “slice and dice” its data. One of the most common
                                    ways people use the Twitter data corpus is to measure a person’s
                                    importance and influence. Klout is an example of one product that
                                    specializes in this kind of “influencer” data.
What is
TrstRank?                           A few years ago, we created our own special version of Klout,
                                    one that took advantage of our vast historical record of the
TrstRank is an Infochimps           relationships to create an accurate number describing how
developed dataset and API           influential a Twitter user is. It’s called TrstRank and it ranks a user
that provides Twitter influence     on a scale of 1-10, with 10 being the most influential you
metrics. This API provides          can get.
Twitter influence metrics with
the click of a button! TrstRank     Coming up with such a number like TrstRank is no small task.
measures Twitter user               Setting aside the issues of getting the data, there are some very
reputation, importance and          real Big Data problems surrounding the product that require
influence in a far more             special tools for getting it done efficiently. And when you’re a
robust way than counting the        bootstrapped startup, like we were at the time, you have to be
number of followers. It is a        resourceful if you are going to get by.
sophisticated measure of a
user’s relative importance          The biggest issue with pursuing a new data product like TrstRank
within the entire Twitter           is the same one any company faces when they decide to venture
network.                            into new territory - the high risks of wasting time and money.


                                    Wasting Time
                                    One of the first problems you run into as a small team trying your
                                    hand at data science is the excess time spent on server and ma-
                                    chine configuration, instead of focusing on modeling, algorithms,
                                    and manipulating the data.

© 2012 Infochimps, Inc. All rights reserved.                                                           1
Ramp-up time for even the first phase of a project like TrstRank
                                    can be a whole day or more of engineering time.


                                    Wasting Money
                                    From our earliest days Infochimps has been based on Amazon
                                    Web Services’ (AWS) cloud, taking advantage of the flexibility
                                    and scalability it provides. With AWS, you pay for what you use,
                                    so you are always inclined to eliminate waste. In our early days
                                    we even created decision trees for when to shut down a cluster or
                                    not, depending on how many hours it was to be up but not used.


                                    This can set conflicting goals for the data scientist who would
                                    prefer to leave a cluster up overnight, even if it’s unused, so they
                                    don’t have to deal with setting everything up again the next day!


                                    Enter Ironfan
                                    We created Ironfan to solve our own problems of how to save
                                    time and money during our data science operations in the cloud.
                                    When we came up with the idea for TrstRank, it was a simple
                                    operation to spin up a cluster for early analysis and experimenta-
                                    tion. We could validate some of our algorithms and ideas on a
                                    simple cluster before moving to something more heavyweight.

                                    Ironfan and TrstRank, Now
                                    Ironfan has continued as a key tool for our monthly TrstRank
                                    operation. We continue to scrape Twitter for follower information,
                                    and with the updated data every month we crunch the TrstRank
                                    numbers again.


                                    With Ironfan, we’re able to run a multiple step operation on
                                    8 billion tweets on clusters of 30 m1.xlarge EC2 machines,
                                    while only running the resources we need when they’re needed.
                                    TrstRank takes 72 hours to complete, with resources being paid
                                    for commensurately. Without Ironfan, we’d be looking at 2-3x the
                                    costs in time and money!



© 2012 Infochimps, Inc. All rights reserved.                                                         2
About Infochimps
                                    Our mission is to make the world’s data more accessible.
                                    Infochimps helps companies understand their data. We provide
                                    tools and services that connect their internal data, leverage the
                                    power of cloud computing and new technologies such as Hadoop,
                                    and provide a wealth of external datasets, which organizations
                                    can connect to their own data.


                                    Contact Us
                                    Infochimps, Inc.
                                    1214 W 6th St. Suite 202
                                    Austin, TX 78703


                                    1-855-DATA-FUN (1-855-328-2386)


                                    www.infochimps.com
                                    info@infochimps.com


                                    Twitter: @infochimps




                      Get a free Big Data consultation
                          Let’s talk Big Data in the enterprise!

     Get a free conference with the leading big data experts regarding your enterprise big data
     project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop
     about your project objectives, design, infrastructure, tools, etc. Find out how other compa-
     nies are solving similar problems. Learn best practices and get recommendations — free.




© 2012 Infochimps, Inc. All rights reserved.                                                        8

More Related Content

More from Infochimps, a CSC Big Data Business (12)

Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
The Other Way of Doing Big Data
The Other Way of Doing Big DataThe Other Way of Doing Big Data
The Other Way of Doing Big Data
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

The Path to TrstRank: Building One Click Twitter Influence Metrics

  • 1. The Path to TrstRank Building One-Click Twitter Influence Metrics Since the launch of Twitter, people have clamored for ways to access and “slice and dice” its data. One of the most common ways people use the Twitter data corpus is to measure a person’s importance and influence. Klout is an example of one product that specializes in this kind of “influencer” data. What is TrstRank? A few years ago, we created our own special version of Klout, one that took advantage of our vast historical record of the TrstRank is an Infochimps relationships to create an accurate number describing how developed dataset and API influential a Twitter user is. It’s called TrstRank and it ranks a user that provides Twitter influence on a scale of 1-10, with 10 being the most influential you metrics. This API provides can get. Twitter influence metrics with the click of a button! TrstRank Coming up with such a number like TrstRank is no small task. measures Twitter user Setting aside the issues of getting the data, there are some very reputation, importance and real Big Data problems surrounding the product that require influence in a far more special tools for getting it done efficiently. And when you’re a robust way than counting the bootstrapped startup, like we were at the time, you have to be number of followers. It is a resourceful if you are going to get by. sophisticated measure of a user’s relative importance The biggest issue with pursuing a new data product like TrstRank within the entire Twitter is the same one any company faces when they decide to venture network. into new territory - the high risks of wasting time and money. Wasting Time One of the first problems you run into as a small team trying your hand at data science is the excess time spent on server and ma- chine configuration, instead of focusing on modeling, algorithms, and manipulating the data. © 2012 Infochimps, Inc. All rights reserved. 1
  • 2. Ramp-up time for even the first phase of a project like TrstRank can be a whole day or more of engineering time. Wasting Money From our earliest days Infochimps has been based on Amazon Web Services’ (AWS) cloud, taking advantage of the flexibility and scalability it provides. With AWS, you pay for what you use, so you are always inclined to eliminate waste. In our early days we even created decision trees for when to shut down a cluster or not, depending on how many hours it was to be up but not used. This can set conflicting goals for the data scientist who would prefer to leave a cluster up overnight, even if it’s unused, so they don’t have to deal with setting everything up again the next day! Enter Ironfan We created Ironfan to solve our own problems of how to save time and money during our data science operations in the cloud. When we came up with the idea for TrstRank, it was a simple operation to spin up a cluster for early analysis and experimenta- tion. We could validate some of our algorithms and ideas on a simple cluster before moving to something more heavyweight. Ironfan and TrstRank, Now Ironfan has continued as a key tool for our monthly TrstRank operation. We continue to scrape Twitter for follower information, and with the updated data every month we crunch the TrstRank numbers again. With Ironfan, we’re able to run a multiple step operation on 8 billion tweets on clusters of 30 m1.xlarge EC2 machines, while only running the resources we need when they’re needed. TrstRank takes 72 hours to complete, with resources being paid for commensurately. Without Ironfan, we’d be looking at 2-3x the costs in time and money! © 2012 Infochimps, Inc. All rights reserved. 2
  • 3. About Infochimps Our mission is to make the world’s data more accessible. Infochimps helps companies understand their data. We provide tools and services that connect their internal data, leverage the power of cloud computing and new technologies such as Hadoop, and provide a wealth of external datasets, which organizations can connect to their own data. Contact Us Infochimps, Inc. 1214 W 6th St. Suite 202 Austin, TX 78703 1-855-DATA-FUN (1-855-328-2386) www.infochimps.com info@infochimps.com Twitter: @infochimps Get a free Big Data consultation Let’s talk Big Data in the enterprise! Get a free conference with the leading big data experts regarding your enterprise big data project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop about your project objectives, design, infrastructure, tools, etc. Find out how other compa- nies are solving similar problems. Learn best practices and get recommendations — free. © 2012 Infochimps, Inc. All rights reserved. 8