GrandData                                      InfoVis challenge“We are Big-data analysts. Wewill be a Legion. We do work...
Data we dealt with Fetched from peerIndex The top most influencer twitter users in UK For each of them:   Popular topi...
Approaching the problem Our focus: make a scalable Infovis solution    If data grow, everything should scale to guarante...
Considerations Problem: DB scalability and easy prototyping:    Solution: use a sharded database -> MongoLab Problem: q...
Vis Moving data to the browser is not a big-data   challenge:    Few pieces of data (compared to the stored)    Very ef...
Further considerations Problem: move data to the browser    Solution: we use MongoLab -> REST calls Problem: Simple fro...
Algo complexity Given N topics and K users, the complexity is   O(K*N)    Since the big-data, in this case, are the user...
Algo enhancement Given all the scores of a person, a prediction of its   (near) future trend is trivial. For each topic. ...
If anyone wants to sponsor us … Improvements:   Add security (authentication/authorization) to REST    calls   Unit tes...
Team references I Another guy Yeah, the last one
Upcoming SlideShare
Loading in …5
×

Grandata

1,063 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,063
On SlideShare
0
From Embeds
0
Number of Embeds
203
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Grandata

  1. 1. GrandData InfoVis challenge“We are Big-data analysts. Wewill be a Legion. We do workhard. We do not forgetscalability. Expect us in you datacenter” grandata.azurewebsites.net/
  2. 2. Data we dealt with Fetched from peerIndex The top most influencer twitter users in UK For each of them:  Popular topics  Influence graph (who influences? From whom has been influenced?)  Some statistics and data on his/her activity  His/Her twitter info Data are unstructured (mainly text, different attributes)
  3. 3. Approaching the problem Our focus: make a scalable Infovis solution  If data grow, everything should scale to guarantee a fixed response time. At least we hope so   No bottlenecks nor single point of failure in the data processing flow  Data are unstructured. Schemaless DB! Additionally: 24hrs aren’t enough to build a complete system. That’s only a fully-working proto
  4. 4. Considerations Problem: DB scalability and easy prototyping:  Solution: use a sharded database -> MongoLab Problem: quick coldstart, reliability and easy management  Solution: cloud -> Windows Azure Problem: algorithm scalability  Solution: MapReduce
  5. 5. Vis Moving data to the browser is not a big-data challenge:  Few pieces of data (compared to the stored)  Very effective graphics library publicly released Support any (recent) browser
  6. 6. Further considerations Problem: move data to the browser  Solution: we use MongoLab -> REST calls Problem: Simple frontend that can runs everywhere  Solution: stay simple -> HTML, CSS and javascript Problem: surfing the UX must be appealing  Solution: powerful js graphics library -> d3js
  7. 7. Algo complexity Given N topics and K users, the complexity is O(K*N)  Since the big-data, in this case, are the users (N will be slow increasing during the time), the complexity can be approximated as O(K)  That’s linear! Great for a big-data task 
  8. 8. Algo enhancement Given all the scores of a person, a prediction of its (near) future trend is trivial. For each topic.  It’s possible to build a time-series prediction of what might be the next value of each score. If data are partially missing, or a subsampling filtering has been applied, it’s still possible to predict the scores of a generic user.  Collaborative filtering based on user/score matrix.
  9. 9. If anyone wants to sponsor us … Improvements:  Add security (authentication/authorization) to REST calls  Unit testing every piece of code  Build an on-line system that automatically loads data gathered from the Internet
  10. 10. Team references I Another guy Yeah, the last one

×