Your SlideShare is downloading. ×
Cascade Project
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Cascade Project

221
views

Published on

A look at the challenges involved in creating a big data product in the context of the Cascade Project (https://www.cascadeproject.com/) …

A look at the challenges involved in creating a big data product in the context of the Cascade Project (https://www.cascadeproject.com/)

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
221
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Nature of the Beast, comparison to weblogs
  • Transcript

    • 1. Real Time-Big Data-Social Network-Data Science-Gamified! Jason Capeharta.k.a. The Cascade Project 12/12/12(Okay … that last part of the title isn’t true)
    • 2. 1. Visualization2. Data3. Analysis
    • 3. Show Me!
    • 4. The Good, The Bad, The Ugly
    • 5. Surely, You Must Be Joking. Store ExamplesKey-Value Hadoop, Memcached, RedisDocument MongoDB, CouchDBGraph Neo4j, Giraph, TitanReal Time Storm, Impala
    • 6. Citation:Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a NewsMedia? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600).Raleigh, NC: ACM.
    • 7. Citation:A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAMReview 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111)
    • 8. 800,000,000 (that’s a lot of users) (cost = 200k for fire hose)
    • 9. Sampled Not SampledCitation:Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free:Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.
    • 10. # Pseudo Codeid_guess = randint(0, 10^9)user = api.get_user(id = id_guess)Repeat until tired or rate limited
    • 11. Power Law (xmin = 281, α = 2.19) LognormalDiscrete Power Law vs.LognormalLoglikelihood 89.46RatioVuong’s Test 7.14Statisticp-val >0.99(1-sided)
    • 12. Power Law (xmin = 222, α = 2.33)LognormalStretched Exponential
    • 13. • Conclusions = None! – All work is in progress• Discussion – Cascade uses open source – Opportunities to give back?
    • 14. References1. A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111) – Code: http://tuvalu.santafe.edu/~aaronc/powerlaws/2. Newman, M. (2005, September-October). Power laws, Pareto distributions and Zipfs law. Contemporary Physics, 46(5), 323-351.3. Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM4. Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.