Scalability and Big Data at Senzari

963 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
963
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Scalability and Big Data at Senzari

  1. 1. SCALABILITY AND DATA ANALYTICS MATTER HCB (@boosc)
  2. 2. Agenda• Buzzword bingo• Data• Analytics• Scalability• Distributed and parallel concepts• Technology and tools• Senzari and big data
  3. 3. Buzzword Bingo H-Space Agents/BotsData Engineer Machine Learning Support Vector Machines Big Data Swarm Intelligence Gaussian Processes Genetic Algorithms Hadoop PIG HBase Cassandra redis.io Eucalyptus Core Dataset R+ Clustering NoStats Natural Language Processing
  4. 4. Data, lots of it
  5. 5. 79 times more CPU power than used in Apollo missions on one iPhone
  6. 6. What we can do
  7. 7. Data
  8. 8. Knowledge pyramid 1960 s 1950 s Data Processing Data Data: Unfiltered, Research, Creation, Gathering
  9. 9. Knowledge pyramid 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Information: Organized Data, Patterns, Presentation
  10. 10. Knowledge pyramid 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Knowledge: Useful Patterns, Predictability, Conversation
  11. 11. Knowledge pyramid 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Intelligence: Choice, Understanding, Dicision
  12. 12. Knowledge pyramid 2010 s Systems Thinking Wisdom 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Wisdom: Evaluation, Interpretation, Retrospective
  13. 13. Knowledge pyramid Yield 2010 s Systems Thinking Wisdom 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data
  14. 14. Why you need big data You Are Here ! Yield 2010 s Systems Thinking Wisdom 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data
  15. 15. Analytics
  16. 16. Even in simple datasets, common statistics fails - (avg, min, max, distribution)
  17. 17. Finding clusters, evaluating outliers and interpreting white noise
  18. 18. Two tips for looking at data: 1. Plot it 2. Remove all labels
  19. 19. Scalability
  20. 20. Cloud Computing IsWhen the IT guys are finallyable to explain to business people what they weretalking about 20 years ago!
  21. 21. =
  22. 22. Computation on demand + Pay as you go
  23. 23. BASE(Basically Available, Soft State, Eventual consistency) not ACID (Atomicity, Consistency, Isolation, Durability)
  24. 24. How to scale (AWS Example)• Do not allocate instances manually• Each component needs to be independent• Plan for failure• Actively provoke failure
  25. 25. Human Software• Click Workers and Mechanical Turks are not just cheap labour• They allow programmers to hand tasks to humans they are not able to handle algorithmically• Make use of it to • Do things too complicated for machine learning • Pre populate machine learning spaces
  26. 26. Distributed and parallel concepts
  27. 27. Imperative Programming• Step by step explanation 1 what to do• Explaining WHAT to do rather than RESULTS you want 2• Always necessary for basic algorithms 3
  28. 28. Functional Programming I• Combine results to 1 become a program 2• Allows dynamic 3 distribution• Map-Reduce is only one way of doing it!
  29. 29. Functional Programming II F ( G ( H ( A,B) , C), D)getMusicLikes(getFriends(facebookID)Instead offor i in getFriends(facebookID) getMusicLikes(i)
  30. 30. Technology and tools
  31. 31. Data Storage• Cassandra - for write performance• Hbase - for read performance• Redis.io - for predictable operation time
  32. 32. Other Data Storage• Mongo - NOSQL for beginners (close to SQL, but scalability is very manual)• SONOS -Graph DB (Windows based)• CouchDB, etc. etc. - nice concepts, lots of great ideas, but communities too small
  33. 33. Distributed Computing• Hadoop• Zookeeper as DLS
  34. 34. Languages• ERLANG• HASKELL• SCALA• Lisp• Prolog• Mathmatica
  35. 35. No,You Don‘t Have to LearnERLANG? No,Use Hadoop Streaming With Python Program 2 Line 1 Line 1 Program 2Program 1 STDOUT Line 1 Program 2 Line 1 Program 2
  36. 36. Check out my tool list: http://www.hcboos.net/100-links/
  37. 37. Senzari and big data
  38. 38. The AMP3 PlatformAdaptable Music Parallel Processing Platform
  39. 39. Behind AMP
  40. 40. Technologies• AWS: EC2, S3, EBS, SNS, ELB• Cassandra + Hadoop + Solandra• Zookeeper• Dynamic scaling server (Lich Lord)• Asynchronous messaging system• Modules built in python
  41. 41. Effects• Built on top of python platform• Fully automated scaling• Fully distributed data processing• Message channels allow code decoupling• Message channels allow replay• Message channels allow outtasking
  42. 42. Thank You for Your Time
  43. 43. Credits• „Big Data Just Beginning to Explode“ by CSC http://www.csc.com/insights/flxwd/ 78931-big_data_just_beginning_to_explode• „Social media network connections among twitter users“ by Marc Smith http:// www.flickr.com/photos/marc_smith/• Asteroid Datasets by Bruce Gary http:// brucegary.net/POVENMIRE/x.htm

×