Your SlideShare is downloading. ×
0
SCALABILITY AND DATA  ANALYTICS MATTER       HCB (@boosc)
Agenda•   Buzzword bingo•   Data•   Analytics•   Scalability•   Distributed and parallel concepts•   Technology and tools•...
Buzzword Bingo     H-Space                                                  Agents/BotsData Engineer   Machine Learning   ...
Data, lots of it
79 times more CPU power than used  in Apollo missions on one iPhone
What we can do
Data
Knowledge pyramid 1960 s 1950 s    Data Processing                            Data          Data:          Unfiltered, Res...
Knowledge pyramid 1980 s          Information Mangement           Information 1970 s 1960 s 1950 s    Data Processing     ...
Knowledge pyramid 1990 s   Knowledge Management           Knowledge 1980 s          Information Mangement         Informat...
Knowledge pyramid 2000 s   Knowledge Ecology              Intelligence 1990 s   Knowledge Management           Knowledge 1...
Knowledge pyramid 2010 s    Systems Thinking                    Wisdom 2000 s    Knowledge Ecology                Intellig...
Knowledge pyramid                                  Yield 2010 s   Systems Thinking                   Wisdom 2000 s   Knowl...
Why you need big data  You Are Here !                    Yield   2010 s   Systems Thinking                   Wisdom   2000...
Analytics
Even in simple datasets, common statistics    fails - (avg, min, max, distribution)
Finding clusters, evaluating outliers   and interpreting white noise
Two tips for looking at data:           1. Plot it     2. Remove all labels
Scalability
Cloud Computing             IsWhen the IT guys are finallyable to explain to business  people what they weretalking about 2...
=
Computation on   demand       + Pay as you go
BASE(Basically Available, Soft State, Eventual consistency)                      not                    ACID   (Atomicity,...
How to scale            (AWS Example)• Do not allocate instances manually• Each component needs to be independent• Plan fo...
Human Software•   Click Workers and Mechanical Turks are not just    cheap labour•   They allow programmers to hand tasks ...
Distributed and parallel       concepts
Imperative            Programming•   Step by step explanation   1    what to do•   Explaining WHAT to do    rather than RE...
Functional          Programming I•   Combine results to       1    become a program                             2•   Allow...
Functional             Programming II    F ( G ( H ( A,B) , C), D)getMusicLikes(getFriends(facebookID)Instead offor i in g...
Technology and tools
Data Storage• Cassandra - for write performance• Hbase - for read performance• Redis.io - for predictable operation time
Other Data Storage• Mongo - NOSQL for beginners (close to  SQL, but scalability is very manual)• SONOS -Graph DB (Windows ...
Distributed Computing• Hadoop• Zookeeper as DLS
Languages•   ERLANG•   HASKELL•   SCALA•   Lisp•   Prolog•   Mathmatica
No,You Don‘t Have to LearnERLANG? No,Use Hadoop  Streaming With Python                        Program 2               Line...
Check out my tool list:  http://www.hcboos.net/100-links/
Senzari and big data
The AMP3 PlatformAdaptable Music Parallel Processing Platform
Behind AMP
Technologies• AWS: EC2, S3, EBS, SNS, ELB• Cassandra + Hadoop + Solandra• Zookeeper• Dynamic scaling server (Lich Lord)• A...
Effects• Built on top of python platform• Fully automated scaling• Fully distributed data processing• Message channels all...
Thank You for Your Time
Credits• „Big Data Just Beginning to Explode“ by  CSC http://www.csc.com/insights/flxwd/  78931-big_data_just_beginning_to_...
Upcoming SlideShare
Loading in...5
×

Scalability and Big Data at Senzari

649

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
649
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Scalability and Big Data at Senzari"

  1. 1. SCALABILITY AND DATA ANALYTICS MATTER HCB (@boosc)
  2. 2. Agenda• Buzzword bingo• Data• Analytics• Scalability• Distributed and parallel concepts• Technology and tools• Senzari and big data
  3. 3. Buzzword Bingo H-Space Agents/BotsData Engineer Machine Learning Support Vector Machines Big Data Swarm Intelligence Gaussian Processes Genetic Algorithms Hadoop PIG HBase Cassandra redis.io Eucalyptus Core Dataset R+ Clustering NoStats Natural Language Processing
  4. 4. Data, lots of it
  5. 5. 79 times more CPU power than used in Apollo missions on one iPhone
  6. 6. What we can do
  7. 7. Data
  8. 8. Knowledge pyramid 1960 s 1950 s Data Processing Data Data: Unfiltered, Research, Creation, Gathering
  9. 9. Knowledge pyramid 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Information: Organized Data, Patterns, Presentation
  10. 10. Knowledge pyramid 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Knowledge: Useful Patterns, Predictability, Conversation
  11. 11. Knowledge pyramid 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Intelligence: Choice, Understanding, Dicision
  12. 12. Knowledge pyramid 2010 s Systems Thinking Wisdom 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Wisdom: Evaluation, Interpretation, Retrospective
  13. 13. Knowledge pyramid Yield 2010 s Systems Thinking Wisdom 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data
  14. 14. Why you need big data You Are Here ! Yield 2010 s Systems Thinking Wisdom 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data
  15. 15. Analytics
  16. 16. Even in simple datasets, common statistics fails - (avg, min, max, distribution)
  17. 17. Finding clusters, evaluating outliers and interpreting white noise
  18. 18. Two tips for looking at data: 1. Plot it 2. Remove all labels
  19. 19. Scalability
  20. 20. Cloud Computing IsWhen the IT guys are finallyable to explain to business people what they weretalking about 20 years ago!
  21. 21. =
  22. 22. Computation on demand + Pay as you go
  23. 23. BASE(Basically Available, Soft State, Eventual consistency) not ACID (Atomicity, Consistency, Isolation, Durability)
  24. 24. How to scale (AWS Example)• Do not allocate instances manually• Each component needs to be independent• Plan for failure• Actively provoke failure
  25. 25. Human Software• Click Workers and Mechanical Turks are not just cheap labour• They allow programmers to hand tasks to humans they are not able to handle algorithmically• Make use of it to • Do things too complicated for machine learning • Pre populate machine learning spaces
  26. 26. Distributed and parallel concepts
  27. 27. Imperative Programming• Step by step explanation 1 what to do• Explaining WHAT to do rather than RESULTS you want 2• Always necessary for basic algorithms 3
  28. 28. Functional Programming I• Combine results to 1 become a program 2• Allows dynamic 3 distribution• Map-Reduce is only one way of doing it!
  29. 29. Functional Programming II F ( G ( H ( A,B) , C), D)getMusicLikes(getFriends(facebookID)Instead offor i in getFriends(facebookID) getMusicLikes(i)
  30. 30. Technology and tools
  31. 31. Data Storage• Cassandra - for write performance• Hbase - for read performance• Redis.io - for predictable operation time
  32. 32. Other Data Storage• Mongo - NOSQL for beginners (close to SQL, but scalability is very manual)• SONOS -Graph DB (Windows based)• CouchDB, etc. etc. - nice concepts, lots of great ideas, but communities too small
  33. 33. Distributed Computing• Hadoop• Zookeeper as DLS
  34. 34. Languages• ERLANG• HASKELL• SCALA• Lisp• Prolog• Mathmatica
  35. 35. No,You Don‘t Have to LearnERLANG? No,Use Hadoop Streaming With Python Program 2 Line 1 Line 1 Program 2Program 1 STDOUT Line 1 Program 2 Line 1 Program 2
  36. 36. Check out my tool list: http://www.hcboos.net/100-links/
  37. 37. Senzari and big data
  38. 38. The AMP3 PlatformAdaptable Music Parallel Processing Platform
  39. 39. Behind AMP
  40. 40. Technologies• AWS: EC2, S3, EBS, SNS, ELB• Cassandra + Hadoop + Solandra• Zookeeper• Dynamic scaling server (Lich Lord)• Asynchronous messaging system• Modules built in python
  41. 41. Effects• Built on top of python platform• Fully automated scaling• Fully distributed data processing• Message channels allow code decoupling• Message channels allow replay• Message channels allow outtasking
  42. 42. Thank You for Your Time
  43. 43. Credits• „Big Data Just Beginning to Explode“ by CSC http://www.csc.com/insights/flxwd/ 78931-big_data_just_beginning_to_explode• „Social media network connections among twitter users“ by Marc Smith http:// www.flickr.com/photos/marc_smith/• Asteroid Datasets by Bruce Gary http:// brucegary.net/POVENMIRE/x.htm
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×