Scalability and Big Data at Senzari
Upcoming SlideShare
Loading in...5
×
 

Scalability and Big Data at Senzari

on

  • 783 views

 

Statistics

Views

Total Views
783
Views on SlideShare
783
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Scalability and Big Data at Senzari Scalability and Big Data at Senzari Presentation Transcript

  • SCALABILITY AND DATA ANALYTICS MATTER HCB (@boosc)
  • Agenda• Buzzword bingo• Data• Analytics• Scalability• Distributed and parallel concepts• Technology and tools• Senzari and big data
  • Buzzword Bingo H-Space Agents/BotsData Engineer Machine Learning Support Vector Machines Big Data Swarm Intelligence Gaussian Processes Genetic Algorithms Hadoop PIG HBase Cassandra redis.io Eucalyptus Core Dataset R+ Clustering NoStats Natural Language Processing
  • Data, lots of it
  • 79 times more CPU power than used in Apollo missions on one iPhone
  • What we can do
  • Data
  • Knowledge pyramid 1960 s 1950 s Data Processing Data Data: Unfiltered, Research, Creation, Gathering
  • Knowledge pyramid 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Information: Organized Data, Patterns, Presentation
  • Knowledge pyramid 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Knowledge: Useful Patterns, Predictability, Conversation
  • Knowledge pyramid 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Intelligence: Choice, Understanding, Dicision
  • Knowledge pyramid 2010 s Systems Thinking Wisdom 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data Wisdom: Evaluation, Interpretation, Retrospective
  • Knowledge pyramid Yield 2010 s Systems Thinking Wisdom 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data
  • Why you need big data You Are Here ! Yield 2010 s Systems Thinking Wisdom 2000 s Knowledge Ecology Intelligence 1990 s Knowledge Management Knowledge 1980 s Information Mangement Information 1970 s 1960 s 1950 s Data Processing Data
  • Analytics
  • Even in simple datasets, common statistics fails - (avg, min, max, distribution)
  • Finding clusters, evaluating outliers and interpreting white noise
  • Two tips for looking at data: 1. Plot it 2. Remove all labels
  • Scalability
  • Cloud Computing IsWhen the IT guys are finallyable to explain to business people what they weretalking about 20 years ago!
  • =
  • Computation on demand + Pay as you go
  • BASE(Basically Available, Soft State, Eventual consistency) not ACID (Atomicity, Consistency, Isolation, Durability)
  • How to scale (AWS Example)• Do not allocate instances manually• Each component needs to be independent• Plan for failure• Actively provoke failure
  • Human Software• Click Workers and Mechanical Turks are not just cheap labour• They allow programmers to hand tasks to humans they are not able to handle algorithmically• Make use of it to • Do things too complicated for machine learning • Pre populate machine learning spaces
  • Distributed and parallel concepts
  • Imperative Programming• Step by step explanation 1 what to do• Explaining WHAT to do rather than RESULTS you want 2• Always necessary for basic algorithms 3
  • Functional Programming I• Combine results to 1 become a program 2• Allows dynamic 3 distribution• Map-Reduce is only one way of doing it!
  • Functional Programming II F ( G ( H ( A,B) , C), D)getMusicLikes(getFriends(facebookID)Instead offor i in getFriends(facebookID) getMusicLikes(i)
  • Technology and tools
  • Data Storage• Cassandra - for write performance• Hbase - for read performance• Redis.io - for predictable operation time
  • Other Data Storage• Mongo - NOSQL for beginners (close to SQL, but scalability is very manual)• SONOS -Graph DB (Windows based)• CouchDB, etc. etc. - nice concepts, lots of great ideas, but communities too small
  • Distributed Computing• Hadoop• Zookeeper as DLS
  • Languages• ERLANG• HASKELL• SCALA• Lisp• Prolog• Mathmatica
  • No,You Don‘t Have to LearnERLANG? No,Use Hadoop Streaming With Python Program 2 Line 1 Line 1 Program 2Program 1 STDOUT Line 1 Program 2 Line 1 Program 2
  • Check out my tool list: http://www.hcboos.net/100-links/
  • Senzari and big data
  • The AMP3 PlatformAdaptable Music Parallel Processing Platform
  • Behind AMP
  • Technologies• AWS: EC2, S3, EBS, SNS, ELB• Cassandra + Hadoop + Solandra• Zookeeper• Dynamic scaling server (Lich Lord)• Asynchronous messaging system• Modules built in python
  • Effects• Built on top of python platform• Fully automated scaling• Fully distributed data processing• Message channels allow code decoupling• Message channels allow replay• Message channels allow outtasking
  • Thank You for Your Time
  • Credits• „Big Data Just Beginning to Explode“ by CSC http://www.csc.com/insights/flxwd/ 78931-big_data_just_beginning_to_explode• „Social media network connections among twitter users“ by Marc Smith http:// www.flickr.com/photos/marc_smith/• Asteroid Datasets by Bruce Gary http:// brucegary.net/POVENMIRE/x.htm