Your SlideShare is downloading. ×
0
Skytree   big data london meetup - may 2013
Skytree   big data london meetup - may 2013
Skytree   big data london meetup - may 2013
Skytree   big data london meetup - may 2013
Skytree   big data london meetup - may 2013
Skytree   big data london meetup - may 2013
Skytree   big data london meetup - may 2013
Skytree   big data london meetup - may 2013
Skytree   big data london meetup - may 2013
Skytree   big data london meetup - may 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Skytree big data london meetup - may 2013

679

Published on

Slides from Paul Salazar's talk on SkyTree at the 18th Big Data London meetup.

Slides from Paul Salazar's talk on SkyTree at the 18th Big Data London meetup.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
679
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SAME DATA.BETTER RESULTS.PAUL SALAZARPAUL@SKYTREE.NET!1
  • 2. SKYTREE’S FOCUS"PRODUCTION GRADE"MACHINE LEARNINGMachine learning: the modern science of finding patterns and making predictions from data.!aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!
  • 3. Machine Learning Use Cases!Predict categories and classes!Predict values and numbers!Grouping and segmentation!Detection and characterization!Visualization and reduction!Find similar items !Classification !Regression!Clustering!Density Estimation !Dimension Reduction!Multidimensional Querying!Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, NearestNeighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine,2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic RegressionRecommendations PredictionsOutlierDetection
  • 4. What are the current options for ML for Big Data!1.  Just use a subset of the data!!–  e.g. just take the first 1,000 rows. Result to expect: Capture onlythe broadest patterns. à Lower accuracy."2.  Just use a simple ML method!!–  e.g. use logistic regression instead of nonlinear SVM. Result toexpect: Entire types of patterns cannot be found. à Loweraccuracy."3.  Just use simple parallelism/MapReduce!!–  i.e. replace all the for-loops with parallel ones. Result to expect:Only the simplest of ML methods (not O(N2)/O(N3)) can besignificantly sped up this way. à See #2."4.  Just throw it in the cloud!!–  i.e. somehow use the large compute power of the cloud. Resultto expect: The cost of sending it to the cloud is even greater thanthe compute cost. à See #1.  See also #3."
  • 5. Skytree’s Unique Differentiation:
Fundamental Technology Breakthrough!Complexity of State-of-the-Art Machine Learning methods:!1.  Querying: all-nearest-neighbors O(N2)!2.  Density estimation: kernel density estimation O(N2), kernel conditional density est.O(N3) !3.  Classification: logistic regression, decision tree, neural nets, nearest-neighbor 
classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), !4.  Regression: linear regression, LASSO, kernel regression O(N2), regression tree, 
Gaussian process regression O(N3)!5.  Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), 
maximum variance unfolding O(N3); Gaussian graphical models, discrete graphicalmodels!6.  Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)!7.  Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 
2-sample testing O(Nn), n=2, 3, 4, …!►  Unfortunately O(N2), O(N3) are computationally prohibitive for big data!Skytree has invented a way to reduce the complexity of abovemethods from O(N2) and O(N3) to O(N) or O(N log N).5
  • 6. Performance!Up to 10,000x !speedups!(on one CPU)!6
  • 7. How Does Skytree Do This?!7Deep knowledge of algorithmsDrawing from the latest from academiaSmart programmingEfficient ways to compute order N(2) and N(3)Distributed systemsTake advantage of parallel computing speed
  • 8. Team!8Martin Hack, CEO & Co-Founder
Sun, GreenBorder (Google)!Alexander Gray, PhD, CTO & Co-Founder
Leading Light for Large-Scale, Fast Algorithms!Paul Salazar, VP Sales
RedHat, Greenplum!Leland Wilkinson, PhD, VP Data Visualization
Creator of SYSTAT (SPSS/IBM).!Tim Marsland, PhD, VP Engineering
Sun Fellow, CTO Software, Apple, Oracle!!!!EXECUTIVETEAM!BOARD OFDIRECTORS!Rick Lewis, USVP
Noah Doyle, Javelin Venture Partners!David Toth, Founder and CEO NetRatings (Nielsen)!Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’!Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)!Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)!Prof. James Demmel, UC Berkeley: high-performance computing!INVESTORS!TECH!ADVISORY!BOARD!USVP, Javelin Venture Partners, Scott McNealy, UPS
  • 9. Product Overview!9Skytree Adviserfor DesktopData Science for EveryoneSkytree Serverfor EnterprisesEnterprise Machine Learning•  Predict Categories/Classes•  Detect Anomalies•  Find Trends•  Predict Values/Numbers•  Identify Patterns•  Find OutliersAdvanced Analytics:
  • 10. Thank you for learning about SkytreeRead more at www.skytree.net!•  We’re hiring: check out our careers page.!•  Download Skytree Adviser for Free.!•  Pick up a T-Shirt.!

×