Slides from Paul Salazar's talk on SkyTree at the 18th Big Data London meetup.

- 1. SAME DATA.BETTER RESULTS.PAUL SALAZARPAUL@SKYTREE.NET!1
- 2. SKYTREE’S FOCUS"PRODUCTION GRADE"MACHINE LEARNINGMachine learning: the modern science of ﬁnding patterns and making predictions from data.!aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!
- 3. Machine Learning Use Cases!Predict categories and classes!Predict values and numbers!Grouping and segmentation!Detection and characterization!Visualization and reduction!Find similar items !Classiﬁcation !Regression!Clustering!Density Estimation !Dimension Reduction!Multidimensional Querying!Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, NearestNeighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine,2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic RegressionRecommendations PredictionsOutlierDetection
- 4. What are the current options for ML for Big Data!1. Just use a subset of the data!!– e.g. just take the ﬁrst 1,000 rows. Result to expect: Capture onlythe broadest patterns. à Lower accuracy."2. Just use a simple ML method!!– e.g. use logistic regression instead of nonlinear SVM. Result toexpect: Entire types of patterns cannot be found. à Loweraccuracy."3. Just use simple parallelism/MapReduce!!– i.e. replace all the for-loops with parallel ones. Result to expect:Only the simplest of ML methods (not O(N2)/O(N3)) can besigniﬁcantly sped up this way. à See #2."4. Just throw it in the cloud!!– i.e. somehow use the large compute power of the cloud. Resultto expect: The cost of sending it to the cloud is even greater thanthe compute cost. à See #1. See also #3."
- 5. Skytree’s Unique Differentiation: Fundamental Technology Breakthrough!Complexity of State-of-the-Art Machine Learning methods:!1. Querying: all-nearest-neighbors O(N2)!2. Density estimation: kernel density estimation O(N2), kernel conditional density est.O(N3) !3. Classiﬁcation: logistic regression, decision tree, neural nets, nearest-neighbor classiﬁer O(N2), kernel discriminant O(N2), support vector machine O(N3), !4. Regression: linear regression, LASSO, kernel regression O(N2), regression tree, Gaussian process regression O(N3)!5. Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), maximum variance unfolding O(N3); Gaussian graphical models, discrete graphicalmodels!6. Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)!7. Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 2-sample testing O(Nn), n=2, 3, 4, …!► Unfortunately O(N2), O(N3) are computationally prohibitive for big data!Skytree has invented a way to reduce the complexity of abovemethods from O(N2) and O(N3) to O(N) or O(N log N).5
- 6. Performance!Up to 10,000x !speedups!(on one CPU)!6
- 7. How Does Skytree Do This?!7Deep knowledge of algorithmsDrawing from the latest from academiaSmart programmingEfficient ways to compute order N(2) and N(3)Distributed systemsTake advantage of parallel computing speed
- 8. Team!8Martin Hack, CEO & Co-Founder Sun, GreenBorder (Google)!Alexander Gray, PhD, CTO & Co-Founder Leading Light for Large-Scale, Fast Algorithms!Paul Salazar, VP Sales RedHat, Greenplum!Leland Wilkinson, PhD, VP Data Visualization Creator of SYSTAT (SPSS/IBM).!Tim Marsland, PhD, VP Engineering Sun Fellow, CTO Software, Apple, Oracle!!!!EXECUTIVETEAM!BOARD OFDIRECTORS!Rick Lewis, USVP Noah Doyle, Javelin Venture Partners!David Toth, Founder and CEO NetRatings (Nielsen)!Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’!Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)!Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)!Prof. James Demmel, UC Berkeley: high-performance computing!INVESTORS!TECH!ADVISORY!BOARD!USVP, Javelin Venture Partners, Scott McNealy, UPS
- 9. Product Overview!9Skytree Adviserfor DesktopData Science for EveryoneSkytree Serverfor EnterprisesEnterprise Machine Learning• Predict Categories/Classes• Detect Anomalies• Find Trends• Predict Values/Numbers• Identify Patterns• Find OutliersAdvanced Analytics:
- 10. Thank you for learning about SkytreeRead more at www.skytree.net!• We’re hiring: check out our careers page.!• Download Skytree Adviser for Free.!• Pick up a T-Shirt.!

