• Like

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Adatao Live Demo at the First Spark Summit

  • 1,131 views
Published

Adatao's Live Product Demo …

Adatao's Live Product Demo
at the First Spark Summit
December 2, 2013
Nikko Hotel, San Francisco

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,131
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
16
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. DATA INTELLIGENCE FOR ALL Adatao Live Demo at the First Spark Summit Dec 2, 2013, San Francisco (Video at the end of this deck) Christopher Nguyen, PhD Co-Founder & CEO
  • 2. Big-Data Compute Engines, Google Apps Engineering Director, Google Founders’ Award, HKUST Prof, 2 successful enterprise exits, Stanford PhD Deep engineering & business experience from Google, Yahoo et al. PhD’s in DM & ML from UIUC, Georgia Tech, Stanford, Berkeley, ... Hadoop distributed/streaming analytics,Yahoo Hadoop Eng, UIUC PhD Machine learning & machine vision, US Army Research Lab, Johns Hopkins PhD
  • 3. Business Users Data Scientists Data Engineers ONE Integrated Platform for Business & Data Science & Engineering BIG INSIGHTS 001 001 0 1 1 00 1 1 0 1 1 1 01 1 1 0 1 0 0 01 0 0 0 0 0 0 10 0 0 1 0 0 0 10 0 0 1 0 1 1 00 1 1 0 1 1 1 01 1 1 0 Visually Beautiful Interactive Data
 Exploration Narrative Web App BIG COMPUTE Powerful In-Memory Data Mining Machine Learning Big Analytics Platform (Hadoop HDFS, Cassandra, SQL DMBS, Streaming Data) BIG DATA
  • 4. Architecture Design One Integrated Platform for Business & Data Science & Engineering Business Users Data Scientists Data Engineers 001 001 0 1 1 00 1 1 0 1 1 1 01 1 1 0 1 0 0 01 0 0 0 0 0 0 10 0 0 1 0 0 0 10 0 0 1 0 1 1 00 1 1 0 1 1 1 01 1 1 0 Business Users VS Data Scientists Data Engineers stack for business users stack for data science stack for data eng OTHERS
  • 5. 001 001 0 1 1 00 1 1 0 1 1 1 01 1 1 0 1 0 0 01 0 0 0 0 0 0 10 0 0 1 0 0 0 10 0 0 1 0 1 1 00 1 1 0 1 1 1 01 1 1 0 for Data Scientists & Engineers Big Data Mining & Machine Learning Powerful In-Memory Data Mining & Machine Learning—Model Terabytes in Seconds Interactive, Cluster-Scale Data Munging & Modeling with Native R, R-Studio, Python, SQL, and Java Front-ends Real-Time Scoring Directly From Trained Models Share reproducible, live data analysis documents Hadoop, Cassandra, RDBMS, Streaming Data
  • 6. for Business Users Predictive Decision Making A Beautiful New Way to Create & Share Visual Narratives of Your Analysis ! Perform Ad Hoc Queries in Plain English ! Publish Streaming, Interactive Dashboards ! Collaborate With Others In Real Time ! Query Terabytes in Seconds.
  • 7. Demo Deployment Diagram CLIENT MASTER WORKER WORKER WORKER WORKER
  • 8. Demo Config Cluster: 8-node x 8-core x 30GB RAM x 1TB Disk Data Sets: 12GB-100GB, 100M-1B rows Airline Arrival Data, 1988-2008 from DoT
  • 9. Algorithms - LM & supporting statistics (AIC, log-likelihood, R2, cross-validation)
 - Binning
 - Classification metrics: confusion matrix, ROC, AUC, F1
 - Logistic Regression with Ref Level for Categorical Vars
 - k-Means
 - Random Forest
 - Naive Bayes
 - Linear SVM
  • 10. Algorithm Roadmap - Hierarchical Clustering
 - Text Mining (token, POS, LDA, …)
 - SVD
 - Markov Chain Models
 - Ensemble Models
 -…
  • 11. Thank you! See demo video at ! http:/ /youtu.be/5UAdk7oHoPE?t=7m