Fast Machine Learning
with
by Fujio Turner
@FujioTurner
Current & Future Problems
Churn Prediction Truth and Veracity
Recommendations Online Advertisement
News Aggregation
Scalability
Content Discovery/Search
Intelligent Learning Machine Learning for Medicine
Source: Abhishek Shivkumar
LexisNexis is a provider of legal,
tax, regulatory, news, business
information, and analysis to
legal, corporate, government,
accounting and academic
markets.
LexisNexis has been in
business since 1977 with over
30,000 employees worldwide. 
What is HPCC Systems?Who is ?
LexisNexis Risk is the division
of the LexisNexis which focuses
on data, Big Data processing,
linking and vertical expertise
and supports HPCC Systems
as an open source project
under Apache 2.0 License.
http://hpccsystems.com/
Problems
Data from 10,000+
Different Source
Different Needs
for the Data
Different Levels
of Proficiency
Lots of Data
Different Needs
for the Data
Different Levels
of Proficiency
Alot of Data
Normalized / Denormalized
Structured / Unstructured
Data from 10,000+
Different Source
DEDUP, JOIN , INDEX ,
COUNT , REGEX, K-Means
BETWEEN, GROUP, CASE, Custom
1 Easy Language (ECL)
or
SQL , R , JAVA , Python , C++, SAS
Reliable Data Distribution & Processing
System that scales to exabytes+
Solutions
Machine Learning Built-in
Regression
Linear Regression
Classification
Naive Bayes
Perceptron
Decisions Trees
Logistic Regression
Clustering
K-Means
KD Trees
Agglomerative/Hierarchical
Association Analysis
AprioriN
EclatN
Rules
http://hpccsystems.com/ml
Michael Payne ,of Clemson University,
on high speed machine learning with
PB-BLAS in HPCC Systems.
http://youtu.be/s_HWlMwi6iI
“I’m sub-second
fast.”
“I can query all
or part of your
data.”
Thor Roxie
Single Threaded
Hard Disk
Index(optional)
Multi-Threaded
Hard Disk
Index(optional)
In-memory
SSD
Either/Both
Cluster Architecture
Sort
Count
Group
Classification
(ROXIE) 0.27 seconds to (THOR) few hours
Country = ‘US’
Join
Index of
~/facebook_2013
Query is Completed in a Single Job
Asynchronously
~/facebook_2013
Country = ‘US’
~/twitter_2013
SORT
GROUP
DEDUP
JOIN
MERGE
BETWEEN
LENGTH
REGEX
ROUND
SUM
COUNT
TRIM
WHEN
AVE
CASE
NORMALIZE
DENORMALIZE
K-MEANS
more ….
+
http://www.youtube.com/watch?v=8SV43DCUqJg
Watch how to install
HPCC Systems
in 5 Minutes
Download HPCC Systems
Open Source
Community Edition
or
Source Code
https://github.com/hpcc-systems
http://hpccsystems.com/download/
+
Common Big Data Setup
What is Couchbase ?
Open Source
Memcached Built-In
What is Couchbase ?
Open Source
Memcached Built-In w/ Replicas
What is Couchbase ?
Open Source
Memcached Built-In
Flexible Schema (JSON)
w/ Replicas
What is Couchbase ?
Open Source
Memcached Built-In
Key/Value & Distributed
Flexible Schema (JSON)
Cross Data Center Replication
w/ Replicas
What is Couchbase ?
Open Source
Memcached Built-In
Flexible Schema (JSON)
SQL++ (N1QL)
w/ Replicas
What is Couchbase ?
Key/Value & Distributed
Cross Data Center Replication
Open Source
+
Sub-Millisecond
SQL++(N1QL)
JSON
Distributed & Reliable
Distributed & Reliable
1 Language
Flexible Data Types
Ready for the Future
XDCR
Couchbase Mobile
.
.
.
.
.
Embedded JSON NoSQL Database
.
.
.
.
.
+ Sync Data Online / Offline
Embedded JSON NoSQL Database
+ Sync & Channel Data Peer-To-Peer
+ Sync Data Peer-To-Peer (directly)
Couchbase Mobile
Couchbase Mobile + HPCC Systems
.
.
.
.
.
Process & Store Data to Scale
INSTALL in 5 Minutes
Download
Source Code
Learning More - Couchbase Server & Lite
http://couchbase.com/download
https://github.com/couchbase
Mountain View, CA
San Francisco ,CA
https://www.youtube.com/
user/CouchbaseVideo

Big Data - Fast Machine Learning at Scale + Couchbase

  • 1.
    Fast Machine Learning with byFujio Turner @FujioTurner
  • 2.
    Current & FutureProblems Churn Prediction Truth and Veracity Recommendations Online Advertisement News Aggregation Scalability Content Discovery/Search Intelligent Learning Machine Learning for Medicine Source: Abhishek Shivkumar
  • 3.
    LexisNexis is aprovider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government, accounting and academic markets. LexisNexis has been in business since 1977 with over 30,000 employees worldwide.  What is HPCC Systems?Who is ? LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License. http://hpccsystems.com/
  • 4.
    Problems Data from 10,000+ DifferentSource Different Needs for the Data Different Levels of Proficiency Lots of Data
  • 5.
    Different Needs for theData Different Levels of Proficiency Alot of Data Normalized / Denormalized Structured / Unstructured Data from 10,000+ Different Source DEDUP, JOIN , INDEX , COUNT , REGEX, K-Means BETWEEN, GROUP, CASE, Custom 1 Easy Language (ECL) or SQL , R , JAVA , Python , C++, SAS Reliable Data Distribution & Processing System that scales to exabytes+ Solutions
  • 6.
    Machine Learning Built-in Regression LinearRegression Classification Naive Bayes Perceptron Decisions Trees Logistic Regression Clustering K-Means KD Trees Agglomerative/Hierarchical Association Analysis AprioriN EclatN Rules http://hpccsystems.com/ml Michael Payne ,of Clemson University, on high speed machine learning with PB-BLAS in HPCC Systems. http://youtu.be/s_HWlMwi6iI
  • 7.
    “I’m sub-second fast.” “I canquery all or part of your data.” Thor Roxie Single Threaded Hard Disk Index(optional) Multi-Threaded Hard Disk Index(optional) In-memory SSD Either/Both Cluster Architecture
  • 8.
    Sort Count Group Classification (ROXIE) 0.27 secondsto (THOR) few hours Country = ‘US’ Join Index of ~/facebook_2013 Query is Completed in a Single Job Asynchronously ~/facebook_2013 Country = ‘US’ ~/twitter_2013 SORT GROUP DEDUP JOIN MERGE BETWEEN LENGTH REGEX ROUND SUM COUNT TRIM WHEN AVE CASE NORMALIZE DENORMALIZE K-MEANS more …. +
  • 9.
    http://www.youtube.com/watch?v=8SV43DCUqJg Watch how toinstall HPCC Systems in 5 Minutes Download HPCC Systems Open Source Community Edition or Source Code https://github.com/hpcc-systems http://hpccsystems.com/download/
  • 10.
  • 11.
    What is Couchbase? Open Source
  • 12.
    Memcached Built-In What isCouchbase ? Open Source
  • 13.
    Memcached Built-In w/Replicas What is Couchbase ? Open Source
  • 14.
    Memcached Built-In Flexible Schema(JSON) w/ Replicas What is Couchbase ? Open Source
  • 15.
    Memcached Built-In Key/Value &Distributed Flexible Schema (JSON) Cross Data Center Replication w/ Replicas What is Couchbase ? Open Source
  • 16.
    Memcached Built-In Flexible Schema(JSON) SQL++ (N1QL) w/ Replicas What is Couchbase ? Key/Value & Distributed Cross Data Center Replication Open Source
  • 17.
    + Sub-Millisecond SQL++(N1QL) JSON Distributed & Reliable Distributed& Reliable 1 Language Flexible Data Types Ready for the Future XDCR
  • 18.
  • 19.
    . . . . . + Sync DataOnline / Offline Embedded JSON NoSQL Database + Sync & Channel Data Peer-To-Peer + Sync Data Peer-To-Peer (directly) Couchbase Mobile
  • 20.
    Couchbase Mobile +HPCC Systems . . . . . Process & Store Data to Scale
  • 21.
    INSTALL in 5Minutes Download Source Code Learning More - Couchbase Server & Lite http://couchbase.com/download https://github.com/couchbase Mountain View, CA San Francisco ,CA https://www.youtube.com/ user/CouchbaseVideo