Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Shiva Amiri, PhD
Chief Product Officer
MLConf Seattle - May 1st 2015
Incorporating the Real Time Component into
Analytics ...
The Challenge
 One or more structural limitations have significantly constrained
successful data mining applications and ...
Source: http://www.informationweek.com/big-data/big-data-analytics/5-
analytics-bi-data-management-trends-for-2015/a/d-id/...
4
The Market Opportunity
 IDC Reports Big Data Analytics market at $125 billion in 2015
 Gartner reports the Internet of...
Which verticals are we looking at?
 Bioinformatics, Computational Biology – genetics, proteomics, EEG data,
fMRI, Molecul...
Disorder X
An example: Complexity of Brain Disorders
Disorder Y
7
What kinds of questions do we want to ask?
 How do the genes and proteins in disorders relate
to each other – clusterin...
Big Data: The Four V’s
RTDS’ SymetryMLTM : What have we built?
 SymetryML™ is a distributed GPU-
implemented predictive analysis and modeling
te...
How is SymetryML™ addressing these
challenges?
 The V’s of Big Data
 SymetryMLTM can handle heavy volumes of data (Volum...
11
 Faster: In minutes SymetryMLTM can utilize 10,000’s+ variables by constructing 1000’s of model
combinations and ultim...
12
Parallel
Processing/Distributed
Computing
Incremental/Decremental
Learning
(no rescan)
Automated Variable
Selection
Add...
Component Technologies
Component
Web UI
REST API
Core
functionalities
NVIDIA GPU
support
Project
sym-web
sym-rest
sym-core...
SymetryML™-CORE
Basic Functionality:
 Learn / Forget data
 Univariate Analysis – Mean, StDev, F Test, Z Test, T Test,
 ...
Web-UI - exploration
15
Web-UI - exploration
16
Web-UI - modeling
17
Web-UI - assessment
18
RTDS Inc. – Headlines
 Team of 6 engineers and Data Scientists in Toronto, Board in NY
 Focus on Technology Differentiat...
Next steps
 We’ve been successful with this technology in the mobile advertising
space…now we want to use the power of th...
Thank you
shiva@rtdsinc.com
neil@rtdsinc.com
www.rtdsinc.com
21
Contact
22
SymetryMLTM and
GPUs
• Native library that uses NVIDIA GPUs are available for:
• Linux 64 bit (CentOS 5.x and Amazon Linux...
• Interactive HTML 5 application
• Direct connection to SYM-REST
• It is de-facto a light weight front-end to SYM-REST
• B...
• Provides a Restful API to sym-core.
• Supported Data Sources:
• Amazon S3
• SFTP
• HTTP/HTTPS
• Redshift
• Upcoming Data...
• User of the rest-API needs an access key
• We generate these keys
• Key is AES 128 bits.
• Every REST request is authent...
Finance data example
• NASDAQ TotalView-ITCH Intraday Data Modeling
 175Gb - one month of raw data
 55Gb of transactions...
Upcoming SlideShare
Loading in …5
×

Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

963 views

Published on

Incorporating the Real Time Component into Analytics and Machine Learning: Many industries and organizations today want to harness the power of big data analytics and machine learning for its potential to improve margins, enhance discoveries, give insight into the business, and enable fast data driven decisions. The challenges include inability and/or difficulties in using available systems, not knowing where to start or which tools make sense for a particular problem, and dealing with data sets that are too big, too fast, or too complicated to handle with traditional systems.

RTDS Inc. has developed SymetryMLTM which are technologies for zero latency machine learning and analytics/exploration of very large datasets in real time, with a focus on speed, accuracy and simplicity. Our goal has been to cut the memory footprint required to learn large data sets, “reducer” functionality to automatically select the best attributes for model creation and build models on the fly. SymetryMLTM is also designed for easy integration into existing business processes via either an easy to use Web-UI or RESTful APIs.

This talk will explore some of the functionality of these systems including real time exploration of data, fast multi-variate model prototyping, and our use of GPUs and parallelization. An example of brain related data and the complexities of analytics will be discussed as well as a brief overview of other verticals we are exploring. Our work is geared towards making big data make sense in real time and enable users to gain insights faster than traditional methods.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

  1. 1. Shiva Amiri, PhD Chief Product Officer MLConf Seattle - May 1st 2015 Incorporating the Real Time Component into Analytics and Machine Learning
  2. 2. The Challenge  One or more structural limitations have significantly constrained successful data mining applications and initiatives  Frequently, these problems are associated with the amount of data, the rate of data generation and the number of attributes (variables) to be processed –  1000’s of data variables form which to model from (dimensionality)  100’s of billions of records to model data  Continuously evolving data elements and changing sets of data  The need to execute and adapt in Real Time  Increasingly, this “big data” environment expands beyond the capabilities of conventional data mining methods and technology 2
  3. 3. Source: http://www.informationweek.com/big-data/big-data-analytics/5- analytics-bi-data-management-trends-for-2015/a/d-id/1318551 - 09/01/2015 What are the trends?
  4. 4. 4 The Market Opportunity  IDC Reports Big Data Analytics market at $125 billion in 2015  Gartner reports the Internet of Things (IoT) will have 25 billion devices with sensors connected by 2020 producing exabytes of data  IoT/E Market size by 2020 will exceed $14 trillion  Bioinformatics market is $7.5 billion according to Gartner  Streaming data, Real Time analytics and machine learning remain a significant challenge for multiple sectors
  5. 5. Which verticals are we looking at?  Bioinformatics, Computational Biology – genetics, proteomics, EEG data, fMRI, Molecular Dynamics data, etc.  Financials – behaviour, signals, patterns  Internet of Everything  Other fast and massive data is what we are interested in 5
  6. 6. Disorder X An example: Complexity of Brain Disorders Disorder Y
  7. 7. 7 What kinds of questions do we want to ask?  How do the genes and proteins in disorders relate to each other – clustering, regression, classification, etc.  What are the other factors involved in disease onset and progression?  What about environment data? Quality of Life? Education? Socioeconomic status? - natural language processing (NLP), classification, predictive modeling, etc.  How can we handle massive amounts of brain sensing and imaging data (EEG, fMRI) and link them to other data (genes and proteins)?  Integrative analytics  And questions we don’t know we have
  8. 8. Big Data: The Four V’s
  9. 9. RTDS’ SymetryMLTM : What have we built?  SymetryML™ is a distributed GPU- implemented predictive analysis and modeling technology for our Massive Data universe…  V3.5 released – real time analytics of large-scale data  Exploration(statistics) and model building, assessment and prediction in real time  Robust security and privacy features  V4.0 being developed – distributed computing capability 9
  10. 10. How is SymetryML™ addressing these challenges?  The V’s of Big Data  SymetryMLTM can handle heavy volumes of data (Volume)  SymetryMLTM can handle streaming data (Velocity)  Accelerated hardware with GPUs and distributed computing  REST API – flexibility and modular design, seamless integration into existing systems or development of custom systems  Simplicity of the design  Real Time analytics – exploration and model generation/prediction, handling massive data with unprecedented speed in real time  Privacy and security  Service Oriented Architecture – XaaS
  11. 11. 11  Faster: In minutes SymetryMLTM can utilize 10,000’s+ variables by constructing 1000’s of model combinations and ultimately reduce variables to a single model - builds models in real time as it learns  Smarter with Scale: Linearly scalable with zero limitation in length of data sets and depth of categorical data allows for unlimited learning from data  More Agile on-the-fly: Continuous learning, both distributed and parallel  Simply Deployed: SymetryMLTM models can be deployed in real time or in the form of scripts (SQL, Java, etc.) Proprietary Statistical Representation Data Learner Modeler Predictor Explorer
  12. 12. 12 Parallel Processing/Distributed Computing Incremental/Decremental Learning (no rescan) Automated Variable Selection Add variables on-the-fly SymetryML™ A few key features
  13. 13. Component Technologies Component Web UI REST API Core functionalities NVIDIA GPU support Project sym-web sym-rest sym-core sym-core Language JavaScript Java Java C/C++
  14. 14. SymetryML™-CORE Basic Functionality:  Learn / Forget data  Univariate Analysis – Mean, StDev, F Test, Z Test, T Test,  Bivariate Analysis  Correlation  Hypothesis Testing  Chi-square Testing  ANOVA  Model Selection and Creation  Predictions  Assessment  Persistence
  15. 15. Web-UI - exploration 15
  16. 16. Web-UI - exploration 16
  17. 17. Web-UI - modeling 17
  18. 18. Web-UI - assessment 18
  19. 19. RTDS Inc. – Headlines  Team of 6 engineers and Data Scientists in Toronto, Board in NY  Focus on Technology Differentiation  Technology timeline  March ’13 – Launched .NET Based Desktop Version  July ’13 – Launched SymetryMLTM Server with REST API.  December ’13 – Successfully deployed first GPU-based system  June ‘14 – Algorithmic Support Expanded  ’15 Roadmap: Aggressive, Attainable and Defensible  Proven technology with successful deployment in advertising  Current Financing  Mogility Capital 19
  20. 20. Next steps  We’ve been successful with this technology in the mobile advertising space…now we want to use the power of this technology in other strategic sectors  We are looking for partners as beta users - with unique datasets and use cases - what kinds of questions can we help answer with your data?  We are looking for integration partners where we can both enhance our offering  Develop the next version (v4.0) of SymetryMLTM – fully parallel with Apache Spark 20
  21. 21. Thank you shiva@rtdsinc.com neil@rtdsinc.com www.rtdsinc.com 21 Contact
  22. 22. 22
  23. 23. SymetryMLTM and GPUs • Native library that uses NVIDIA GPUs are available for: • Linux 64 bit (CentOS 5.x and Amazon Linux) • Use of GPUs for core operations: • Learning / Forgetting data • Model Building • Model Selection
  24. 24. • Interactive HTML 5 application • Direct connection to SYM-REST • It is de-facto a light weight front-end to SYM-REST • Based on Sencha Ext-JS 4.x SymetryMLTM-WEB
  25. 25. • Provides a Restful API to sym-core. • Supported Data Sources: • Amazon S3 • SFTP • HTTP/HTTPS • Redshift • Upcoming Data Sources: • HDFS • ODBC/JDBC SYM-REST
  26. 26. • User of the rest-API needs an access key • We generate these keys • Key is AES 128 bits. • Every REST request is authenticated with a HMAC (SHA1) code based on part of the request • If data encryption is needed, then usage of HTTPS is possible SYM-REST Security
  27. 27. Finance data example • NASDAQ TotalView-ITCH Intraday Data Modeling  175Gb - one month of raw data  55Gb of transactions for NASDAQ100 constituents  12M rows/400 attributes  Univariate analysis across securities  Covariance and Hypothesis Testing  Model Building: Classification/Regression  Prediction of Price Movement  Full Order Book Analysis 27

×