SlideShare a Scribd company logo
1 of 37
IN-MEMORY NOSQL 
REAL-TIME ANALYTICS ON A 
HIGH PERFORMANCE 
DATABASE PLATFORM 
PETER MILNE 
DIRECTOR OF APPLICATION ENGINEERING 
BIG DATA STRATEGY 
VILNIUS 
MAY 2014 
Aerospike aer . o . spike [air-oh- spahyk] 
noun, 1. tip of a rocket that enhances speed and stability 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 1
Where is the wisdom we have lost in 
knowledge? Where is the knowledge we 
have lost in information? 
-T.S.Elliot 1934 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 2 
What is: “Analytics”? 
Analytics is: 
■ Finding Knowledge in Information 
■ Finding Signal in Noise
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 3 
Signal in Noise 
We see/hear noise 
But its all signal
How do we find Signal in Noise 
■ Smoothing & Filtering 
■ Classification 
■ Similarity 
■ Dimension Reduction 
■ Aggregation 
■ Voodoo + Alchemy (just joking) 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 4
Smoothing & Filtering 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 5
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 6 
Smoothing 
■ Moving average - each point in 
the signal is replaced with the 
average of “m” adjacent points, 
where “m is a positive integer 
called the smooth width
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 7 
Filtering 
■ Pass Filters – High, Low, Band, 
Butterworth, etc 
■ Recursive Filters – Kalman Filter 
or linear quadratic estimation
Classification 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 8
Cluster Analysis 
Cluster analysis or clustering is the task of 
grouping a set of objects in such a way 
that objects in the same group (called a 
cluster) are more similar (in some sense 
or another) to each other than to those in 
other groups (clusters). 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 9
Naïve Bayes Classifier 
A naive Bayes classifier is a simple 
probabilistic classifier based on applying 
Bayes' theorem with strong (naive) 
independence assumptions. - Wikipedia 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 10
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 11 
Similarity
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 12 
Cosine Similarity 
“Cosine similarity is a measure of 
similarity between two vectors of 
an inner product space that measures 
the cosine of the angle between 
them. …. Cosine similarity is 
particularly used in positive space, 
where the outcome is neatly 
bounded in [0,1].” - Wikipedia
Dynamic Time Warping - DTW 
“In time series analysis, dynamic time 
warping (DTW) is an algorithm for 
measuring similarity between two 
temporal sequences which may vary in 
time or speed. 
… 
A well known application has been 
automatic speech recognition, to cope 
with different speaking speeds. 
… 
Other applications include speaker 
recognition and online signature 
recognition. Also it is seen that it can be 
used in partial shape matching 
application.” - Wikipedia 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 13
Dimension Reduction 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 14
Principal Component Analysis – PCA 
The PCA method finds the directions with 
the greatest variance in the data, called 
principal components. 
Eigenfaces – Facial recognition, OpenCV 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 15
Linear Discriminant Analysis - LDA 
“Linear discriminant analysis (LDA) and 
the related Fisher's linear discriminant 
are methods used in statistics, pattern 
recognition and machine learning to find 
a linear combination of features which 
characterizes or separates two or more 
classes of objects or events. The resulting 
combination may be used as a linear 
classifier, or, more commonly, for 
dimensionality reduction before later 
classification.” - Wikipedia 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 16 
Bankruptcy prediction 
Facial recognition 
Marketing
My Brain is Full 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 17
Technologies 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 18
Complex Event Processing (CEP) 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 19 
Storm
Map Reduce 
Distributed Database Cluster 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 20
Hadoop 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 21 
➤Large 
➤Powerful 
➤Capable 
➤Methodical 
➤Batch 
Input: HDFS - PetaBytes of “RAW” 
data 
Output: NoSQL – Signal from 
Noise
HOT ANALYTICS – In Real-time 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 22
Key Challenges 
➤ Handle extremely high rates of read/write transactions with 
concurrent real-time analytics 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 23 
➤ Avoid hot spots 
 On a node 
 An index 
 A key 
➤ Pre-qualify data to be processed in Map Reduce 
➤ Maximize parallelism 
➤ Minimize programmer complexity 
➤ In Realtime
© 2013 Aerospike. All rights reserved Pg. 24 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 24 
1) Shared Nothing 
Architecture, 
every node identical 
2) No hotspots 
– DHT with RIPEMD160 
3) Single row ACID 
– synch replication within 
cluster 
4) Real-time prioritization 
of transactions + long 
running tasks 
5) Smart Cluster – Zero 
touch auto fail-over, 
rebalancing, rolling 
upgrades.. 
6) Smart Client - 1 hop to 
data, no load balancers 
Aerospike Architecture
Queries + User Defined Functions = Real-time Analytics 
STREAM 
AGGREGATIONS 
(INDEXED MAP-REDUCE) 
Pipe Query results 
through UDFs 
■ Filter, Transform, 
Aggregate.. Map, 
Reduce 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 25 
User Defined 
Functions (UDFs) 
for real-time analytics 
and aggregations
Conceptual Stream Processing 
■ Output of a query is a Stream 
■ Stream flows through 
■Filter 
■Mapper 
■Aggregator 
■Reducer 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 26
Hot Analytics Scanario – Airline Late Flights 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 27 
Data 
■Airline flights in the USA January 2012 
■`1,050,000 flight records 
Task 
■On a specific date 
■ Which Airline had late flights? 
■ How many flights? 
■ How many were late? 
■ Percentage late flights? 
Performance Requirements 
■Results in < 1 Sec 
■No impact on production transaction performance (300K TPS) 
GitHub Repo - https://github.com/aerospike/flights-analytics
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 28 
Solution 
■ Index the flight records by Date 
■ Aggregate (Map) late flight data on node 
■ Reduce flight data from each node in the 
“client” 
■ User Defined Functions (UDFs) written in 
Lua 
■ Registered with the Aerospike Cluster 
■ Invoked as part of a secondary index 
query 
Indexed Map Reduce
Prepare and execute a Query 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 29
Aggregation Function (Map) function 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 30
Reduce function 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 31
Stream Function (StreamUDF) 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 32
Operations (300k TPS) + Analytics (Indexed Map/Reduce) 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 33 
■ Java App calculates 
% of late flights by Airline 
■ 300k TPS Operations + 
Process 1 Million 
records 
■ Indexed Map/Reduce 
■ Aggregations 
■ Distributed Queries + UDF 
■ Runs in 0.5 seconds
Operational + Analytics + Adding servers and Re-balancing 
■ 300k TPS Operations + 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 34 
Process 1 Million 
records 
■ Runs in .5 seconds 
■ Add servers, 
auto-rebalance 
while running query 
Cluster 
■ 3 Nodes 
■ Hex core 3.4Ghz 
■ RAM 24GB 
■ 2x Micron p420m SSDs 
■ 10GB network
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 35 
Books
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 36 
Software 
■ Aerospike 
■http://www.aerospike.com/free-aerospike-3-community-edition/ 
■ Tools 
■Eclipse - http://www.eclipse.org/ 
■Lua Plugin - http://www.eclipse.org/koneki/ldt/ 
■Aerospike Plugin - https://github.com/aerospike/eclipse-tools 
■ Example 
■Fligtt Analytics - https://github.com/aerospike/flights-analytics
QUESTIONS? 
info@aerospike.com 
www.aerospike.com 
© 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 37

More Related Content

Similar to Real Time Analytics

Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
How Bluemix Helps NASA Innovate
How Bluemix Helps NASA InnovateHow Bluemix Helps NASA Innovate
How Bluemix Helps NASA InnovateIBM
 
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...Rainer Sternfeld
 
Inspire hack 2017-linked-data
Inspire hack 2017-linked-dataInspire hack 2017-linked-data
Inspire hack 2017-linked-dataRaul Palma
 
Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generationplan4all
 
Ensuring solution performance in large-scale smart installations”, Showcase t...
Ensuring solution performance in large-scale smart installations”, Showcase t...Ensuring solution performance in large-scale smart installations”, Showcase t...
Ensuring solution performance in large-scale smart installations”, Showcase t...Landis+Gyr
 
12.20 paul archer, atheeb intergraph
12.20 paul archer, atheeb intergraph12.20 paul archer, atheeb intergraph
12.20 paul archer, atheeb intergraphIbrahim Al-Hudhaif
 
Creating the Smart Transportation Infrastructure of the Future
Creating the Smart Transportation Infrastructure of the FutureCreating the Smart Transportation Infrastructure of the Future
Creating the Smart Transportation Infrastructure of the FutureDataWorks Summit
 
Manage a hybrid enterprise application architecture
Manage a hybrid enterprise application architectureManage a hybrid enterprise application architecture
Manage a hybrid enterprise application architectureOPITZ CONSULTING Deutschland
 
Full planning support system at your fingertips mvopc p&amp;z_120712
Full planning support system at your fingertips mvopc p&amp;z_120712Full planning support system at your fingertips mvopc p&amp;z_120712
Full planning support system at your fingertips mvopc p&amp;z_120712MVRPC
 
Avaali lbs location based services
Avaali lbs location based servicesAvaali lbs location based services
Avaali lbs location based servicesAvaali Solutions
 
Connected Fleet with SAP IoT - Webinar - Westernacher Consulting
Connected Fleet with SAP IoT - Webinar - Westernacher ConsultingConnected Fleet with SAP IoT - Webinar - Westernacher Consulting
Connected Fleet with SAP IoT - Webinar - Westernacher ConsultingWesternacher Consulting AG
 
22 May 2014 CDE competition: Information processing and sensemaking presentation
22 May 2014 CDE competition: Information processing and sensemaking presentation22 May 2014 CDE competition: Information processing and sensemaking presentation
22 May 2014 CDE competition: Information processing and sensemaking presentationDefence and Security Accelerator
 
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock ExchangeIRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock ExchangeIRJET Journal
 
Mapping Research Infrastructures with the ENVRI Reference Model
Mapping Research Infrastructures with the ENVRI Reference ModelMapping Research Infrastructures with the ENVRI Reference Model
Mapping Research Infrastructures with the ENVRI Reference ModelAlex Hardisty
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR Technologies
 

Similar to Real Time Analytics (20)

Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
How Bluemix Helps NASA Innovate
How Bluemix Helps NASA InnovateHow Bluemix Helps NASA Innovate
How Bluemix Helps NASA Innovate
 
Shanghai Breakout: Location Analytics – Key Considerations and Use Cases
Shanghai Breakout: Location Analytics – Key Considerations and Use CasesShanghai Breakout: Location Analytics – Key Considerations and Use Cases
Shanghai Breakout: Location Analytics – Key Considerations and Use Cases
 
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
 
Inspire hack 2017-linked-data
Inspire hack 2017-linked-dataInspire hack 2017-linked-data
Inspire hack 2017-linked-data
 
Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generation
 
Awalin viz sec
Awalin viz secAwalin viz sec
Awalin viz sec
 
Ensuring solution performance in large-scale smart installations”, Showcase t...
Ensuring solution performance in large-scale smart installations”, Showcase t...Ensuring solution performance in large-scale smart installations”, Showcase t...
Ensuring solution performance in large-scale smart installations”, Showcase t...
 
12.20 paul archer, atheeb intergraph
12.20 paul archer, atheeb intergraph12.20 paul archer, atheeb intergraph
12.20 paul archer, atheeb intergraph
 
Creating the Smart Transportation Infrastructure of the Future
Creating the Smart Transportation Infrastructure of the FutureCreating the Smart Transportation Infrastructure of the Future
Creating the Smart Transportation Infrastructure of the Future
 
Manage a hybrid enterprise application architecture
Manage a hybrid enterprise application architectureManage a hybrid enterprise application architecture
Manage a hybrid enterprise application architecture
 
Full planning support system at your fingertips mvopc p&amp;z_120712
Full planning support system at your fingertips mvopc p&amp;z_120712Full planning support system at your fingertips mvopc p&amp;z_120712
Full planning support system at your fingertips mvopc p&amp;z_120712
 
Avaali lbs location based services
Avaali lbs location based servicesAvaali lbs location based services
Avaali lbs location based services
 
Connected Fleet with SAP IoT - Webinar - Westernacher Consulting
Connected Fleet with SAP IoT - Webinar - Westernacher ConsultingConnected Fleet with SAP IoT - Webinar - Westernacher Consulting
Connected Fleet with SAP IoT - Webinar - Westernacher Consulting
 
22 May 2014 CDE competition: Information processing and sensemaking presentation
22 May 2014 CDE competition: Information processing and sensemaking presentation22 May 2014 CDE competition: Information processing and sensemaking presentation
22 May 2014 CDE competition: Information processing and sensemaking presentation
 
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock ExchangeIRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
 
Mapping Research Infrastructures with the ENVRI Reference Model
Mapping Research Infrastructures with the ENVRI Reference ModelMapping Research Infrastructures with the ENVRI Reference Model
Mapping Research Infrastructures with the ENVRI Reference Model
 
Dejan Neskovic Resume
Dejan Neskovic ResumeDejan Neskovic Resume
Dejan Neskovic Resume
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012
 

More from Peter Milne

Achieving High Load in Advertising Technology
Achieving High Load in Advertising TechnologyAchieving High Load in Advertising Technology
Achieving High Load in Advertising TechnologyPeter Milne
 
Glue con denver may 2015 sql to nosql
Glue con denver may 2015 sql to nosqlGlue con denver may 2015 sql to nosql
Glue con denver may 2015 sql to nosqlPeter Milne
 
Principles of High Load - Vilnius January 2015
Principles of High Load - Vilnius January 2015Principles of High Load - Vilnius January 2015
Principles of High Load - Vilnius January 2015Peter Milne
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike ArchitecturePeter Milne
 
Generating NoSQL from SQL
Generating NoSQL from SQLGenerating NoSQL from SQL
Generating NoSQL from SQLPeter Milne
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike ArchitecturePeter Milne
 

More from Peter Milne (6)

Achieving High Load in Advertising Technology
Achieving High Load in Advertising TechnologyAchieving High Load in Advertising Technology
Achieving High Load in Advertising Technology
 
Glue con denver may 2015 sql to nosql
Glue con denver may 2015 sql to nosqlGlue con denver may 2015 sql to nosql
Glue con denver may 2015 sql to nosql
 
Principles of High Load - Vilnius January 2015
Principles of High Load - Vilnius January 2015Principles of High Load - Vilnius January 2015
Principles of High Load - Vilnius January 2015
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
Generating NoSQL from SQL
Generating NoSQL from SQLGenerating NoSQL from SQL
Generating NoSQL from SQL
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 

Real Time Analytics

  • 1. IN-MEMORY NOSQL REAL-TIME ANALYTICS ON A HIGH PERFORMANCE DATABASE PLATFORM PETER MILNE DIRECTOR OF APPLICATION ENGINEERING BIG DATA STRATEGY VILNIUS MAY 2014 Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 1
  • 2. Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -T.S.Elliot 1934 © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 2 What is: “Analytics”? Analytics is: ■ Finding Knowledge in Information ■ Finding Signal in Noise
  • 3. © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 3 Signal in Noise We see/hear noise But its all signal
  • 4. How do we find Signal in Noise ■ Smoothing & Filtering ■ Classification ■ Similarity ■ Dimension Reduction ■ Aggregation ■ Voodoo + Alchemy (just joking) © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 4
  • 5. Smoothing & Filtering © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 5
  • 6. © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 6 Smoothing ■ Moving average - each point in the signal is replaced with the average of “m” adjacent points, where “m is a positive integer called the smooth width
  • 7. © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 7 Filtering ■ Pass Filters – High, Low, Band, Butterworth, etc ■ Recursive Filters – Kalman Filter or linear quadratic estimation
  • 8. Classification © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 8
  • 9. Cluster Analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 9
  • 10. Naïve Bayes Classifier A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. - Wikipedia © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 10
  • 11. © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 11 Similarity
  • 12. © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 12 Cosine Similarity “Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. …. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1].” - Wikipedia
  • 13. Dynamic Time Warping - DTW “In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences which may vary in time or speed. … A well known application has been automatic speech recognition, to cope with different speaking speeds. … Other applications include speaker recognition and online signature recognition. Also it is seen that it can be used in partial shape matching application.” - Wikipedia © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 13
  • 14. Dimension Reduction © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 14
  • 15. Principal Component Analysis – PCA The PCA method finds the directions with the greatest variance in the data, called principal components. Eigenfaces – Facial recognition, OpenCV © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 15
  • 16. Linear Discriminant Analysis - LDA “Linear discriminant analysis (LDA) and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.” - Wikipedia © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 16 Bankruptcy prediction Facial recognition Marketing
  • 17. My Brain is Full © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 17
  • 18. Technologies © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 18
  • 19. Complex Event Processing (CEP) © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 19 Storm
  • 20. Map Reduce Distributed Database Cluster © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 20
  • 21. Hadoop © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 21 ➤Large ➤Powerful ➤Capable ➤Methodical ➤Batch Input: HDFS - PetaBytes of “RAW” data Output: NoSQL – Signal from Noise
  • 22. HOT ANALYTICS – In Real-time © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 22
  • 23. Key Challenges ➤ Handle extremely high rates of read/write transactions with concurrent real-time analytics © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 23 ➤ Avoid hot spots  On a node  An index  A key ➤ Pre-qualify data to be processed in Map Reduce ➤ Maximize parallelism ➤ Minimize programmer complexity ➤ In Realtime
  • 24. © 2013 Aerospike. All rights reserved Pg. 24 © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 24 1) Shared Nothing Architecture, every node identical 2) No hotspots – DHT with RIPEMD160 3) Single row ACID – synch replication within cluster 4) Real-time prioritization of transactions + long running tasks 5) Smart Cluster – Zero touch auto fail-over, rebalancing, rolling upgrades.. 6) Smart Client - 1 hop to data, no load balancers Aerospike Architecture
  • 25. Queries + User Defined Functions = Real-time Analytics STREAM AGGREGATIONS (INDEXED MAP-REDUCE) Pipe Query results through UDFs ■ Filter, Transform, Aggregate.. Map, Reduce © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 25 User Defined Functions (UDFs) for real-time analytics and aggregations
  • 26. Conceptual Stream Processing ■ Output of a query is a Stream ■ Stream flows through ■Filter ■Mapper ■Aggregator ■Reducer © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 26
  • 27. Hot Analytics Scanario – Airline Late Flights © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 27 Data ■Airline flights in the USA January 2012 ■`1,050,000 flight records Task ■On a specific date ■ Which Airline had late flights? ■ How many flights? ■ How many were late? ■ Percentage late flights? Performance Requirements ■Results in < 1 Sec ■No impact on production transaction performance (300K TPS) GitHub Repo - https://github.com/aerospike/flights-analytics
  • 28. © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 28 Solution ■ Index the flight records by Date ■ Aggregate (Map) late flight data on node ■ Reduce flight data from each node in the “client” ■ User Defined Functions (UDFs) written in Lua ■ Registered with the Aerospike Cluster ■ Invoked as part of a secondary index query Indexed Map Reduce
  • 29. Prepare and execute a Query © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 29
  • 30. Aggregation Function (Map) function © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 30
  • 31. Reduce function © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 31
  • 32. Stream Function (StreamUDF) © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 32
  • 33. Operations (300k TPS) + Analytics (Indexed Map/Reduce) © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 33 ■ Java App calculates % of late flights by Airline ■ 300k TPS Operations + Process 1 Million records ■ Indexed Map/Reduce ■ Aggregations ■ Distributed Queries + UDF ■ Runs in 0.5 seconds
  • 34. Operational + Analytics + Adding servers and Re-balancing ■ 300k TPS Operations + © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 34 Process 1 Million records ■ Runs in .5 seconds ■ Add servers, auto-rebalance while running query Cluster ■ 3 Nodes ■ Hex core 3.4Ghz ■ RAM 24GB ■ 2x Micron p420m SSDs ■ 10GB network
  • 35. © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 35 Books
  • 36. © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 36 Software ■ Aerospike ■http://www.aerospike.com/free-aerospike-3-community-edition/ ■ Tools ■Eclipse - http://www.eclipse.org/ ■Lua Plugin - http://www.eclipse.org/koneki/ldt/ ■Aerospike Plugin - https://github.com/aerospike/eclipse-tools ■ Example ■Fligtt Analytics - https://github.com/aerospike/flights-analytics
  • 37. QUESTIONS? info@aerospike.com www.aerospike.com © 2014 Aerospike, Inc. All rights reserved. Confidential. | Bid Data Strategy, Vilnius – May 2014 | 37

Editor's Notes

  1. OpenCV Facial recognition