SlideShare a Scribd company logo
1 of 41
Search on the fly:
how to lighten your Big Data
Simona Russo
Auro Rolle
MILAN 25-26 NOVEMBER 2016
Who am I?
Head of R&D
FacilityLive
4years
Major Italian
Software houses
+20years
Software
products
(developed!)
4sw products
Finance
3years
Payroll
3years
Logistic
+10years
Search
4years
… so I’m
+20years old
Simona Russo
... some numbers
Who Am I ?
1996
2016
AGENDA
• Use cases
• Business needs
• Analysis
• Architectural overview
• Some components
• Ram indexes
• Cache management
• Push data from server to client
• Optimize http data serialization
• Scale out
• Stress test and performance test
• Future improvements
Search on the fly:
how to lighten your Big Data
Prerequisite: Lucene
Lucene is the de facto standard
for search library.
Initially developed by Doug Cutting in 1999 (also author of Hadoop in
2003), it joined Apache software foundation in 2001.
See https://en.wikipedia.org/wiki/Apache_Lucene
Apache Lucene is an open source full-featured search engine library
written entirely in JAVA:
• many powerful query types: phrase queries, wildcard queries, range queries
• fielded searching (i.e. title, author, contents)
• synonyms, stopwords options
• … and more. See http://lucene.apache.org/ for further information
Use cases: Business needs
“I want
constantly
updated data”
“I want excellent
search response time”
“I want to search all the available
data I have (even from different
data sources)”
“I don’t want to
duplicate my data”
Use cases: Business needs
Search Platform
Use cases: Conceptual high level layer overview
External Data
Services
DB
Files
CSV, XLS, …
External
Services
…
…
How to follow
business
needs
Presentation Layer: search results
Data layer
i.e.
Customers
View
Lucene Indexes
i.e.
Invoices
View
i.e.
Products
View
…
Use cases: Business needs
Data Sources
Analysis: how to integrate different data sources and search
on them?
Simple! We can create one or many Lucene Indexes integrating different data sources
(schema-less feature).
• Joining data sources
• Transforming data values
• Enriching , cleansing and indexing data values
Can we use data virtualization/federation middleware from others vendors?
• Yes, we can, but we have double data integrations:
1. from Data Source to Data Virtualization Middleware
2. from Data Virtualization to Lucene Indexes.
Data
Virtualization/Feder
ations middleware
Data Layer
Lucene Indexes
It may add latency/consistency/poor flexibility and revenues issues in a search driven
platform.
So we decided not to use it but instead to integrate data into Lucene Indexes directly.
“I want to search all the available
data I have (even from different
data sources)”
Use cases: Business needs
Analysis: How to avoid data sources duplications?
We have been indexing with Lucene only the metadata, the
data that the user wants to search.
All others data (i.e. pdf, html page, …) can reside on the
source system and can be accessed on demand.
So we significantly reduce data duplication
“I don’t want
duplicate my data”
Use cases: Business needs
Analysis: how to get constantly updated data?
Constantly updated data must constantly call external services
to retrieve updated data!
Issues:
• Overload external data services
• Worst search time (before every search call service, create
indexes and then search it!)
• Not feasible if I have to index ALL business data (Big Data) at
search time or very small time interval
Use cases: Business needs
Analysis: how do I get constantly
updated data?
We need to define the scope to
verify that the business needs are
satisfied
We need to define a path
Use cases: Constraints
Consider the example of a Call Center use case where the Operator has a
set of keywords to identify the caller (for example: phone number, fiscal
code, customer id, …) and needs to find all the related business
information.
The data is retrieved and indexed once the customer is identified and then
the operator refines the search to find specific customer information
related to the call.
If the operator does not interact with the system for a predefined time
interval (for example 1 minute) the indexed data can be discarded and the
next operator interaction will produce updated data.
Use cases: Constraints
The use-case is applicable to the single
entity nature of the data (for example
the customer of a call center) and the
fact that the data is found using known
specific key.
The technique is described as on-the-fly
because the data is retrieved and
indexed in realtime.
The application uses the data to enable
the user to find the specific information
required for the business transaction
without recontacting the source
systems.
“I want
constantly
updated data”
“I don’t want to
duplicate my data”
Use cases: Business needs
Analysis: How to get excellent response time?
With the previous use case contraints we are indexing only
the data correlated to the single entity searched:
• So we could create in memory Lucene Index (RAM)
because we have a smaller data set. RAM Indexes have
best performances than on disk indexes, but they must be
evicted after a prefixed time (CACHE with eviction policy).
How to prevent one external data service from delaying the
search response time?
• Push the data from single services to the Browser as soon
as they are available (SSE).
“I want excellent
search response time”
Architectural overview: Components
• Lucene RAM Indexes
• Improve response time
• Reduce data duplications
• Cache
• Improve response time of refined search
• Reduce data duplications (eviction policy)
• Push data when the data are available
• Improve total response time
Lucene RAM Index
A memory resident implementation of Lucene Index.
WARNING!
Lucene RAMDirectory implementation is not intended to work with huge indexes
Everything beyond several hundred megabytes will waste resources (GC cycles),
because it uses an internal buffer size of 1024 bytes, producing millions of
byte[1024] arrays. This class is optimized for small memory-resident indexes. It also
has bad concurrency on multithreaded environments.
See
https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/store/RAMDirector
y.html
• Limited environment in terms of data (single entity) and concurrent users
Lucene RAM Indexes
How we use RAM Indexes?
RAM index are indentified by:
• UserId
• Name (Logical Name of the data set), it depends on the user customization i.e.
customer, product, invoices
• Context Id (alphanumeric value), a new context id is attributed to it when there
are empty results from the search.
So the first search has a new context id, the refined search has the same context
id.
The same search from different user leads to the creation of new indexes.
Lucene RAM Index
In a cluster environment the RAM Indexes related to a user search resides always
on a dedicated node (load balancer with stickyness by user).
If this node fails the, the refined search will redirect to another node, so the RAM
index will be created again.
User
B
Node 1 RAM
(A) (A) (A)
Node 2 RAM
(B) (B)
Load
Balancer
User
A
(A) (A)
(B)
(A)
Why Cache?
Because we use RAM Indexes and we want to access Lucene Indexes when
the user wants to refine the previous search.
Why not ConcurrentHashMap?
Don’t reinvent the wheel! Because if we use cache we don’t have to build
a lot of built-in functions among which eviction policies!
Eviction policies:
Every cache needs to remove values using different criteria: by time, size,
weight.
By Time (expiration), remove elements that have reached:
• idle time, span of time when no operation is performed with the
given key (get/put key);
• total lived time, maximum span of time an element can spend in the
cache.
Caches benchmark
JMH Microbenchmark: put/get (4 threads + 4 threads)
1000 alphanumeric key per iteration pre-inserted (plus 1000 with put operation)
-i 10 -wi 2 -r 2s –f 5
0
10
20
30
40
50
60
Cache2k Guava EhCache Infinispan
Millions
Put (ops/sec)
Get (ops/sec)
MacBookPro Intel Core I7 CPU 2.5 GHz 4 core (2 threads per core) 16GB RAM
• Cache2K 0.28-BETA
• Guava version 20.0 (conc. 100)
• Infinispan 8.2.4.Final (conc. 100)
• EhCache 3.1.3
We tested and
selected the libs in
2015 (jdk 1.7)
This benchmarks
are updated with
the last libs
versions (jdk 1.8)
Cache2K Status
“ … We use every
cache2k release
within production
environments.
However, some of
the basic features
are still evolving and
there may be API
breaking changes
until we reach
version 1.0.”
https://cache2k.org/
Cache: Guava
We selected GUAVA for overall best performance results (after cache2k) and
because we need a simple local cache (not distributed) compatible with jdk 1.7.
The following is an example how to create a cache with time idle expiration
policy:
We use the time idle expiration policy to remove expired Indexes from the cache.
Cache: Guava
Expired elements
• Logical removed
When an element expires, it is not automatically removed from the cache
Expired entries will never be visible to read or write operations.
Cache.size() counts also expired entries.
• Physical removed
• Trying to write/access to expired entry
• Calling Cache.cleanUp() (force eviction)
CleanUp is an expensive operation, so it is not automatically performed by
GUAVA but the developer has to do it.
HOW to push data to browser?
Push data from server to browser
Polling
Client repeatedly sends new requests to a server. If the server has the response data
it sends the response, otherwise it sends an empty response.
Push data from server to client
Web
Browser
ServerREQUEST
RESPONSE (with data or empty)
REQUEST
RESPONSE (with data or empty)
Long Polling
Client sends request to a server: if the server has no data it holds the connection and
waits until data is available and then sends the data back to the client.
Push data from server to client
Web
Browser
ServerREQUEST
RESPONSE (with data)
REQUEST
RESPONSE (with data)
Server Sent Events
Similar to long-polling, when the client sends a request to a server that waits until
data is available and then sends the data back to the client as one or more events.
The client processes the data without closing the connection until the server sends
the last event.
Push data from server to client
Web
Browser
ServerREQUEST
EVENT (with data)
EVENT (with data)
EVENT (close connection)
Server Sent Events (SSE)
• The server transmits data to browser as a continuos stream with event-
stream content-type, over a connection which is left open
• Transported over HTTP it can be poly-filled with javascript to backport SSE
to browsers that do not support it yet.
• JAVA EE8 JSR370 (JAX RS 2.1) Proposed specification
“SSE is a new technology defined as part of the HTML5 set of recommendations for a
client (e.g., a browser) to automatically get updates from a server via HTTP. It is
commonly employed for one-way streaming data transmissions in which a server updates
a client periodically or every time an event takes place”
• Java Library to manage client/server SSE connection:
• https://jersey.java.net/documentation/latest/sse.html
• WebSocket?
Push data from server to client
We use KRYO, a Java framework for fast and efficient object serialization, to reduce
the size of object for the data being serialized in a http connection.
We reduce http service response time up to 50%!
Optimize http data serialization
https://github.com/EsotericSoftware/kryo
Benchmark:
https://github.com/eishay/jvm-serializers/wiki
Search
Manager
Architectural high level overview
External
data
Services
…
Presentation Layer: search results
v1 v2 v3 v4 v5
4.HTTP Rest
Access by
Keyword
(1 per service)
2. Search result
“Context” Indexes
searchSearch
1. Search by
“keyword”
3. No Results!
Access by
keyword
ASYNC
SSE
GUAVA Cache
RAM
Indexes
RAM
Indexes
RAM
Indexes
Service
Mapper
Service
Mapper
Service
Mapper
Service
Mapper
6. Send Response
(1 per service)
5. Create RAM
Indexes
Send event resp v1 Send event resp v2 Send event resp vx1. Refine
Search
Kryo serializer
Scale out
Presentation Layer: search results
Search
Manager
GUAVA Cache
RAM
Indexes
RAM
Indexes
RAM
Indexes
Service
Mapper
Service
Mapper
Service
Mapper
Service
Mapper
Cluster
Load Balancer
(stickiness by user)
Presentation Layer: search resultsPresentation Layer: search resultsLoad
Balancer
(stickiness
by user)
Search Layer
Cluster
Test
environment
Complete
system test
with 600
Vuser
27 [Tr/s]
9 [Tr/s] /
200 Vuser
per chain
FE-BE
32 GB RAM – 8 Core – Heap 20 GB
Phase 1
Incremental tests to find the max
transaction frequency
• 400 Vuser
• 800 Vuser
Phase 2
Environment minimization and direct load
on the search engine
Phase 1
Test 400
Vuser
18 [Tr/s] –
Linear
gain
Phase 1
Test 800
Vuser
24 [Tr/s] –
Sub linear
gain – Max
found
Phase 2
Direct load injection on a single BE via API
Search and indexing transactions with the
same document weight used in phase 1:
19 binary Add document on 19 Lucene
indexes. Average size 20k per Add
1 Search with 24 results
Phase 2
Test 200
Vuser
13 [Tr/s]
Clean UP
70s
Phase 2
Test 200 Vuser
43 [Tr/s]
Clean UP 15s
Samples Average response time [ms] Errors % Transactions per second
TOTAL 1107087 66 0.00% 860.0
Future improvements
• We are migrating to a microservices architecture:
• For example: Service Mapper and Searcher Manager are
easily converting to Microservices and scale out
independently
• We are testing the benefits/drawback to migrate to HTTP2
• Header compression
• Server push
• Binary protocols
Search on the fly:
how to lighten your Big Data
MILAN 25-26 NOVEMBER 2016

More Related Content

What's hot

Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetInteractive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetHortonworks
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache HiveMurtaza Doctor
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)Nicolas Kourtellis
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Understanding apache-druid
Understanding apache-druidUnderstanding apache-druid
Understanding apache-druidSuman Banerjee
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data TestingQA InfoTech
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Databricks
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantRoman Nikitchenko
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineeringnathanmarz
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
 

What's hot (20)

Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetInteractive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Understanding apache-druid
Understanding apache-druidUnderstanding apache-druid
Understanding apache-druid
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data Testing
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephant
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 

Viewers also liked

Higher order infrastructure: from Docker basics to cluster management - Nicol...
Higher order infrastructure: from Docker basics to cluster management - Nicol...Higher order infrastructure: from Docker basics to cluster management - Nicol...
Higher order infrastructure: from Docker basics to cluster management - Nicol...Codemotion
 
Cyber Analysts: who they are, what they do, where they are - Marco Ramilli - ...
Cyber Analysts: who they are, what they do, where they are - Marco Ramilli - ...Cyber Analysts: who they are, what they do, where they are - Marco Ramilli - ...
Cyber Analysts: who they are, what they do, where they are - Marco Ramilli - ...Codemotion
 
Lo sviluppo di Edge Guardian VR - Maurizio Tatafiore - Codemotion Milan 2016
Lo sviluppo di Edge Guardian VR - Maurizio Tatafiore - Codemotion Milan 2016Lo sviluppo di Edge Guardian VR - Maurizio Tatafiore - Codemotion Milan 2016
Lo sviluppo di Edge Guardian VR - Maurizio Tatafiore - Codemotion Milan 2016Codemotion
 
Master the chaos: from raw data to analytics - Andrea Pompili, Riccardo Rossi...
Master the chaos: from raw data to analytics - Andrea Pompili, Riccardo Rossi...Master the chaos: from raw data to analytics - Andrea Pompili, Riccardo Rossi...
Master the chaos: from raw data to analytics - Andrea Pompili, Riccardo Rossi...Codemotion
 
DevOps in Cloud, dai Container all'approccio Codeless - Gabriele Provinciali,...
DevOps in Cloud, dai Container all'approccio Codeless - Gabriele Provinciali,...DevOps in Cloud, dai Container all'approccio Codeless - Gabriele Provinciali,...
DevOps in Cloud, dai Container all'approccio Codeless - Gabriele Provinciali,...Codemotion
 
Milano Chatbots Meetup - Vittorio Banfi - Bot Design - Codemotion Milan 2016
Milano Chatbots Meetup - Vittorio Banfi - Bot Design - Codemotion Milan 2016 Milano Chatbots Meetup - Vittorio Banfi - Bot Design - Codemotion Milan 2016
Milano Chatbots Meetup - Vittorio Banfi - Bot Design - Codemotion Milan 2016 Codemotion
 
Games of Simplicity - Pozzi; Molinari - Codemotion Milan 2016
Games of Simplicity - Pozzi; Molinari - Codemotion Milan 2016Games of Simplicity - Pozzi; Molinari - Codemotion Milan 2016
Games of Simplicity - Pozzi; Molinari - Codemotion Milan 2016Codemotion
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Codemotion
 
Dove sono i tuoi vertici e di cosa stanno parlando?
Dove sono i tuoi vertici e di cosa stanno parlando?Dove sono i tuoi vertici e di cosa stanno parlando?
Dove sono i tuoi vertici e di cosa stanno parlando?Codemotion
 
Reactive Android: RxJava and beyond - Fabio Tiriticco - Codemotion Amsterdam ...
Reactive Android: RxJava and beyond - Fabio Tiriticco - Codemotion Amsterdam ...Reactive Android: RxJava and beyond - Fabio Tiriticco - Codemotion Amsterdam ...
Reactive Android: RxJava and beyond - Fabio Tiriticco - Codemotion Amsterdam ...Codemotion
 
Making your conferences more memorable with Sketchnoting - Linda van der Pal ...
Making your conferences more memorable with Sketchnoting - Linda van der Pal ...Making your conferences more memorable with Sketchnoting - Linda van der Pal ...
Making your conferences more memorable with Sketchnoting - Linda van der Pal ...Codemotion
 
The recurring nightmare - Rosa Gutierrez - Codemotion Amsterdam 2016
The recurring nightmare  - Rosa Gutierrez - Codemotion Amsterdam 2016The recurring nightmare  - Rosa Gutierrez - Codemotion Amsterdam 2016
The recurring nightmare - Rosa Gutierrez - Codemotion Amsterdam 2016Codemotion
 
Beautiful Authentication - Tiffany Conroy - Codemotion Milan 2016
Beautiful Authentication - Tiffany Conroy - Codemotion Milan 2016Beautiful Authentication - Tiffany Conroy - Codemotion Milan 2016
Beautiful Authentication - Tiffany Conroy - Codemotion Milan 2016Codemotion
 
Come rendere il proprio prodotto una bomba creandogli una intera community in...
Come rendere il proprio prodotto una bomba creandogli una intera community in...Come rendere il proprio prodotto una bomba creandogli una intera community in...
Come rendere il proprio prodotto una bomba creandogli una intera community in...Codemotion
 
Understanding Angular 2 - Shmuela Jacobs - Codemotion Milan 2016
Understanding Angular 2 - Shmuela Jacobs - Codemotion Milan 2016Understanding Angular 2 - Shmuela Jacobs - Codemotion Milan 2016
Understanding Angular 2 - Shmuela Jacobs - Codemotion Milan 2016Codemotion
 
Coding Culture - Sven Peters - Codemotion Milan 2016
Coding Culture - Sven Peters - Codemotion Milan 2016Coding Culture - Sven Peters - Codemotion Milan 2016
Coding Culture - Sven Peters - Codemotion Milan 2016Codemotion
 
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...Codemotion
 
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...Codemotion
 
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...Codemotion
 
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...Codemotion
 

Viewers also liked (20)

Higher order infrastructure: from Docker basics to cluster management - Nicol...
Higher order infrastructure: from Docker basics to cluster management - Nicol...Higher order infrastructure: from Docker basics to cluster management - Nicol...
Higher order infrastructure: from Docker basics to cluster management - Nicol...
 
Cyber Analysts: who they are, what they do, where they are - Marco Ramilli - ...
Cyber Analysts: who they are, what they do, where they are - Marco Ramilli - ...Cyber Analysts: who they are, what they do, where they are - Marco Ramilli - ...
Cyber Analysts: who they are, what they do, where they are - Marco Ramilli - ...
 
Lo sviluppo di Edge Guardian VR - Maurizio Tatafiore - Codemotion Milan 2016
Lo sviluppo di Edge Guardian VR - Maurizio Tatafiore - Codemotion Milan 2016Lo sviluppo di Edge Guardian VR - Maurizio Tatafiore - Codemotion Milan 2016
Lo sviluppo di Edge Guardian VR - Maurizio Tatafiore - Codemotion Milan 2016
 
Master the chaos: from raw data to analytics - Andrea Pompili, Riccardo Rossi...
Master the chaos: from raw data to analytics - Andrea Pompili, Riccardo Rossi...Master the chaos: from raw data to analytics - Andrea Pompili, Riccardo Rossi...
Master the chaos: from raw data to analytics - Andrea Pompili, Riccardo Rossi...
 
DevOps in Cloud, dai Container all'approccio Codeless - Gabriele Provinciali,...
DevOps in Cloud, dai Container all'approccio Codeless - Gabriele Provinciali,...DevOps in Cloud, dai Container all'approccio Codeless - Gabriele Provinciali,...
DevOps in Cloud, dai Container all'approccio Codeless - Gabriele Provinciali,...
 
Milano Chatbots Meetup - Vittorio Banfi - Bot Design - Codemotion Milan 2016
Milano Chatbots Meetup - Vittorio Banfi - Bot Design - Codemotion Milan 2016 Milano Chatbots Meetup - Vittorio Banfi - Bot Design - Codemotion Milan 2016
Milano Chatbots Meetup - Vittorio Banfi - Bot Design - Codemotion Milan 2016
 
Games of Simplicity - Pozzi; Molinari - Codemotion Milan 2016
Games of Simplicity - Pozzi; Molinari - Codemotion Milan 2016Games of Simplicity - Pozzi; Molinari - Codemotion Milan 2016
Games of Simplicity - Pozzi; Molinari - Codemotion Milan 2016
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
 
Dove sono i tuoi vertici e di cosa stanno parlando?
Dove sono i tuoi vertici e di cosa stanno parlando?Dove sono i tuoi vertici e di cosa stanno parlando?
Dove sono i tuoi vertici e di cosa stanno parlando?
 
Reactive Android: RxJava and beyond - Fabio Tiriticco - Codemotion Amsterdam ...
Reactive Android: RxJava and beyond - Fabio Tiriticco - Codemotion Amsterdam ...Reactive Android: RxJava and beyond - Fabio Tiriticco - Codemotion Amsterdam ...
Reactive Android: RxJava and beyond - Fabio Tiriticco - Codemotion Amsterdam ...
 
Making your conferences more memorable with Sketchnoting - Linda van der Pal ...
Making your conferences more memorable with Sketchnoting - Linda van der Pal ...Making your conferences more memorable with Sketchnoting - Linda van der Pal ...
Making your conferences more memorable with Sketchnoting - Linda van der Pal ...
 
The recurring nightmare - Rosa Gutierrez - Codemotion Amsterdam 2016
The recurring nightmare  - Rosa Gutierrez - Codemotion Amsterdam 2016The recurring nightmare  - Rosa Gutierrez - Codemotion Amsterdam 2016
The recurring nightmare - Rosa Gutierrez - Codemotion Amsterdam 2016
 
Beautiful Authentication - Tiffany Conroy - Codemotion Milan 2016
Beautiful Authentication - Tiffany Conroy - Codemotion Milan 2016Beautiful Authentication - Tiffany Conroy - Codemotion Milan 2016
Beautiful Authentication - Tiffany Conroy - Codemotion Milan 2016
 
Come rendere il proprio prodotto una bomba creandogli una intera community in...
Come rendere il proprio prodotto una bomba creandogli una intera community in...Come rendere il proprio prodotto una bomba creandogli una intera community in...
Come rendere il proprio prodotto una bomba creandogli una intera community in...
 
Understanding Angular 2 - Shmuela Jacobs - Codemotion Milan 2016
Understanding Angular 2 - Shmuela Jacobs - Codemotion Milan 2016Understanding Angular 2 - Shmuela Jacobs - Codemotion Milan 2016
Understanding Angular 2 - Shmuela Jacobs - Codemotion Milan 2016
 
Coding Culture - Sven Peters - Codemotion Milan 2016
Coding Culture - Sven Peters - Codemotion Milan 2016Coding Culture - Sven Peters - Codemotion Milan 2016
Coding Culture - Sven Peters - Codemotion Milan 2016
 
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
 
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
 
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
 
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
 

Similar to Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - Codemotion Milan 2016

Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldStéphane Dorrekens
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsMatt Kuklinski
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...Amazon Web Services
 
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdfBuild User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdfAlbert Wong
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysqlliufabin 66688
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architectureMatsuo Sawahashi
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysiswalk2talk srl
 
Performance tuning in ranker
Performance tuning in rankerPerformance tuning in ranker
Performance tuning in rankerEosSoftware
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1Bill Liu
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServicePoornima Vijayashanker
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesElasticsearch
 
Azure saturday pn 2018
Azure saturday pn 2018Azure saturday pn 2018
Azure saturday pn 2018Marco Pozzan
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesElasticsearch
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...BI Brainz
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructureSimon Belak
 

Similar to Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - Codemotion Milan 2016 (20)

Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
 
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdfBuild User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
Build User-Facing Analytics Application That Scales Using StarRocks (DLH).pdf
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysql
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysis
 
Performance tuning in ranker
Performance tuning in rankerPerformance tuning in ranker
Performance tuning in ranker
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisiones
 
Azure saturday pn 2018
Azure saturday pn 2018Azure saturday pn 2018
Azure saturday pn 2018
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Job portal
Job portalJob portal
Job portal
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - Codemotion Milan 2016

  • 1. Search on the fly: how to lighten your Big Data Simona Russo Auro Rolle MILAN 25-26 NOVEMBER 2016
  • 2. Who am I? Head of R&D FacilityLive 4years Major Italian Software houses +20years Software products (developed!) 4sw products Finance 3years Payroll 3years Logistic +10years Search 4years … so I’m +20years old Simona Russo ... some numbers
  • 3. Who Am I ? 1996 2016
  • 4. AGENDA • Use cases • Business needs • Analysis • Architectural overview • Some components • Ram indexes • Cache management • Push data from server to client • Optimize http data serialization • Scale out • Stress test and performance test • Future improvements Search on the fly: how to lighten your Big Data
  • 5. Prerequisite: Lucene Lucene is the de facto standard for search library. Initially developed by Doug Cutting in 1999 (also author of Hadoop in 2003), it joined Apache software foundation in 2001. See https://en.wikipedia.org/wiki/Apache_Lucene Apache Lucene is an open source full-featured search engine library written entirely in JAVA: • many powerful query types: phrase queries, wildcard queries, range queries • fielded searching (i.e. title, author, contents) • synonyms, stopwords options • … and more. See http://lucene.apache.org/ for further information
  • 6. Use cases: Business needs “I want constantly updated data” “I want excellent search response time” “I want to search all the available data I have (even from different data sources)” “I don’t want to duplicate my data”
  • 8. Search Platform Use cases: Conceptual high level layer overview External Data Services DB Files CSV, XLS, … External Services … … How to follow business needs Presentation Layer: search results Data layer i.e. Customers View Lucene Indexes i.e. Invoices View i.e. Products View …
  • 9. Use cases: Business needs Data Sources Analysis: how to integrate different data sources and search on them? Simple! We can create one or many Lucene Indexes integrating different data sources (schema-less feature). • Joining data sources • Transforming data values • Enriching , cleansing and indexing data values Can we use data virtualization/federation middleware from others vendors? • Yes, we can, but we have double data integrations: 1. from Data Source to Data Virtualization Middleware 2. from Data Virtualization to Lucene Indexes. Data Virtualization/Feder ations middleware Data Layer Lucene Indexes It may add latency/consistency/poor flexibility and revenues issues in a search driven platform. So we decided not to use it but instead to integrate data into Lucene Indexes directly. “I want to search all the available data I have (even from different data sources)”
  • 10. Use cases: Business needs Analysis: How to avoid data sources duplications? We have been indexing with Lucene only the metadata, the data that the user wants to search. All others data (i.e. pdf, html page, …) can reside on the source system and can be accessed on demand. So we significantly reduce data duplication “I don’t want duplicate my data”
  • 11. Use cases: Business needs Analysis: how to get constantly updated data? Constantly updated data must constantly call external services to retrieve updated data! Issues: • Overload external data services • Worst search time (before every search call service, create indexes and then search it!) • Not feasible if I have to index ALL business data (Big Data) at search time or very small time interval
  • 12. Use cases: Business needs Analysis: how do I get constantly updated data? We need to define the scope to verify that the business needs are satisfied We need to define a path
  • 13. Use cases: Constraints Consider the example of a Call Center use case where the Operator has a set of keywords to identify the caller (for example: phone number, fiscal code, customer id, …) and needs to find all the related business information. The data is retrieved and indexed once the customer is identified and then the operator refines the search to find specific customer information related to the call. If the operator does not interact with the system for a predefined time interval (for example 1 minute) the indexed data can be discarded and the next operator interaction will produce updated data.
  • 14. Use cases: Constraints The use-case is applicable to the single entity nature of the data (for example the customer of a call center) and the fact that the data is found using known specific key. The technique is described as on-the-fly because the data is retrieved and indexed in realtime. The application uses the data to enable the user to find the specific information required for the business transaction without recontacting the source systems. “I want constantly updated data” “I don’t want to duplicate my data”
  • 15. Use cases: Business needs Analysis: How to get excellent response time? With the previous use case contraints we are indexing only the data correlated to the single entity searched: • So we could create in memory Lucene Index (RAM) because we have a smaller data set. RAM Indexes have best performances than on disk indexes, but they must be evicted after a prefixed time (CACHE with eviction policy). How to prevent one external data service from delaying the search response time? • Push the data from single services to the Browser as soon as they are available (SSE). “I want excellent search response time”
  • 16. Architectural overview: Components • Lucene RAM Indexes • Improve response time • Reduce data duplications • Cache • Improve response time of refined search • Reduce data duplications (eviction policy) • Push data when the data are available • Improve total response time
  • 17. Lucene RAM Index A memory resident implementation of Lucene Index. WARNING! Lucene RAMDirectory implementation is not intended to work with huge indexes Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte[1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. See https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/store/RAMDirector y.html • Limited environment in terms of data (single entity) and concurrent users
  • 18. Lucene RAM Indexes How we use RAM Indexes? RAM index are indentified by: • UserId • Name (Logical Name of the data set), it depends on the user customization i.e. customer, product, invoices • Context Id (alphanumeric value), a new context id is attributed to it when there are empty results from the search. So the first search has a new context id, the refined search has the same context id. The same search from different user leads to the creation of new indexes.
  • 19. Lucene RAM Index In a cluster environment the RAM Indexes related to a user search resides always on a dedicated node (load balancer with stickyness by user). If this node fails the, the refined search will redirect to another node, so the RAM index will be created again. User B Node 1 RAM (A) (A) (A) Node 2 RAM (B) (B) Load Balancer User A (A) (A) (B) (A)
  • 20. Why Cache? Because we use RAM Indexes and we want to access Lucene Indexes when the user wants to refine the previous search. Why not ConcurrentHashMap? Don’t reinvent the wheel! Because if we use cache we don’t have to build a lot of built-in functions among which eviction policies! Eviction policies: Every cache needs to remove values using different criteria: by time, size, weight. By Time (expiration), remove elements that have reached: • idle time, span of time when no operation is performed with the given key (get/put key); • total lived time, maximum span of time an element can spend in the cache.
  • 21. Caches benchmark JMH Microbenchmark: put/get (4 threads + 4 threads) 1000 alphanumeric key per iteration pre-inserted (plus 1000 with put operation) -i 10 -wi 2 -r 2s –f 5 0 10 20 30 40 50 60 Cache2k Guava EhCache Infinispan Millions Put (ops/sec) Get (ops/sec) MacBookPro Intel Core I7 CPU 2.5 GHz 4 core (2 threads per core) 16GB RAM • Cache2K 0.28-BETA • Guava version 20.0 (conc. 100) • Infinispan 8.2.4.Final (conc. 100) • EhCache 3.1.3 We tested and selected the libs in 2015 (jdk 1.7) This benchmarks are updated with the last libs versions (jdk 1.8) Cache2K Status “ … We use every cache2k release within production environments. However, some of the basic features are still evolving and there may be API breaking changes until we reach version 1.0.” https://cache2k.org/
  • 22. Cache: Guava We selected GUAVA for overall best performance results (after cache2k) and because we need a simple local cache (not distributed) compatible with jdk 1.7. The following is an example how to create a cache with time idle expiration policy: We use the time idle expiration policy to remove expired Indexes from the cache.
  • 23. Cache: Guava Expired elements • Logical removed When an element expires, it is not automatically removed from the cache Expired entries will never be visible to read or write operations. Cache.size() counts also expired entries. • Physical removed • Trying to write/access to expired entry • Calling Cache.cleanUp() (force eviction) CleanUp is an expensive operation, so it is not automatically performed by GUAVA but the developer has to do it.
  • 24. HOW to push data to browser? Push data from server to browser
  • 25. Polling Client repeatedly sends new requests to a server. If the server has the response data it sends the response, otherwise it sends an empty response. Push data from server to client Web Browser ServerREQUEST RESPONSE (with data or empty) REQUEST RESPONSE (with data or empty)
  • 26. Long Polling Client sends request to a server: if the server has no data it holds the connection and waits until data is available and then sends the data back to the client. Push data from server to client Web Browser ServerREQUEST RESPONSE (with data) REQUEST RESPONSE (with data)
  • 27. Server Sent Events Similar to long-polling, when the client sends a request to a server that waits until data is available and then sends the data back to the client as one or more events. The client processes the data without closing the connection until the server sends the last event. Push data from server to client Web Browser ServerREQUEST EVENT (with data) EVENT (with data) EVENT (close connection)
  • 28. Server Sent Events (SSE) • The server transmits data to browser as a continuos stream with event- stream content-type, over a connection which is left open • Transported over HTTP it can be poly-filled with javascript to backport SSE to browsers that do not support it yet. • JAVA EE8 JSR370 (JAX RS 2.1) Proposed specification “SSE is a new technology defined as part of the HTML5 set of recommendations for a client (e.g., a browser) to automatically get updates from a server via HTTP. It is commonly employed for one-way streaming data transmissions in which a server updates a client periodically or every time an event takes place” • Java Library to manage client/server SSE connection: • https://jersey.java.net/documentation/latest/sse.html • WebSocket? Push data from server to client
  • 29. We use KRYO, a Java framework for fast and efficient object serialization, to reduce the size of object for the data being serialized in a http connection. We reduce http service response time up to 50%! Optimize http data serialization https://github.com/EsotericSoftware/kryo Benchmark: https://github.com/eishay/jvm-serializers/wiki
  • 30. Search Manager Architectural high level overview External data Services … Presentation Layer: search results v1 v2 v3 v4 v5 4.HTTP Rest Access by Keyword (1 per service) 2. Search result “Context” Indexes searchSearch 1. Search by “keyword” 3. No Results! Access by keyword ASYNC SSE GUAVA Cache RAM Indexes RAM Indexes RAM Indexes Service Mapper Service Mapper Service Mapper Service Mapper 6. Send Response (1 per service) 5. Create RAM Indexes Send event resp v1 Send event resp v2 Send event resp vx1. Refine Search Kryo serializer
  • 31. Scale out Presentation Layer: search results Search Manager GUAVA Cache RAM Indexes RAM Indexes RAM Indexes Service Mapper Service Mapper Service Mapper Service Mapper Cluster Load Balancer (stickiness by user) Presentation Layer: search resultsPresentation Layer: search resultsLoad Balancer (stickiness by user) Search Layer Cluster
  • 33. Complete system test with 600 Vuser 27 [Tr/s] 9 [Tr/s] / 200 Vuser per chain FE-BE
  • 34. 32 GB RAM – 8 Core – Heap 20 GB Phase 1 Incremental tests to find the max transaction frequency • 400 Vuser • 800 Vuser Phase 2 Environment minimization and direct load on the search engine
  • 35. Phase 1 Test 400 Vuser 18 [Tr/s] – Linear gain
  • 36. Phase 1 Test 800 Vuser 24 [Tr/s] – Sub linear gain – Max found
  • 37. Phase 2 Direct load injection on a single BE via API Search and indexing transactions with the same document weight used in phase 1: 19 binary Add document on 19 Lucene indexes. Average size 20k per Add 1 Search with 24 results
  • 38. Phase 2 Test 200 Vuser 13 [Tr/s] Clean UP 70s
  • 39. Phase 2 Test 200 Vuser 43 [Tr/s] Clean UP 15s Samples Average response time [ms] Errors % Transactions per second TOTAL 1107087 66 0.00% 860.0
  • 40. Future improvements • We are migrating to a microservices architecture: • For example: Service Mapper and Searcher Manager are easily converting to Microservices and scale out independently • We are testing the benefits/drawback to migrate to HTTP2 • Header compression • Server push • Binary protocols
  • 41. Search on the fly: how to lighten your Big Data MILAN 25-26 NOVEMBER 2016