SlideShare a Scribd company logo
1 of 37
Search and Nosql for an information
management application
@LucianPrecup
@nosqlmatters #nosql14
2014-04-30
whoami
• CTO of Adelean
• Integrate search, nosql and big data
technologies to support ETL, BI, data mining,
data processing and data visualization use
cases.
2014-04-30 2@LucianPrecup @nosqlmatters #nosql14
Poll - How many of you are …
• Familiar with Elasticsearch or Solr ?
• Using Hibernate or other ORM framework ?
Hibernate search ?
• Familiar with NoSQL ?
• Developing information management
applications ?
• Familiar with SOA ? Developing web services ?
• Familiar with ETL and BI ?
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 3
Conventions used in the slides
2014-04-30 4@LucianPrecup @nosqlmatters #nosql14
Information management application?
• Backed by a relational
database
• CRUD operations
• The user is generally an
employee of the
company
2014-04-30 5@LucianPrecup @nosqlmatters #nosql14
Enterprise information management
• ORM Mapping (like
Hibernate)
• The relational database
is the central point
2014-04-30 6@LucianPrecup @nosqlmatters #nosql14
SOA and information management
• The service layer
becomes the
central point
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 7
Information management, ETL and BI
• Batch extraction,
transformation and
loading during non-
peak periods
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 8
User interface
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 9
http://www.salesboom.com/images/screenshots/leads-management-software-snapshot.jpg
User interface
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 10
http://www.marvelsoft.co.in/images/stories/screenshot.png
Adding "Google like" functionality
• Operators are using the applications and
– Are asking customers for information
– Need to be fast
– Sometimes make typing errors
• Customers
– Give information to operators (sometimes over the
phone)
– Do not know their id (contract id, customer id) by
heart
– Expect operators to be fast and accurate
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 11
Server side result pagination
Faceting for additional
search criteria
Auto-completion (result
suggestion) and highlighting
"Full text" search
Display the total
number of results
Order all the results
on different criteria
Display heterogeneous results
Advanced search
Fuzzy searches and
spelling suggestions
2014-04-30 12@LucianPrecup @nosqlmatters #nosql14
SELECT … FROM … WHERE … AND
ROWNUM > 0 AND ROWNUM <= 10
SELECT CUSTOMER_STATUS,
COUNT(*) FROM … GROUP BY
CUSTOMER_STATUS
SELECT * WHERE
FIRST_NAME LIKE …% OR …
if (text contains searchTerm)
then replace text with
<b>text<b>
Split searchText into tokens
SELECT * WHERE
(FIRST_NAME = token1 AND
LAST_NAME = token2 AND
ZIP_CODE = token3) OR
(FIRST_NAME = token2 AND
LAST_NAME = token1 AND
ZIP_CODE = token3) OR …
SELECT COUNT(*) FROM …
SELECT * FROM …
ORDER BY …
SELECT NULL as COMPANY_ID, …
FROM PERSON
UNION ALL
SELECT NULL as BIRTH_DATE, …
FROM COMPANY
???
2014-04-30 13@LucianPrecup @nosqlmatters #nosql14
O.K. Simpler
SELECT CUSTOMER_TYPE,
COUNT(*) FROM … GROUP BY
CUSTOMER_TYPE
1
2
3
4
Pagination handled natively ("I am feeling lucky" !)
Search facets calculated
with the results. Almost no
additional cost.
Indexing with EdgeNGram 
very performing "auto-
complete" search
Highlighting handled by the
search engine
"Full text" search
available by default
Hits count available
with the results
Faster (cached) sort
“Schema free” right !
"Structured" search
also available
Fast and pertinent fuzzy
searches and suggesters
2014-04-30 14@LucianPrecup @nosqlmatters #nosql14
A search engine for information management ?
• User experience
– "Full text" search and "Google like" navigation
– Direct access to data
– Facetted search and navigation
– "out of the box" pagination
– Performance  fluidity
– Search improved with auto-completion and
search suggestions
• Additional functionality
– Fuzzy search
– Phonetic analysis
– Spell checking
– Synonyms and technical terms handling
– Compound words handling
• Faster than SQL for
– Sorting
– Pagination
– Facets and aggregations
– Filtering
2014-04-30 15@LucianPrecup @nosqlmatters #nosql14
Auto-completion with EdgeNGram
Input 
Indexing Searching
Id Nom
1 Céline
2 Celia
Ascii folding  Celine, Celia
Lowercase  celine, celia
EdgeNGram  ce cel celi celin celine
ce cel celi celia
Key Document id
ce 1, 2
cel 1, 2
celi 1, 2
celin 1
celine 1
celia 2
Index
 Search term
Nom
Célin
Celin  Ascii folding
celin  Lowercase
ce cel celi celin  EdgeNGram
2014-04-30 16@LucianPrecup @nosqlmatters #nosql14
Auto-completion with EdgeNGram
Input 
Indexing Searching
Id Nom
1 Céline
2 Celia
Ascii folding  Celine, Celia
Lowercase  celine, celia
EdgeNGram  ce cel celi celin celine
ce cel celi celia
Key Document id
ce 1, 2
cel 1, 2
celi 1, 2
celin 1
celine 1
celia 2
Index
 Search term
Nom
Célin
Celin  Ascii folding
celin  Lowercase
2014-04-30 17@LucianPrecup @nosqlmatters #nosql14
Information management and search
• Enter "polyglot persistence"
• How to synchronize the two
databases ?
• Triggers ? Hard to implement
– Especially for external triggers
– And for large transactions
(when many tables are
concerned)
• How to provide seamless
integration for the
application ?
• Luckily we have the Service
Layer …
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 18
SOA and a search engine
• The Service Layer ensures
seamless integration for
applications
• The Service Layer can help
with the synchronization :
Creates, Updates, Deletes
are passing through
• But how to deal with
legacy applications and
batch processes ?
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 19
Search integration in enterprise environment
• Hooks in the Service
Layer to intercept
Creates, Updates,
Deletes
• Batch indexing to
catch up with legacy
applications
• The Search Services
Layer ensures
seamless integration
and security
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 20
Search integration in enterprise environment
• Hooks into the Services Layer
– Send messages asynchronously to Elasticsearch
• Real time indexing for everything that passes
through the Services Layer
– If messages are self contained proceed with
indexing, if not use a Read operation on the Service
Layer
• Batch indexing to catch up with legacy
applications
– Delta extraction
– Full database extraction
• Search Services Layer is important :
– Constructs Elasticsearch queries and supplies a high
level interface, ready to integrate into applications
– Acts as a proxy to Elasticsearch, ensuring
authentication and authorizations
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 21
The world of a search engine is flat
Normalized database Elasticsearch document
{"film" : {
"id" : "183070",
"title" : "The Artist",
"published" : "2011-10-12",
"genre" : ["Romance", "Drama", "Comedy"],
"language" : ["English", "French"],
"persons" : [
{"person" : { "id" : "5079", "name" : "Michel
Hazanavicius", "role" : "director" }},
{"person" : { "id" : "84145", "name" : "Jean
Dujardin", "role" : "actor" }},
{"person" : { "id" : "24485", "name" : "Bérénice
Bejo", "role" : "actor" }},
{"person" : { "id" : "4204", "name" : "John
Goodman", "role" : "actor" }}
]
}}
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 22
How to create the "flattened" document
• Joining data from the previous example:
• Solr DataImport Handler : Entities and referencing ${parent.ID}
• "Crawls" the database  very bad performance
• WM_CONCAT, LISTAGG, GROUP_CONCAT
• Nice workaround if your database allows it
• Elasticsearch jdbc river
• https://github.com/jprante/elasticsearch-river-jdbc/wiki/Structured-Objects
• Use your own batch !
• Use your own code to create the document sent to the search engine
• If you are using Java, you can integrate SpringBatch for example
2014-04-30 23@LucianPrecup @nosqlmatters #nosql14
“Crawling" the database 
very bad performance
Solr’s DataImportHandler
<dataConfig>
<dataSource driver="org.hsqldb.jdbcDriver"
url="jdbc:hsqldb:/temp/example/ex" user="sa" />
<document name="products">
<entity name="item" query="select * from item">
<field column="ID" name="id" />
<field column="NAME" name="name" />
<field column="MANU" name="manu" />
<field column="WEIGHT" name="weight" />
<field column="PRICE" name="price" />
<field column="POPULARITY" name="popularity" />
<field column="INSTOCK" name="inStock" />
<field column="INCLUDES" name="includes" />
<entity name="feature" query="select description from feature
where item_id='${item.ID}'">
<field name="features" column="description" />
</entity>
<entity name="item_category" query="select CATEGORY_ID from
item_category where item_id='${item.ID}'">
<entity name="category" query="select description from
category where id = '${item_category.CATEGORY_ID}'">
<field column="description" name="cat" />
</entity>
</entity>
</entity>
</document>
</dataConfig>
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 24
“Crawling" the database 
very bad performance
Solr’s DataImportHandler
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 25
http://wiki.apache.org/solr/DataImportHandler
OK for HTML pages but
KO for db relations
The issue with joins :-)
• Let’s say you have two relational entities: Persons
and Contracts
– A Person has zero, one or more Contracts
– A Contract is attached to one or more Persons (eg. the
Subscriber, the Grantee, …)
• Need a search services :
– S1: getPersonsDetailsByContractProperties
– S2: getContractsDetailsByPersonProperties
• Simple solution with SQL:
SELECT P.* FROM P, C WHERE P.id = C.pid AND C.a = 'A‘
SELECT C.* FROM P, C WHERE P.id = C.pid AND P.a = 'A'
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 26
The issue with joins - solutions
• Solution 1
– Index Persons with Contracts together for S1
{"person" : { "details" : …, … , "contracts" : ["contract" :{"id" : 1, …}, …] }}
– Index Contracts with Persons together for S2
{"contract" : { "details" : …, …, "persons" : ["person" :{"id" : 1, "role" : "S", …}, …]}}
• Issues with solution 1:
– A lot of data duplication
– Have to get Contracts when indexing Persons and vice-versa
• Solution 2
– Elasticsearch’s Parent/Child
• Issues with solution 2:
– Works in one way but not the other (only one parent for n children, a 1 to n relationship)
• Solution 3
– Index Persons and Contracts separately
– Launch two Elasticsearch queries to get the response
– For S1 : First get all Contract ids by Contract properties, then get Persons by Contract ids (terms
query)
– For S2 : First get all Persons ids by Person properties, then get Contracts by Person ids (terms
query)
– The response to the second query can be returned “as is” to the client (pagination, etc.)
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 27
SQL : 1, Elasticsearch : 2 – another example
• Let’s say you have a database containing lines of monthly payments by customer
companies
• Let’s say you need to display company information together with the aggregated
income for each company
• With SQL :
SELECT Company.id, Company.A, Company.B SUM(Payments.amount) FROM Company, Payments
WHERE Company.id=Payments.cid GROUP BY Company.id, Company.A, Company.B
• With Elasticsearch :
Launch
"facets" : {
"Total" : {
"terms_stats" : {
"key_field" : "Payments.cid",
"value_field" : "Payments.amount"
}
}
}
then
GET /index/Company/_mget
{
"ids" : ["cid1", "cid2", …]
}
Or, you could use the relational database itself if it is fast enough (IN query)
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 28
Other tips and tricks
• Define your own mapping for each field (even if you are
tempted by the “schema free”)
– For example: you might not want to use regular stop words for
fields representing person names
• Beware of idf
– Values that are rare are boosted by default. This can lead to
some “unexpected” behavior.
The query
{"bool": {
"should": [
{"text": {"name": "john"}},
{"text": {"place": "sunnyvale"}}
]
}}
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 29
Name Place
John Doe Sunnyvale
Alice O’Malley Sunnyvale
John Dean John Palo Alto
Could result in
More tips and tricks
• Use index alias when launching
the full batch indexing to keep
the service running 24/7
• When searching structured information you may have
many identical scores. To keep the order and pagination
deterministic, use a sort by default. For example first by
_score, then by id.
• For user defined sorting (there are many criteria in this kind
of use cases : name, user_id, zipcode, location, etc.) be sure
to have enough memory. For example sorting on a 34
million records database by user_id (unique, of size 8 bytes)
you must count about 260 MB of fielddata cache.
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 30
Search and BI
• Elasticsearch is
already in sync
with the main
database.
• Kibana is capable
of displaying
useful
information
about the data.
• Why not having
some BI
dashboards in
real time ?
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 31
Example of Kibana dashboard
-- http://www.elasticsearch.org/overview/kibana/
2014-04-30 32@LucianPrecup @nosqlmatters #nosql14
Datawarehouse
• Elasticsearch is
conceived for Big
Data so using it
as a Data
Warehouse
makes sense
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 33
NoSQL, Search and RDBMS
Real time
synchronization
Traditional BI
alternative
Facets
(simple or
complex)
Better performance,
better user experience,
more functionality
Auto-complétion,
suggestions, fuzzy
searches, …
Direct access to
pertinent data
2014-04-30
34
@LucianPrecup @nosqlmatters #nosql14
Elasticsearch integration – global view
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 35
Why not NoRDBMS ?
• Legacy applications
• The need for transactions
and consistency
• The need for data
normalization
• The need for structure
• Polyglot persistence is better
• NoSQL databases are lacking
features for now (see
keynote from
@ted_dunning)
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 36
Thank you
Q & A

More Related Content

Viewers also liked

Mona Vale Town Centre - Balancing Place Making and Traffic Engineering
Mona Vale Town Centre - Balancing Place Making and Traffic EngineeringMona Vale Town Centre - Balancing Place Making and Traffic Engineering
Mona Vale Town Centre - Balancing Place Making and Traffic Engineering
JumpingJaq
 
презентация компании "КапиталСити"
презентация компании "КапиталСити" презентация компании "КапиталСити"
презентация компании "КапиталСити"
Капитал Сити
 
11 steps to success with Salesforce: adoption to addiction
11 steps to success with Salesforce: adoption to addiction11 steps to success with Salesforce: adoption to addiction
11 steps to success with Salesforce: adoption to addiction
Daniel Plume
 
Передвижной комплекс связи "Энергия"
Передвижной комплекс связи "Энергия"Передвижной комплекс связи "Энергия"
Передвижной комплекс связи "Энергия"
Datamodel
 
The Future Development of Traffic Signals and the Impact of Autonomous Vehicles
The Future Development of Traffic Signals and the Impact of Autonomous VehiclesThe Future Development of Traffic Signals and the Impact of Autonomous Vehicles
The Future Development of Traffic Signals and the Impact of Autonomous Vehicles
JumpingJaq
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
Flink Forward
 

Viewers also liked (20)

Indexation d'une base documentaire pour Liberation
Indexation d'une base documentaire pour LiberationIndexation d'une base documentaire pour Liberation
Indexation d'une base documentaire pour Liberation
 
Mdday2010 modelisation-agilite
Mdday2010 modelisation-agiliteMdday2010 modelisation-agilite
Mdday2010 modelisation-agilite
 
Expert Report on Geologic Hazards in the Karst Regions of Virginia and West V...
Expert Report on Geologic Hazards in the Karst Regions of Virginia and West V...Expert Report on Geologic Hazards in the Karst Regions of Virginia and West V...
Expert Report on Geologic Hazards in the Karst Regions of Virginia and West V...
 
mod.org.in - cities for people - colab lecture - may 2011
mod.org.in - cities for people - colab lecture - may 2011mod.org.in - cities for people - colab lecture - may 2011
mod.org.in - cities for people - colab lecture - may 2011
 
Mona Vale Town Centre - Balancing Place Making and Traffic Engineering
Mona Vale Town Centre - Balancing Place Making and Traffic EngineeringMona Vale Town Centre - Balancing Place Making and Traffic Engineering
Mona Vale Town Centre - Balancing Place Making and Traffic Engineering
 
Bike Boulevards - Conquering the Last Mile
Bike Boulevards - Conquering the  Last MileBike Boulevards - Conquering the  Last Mile
Bike Boulevards - Conquering the Last Mile
 
How good are our models?
How good are our models?How good are our models?
How good are our models?
 
Grammer month (8)
Grammer month (8)Grammer month (8)
Grammer month (8)
 
презентация компании "КапиталСити"
презентация компании "КапиталСити" презентация компании "КапиталСити"
презентация компании "КапиталСити"
 
11 steps to success with Salesforce: adoption to addiction
11 steps to success with Salesforce: adoption to addiction11 steps to success with Salesforce: adoption to addiction
11 steps to success with Salesforce: adoption to addiction
 
Передвижной комплекс связи "Энергия"
Передвижной комплекс связи "Энергия"Передвижной комплекс связи "Энергия"
Передвижной комплекс связи "Энергия"
 
A Macroscopic Dynamic model integrated into Dynamic Traffic Assignment: advan...
A Macroscopic Dynamic model integrated into Dynamic Traffic Assignment: advan...A Macroscopic Dynamic model integrated into Dynamic Traffic Assignment: advan...
A Macroscopic Dynamic model integrated into Dynamic Traffic Assignment: advan...
 
Moteurs de recherche et Lucene at LorraineJUG
Moteurs de recherche et Lucene at LorraineJUGMoteurs de recherche et Lucene at LorraineJUG
Moteurs de recherche et Lucene at LorraineJUG
 
L' Analyse documentaire : indexation, classification, clusters
L' Analyse documentaire : indexation, classification, clustersL' Analyse documentaire : indexation, classification, clusters
L' Analyse documentaire : indexation, classification, clusters
 
Psm master1 - pre-requis seo - referencement naturel
Psm   master1 - pre-requis seo - referencement naturelPsm   master1 - pre-requis seo - referencement naturel
Psm master1 - pre-requis seo - referencement naturel
 
Tv Sync Hybrid
Tv Sync HybridTv Sync Hybrid
Tv Sync Hybrid
 
What is a good technology stack today?
What is a good technology stack today?What is a good technology stack today?
What is a good technology stack today?
 
Realweb & re:Store. У кого look alike длиннее
Realweb & re:Store. У кого look alike длиннееRealweb & re:Store. У кого look alike длиннее
Realweb & re:Store. У кого look alike длиннее
 
The Future Development of Traffic Signals and the Impact of Autonomous Vehicles
The Future Development of Traffic Signals and the Impact of Autonomous VehiclesThe Future Development of Traffic Signals and the Impact of Autonomous Vehicles
The Future Development of Traffic Signals and the Impact of Autonomous Vehicles
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
 

Similar to Search and nosql for information management @nosqlmatters Cologne

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Your Content, Your Search, Your Decision
Your Content, Your Search, Your DecisionYour Content, Your Search, Your Decision
Your Content, Your Search, Your Decision
Agnes Molnar
 
SplunkLive! Analytics with Splunk Enterprise
SplunkLive! Analytics with Splunk EnterpriseSplunkLive! Analytics with Splunk Enterprise
SplunkLive! Analytics with Splunk Enterprise
Splunk
 
SplunkLive! Data Models 101
SplunkLive! Data Models 101SplunkLive! Data Models 101
SplunkLive! Data Models 101
Splunk
 
Analytics with splunk - Advanced
Analytics with splunk - AdvancedAnalytics with splunk - Advanced
Analytics with splunk - Advanced
jenny_splunk
 

Similar to Search and nosql for information management @nosqlmatters Cologne (20)

Find Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextFind Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
 
Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014
Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014
Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014
 
Omnisearch in AEM 6.2 - Search All the Things
Omnisearch in AEM 6.2 - Search All the ThingsOmnisearch in AEM 6.2 - Search All the Things
Omnisearch in AEM 6.2 - Search All the Things
 
EVOLVE'16 | Enhance | Oscar Bolaños & Justin Edelson | Search All the Things:...
EVOLVE'16 | Enhance | Oscar Bolaños & Justin Edelson | Search All the Things:...EVOLVE'16 | Enhance | Oscar Bolaños & Justin Edelson | Search All the Things:...
EVOLVE'16 | Enhance | Oscar Bolaños & Justin Edelson | Search All the Things:...
 
Oracle by Muhammad Iqbal
Oracle by Muhammad IqbalOracle by Muhammad Iqbal
Oracle by Muhammad Iqbal
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
Your Content, Your Search, Your Decision
Your Content, Your Search, Your DecisionYour Content, Your Search, Your Decision
Your Content, Your Search, Your Decision
 
PostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databasePostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty database
 
SplunkLive! Analytics with Splunk Enterprise
SplunkLive! Analytics with Splunk EnterpriseSplunkLive! Analytics with Splunk Enterprise
SplunkLive! Analytics with Splunk Enterprise
 
SplunkLive! Data Models 101
SplunkLive! Data Models 101SplunkLive! Data Models 101
SplunkLive! Data Models 101
 
Analytics with splunk - Advanced
Analytics with splunk - AdvancedAnalytics with splunk - Advanced
Analytics with splunk - Advanced
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Tulsa Techfest 2016 : Pragmatic Governace by Scott Mitchell
Tulsa Techfest 2016 : Pragmatic Governace by Scott MitchellTulsa Techfest 2016 : Pragmatic Governace by Scott Mitchell
Tulsa Techfest 2016 : Pragmatic Governace by Scott Mitchell
 
Boolean Search.. A Basic Level for internal KT/Reference material
Boolean Search.. A Basic Level for internal KT/Reference materialBoolean Search.. A Basic Level for internal KT/Reference material
Boolean Search.. A Basic Level for internal KT/Reference material
 

More from Lucian Precup

La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !
Lucian Precup
 

More from Lucian Precup (8)

Enrich data and rewrite queries with the Elasticsearch percolator
Enrich data and rewrite queries with the Elasticsearch percolatorEnrich data and rewrite queries with the Elasticsearch percolator
Enrich data and rewrite queries with the Elasticsearch percolator
 
Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015
 
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters ParisBack to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
 
Search, nosql et bigdata avec les moteurs de recherche
Search, nosql et bigdata avec les moteurs de rechercheSearch, nosql et bigdata avec les moteurs de recherche
Search, nosql et bigdata avec les moteurs de recherche
 
ALM et Agilite : la convergence
ALM et Agilite : la convergenceALM et Agilite : la convergence
ALM et Agilite : la convergence
 
La revue de code : facile !
La revue de code : facile !La revue de code : facile !
La revue de code : facile !
 
La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !
 
Solr and Elasticsearch in Action (at Breizhcamp)
Solr and Elasticsearch in Action (at Breizhcamp)Solr and Elasticsearch in Action (at Breizhcamp)
Solr and Elasticsearch in Action (at Breizhcamp)
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Search and nosql for information management @nosqlmatters Cologne

  • 1. Search and Nosql for an information management application @LucianPrecup @nosqlmatters #nosql14 2014-04-30
  • 2. whoami • CTO of Adelean • Integrate search, nosql and big data technologies to support ETL, BI, data mining, data processing and data visualization use cases. 2014-04-30 2@LucianPrecup @nosqlmatters #nosql14
  • 3. Poll - How many of you are … • Familiar with Elasticsearch or Solr ? • Using Hibernate or other ORM framework ? Hibernate search ? • Familiar with NoSQL ? • Developing information management applications ? • Familiar with SOA ? Developing web services ? • Familiar with ETL and BI ? 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 3
  • 4. Conventions used in the slides 2014-04-30 4@LucianPrecup @nosqlmatters #nosql14
  • 5. Information management application? • Backed by a relational database • CRUD operations • The user is generally an employee of the company 2014-04-30 5@LucianPrecup @nosqlmatters #nosql14
  • 6. Enterprise information management • ORM Mapping (like Hibernate) • The relational database is the central point 2014-04-30 6@LucianPrecup @nosqlmatters #nosql14
  • 7. SOA and information management • The service layer becomes the central point 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 7
  • 8. Information management, ETL and BI • Batch extraction, transformation and loading during non- peak periods 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 8
  • 9. User interface 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 9 http://www.salesboom.com/images/screenshots/leads-management-software-snapshot.jpg
  • 10. User interface 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 10 http://www.marvelsoft.co.in/images/stories/screenshot.png
  • 11. Adding "Google like" functionality • Operators are using the applications and – Are asking customers for information – Need to be fast – Sometimes make typing errors • Customers – Give information to operators (sometimes over the phone) – Do not know their id (contract id, customer id) by heart – Expect operators to be fast and accurate 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 11
  • 12. Server side result pagination Faceting for additional search criteria Auto-completion (result suggestion) and highlighting "Full text" search Display the total number of results Order all the results on different criteria Display heterogeneous results Advanced search Fuzzy searches and spelling suggestions 2014-04-30 12@LucianPrecup @nosqlmatters #nosql14
  • 13. SELECT … FROM … WHERE … AND ROWNUM > 0 AND ROWNUM <= 10 SELECT CUSTOMER_STATUS, COUNT(*) FROM … GROUP BY CUSTOMER_STATUS SELECT * WHERE FIRST_NAME LIKE …% OR … if (text contains searchTerm) then replace text with <b>text<b> Split searchText into tokens SELECT * WHERE (FIRST_NAME = token1 AND LAST_NAME = token2 AND ZIP_CODE = token3) OR (FIRST_NAME = token2 AND LAST_NAME = token1 AND ZIP_CODE = token3) OR … SELECT COUNT(*) FROM … SELECT * FROM … ORDER BY … SELECT NULL as COMPANY_ID, … FROM PERSON UNION ALL SELECT NULL as BIRTH_DATE, … FROM COMPANY ??? 2014-04-30 13@LucianPrecup @nosqlmatters #nosql14 O.K. Simpler SELECT CUSTOMER_TYPE, COUNT(*) FROM … GROUP BY CUSTOMER_TYPE 1 2 3 4
  • 14. Pagination handled natively ("I am feeling lucky" !) Search facets calculated with the results. Almost no additional cost. Indexing with EdgeNGram  very performing "auto- complete" search Highlighting handled by the search engine "Full text" search available by default Hits count available with the results Faster (cached) sort “Schema free” right ! "Structured" search also available Fast and pertinent fuzzy searches and suggesters 2014-04-30 14@LucianPrecup @nosqlmatters #nosql14
  • 15. A search engine for information management ? • User experience – "Full text" search and "Google like" navigation – Direct access to data – Facetted search and navigation – "out of the box" pagination – Performance  fluidity – Search improved with auto-completion and search suggestions • Additional functionality – Fuzzy search – Phonetic analysis – Spell checking – Synonyms and technical terms handling – Compound words handling • Faster than SQL for – Sorting – Pagination – Facets and aggregations – Filtering 2014-04-30 15@LucianPrecup @nosqlmatters #nosql14
  • 16. Auto-completion with EdgeNGram Input  Indexing Searching Id Nom 1 Céline 2 Celia Ascii folding  Celine, Celia Lowercase  celine, celia EdgeNGram  ce cel celi celin celine ce cel celi celia Key Document id ce 1, 2 cel 1, 2 celi 1, 2 celin 1 celine 1 celia 2 Index  Search term Nom Célin Celin  Ascii folding celin  Lowercase ce cel celi celin  EdgeNGram 2014-04-30 16@LucianPrecup @nosqlmatters #nosql14
  • 17. Auto-completion with EdgeNGram Input  Indexing Searching Id Nom 1 Céline 2 Celia Ascii folding  Celine, Celia Lowercase  celine, celia EdgeNGram  ce cel celi celin celine ce cel celi celia Key Document id ce 1, 2 cel 1, 2 celi 1, 2 celin 1 celine 1 celia 2 Index  Search term Nom Célin Celin  Ascii folding celin  Lowercase 2014-04-30 17@LucianPrecup @nosqlmatters #nosql14
  • 18. Information management and search • Enter "polyglot persistence" • How to synchronize the two databases ? • Triggers ? Hard to implement – Especially for external triggers – And for large transactions (when many tables are concerned) • How to provide seamless integration for the application ? • Luckily we have the Service Layer … 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 18
  • 19. SOA and a search engine • The Service Layer ensures seamless integration for applications • The Service Layer can help with the synchronization : Creates, Updates, Deletes are passing through • But how to deal with legacy applications and batch processes ? 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 19
  • 20. Search integration in enterprise environment • Hooks in the Service Layer to intercept Creates, Updates, Deletes • Batch indexing to catch up with legacy applications • The Search Services Layer ensures seamless integration and security 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 20
  • 21. Search integration in enterprise environment • Hooks into the Services Layer – Send messages asynchronously to Elasticsearch • Real time indexing for everything that passes through the Services Layer – If messages are self contained proceed with indexing, if not use a Read operation on the Service Layer • Batch indexing to catch up with legacy applications – Delta extraction – Full database extraction • Search Services Layer is important : – Constructs Elasticsearch queries and supplies a high level interface, ready to integrate into applications – Acts as a proxy to Elasticsearch, ensuring authentication and authorizations 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 21
  • 22. The world of a search engine is flat Normalized database Elasticsearch document {"film" : { "id" : "183070", "title" : "The Artist", "published" : "2011-10-12", "genre" : ["Romance", "Drama", "Comedy"], "language" : ["English", "French"], "persons" : [ {"person" : { "id" : "5079", "name" : "Michel Hazanavicius", "role" : "director" }}, {"person" : { "id" : "84145", "name" : "Jean Dujardin", "role" : "actor" }}, {"person" : { "id" : "24485", "name" : "Bérénice Bejo", "role" : "actor" }}, {"person" : { "id" : "4204", "name" : "John Goodman", "role" : "actor" }} ] }} 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 22
  • 23. How to create the "flattened" document • Joining data from the previous example: • Solr DataImport Handler : Entities and referencing ${parent.ID} • "Crawls" the database  very bad performance • WM_CONCAT, LISTAGG, GROUP_CONCAT • Nice workaround if your database allows it • Elasticsearch jdbc river • https://github.com/jprante/elasticsearch-river-jdbc/wiki/Structured-Objects • Use your own batch ! • Use your own code to create the document sent to the search engine • If you are using Java, you can integrate SpringBatch for example 2014-04-30 23@LucianPrecup @nosqlmatters #nosql14
  • 24. “Crawling" the database  very bad performance Solr’s DataImportHandler <dataConfig> <dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" user="sa" /> <document name="products"> <entity name="item" query="select * from item"> <field column="ID" name="id" /> <field column="NAME" name="name" /> <field column="MANU" name="manu" /> <field column="WEIGHT" name="weight" /> <field column="PRICE" name="price" /> <field column="POPULARITY" name="popularity" /> <field column="INSTOCK" name="inStock" /> <field column="INCLUDES" name="includes" /> <entity name="feature" query="select description from feature where item_id='${item.ID}'"> <field name="features" column="description" /> </entity> <entity name="item_category" query="select CATEGORY_ID from item_category where item_id='${item.ID}'"> <entity name="category" query="select description from category where id = '${item_category.CATEGORY_ID}'"> <field column="description" name="cat" /> </entity> </entity> </entity> </document> </dataConfig> 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 24 “Crawling" the database  very bad performance
  • 25. Solr’s DataImportHandler 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 25 http://wiki.apache.org/solr/DataImportHandler OK for HTML pages but KO for db relations
  • 26. The issue with joins :-) • Let’s say you have two relational entities: Persons and Contracts – A Person has zero, one or more Contracts – A Contract is attached to one or more Persons (eg. the Subscriber, the Grantee, …) • Need a search services : – S1: getPersonsDetailsByContractProperties – S2: getContractsDetailsByPersonProperties • Simple solution with SQL: SELECT P.* FROM P, C WHERE P.id = C.pid AND C.a = 'A‘ SELECT C.* FROM P, C WHERE P.id = C.pid AND P.a = 'A' 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 26
  • 27. The issue with joins - solutions • Solution 1 – Index Persons with Contracts together for S1 {"person" : { "details" : …, … , "contracts" : ["contract" :{"id" : 1, …}, …] }} – Index Contracts with Persons together for S2 {"contract" : { "details" : …, …, "persons" : ["person" :{"id" : 1, "role" : "S", …}, …]}} • Issues with solution 1: – A lot of data duplication – Have to get Contracts when indexing Persons and vice-versa • Solution 2 – Elasticsearch’s Parent/Child • Issues with solution 2: – Works in one way but not the other (only one parent for n children, a 1 to n relationship) • Solution 3 – Index Persons and Contracts separately – Launch two Elasticsearch queries to get the response – For S1 : First get all Contract ids by Contract properties, then get Persons by Contract ids (terms query) – For S2 : First get all Persons ids by Person properties, then get Contracts by Person ids (terms query) – The response to the second query can be returned “as is” to the client (pagination, etc.) 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 27
  • 28. SQL : 1, Elasticsearch : 2 – another example • Let’s say you have a database containing lines of monthly payments by customer companies • Let’s say you need to display company information together with the aggregated income for each company • With SQL : SELECT Company.id, Company.A, Company.B SUM(Payments.amount) FROM Company, Payments WHERE Company.id=Payments.cid GROUP BY Company.id, Company.A, Company.B • With Elasticsearch : Launch "facets" : { "Total" : { "terms_stats" : { "key_field" : "Payments.cid", "value_field" : "Payments.amount" } } } then GET /index/Company/_mget { "ids" : ["cid1", "cid2", …] } Or, you could use the relational database itself if it is fast enough (IN query) 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 28
  • 29. Other tips and tricks • Define your own mapping for each field (even if you are tempted by the “schema free”) – For example: you might not want to use regular stop words for fields representing person names • Beware of idf – Values that are rare are boosted by default. This can lead to some “unexpected” behavior. The query {"bool": { "should": [ {"text": {"name": "john"}}, {"text": {"place": "sunnyvale"}} ] }} 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 29 Name Place John Doe Sunnyvale Alice O’Malley Sunnyvale John Dean John Palo Alto Could result in
  • 30. More tips and tricks • Use index alias when launching the full batch indexing to keep the service running 24/7 • When searching structured information you may have many identical scores. To keep the order and pagination deterministic, use a sort by default. For example first by _score, then by id. • For user defined sorting (there are many criteria in this kind of use cases : name, user_id, zipcode, location, etc.) be sure to have enough memory. For example sorting on a 34 million records database by user_id (unique, of size 8 bytes) you must count about 260 MB of fielddata cache. 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 30
  • 31. Search and BI • Elasticsearch is already in sync with the main database. • Kibana is capable of displaying useful information about the data. • Why not having some BI dashboards in real time ? 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 31
  • 32. Example of Kibana dashboard -- http://www.elasticsearch.org/overview/kibana/ 2014-04-30 32@LucianPrecup @nosqlmatters #nosql14
  • 33. Datawarehouse • Elasticsearch is conceived for Big Data so using it as a Data Warehouse makes sense 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 33
  • 34. NoSQL, Search and RDBMS Real time synchronization Traditional BI alternative Facets (simple or complex) Better performance, better user experience, more functionality Auto-complétion, suggestions, fuzzy searches, … Direct access to pertinent data 2014-04-30 34 @LucianPrecup @nosqlmatters #nosql14
  • 35. Elasticsearch integration – global view 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 35
  • 36. Why not NoRDBMS ? • Legacy applications • The need for transactions and consistency • The need for data normalization • The need for structure • Polyglot persistence is better • NoSQL databases are lacking features for now (see keynote from @ted_dunning) 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 36

Editor's Notes

  1. http://www.salesboom.com/images/screenshots/leads-management-software-snapshot.jpg
  2. http://www.marvelsoft.co.in/images/stories/screenshot.png
  3. Recherche rapide « full text » Input: une zone de texte « à la Google » (un texte représentant un nom, un prénom, un identifiant quelconque, un numéro de téléphone, une adresse e-mail, etc.) Output: Résultats, surbrillance et facettes. La liste de résultats n’est pas forcement homogène, chaque résultat pouvant être affiché sous forme de « mini-fiche » L’utilisation des facettes lors de l’affichage du résultat rendra la recherche multicritères optionnelle. Recherche multicritères (recherche avancée) Input: termes de recherche par champ (nom, prénom, id, ville, code postal) Output: liste (homogène) avec résultats de la recherche Auto-complétion (suggestion des résultats) Input: toute ou partie d’un terme recherché localisé à un champ de texte à remplir Output: liste déroulante avec suggestions du terme recherché et surbrillance Facettes Groupement des résultats par catégorie en fonction de la valeur d’un champs (ex. Type Client, Profil client, Sexe) Calculées et remontées en même temps que le résultat de recherche Recherche approximative Sources des erreurs de saisie: fautes de frappe, éléments mal compris par téléphone (phonétique), noms saisis partiellement, noms composés, caractères accentués Suggestions « voulez-vous dire … » Termes se rapprochant des termes initialement cherchés et pouvant remonter potentiellement plus de résultats. Pagination Le moteur gère la pagination Chaque requête précise, en plus des critères de recherche, un indice de départ et une taille de la page La première page est, en général, remontée le plus rapidement La réponse contient le nombre total de résultats, permettant à l’IHM de proposer les liens vers toutes les pages suivantes Tri Le tri par défaut est le tri par pertinence moteur D’autres tris peuvent être demandés (alphabétique par nom, par date de naissance, etc.). Dans ce cas, le tri se fait sur l’ensemble de résultats et pas seulement sur la page en cours.
  4. Recherche rapide « full text » Input: une zone de texte « à la Google » (un texte représentant un nom, un prénom, un identifiant quelconque, un numéro de téléphone, une adresse e-mail, etc.) Output: Résultats, surbrillance et facettes. La liste de résultats n’est pas forcement homogène, chaque résultat pouvant être affiché sous forme de « mini-fiche » L’utilisation des facettes lors de l’affichage du résultat rendra la recherche multicritères optionnelle. Recherche multicritères (recherche avancée) Input: termes de recherche par champ (nom, prénom, id, ville, code postal) Output: liste (homogène) avec résultats de la recherche Auto-complétion (suggestion des résultats) Input: toute ou partie d’un terme recherché localisé à un champ de texte à remplir Output: liste déroulante avec suggestions du terme recherché et surbrillance Facettes Groupement des résultats par catégorie en fonction de la valeur d’un champs (ex. Type Client, Profil client, Sexe) Calculées et remontées en même temps que le résultat de recherche Recherche approximative Sources des erreurs de saisie: fautes de frappe, éléments mal compris par téléphone (phonétique), noms saisis partiellement, noms composés, caractères accentués Suggestions « voulez-vous dire … » Termes se rapprochant des termes initialement cherchés et pouvant remonter potentiellement plus de résultats. Pagination Le moteur gère la pagination Chaque requête précise, en plus des critères de recherche, un indice de départ et une taille de la page La première page est, en général, remontée le plus rapidement La réponse contient le nombre total de résultats, permettant à l’IHM de proposer les liens vers toutes les pages suivantes Tri Le tri par défaut est le tri par pertinence moteur D’autres tris peuvent être demandés (alphabétique par nom, par date de naissance, etc.). Dans ce cas, le tri se fait sur l’ensemble de résultats et pas seulement sur la page en cours.
  5. Recherche rapide « full text » Input: une zone de texte « à la Google » (un texte représentant un nom, un prénom, un identifiant quelconque, un numéro de téléphone, une adresse e-mail, etc.) Output: Résultats, surbrillance et facettes. La liste de résultats n’est pas forcement homogène, chaque résultat pouvant être affiché sous forme de « mini-fiche » L’utilisation des facettes lors de l’affichage du résultat rendra la recherche multicritères optionnelle. Recherche multicritères (recherche avancée) Input: termes de recherche par champ (nom, prénom, id, ville, code postal) Output: liste (homogène) avec résultats de la recherche Auto-complétion (suggestion des résultats) Input: toute ou partie d’un terme recherché localisé à un champ de texte à remplir Output: liste déroulante avec suggestions du terme recherché et surbrillance Facettes Groupement des résultats par catégorie en fonction de la valeur d’un champs (ex. Type Client, Profil client, Sexe) Calculées et remontées en même temps que le résultat de recherche Recherche approximative Sources des erreurs de saisie: fautes de frappe, éléments mal compris par téléphone (phonétique), noms saisis partiellement, noms composés, caractères accentués Suggestions « voulez-vous dire … » Termes se rapprochant des termes initialement cherchés et pouvant remonter potentiellement plus de résultats. Pagination Le moteur gère la pagination Chaque requête précise, en plus des critères de recherche, un indice de départ et une taille de la page La première page est, en général, remontée le plus rapidement La réponse contient le nombre total de résultats, permettant à l’IHM de proposer les liens vers toutes les pages suivantes Tri Le tri par défaut est le tri par pertinence moteur D’autres tris peuvent être demandés (alphabétique par nom, par date de naissance, etc.). Dans ce cas, le tri se fait sur l’ensemble de résultats et pas seulement sur la page en cours.