2. WHAT IS SCHEMA?
A mapping between the fieldName and it’s Type.
SOLR needs a schema to define what type each field
belongs to. Internally it maps these types to Lucene
Types.
FieldName title price id
FieldType String Float String
3. SCHEMA LESS SOLR?
SOLR can’t function properly without a schema.
So, it comes with a Schema-less Mode that builds an
schema in the background as you index.
However, It has it’s own problems.…
4. Indexing Document in SchemaLess Mode
Doc 1:
{“title”:“Fantastic Beasts”, “price”:200,“distID”:”2017-01-07”}
Doc 2:
{“title”:“Train Your Dragon”, “price”:22.3, “distID”:”112-uuiw-0”}
Fails With Errors:
Float not supported for fieldType Long!!
String not supported for fieldType Date!!
FieldNames title price distId
TypesInferred String Long Date
7. WHAT CAN IT DO?
Learn from the document stream, and suggest the
following for every field:-
The Most Suitable FieldType
SingleValued or MultiValued
Point-out possible ‘type-anomalies’ in a document
stream.
8. WHO NEEDS IT
Multi Tenant Search Platforms
Indexing Documents from Multiple Sources
Getting an Idea of Your Data
Getting started with SOLR
9. SCHEMA-TRAINING API’S
1. Get A Training ID:
POST: /schema/train/start
Response: <NewTrainingID>
2. Start Training:
POST: /schema/train/<trainingID> -d [{f1:v1, f2:v2…}]
3. Get The Schema Trained So Far:
GET: /schema/train/<trainingID>/trainedSchema
Response: {Generated Schema}
4. Stop The Training:
DELETE: /schema/train/<trainingID>
12. TO DO…
Replay the data internally to index it.
Learn from the queries and suggest:
Most suitable field types (string field vs. text field)
DocValues: true vs. false
Stored: true vs. false
Default search fields (qf)