SlideShare a Scribd company logo
Database Indexing Framework  ( Version 1.0 )
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Overview
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Overview
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Overview
The following slides discuss a incremental indexing approach that we thought would work well for our requirements. In this approach the Search Index relevant views are created using  Database Views  and the indexing is done as a  Batch Process  and not at real time. First we need to  understand the need for the Database Views . When a search term is searched for in the index, the result page shows some details and summary of the result. For instant results these details need to be stored in the index itself  so we don’t have to hit the database just to display collated results in the results page. When creating the Solr index it then doesn't make much sense to index all the tables individually. This is because each table will have it own dependencies with child and parent tables. We will either have to create similar dependencies in the index or else create our indexes intelligently keeping the search needs in mind. This will involve creating appropriate joins  across tables to fetch all the data relevant to a search result at one shot. The database view can do this job of collating data from the parent and child tables in a  representation that exactly matches the requirements of the search index. This makes the job of the application layer hassle free. It just picks everything from the view  and indexes it as it is.  Incremental Indexing Process  ( the need for Database Views )
Next we need to understand why the  Batch Indexing process  can work well for us. Most of our search requirements would involve searching for historic data. Rarely could there be cases where we search for data put in immediately. Even these cases can be handled by setting the Batch Process interval to a very small time. The real time indexing process can become a pretty expensive process in case a large  amount of data is entered in small intervals. Also the batch process gives us the flexibility of working on a copy of the database to make  the whole indexing process an offline one. Incremental Indexing Process  ( the need for Batch indexing )
Database Result Set to XML Converter Data Fetcher Indexing Job Scheduler Database Indexer (the controller class) SOLR Index Manager (9)  Solr XML (1)  Indexing Job Name (2)  Database View Name (5)  Result Set (6)  Solr XML (3)  Query (4)  Result Set (8)  Solr XML Indexing Job - Trigger Config file ( Indexing Job Schedules ) Trigger Time 1  -  Indexing Job 1 Trigger Time 2  -  Indexing Job 2 Trigger Time 3  -  Indexing Job 3 7)  Solr XML Incremental Indexing Batch Process  ( the flow ) Components in green are explained in detail in next slide  >> Indexing Job – Database View Mapping file More than one DB view might need to be indexed at the same time, so these  can be as an Indexing Job. Indexing Job 1 – Database View1 Database View2   Database View3   Database View4 Indexing Job 2 – Database View5 Database View6 DB View Column name to Solr field mapping - Database   View 1 Column 1  - Solr Field 1 Column 2  - Solr   Field 2 Column 3  - Solr   Field 3   - Database   View 2 Column 1  - Solr Field 3 Column 2  - Solr   Field 2
Incremental Indexing Batch Process  ( the components ) An  Indexing Job  has been defined as indexing of all the set of Database Views  that need to be indexed at the same time and at equal time intervals. Triggers  holds the time information, the start time, time interval and other such  time related details. So when a Indexing Job is associated to a trigger, the job will  run according to the start time and time intervals as mentioned in the trigger. Indexing Job - Trigger Config file  has all Indexing Job Schedules. It maps triggers to indexing jobs. Indexing Job – Database View Mapping file  defines the Indexing Jobs. It associates Database Views with each Indexing Job. If a database view like the one for the messages module requires to be picked up for at a smaller time interval than the one for the shopping module, then  they will be part of different indexing jobs having different Triggers. Database Indexer  acts as the controller of the database indexing process. It does the job of calling the Data Fetcher to get database records in XML format which it sends to the Index Manager to post it to Solr. The  Data Fetcher  communicates with the database to get all the new and updated  records for a given database view along with those records that have been marked  for deletion. It then feeds this data to the Result Set to XML converter to get the  data converted to the Solr recognizable XML format. The  Result Set to XML converter  is a utility class which converts database records to  XML format. If the record is new or updated it puts it in the <add> tag. If it is marked  for deletion then it is put in the <delete> tag.  It picks up Solr Field names corresponding to the DB View Column names from the  DB View Column name to Solr field mapping  file.
Incremental Indexing Batch Process  ( the flow) The indexing process is triggered off by the  Indexing Job Scheduler . An indexing job is triggered from the Indexing Job Scheduler based on the  trigger settings to which it is associated in the  Indexing Job - Trigger Config file . The Indexing Job Scheduler makes a call to the  Database Indexer  sending the  name of the job to done as an argument. The Database Indexer   acts as the controller for this whole process. It picks up  the names of Database Views to be indexed corresponding to the Indexing Job  sent by Indexing Job Scheduler from the  Indexing Job – Database View Mapping file . The Database Indexer loops over the set of Database Views and makes a call to the  Data Fetcher  for each View. The Data Fetcher hits the database with a query to get all the latest records from the  View. The result set is sent to  Result set to XML Converter  which return the Solr XML. This Solr XML is sent back to the Database Indexer which in turn sends it to the  Index manger for posting it to Solr.
(4) Result Set (3 ) View Query Indexing Job to Database Views mapping file Job - Trigger Config file (Indexing Job Schedules) DB View Column name to  Solr field mapping (2 ) Database View Name (7) Solr XML (6) Solr XML (5) Result Set (8) Solr XML (1) Indexing Job Name Indexing Job Scheduler Triggers  Database Indexer  with an  Indexing job   based on the trigger times in the  Job - Trigger Config  file ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Database SOLR Index Manager (9) Solr XML

More Related Content

What's hot

Introduction of sql server indexing
Introduction of sql server indexingIntroduction of sql server indexing
Introduction of sql server indexing
Mahabubur Rahaman
 
Sql server lesson6
Sql server lesson6Sql server lesson6
Sql server lesson6
Ala Qunaibi
 
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1 "Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
Andriy Krayniy
 
SQL Server Index and Partition Strategy
SQL Server Index and Partition StrategySQL Server Index and Partition Strategy
SQL Server Index and Partition Strategy
Hamid J. Fard
 
Lecture12 abap on line
Lecture12 abap on lineLecture12 abap on line
Lecture12 abap on line
Milind Patil
 
SQL_Part1
SQL_Part1SQL_Part1
SQL_Part1
Rick Perry
 
Optimized cluster index generation
Optimized cluster index generationOptimized cluster index generation
Optimized cluster index generation
Rutvik Pensionwar
 
Chapter.07
Chapter.07Chapter.07
What is Link list? explained with animations
What is Link list? explained with animationsWhat is Link list? explained with animations
What is Link list? explained with animations
PratikNaik41
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
BADR
 
Indexes: The Second Pillar of Database Wisdom
Indexes: The Second Pillar of Database WisdomIndexes: The Second Pillar of Database Wisdom
Indexes: The Second Pillar of Database Wisdom
gisborne
 
dotnetMALAGA - Sql query tuning guidelines
dotnetMALAGA - Sql query tuning guidelinesdotnetMALAGA - Sql query tuning guidelines
dotnetMALAGA - Sql query tuning guidelines
Javier García Magna
 
Sql introduction
Sql introductionSql introduction
Sql introduction
Bhavya Chawla
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
jkeriaki
 
Ardbms
ArdbmsArdbms
Ardbms
guestcc2d29
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
Boris Hristov
 
Quick And Dirty Databases
Quick And Dirty DatabasesQuick And Dirty Databases
Quick And Dirty Databases
cwarren
 
Ijebea14 228
Ijebea14 228Ijebea14 228
Ijebea14 228
Iasir Journals
 
DATASTORAGE.pptx
DATASTORAGE.pptxDATASTORAGE.pptx
DATASTORAGE.pptx
Neheurevathy
 
DATASTORAGE.pdf
DATASTORAGE.pdfDATASTORAGE.pdf
DATASTORAGE.pdf
Neheurevathy
 

What's hot (20)

Introduction of sql server indexing
Introduction of sql server indexingIntroduction of sql server indexing
Introduction of sql server indexing
 
Sql server lesson6
Sql server lesson6Sql server lesson6
Sql server lesson6
 
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1 "Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
 
SQL Server Index and Partition Strategy
SQL Server Index and Partition StrategySQL Server Index and Partition Strategy
SQL Server Index and Partition Strategy
 
Lecture12 abap on line
Lecture12 abap on lineLecture12 abap on line
Lecture12 abap on line
 
SQL_Part1
SQL_Part1SQL_Part1
SQL_Part1
 
Optimized cluster index generation
Optimized cluster index generationOptimized cluster index generation
Optimized cluster index generation
 
Chapter.07
Chapter.07Chapter.07
Chapter.07
 
What is Link list? explained with animations
What is Link list? explained with animationsWhat is Link list? explained with animations
What is Link list? explained with animations
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
Indexes: The Second Pillar of Database Wisdom
Indexes: The Second Pillar of Database WisdomIndexes: The Second Pillar of Database Wisdom
Indexes: The Second Pillar of Database Wisdom
 
dotnetMALAGA - Sql query tuning guidelines
dotnetMALAGA - Sql query tuning guidelinesdotnetMALAGA - Sql query tuning guidelines
dotnetMALAGA - Sql query tuning guidelines
 
Sql introduction
Sql introductionSql introduction
Sql introduction
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
 
Ardbms
ArdbmsArdbms
Ardbms
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
 
Quick And Dirty Databases
Quick And Dirty DatabasesQuick And Dirty Databases
Quick And Dirty Databases
 
Ijebea14 228
Ijebea14 228Ijebea14 228
Ijebea14 228
 
DATASTORAGE.pptx
DATASTORAGE.pptxDATASTORAGE.pptx
DATASTORAGE.pptx
 
DATASTORAGE.pdf
DATASTORAGE.pdfDATASTORAGE.pdf
DATASTORAGE.pdf
 

Viewers also liked

12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
koolkampus
 
1 data types
1 data types1 data types
1 data types
Ram Kedem
 
3 indexes
3 indexes3 indexes
3 indexes
Ram Kedem
 
Indexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningIndexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuning
OSSCube
 
Ms sql-server
Ms sql-serverMs sql-server
Ms sql-server
Md.Mojibul Hoque
 
MS SQL Server
MS SQL ServerMS SQL Server
MS SQL Server
Md. Mahedee Hasan
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
lucenerevolution
 
Introduction to TFS 2013
Introduction to TFS 2013Introduction to TFS 2013
Introduction to TFS 2013
Md. Mahedee Hasan
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
MYXPLAIN
 
Types of Search Engines
Types of Search EnginesTypes of Search Engines
Types of Search Engines
Surendra Kapadia
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
Nitin Pande
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
Lucidworks (Archived)
 
Search engines powerpoint
Search engines powerpointSearch engines powerpoint
Search engines powerpoint
vbaker2210
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
Karwin Software Solutions LLC
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
JSCHO9
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
Nitin Pande
 
Search engines
Search enginesSearch engines
Search engines
Sahiba Khurana
 
Search Engine
Search EngineSearch Engine
Search Engine
Ram Dutt Shukla
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
201014161
 

Viewers also liked (20)

12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 
1 data types
1 data types1 data types
1 data types
 
3 indexes
3 indexes3 indexes
3 indexes
 
Indexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningIndexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuning
 
Ms sql-server
Ms sql-serverMs sql-server
Ms sql-server
 
MS SQL Server
MS SQL ServerMS SQL Server
MS SQL Server
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Introduction to TFS 2013
Introduction to TFS 2013Introduction to TFS 2013
Introduction to TFS 2013
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
 
Types of Search Engines
Types of Search EnginesTypes of Search Engines
Types of Search Engines
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Search engines powerpoint
Search engines powerpointSearch engines powerpoint
Search engines powerpoint
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Search engines
Search enginesSearch engines
Search engines
 
Search Engine
Search EngineSearch Engine
Search Engine
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 

Similar to Database indexing framework

A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
Editor IJCATR
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
Editor IJCATR
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
Riccardo Zamana
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
Ajeet Singh
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
guest9d79e073
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
Mark Ginnebaugh
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
rainynovember12
 
Process management seminar
Process management seminarProcess management seminar
Process management seminar
apurva_naik
 
Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2
Mahesh Vallampati
 
Database Basics
Database BasicsDatabase Basics
Database Basics
Abdel Moneim Emad
 
Sql server introduction
Sql server introductionSql server introduction
Sql server introduction
Riteshkiit
 
Data warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersData warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswers
Sourav Singh
 
Msbi Architecture
Msbi ArchitectureMsbi Architecture
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
Sperasoft
 
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
Szabolcs Rozsnyai
 
Sql server introduction fundamental
Sql server introduction fundamentalSql server introduction fundamental
Sql server introduction fundamental
Riteshkiit
 
Bt0066 database management system1
Bt0066 database management system1Bt0066 database management system1
Bt0066 database management system1
Techglyphs
 
Search Approach - ES, GraphDB
Search Approach - ES, GraphDBSearch Approach - ES, GraphDB
Search Approach - ES, GraphDB
Sunita Shrivastava
 
AWS RDS Migration Tool
AWS RDS Migration Tool AWS RDS Migration Tool
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
Er. Nawaraj Bhandari
 

Similar to Database indexing framework (20)

A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
 
Process management seminar
Process management seminarProcess management seminar
Process management seminar
 
Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2
 
Database Basics
Database BasicsDatabase Basics
Database Basics
 
Sql server introduction
Sql server introductionSql server introduction
Sql server introduction
 
Data warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersData warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswers
 
Msbi Architecture
Msbi ArchitectureMsbi Architecture
Msbi Architecture
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
 
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
Large-Scale Distributed Storage System for Business Provenance - Cloud 2011
 
Sql server introduction fundamental
Sql server introduction fundamentalSql server introduction fundamental
Sql server introduction fundamental
 
Bt0066 database management system1
Bt0066 database management system1Bt0066 database management system1
Bt0066 database management system1
 
Search Approach - ES, GraphDB
Search Approach - ES, GraphDBSearch Approach - ES, GraphDB
Search Approach - ES, GraphDB
 
AWS RDS Migration Tool
AWS RDS Migration Tool AWS RDS Migration Tool
AWS RDS Migration Tool
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 

Database indexing framework

  • 1. Database Indexing Framework ( Version 1.0 )
  • 2.
  • 3.
  • 4.
  • 5. The following slides discuss a incremental indexing approach that we thought would work well for our requirements. In this approach the Search Index relevant views are created using Database Views and the indexing is done as a Batch Process and not at real time. First we need to understand the need for the Database Views . When a search term is searched for in the index, the result page shows some details and summary of the result. For instant results these details need to be stored in the index itself so we don’t have to hit the database just to display collated results in the results page. When creating the Solr index it then doesn't make much sense to index all the tables individually. This is because each table will have it own dependencies with child and parent tables. We will either have to create similar dependencies in the index or else create our indexes intelligently keeping the search needs in mind. This will involve creating appropriate joins across tables to fetch all the data relevant to a search result at one shot. The database view can do this job of collating data from the parent and child tables in a representation that exactly matches the requirements of the search index. This makes the job of the application layer hassle free. It just picks everything from the view and indexes it as it is. Incremental Indexing Process ( the need for Database Views )
  • 6. Next we need to understand why the Batch Indexing process can work well for us. Most of our search requirements would involve searching for historic data. Rarely could there be cases where we search for data put in immediately. Even these cases can be handled by setting the Batch Process interval to a very small time. The real time indexing process can become a pretty expensive process in case a large amount of data is entered in small intervals. Also the batch process gives us the flexibility of working on a copy of the database to make the whole indexing process an offline one. Incremental Indexing Process ( the need for Batch indexing )
  • 7. Database Result Set to XML Converter Data Fetcher Indexing Job Scheduler Database Indexer (the controller class) SOLR Index Manager (9) Solr XML (1) Indexing Job Name (2) Database View Name (5) Result Set (6) Solr XML (3) Query (4) Result Set (8) Solr XML Indexing Job - Trigger Config file ( Indexing Job Schedules ) Trigger Time 1 - Indexing Job 1 Trigger Time 2 - Indexing Job 2 Trigger Time 3 - Indexing Job 3 7) Solr XML Incremental Indexing Batch Process ( the flow ) Components in green are explained in detail in next slide >> Indexing Job – Database View Mapping file More than one DB view might need to be indexed at the same time, so these can be as an Indexing Job. Indexing Job 1 – Database View1 Database View2 Database View3 Database View4 Indexing Job 2 – Database View5 Database View6 DB View Column name to Solr field mapping - Database View 1 Column 1 - Solr Field 1 Column 2 - Solr Field 2 Column 3 - Solr Field 3 - Database View 2 Column 1 - Solr Field 3 Column 2 - Solr Field 2
  • 8. Incremental Indexing Batch Process ( the components ) An Indexing Job has been defined as indexing of all the set of Database Views that need to be indexed at the same time and at equal time intervals. Triggers holds the time information, the start time, time interval and other such time related details. So when a Indexing Job is associated to a trigger, the job will run according to the start time and time intervals as mentioned in the trigger. Indexing Job - Trigger Config file has all Indexing Job Schedules. It maps triggers to indexing jobs. Indexing Job – Database View Mapping file defines the Indexing Jobs. It associates Database Views with each Indexing Job. If a database view like the one for the messages module requires to be picked up for at a smaller time interval than the one for the shopping module, then they will be part of different indexing jobs having different Triggers. Database Indexer acts as the controller of the database indexing process. It does the job of calling the Data Fetcher to get database records in XML format which it sends to the Index Manager to post it to Solr. The Data Fetcher communicates with the database to get all the new and updated records for a given database view along with those records that have been marked for deletion. It then feeds this data to the Result Set to XML converter to get the data converted to the Solr recognizable XML format. The Result Set to XML converter is a utility class which converts database records to XML format. If the record is new or updated it puts it in the <add> tag. If it is marked for deletion then it is put in the <delete> tag. It picks up Solr Field names corresponding to the DB View Column names from the DB View Column name to Solr field mapping file.
  • 9. Incremental Indexing Batch Process ( the flow) The indexing process is triggered off by the Indexing Job Scheduler . An indexing job is triggered from the Indexing Job Scheduler based on the trigger settings to which it is associated in the Indexing Job - Trigger Config file . The Indexing Job Scheduler makes a call to the Database Indexer sending the name of the job to done as an argument. The Database Indexer acts as the controller for this whole process. It picks up the names of Database Views to be indexed corresponding to the Indexing Job sent by Indexing Job Scheduler from the Indexing Job – Database View Mapping file . The Database Indexer loops over the set of Database Views and makes a call to the Data Fetcher for each View. The Data Fetcher hits the database with a query to get all the latest records from the View. The result set is sent to Result set to XML Converter which return the Solr XML. This Solr XML is sent back to the Database Indexer which in turn sends it to the Index manger for posting it to Solr.
  • 10.