SlideShare a Scribd company logo
1 of 17
Carlos Oliveira / May 31, 2012
Agenda
 Oracle Text Overview
    Introduction
    Oracle Text Overview
    Types of Index
    Text Query Application
    Document Presentation and Highlighting
    Document Samples
    Oracle Text Indexing Process
    Indexing Classes
    Examples
    Contains Operators
    POC
    Training & Reference
    Questions
Introduction
I am a forward-looking Information Systems Architect with a
solid Oracle DBA background comprising the daily
infrastructure tasks of the DBA, several projects as a Data
Modeler, and performance management projects.

I Started on the mainframe business, and soon had a deep dive
in application development for Oracle databases. After
acquiring an Oracle certification, I worked on performance
enhancement for applications using Oracle databases, and later
worked several years as an infrastructure DBA, later I worked
on data modeling projects and more recently a performance
management project, on both application and database layers.
“The limits of my language
mean the limits of my world.”



Ludwig Wittgenstein
What is Oracle Text
•An option the database that extends the text indexes
•It is a free option for Oracle DB (EE, SE, and PE)
•Has cataloging, referencing and classification features
•Deals with tags, such as HTML or XML
•Extends indexing for:
     •Documents stored in tables or referenced
     •PDF, MS Word, XML, text, ...
     •using data types as BLOB, BFILE, CLOB, long, ...
     •even web pages, stored or referenced
Oracle Text Overview
Types of Index
Type of                                                                               Query
Index     Description                                                                 Operator
CONTEXT   Use this index to build a text retrieval application when your text consists of  CONTAINS
          large coherent documents. You can index documents of different formats such
          as Microsoft Word, HTML, XML, or plain text.
          You can customize your index in a variety of ways.
CTXCAT    Use this index type to improve mixed query performance. Suitable for querying CATSEARCH
          small text fragments with structured criteria like dates, item names, and prices
          that are stored across columns.
CTXRULE   Use to build a document classification application. You create this index on a   MATCHES
          table of queries, where each query has a classification.
          Single documents (plain text, HTML, or XML) can be classified by using the
          MATCHES operator.
Text Query Application
Document Presentation and
                Highlighting
Output                                                Procedure
Plain text version, no highlights                     CTX_DOC.FILTER
HTML version of document, no highlights               CTX_DOC.FILTER
Highlighted document, plain text version              CTX_DOC.MARKUP
Highlighted document, HTML version                    CTX_DOC.MARKUP
Highlight offset information for plain text version   CTX_DOC.HIGHLIGHT
Highlight offset information for HTML version         CTX_DOC.HIGHLIGHT
Theme summaries and gist of document.                 CTX_DOC.GIST
List of themes in document.                           CTX_DOC.THEMES
Document Samples
Oracle Text Indexing Process
Indexing Classes
Class         Description
Datastore     How are your documents stored?

Filter        How can the documents be converted to plaintext?

Lexer         What language is being indexed?

Wordlist      How should stem and fuzzy queries be expanded?

Storage       How should the index data be stored?

Stop List     What words or themes are not to be indexed?

Section Group How are documents sections defined?
Example Parameters
EXEC ctx_ddl.drop_preference('address_lx');   begin
EXEC ctx_ddl.drop_preference('address_wl');   ctx_ddl.create_preference('address_lx','BASIC_LEXER');
EXEC ctx_ddl.drop_preference('address_st');   -- removes diacritics
EXEC ctx_ddl.drop_stoplist('address_sl');     ctx_ddl.set_attribute('address_lx','base_letter','YES');
                                              ctx_ddl.create_preference('address_wl','BASIC_WORDLIST');
                                              ctx_ddl.create_stoplist('address_sl', 'BASIC_STOPLIST');
                                                  ctx_ddl.add_stopclass('address_sl', 'NUMBERS');
                                              ctx_ddl.add_stopword('address_sl', 'a');
                                              ...
                                              ctx_ddl.add_stopword('address_sl', 'vocês');
                                              ctx_ddl.create_preference('address_st', 'BASIC_STORAGE');
                                              ctx_ddl.set_attribute('address_st','i_table_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st','k_table_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st','r_table_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st','n_table_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st',‘i_index_clause','TABLESPACE IDM2');
                                              ctx_ddl.set_attribute('address_st','p_table_clause','TABLESPACE IDM2');
                                              end;
                                              /
Example DDL and Query
DROP INDEX dbapp.IDX_st_address_3;                   SET DEFINE OFF;
CREATE INDEX dbapp.IDX_st_address_3 ON               SELECT NOM_st_address
dbapp.st_address(NOM_st_address)                     FROM dbapp.st_address
INDEXTYPE IS CTXSYS.CONTEXT                          WHERE CONTAINS (NOM_st_address, 'ST&MAJOR&OSCAR&STONE', 1) > 0;
PARAMETERS ('LEXER address_lx
WORDLIST address_wl                              Plan
STOPLIST address_sl
STORAGE address_st')                             SELECT STATEMENT CHOOSE Cost: 18 Bytes: 118 Cardinality: 1
PARALLEL 8;                                             2 TABLE ACCESS BY INDEX ROWID dbapp.st_address_3 Cost: 18 Bytes: 118 Cardinality: 1
COMMIT;
                                                                    1 DOMAIN INDEX dbapp.st_address_3 Cost: 15
BEGIN
                                                     NOM_st_address
  SYS.DBMS_STATS.GATHER_TABLE_STATS (
                                                     --------------------------------------------------
  OwnName        => 'dbapp'
                                                     OSCAR STONE MAJOR
  ,TabName      => 'st_address'
  ,Estimate_Percent => NULL                          SET DEFINE OFF;
  ,Method_Opt => 'FOR ALL INDEXED COLUMNS SIZE       SELECT NOM_st_address
AUTO '                                               FROM dbapp.st_address
  ,Degree       => 8                                 WHERE CONTAINS (NOM_st_address, 'ST&OSCAR&STONE', 1) > 0;
  ,Cascade      => TRUE
  ,No_Invalidate => FALSE);                          NOM_st_address
END;                                                 --------------------------------------------------
/                                                    JOSE OSCAR STONE
                                                     ........
                                                     OSCAR STONE MAJOR
                                                     OSCAR WEBBER STONE

                                                     28 rows selected.
CONTAINS Operators
•EQUIValence (=)
                             Query Expression            Order of Evaluation
•NEAR (;)
•weight (*), threshold (>)   w1 | w2 & w3                (w1) | (w2 & w3)
•MINUS (-)
•NOT (~)                     w1 & w2 | w3                (w1 & w2) | w3
•WITHIN
•AND (&)                     ?w1, w2 | w3 & w4           (?w1), (w2 | (w3 & w4))
•OR (|)
•ACCUMulate ( , )
                             abc = def ghi & jkl = mno   ((abc = def) ghi) &
                                                         (jkl=mno)
•Wildcard Characters
•ABOUT                       dog and cat WITHIN body     dog and (cat WITHIN body)
•stem ($)
•Fuzzy
•soundex (!)
Training
Resources at Oracle website

• Text Application Developer's Guide
http://docs.oracle.com/cd/B10501_01/text.920/a96517/toc.htm

• Text Reference
http://docs.oracle.com/cd/B10501_01/text.920/a96518/toc.htm
Thank you




Carlos Oliveira / May 31, 2012

More Related Content

Recently uploaded

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 

Recently uploaded (20)

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Oracle Text Public

  • 1. Carlos Oliveira / May 31, 2012
  • 2. Agenda  Oracle Text Overview  Introduction  Oracle Text Overview  Types of Index  Text Query Application  Document Presentation and Highlighting  Document Samples  Oracle Text Indexing Process  Indexing Classes  Examples  Contains Operators  POC  Training & Reference  Questions
  • 3. Introduction I am a forward-looking Information Systems Architect with a solid Oracle DBA background comprising the daily infrastructure tasks of the DBA, several projects as a Data Modeler, and performance management projects. I Started on the mainframe business, and soon had a deep dive in application development for Oracle databases. After acquiring an Oracle certification, I worked on performance enhancement for applications using Oracle databases, and later worked several years as an infrastructure DBA, later I worked on data modeling projects and more recently a performance management project, on both application and database layers.
  • 4. “The limits of my language mean the limits of my world.” Ludwig Wittgenstein
  • 5. What is Oracle Text •An option the database that extends the text indexes •It is a free option for Oracle DB (EE, SE, and PE) •Has cataloging, referencing and classification features •Deals with tags, such as HTML or XML •Extends indexing for: •Documents stored in tables or referenced •PDF, MS Word, XML, text, ... •using data types as BLOB, BFILE, CLOB, long, ... •even web pages, stored or referenced
  • 7. Types of Index Type of Query Index Description Operator CONTEXT Use this index to build a text retrieval application when your text consists of CONTAINS large coherent documents. You can index documents of different formats such as Microsoft Word, HTML, XML, or plain text. You can customize your index in a variety of ways. CTXCAT Use this index type to improve mixed query performance. Suitable for querying CATSEARCH small text fragments with structured criteria like dates, item names, and prices that are stored across columns. CTXRULE Use to build a document classification application. You create this index on a MATCHES table of queries, where each query has a classification. Single documents (plain text, HTML, or XML) can be classified by using the MATCHES operator.
  • 9. Document Presentation and Highlighting Output Procedure Plain text version, no highlights CTX_DOC.FILTER HTML version of document, no highlights CTX_DOC.FILTER Highlighted document, plain text version CTX_DOC.MARKUP Highlighted document, HTML version CTX_DOC.MARKUP Highlight offset information for plain text version CTX_DOC.HIGHLIGHT Highlight offset information for HTML version CTX_DOC.HIGHLIGHT Theme summaries and gist of document. CTX_DOC.GIST List of themes in document. CTX_DOC.THEMES
  • 12. Indexing Classes Class Description Datastore How are your documents stored? Filter How can the documents be converted to plaintext? Lexer What language is being indexed? Wordlist How should stem and fuzzy queries be expanded? Storage How should the index data be stored? Stop List What words or themes are not to be indexed? Section Group How are documents sections defined?
  • 13. Example Parameters EXEC ctx_ddl.drop_preference('address_lx'); begin EXEC ctx_ddl.drop_preference('address_wl'); ctx_ddl.create_preference('address_lx','BASIC_LEXER'); EXEC ctx_ddl.drop_preference('address_st'); -- removes diacritics EXEC ctx_ddl.drop_stoplist('address_sl'); ctx_ddl.set_attribute('address_lx','base_letter','YES'); ctx_ddl.create_preference('address_wl','BASIC_WORDLIST'); ctx_ddl.create_stoplist('address_sl', 'BASIC_STOPLIST'); ctx_ddl.add_stopclass('address_sl', 'NUMBERS'); ctx_ddl.add_stopword('address_sl', 'a'); ... ctx_ddl.add_stopword('address_sl', 'vocês'); ctx_ddl.create_preference('address_st', 'BASIC_STORAGE'); ctx_ddl.set_attribute('address_st','i_table_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st','k_table_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st','r_table_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st','n_table_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st',‘i_index_clause','TABLESPACE IDM2'); ctx_ddl.set_attribute('address_st','p_table_clause','TABLESPACE IDM2'); end; /
  • 14. Example DDL and Query DROP INDEX dbapp.IDX_st_address_3; SET DEFINE OFF; CREATE INDEX dbapp.IDX_st_address_3 ON SELECT NOM_st_address dbapp.st_address(NOM_st_address) FROM dbapp.st_address INDEXTYPE IS CTXSYS.CONTEXT WHERE CONTAINS (NOM_st_address, 'ST&MAJOR&OSCAR&STONE', 1) > 0; PARAMETERS ('LEXER address_lx WORDLIST address_wl Plan STOPLIST address_sl STORAGE address_st') SELECT STATEMENT CHOOSE Cost: 18 Bytes: 118 Cardinality: 1 PARALLEL 8; 2 TABLE ACCESS BY INDEX ROWID dbapp.st_address_3 Cost: 18 Bytes: 118 Cardinality: 1 COMMIT; 1 DOMAIN INDEX dbapp.st_address_3 Cost: 15 BEGIN NOM_st_address SYS.DBMS_STATS.GATHER_TABLE_STATS ( -------------------------------------------------- OwnName => 'dbapp' OSCAR STONE MAJOR ,TabName => 'st_address' ,Estimate_Percent => NULL SET DEFINE OFF; ,Method_Opt => 'FOR ALL INDEXED COLUMNS SIZE SELECT NOM_st_address AUTO ' FROM dbapp.st_address ,Degree => 8 WHERE CONTAINS (NOM_st_address, 'ST&OSCAR&STONE', 1) > 0; ,Cascade => TRUE ,No_Invalidate => FALSE); NOM_st_address END; -------------------------------------------------- / JOSE OSCAR STONE ........ OSCAR STONE MAJOR OSCAR WEBBER STONE 28 rows selected.
  • 15. CONTAINS Operators •EQUIValence (=) Query Expression Order of Evaluation •NEAR (;) •weight (*), threshold (>) w1 | w2 & w3 (w1) | (w2 & w3) •MINUS (-) •NOT (~) w1 & w2 | w3 (w1 & w2) | w3 •WITHIN •AND (&) ?w1, w2 | w3 & w4 (?w1), (w2 | (w3 & w4)) •OR (|) •ACCUMulate ( , ) abc = def ghi & jkl = mno ((abc = def) ghi) & (jkl=mno) •Wildcard Characters •ABOUT dog and cat WITHIN body dog and (cat WITHIN body) •stem ($) •Fuzzy •soundex (!)
  • 16. Training Resources at Oracle website • Text Application Developer's Guide http://docs.oracle.com/cd/B10501_01/text.920/a96517/toc.htm • Text Reference http://docs.oracle.com/cd/B10501_01/text.920/a96518/toc.htm
  • 17. Thank you Carlos Oliveira / May 31, 2012