SlideShare a Scribd company logo
1
<Insert Picture Here>

Oracle Database 11g New Search Features and Roadmap
Roger Ford
Senior Principal Product Manager
Contents

• Oracle’s Search Products
• Oracle Text 11g New Features
• Oracle Text 11.2.0.2 New Features

<Insert Picture Here>

– Entity Extraction
– Name Search
– Result Set Interface

• Search Product Roadmap
– Oracle Text
– Secure Enterprise Search

3
Oracle’s Search Products

• Oracle Text
– A SQL and PL/SQL based toolkit for creating full-text search
applications
– Free with all database versions
– Previously known as Context Option, interMedia Text

• Secure Enterprise Search
– A complete search based on Oracle Text capabilities
– Crawlers for datasources such as web, email, document
repositories, databases
– End-user query application and APIs for embedding

4
Oracle Text 11g New Features
• Composite Domain Indexes and SDATA sections
– Allows storage of structured info (eg numbers, dates) within
text index
– Makes for much faster “mixed” queries

• Auto Lexer
– Automatic Language Recognition
– Segmentation and Stemming for 32 languages
– Context-sensitive stemming for 23 of these languages

• Off-line and time-limited index creation
– Enables rebuild of indexes offline in quiet periods for true
24x7 operation

5
Demo: Auto Lexer

6
11.2.0.2 New Features - Summary
1. Entity Extraction
–
–

Find “entities” such as people, countries, cities, states, zip codes,
phone numbers etc from the text
Use default dictionary and rules or define your own dictionary and
rules based on regular expressions

2. Name Search (NDATA sections)
–
–

Inexact searches, copes with mis-spellings, segmentation errors,
contractions and word reversal
Useful for many searches, but particular good for names

3. ResultSet Interface
–
–

Query request in XML and results returned as XML
Avoids SQL layer and requirement to work within “SELECT”
semantics

7
Entity Extraction
•
•
•
•

Indentify names, places, dates, times, etc
Tag each occurence with type and subtype
Entities are defined by DICTIONARY and RULES
Implemented by CTX_ENTITY package
– create_extract_policy – create a policy to which you can add extract
rules
• Choose to use/not use built in rules and dictionary
– add_extract_rule – create an XML-based rule to define an entity
– add_stop_entity – prevent defined entities from being used
– compile – build the policy with its rules
– extract – get an XML-based list of entities for a doc

• Also can use ctxload to load user dictionary

8
Demo: Entity Extraction

9
Entities: built-in types

•
•
•
•
•
•
•
•
•
•
•
•
•
•

building
city
company
country
currency
date
day
email_address
geo_political
holiday
location_other
month
non_profit
organization_other

•
•
•
•
•
•
•
•
•
•
•
•
•
•

percent
person_jobtitle
person_name
person_other
phone_number
postal_address
product
region
ssn
state
time_duration
tod
url
zip_code

10
Entity Extraction –
Example 1: Defaults
ctx_entity.create_extract_policy('my_default_policy');
ctx_entity.compile('mypolicy');
ctx_entity.extract('mypolicy', mydoc, mylang, myresults);

• Output in "myresults":
<entities>
<entity id="0" offset="75" length="8" source="SuppliedDictionary">
<text>New York</text>
<type>city</type>
</entity>
<entity id="1" offset="55" length="16" source="SuppliedRule">
<text>Hupplewhite Inc.</text>
<type>company</type>
</entity>
</entities>

11
Entity Extraction –
Example 2: User rule
ctx_entity.create_extract_policy('mypolicy');
ctx_entity.add_extract_rule('mypolicy', 5,
'<rule>
<expression>((North|South)? America)</expression>
<type refid="1">xContinent</type>
</rule>');
ctx_entity.compile('mypolicy');
ctx_entity.extract('mypolicy', mydoc, mylang, myresults);

• Note parentheses around expression. refid="1" means take the first expression in
paren – so "North America" or just "America".
• User defined types must be prefixed with a "x" – hence "xContinent"
<entities>
<entity id="0" offset="75" length="13" source="UserRule">
<text>North America</text>
<type>xContinent</type>
</entity>
</entities>

12
Ent Ext: Adding a user dictionary
• Create file

ud.xml:

<dictionary> <entities>
<entity> <value>Dow Jones Industrial Average</value> <type>xIndex</type> </entity>
<entity> <value>S&amp;P 500</value> <type>xIndex</type> </entity>
<entities> </dictionary>

• Create the policy with CTXLOAD (can add rules later)
ctxload -user scott/tiger -extract -name pol1 -file ud.xml
• Compile the policy

ctx_entity.compile('pol1');
•

Results
<entity id="69" offset="1010" length="7" source="UserDictionary">
<text>S&amp;P 500</text>
<type>xIndex</type>
</entity>

13
Entity Extraction – other stuff
• Extracting only certain entity types:
– ctx_entity.extract('p1', mydoc, null, myresults,
'city,company,xContinent');

14
Name Search
• Searching names has many difficulties
–
–
–
–
–
–

Spelling (steven = stephen)
Alternate Names (fred = alfred, chuck = charles)
Transcription (copying from spoken to written form)
Transliteration (copying from one writing system to another)
Segmentation (Mary Jane, Maryjane)
First, Middle, and Last Name Classification

• Name search does intelligent matching across all
these issues

15
Demo: Name Search

16
NDATA section type
• Basic implementation for name search
• Limitations
– 511 characters
– 255 whitespace-delimited terms
– No offset information, therefore no:
• Highlighting / Markup
• NEAR or phrase search with NDATA

• Uses WORDLIST preference attributes:
–
–
–
–

NDATA_ALTERNATE_SPELLING
NDATA_BASE_LETTER
NDATA_THESAURUS (for alternate names – default thesaurus provided)
NDATA_JOIN_PARTICLES (list such as 'de:du:mc:mac')

• Query Syntax
– NDATA(fieldname, search terms [, order [, proximity ] ] )

17
Result Set Interface
• Some queries are difficult to express in SQL:
– eg "Give me the top 5 hits in each category"

• Result set interface uses a simple text query and an
XML result set descriptor
• Hitlist is returned in XML according to result set
descriptor
• Uses SDATA sections for
– Grouping
– Counting

18
Result Set Example Query
ctx_query.result_set('docidx', 'oracle',
'<ctx_result_set_descriptor>
<count/>
<hitlist start_hit_num="1" end_hit_num="2" order="pubDate
desc, score desc">
<score/> <rowid/>
<sdata name="author"/>
<sdata name="pubDate"/>
</hitlist>
<group sdata="pubDate">
<count/>
</group>
<group sdata="author">
<count/>
</group>
</ctx_result_set_descriptor> ', rs);

19
Result Set Output
<ctx_result_set>
<hitlist>
<hit>
<score>3</score><rowid>AAAPoEAABAAAMWsAAC</rowid>
<sdata name="AUTHOR">John</sdata>
<sdata name="PUBDATE">2001-01-03 00:00:00</sdata>
</hit>
<hit>
<score>3</score><rowid>AAAPoEAABAAAMWsAAG</rowid>
<sdata name="AUTHOR">John</sdata>
<sdata name="PUBDATE">2001-01-03 00:00:00</sdata>
</hit>
</hitlist>
<count>100</count>

20
Result Set Output - Continued
<groups sdata="PUBDATE">
<group value="2001-01-01 00:00:00"><count>25</count></group>
<group value="2001-01-02 00:00:00"><count>50</count></group>
<group value="2001-01-03 00:00:00"><count>25</count></group>
</groups>
<groups sdata="AUTHOR">
<group value="John"><count>50</count></group>
<group value="Mike"><count>25</count></group>
<group value="Steve"><count>25</count></group>
</groups>
</ctx_result_set>

21
Preview

22
Roadmap – merging Text and SES

Oracle Text

Secure Enterprise
Search

Full Control

Full Featured

• Fine-grained Index Options

• Built in database and mid-tier

• Data Storage Options

• Crawlers for many sources

• Lexer Options

• Simple Query Interface

• Stoplists

• End user GUI / API

• Use existing database

• Embedded security

• RAC, Exadata

23
Coming Search Features
• Natural Language Processing enhancements
– Ontology based classification
– Question answering

• Automatic Partitioning
– Query load load balancing

• Full support for facetted navigation (MVDATA sections)
• Functional completeness for Result Set Interface
– Result Iterator – streaming support
– Parallel Query

• Replication Support
– Golden Gate / Logical Standby / Streams

• Operator improvements
– NEAR2 – best query in one operator
– MNOT – mild not, eg YORK mnot NEW YORK
– Nested near

• Substring index and query performance improvements
24
Coming Search Features - Continued
• Multiple enhancements to query performance
– BIGIO leverages Secure Files CLOBs
– Automatic optimization of indexes with “stage index”
– Two level index – keep common search terms in memory

• Partition maintenance without reindexing
• Off-load filtering from database server
• Section specific index options
– Choose different options, eg language, stopwords, PRINTJOINS for
each section

• Regular expression based stopwords
• Forward Index
– Hugely improved performance for highlighting, snippets

• PDF “Native” Highlighting
• Unlimited SDATA, MDATA and Field Sections

25
The preceding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.

26
27

More Related Content

What's hot

Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
Lucidworks
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Andy Jackson
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
Stefan Schmidt
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
lucenerevolution
 
Xml and webdata
Xml and webdataXml and webdata
Xml and webdata
Harry Potter
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksIntroduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Lucidworks
 
AAC Room
AAC RoomAAC Room
AAC Room
선옥 장
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Alexandre Rafalovitch
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2Marco Gralike
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
lucenerevolution
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL Server
Mark Tabladillo
 

What's hot (15)

Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Xml and webdata
Xml and webdataXml and webdata
Xml and webdata
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksIntroduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
 
AAC Room
AAC RoomAAC Room
AAC Room
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL Server
 

Similar to Oracle by Muhammad Iqbal

(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
BIOVIA
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
Robert Viseur
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
Alexander Tokarev
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
Vinay Kumar
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptx
SonuShaw16
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Petter Skodvin-Hvammen
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
David Phillips
 
PostgreSQL - Object Relational Database
PostgreSQL - Object Relational DatabasePostgreSQL - Object Relational Database
PostgreSQL - Object Relational Database
Mubashar Iqbal
 
Getting started with Splunk - Break out Session
Getting started with Splunk - Break out SessionGetting started with Splunk - Break out Session
Getting started with Splunk - Break out Session
Georg Knon
 
Getting started with Splunk
Getting started with SplunkGetting started with Splunk
Getting started with Splunk
Splunk
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
DataArt
 
Rdbms
RdbmsRdbms
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
David Hoerster
 
SplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with SplunkSplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with Splunk
Splunk
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
Trey Grainger
 

Similar to Oracle by Muhammad Iqbal (20)

(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptx
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
PostgreSQL - Object Relational Database
PostgreSQL - Object Relational DatabasePostgreSQL - Object Relational Database
PostgreSQL - Object Relational Database
 
Getting started with Splunk - Break out Session
Getting started with Splunk - Break out SessionGetting started with Splunk - Break out Session
Getting started with Splunk - Break out Session
 
Getting started with Splunk
Getting started with SplunkGetting started with Splunk
Getting started with Splunk
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Rdbms
RdbmsRdbms
Rdbms
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
 
SplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with SplunkSplunkLive! - Getting started with Splunk
SplunkLive! - Getting started with Splunk
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 

Recently uploaded

TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 

Recently uploaded (20)

TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 

Oracle by Muhammad Iqbal

  • 1. 1
  • 2. <Insert Picture Here> Oracle Database 11g New Search Features and Roadmap Roger Ford Senior Principal Product Manager
  • 3. Contents • Oracle’s Search Products • Oracle Text 11g New Features • Oracle Text 11.2.0.2 New Features <Insert Picture Here> – Entity Extraction – Name Search – Result Set Interface • Search Product Roadmap – Oracle Text – Secure Enterprise Search 3
  • 4. Oracle’s Search Products • Oracle Text – A SQL and PL/SQL based toolkit for creating full-text search applications – Free with all database versions – Previously known as Context Option, interMedia Text • Secure Enterprise Search – A complete search based on Oracle Text capabilities – Crawlers for datasources such as web, email, document repositories, databases – End-user query application and APIs for embedding 4
  • 5. Oracle Text 11g New Features • Composite Domain Indexes and SDATA sections – Allows storage of structured info (eg numbers, dates) within text index – Makes for much faster “mixed” queries • Auto Lexer – Automatic Language Recognition – Segmentation and Stemming for 32 languages – Context-sensitive stemming for 23 of these languages • Off-line and time-limited index creation – Enables rebuild of indexes offline in quiet periods for true 24x7 operation 5
  • 7. 11.2.0.2 New Features - Summary 1. Entity Extraction – – Find “entities” such as people, countries, cities, states, zip codes, phone numbers etc from the text Use default dictionary and rules or define your own dictionary and rules based on regular expressions 2. Name Search (NDATA sections) – – Inexact searches, copes with mis-spellings, segmentation errors, contractions and word reversal Useful for many searches, but particular good for names 3. ResultSet Interface – – Query request in XML and results returned as XML Avoids SQL layer and requirement to work within “SELECT” semantics 7
  • 8. Entity Extraction • • • • Indentify names, places, dates, times, etc Tag each occurence with type and subtype Entities are defined by DICTIONARY and RULES Implemented by CTX_ENTITY package – create_extract_policy – create a policy to which you can add extract rules • Choose to use/not use built in rules and dictionary – add_extract_rule – create an XML-based rule to define an entity – add_stop_entity – prevent defined entities from being used – compile – build the policy with its rules – extract – get an XML-based list of entities for a doc • Also can use ctxload to load user dictionary 8
  • 11. Entity Extraction – Example 1: Defaults ctx_entity.create_extract_policy('my_default_policy'); ctx_entity.compile('mypolicy'); ctx_entity.extract('mypolicy', mydoc, mylang, myresults); • Output in "myresults": <entities> <entity id="0" offset="75" length="8" source="SuppliedDictionary"> <text>New York</text> <type>city</type> </entity> <entity id="1" offset="55" length="16" source="SuppliedRule"> <text>Hupplewhite Inc.</text> <type>company</type> </entity> </entities> 11
  • 12. Entity Extraction – Example 2: User rule ctx_entity.create_extract_policy('mypolicy'); ctx_entity.add_extract_rule('mypolicy', 5, '<rule> <expression>((North|South)? America)</expression> <type refid="1">xContinent</type> </rule>'); ctx_entity.compile('mypolicy'); ctx_entity.extract('mypolicy', mydoc, mylang, myresults); • Note parentheses around expression. refid="1" means take the first expression in paren – so "North America" or just "America". • User defined types must be prefixed with a "x" – hence "xContinent" <entities> <entity id="0" offset="75" length="13" source="UserRule"> <text>North America</text> <type>xContinent</type> </entity> </entities> 12
  • 13. Ent Ext: Adding a user dictionary • Create file ud.xml: <dictionary> <entities> <entity> <value>Dow Jones Industrial Average</value> <type>xIndex</type> </entity> <entity> <value>S&amp;P 500</value> <type>xIndex</type> </entity> <entities> </dictionary> • Create the policy with CTXLOAD (can add rules later) ctxload -user scott/tiger -extract -name pol1 -file ud.xml • Compile the policy ctx_entity.compile('pol1'); • Results <entity id="69" offset="1010" length="7" source="UserDictionary"> <text>S&amp;P 500</text> <type>xIndex</type> </entity> 13
  • 14. Entity Extraction – other stuff • Extracting only certain entity types: – ctx_entity.extract('p1', mydoc, null, myresults, 'city,company,xContinent'); 14
  • 15. Name Search • Searching names has many difficulties – – – – – – Spelling (steven = stephen) Alternate Names (fred = alfred, chuck = charles) Transcription (copying from spoken to written form) Transliteration (copying from one writing system to another) Segmentation (Mary Jane, Maryjane) First, Middle, and Last Name Classification • Name search does intelligent matching across all these issues 15
  • 17. NDATA section type • Basic implementation for name search • Limitations – 511 characters – 255 whitespace-delimited terms – No offset information, therefore no: • Highlighting / Markup • NEAR or phrase search with NDATA • Uses WORDLIST preference attributes: – – – – NDATA_ALTERNATE_SPELLING NDATA_BASE_LETTER NDATA_THESAURUS (for alternate names – default thesaurus provided) NDATA_JOIN_PARTICLES (list such as 'de:du:mc:mac') • Query Syntax – NDATA(fieldname, search terms [, order [, proximity ] ] ) 17
  • 18. Result Set Interface • Some queries are difficult to express in SQL: – eg "Give me the top 5 hits in each category" • Result set interface uses a simple text query and an XML result set descriptor • Hitlist is returned in XML according to result set descriptor • Uses SDATA sections for – Grouping – Counting 18
  • 19. Result Set Example Query ctx_query.result_set('docidx', 'oracle', '<ctx_result_set_descriptor> <count/> <hitlist start_hit_num="1" end_hit_num="2" order="pubDate desc, score desc"> <score/> <rowid/> <sdata name="author"/> <sdata name="pubDate"/> </hitlist> <group sdata="pubDate"> <count/> </group> <group sdata="author"> <count/> </group> </ctx_result_set_descriptor> ', rs); 19
  • 20. Result Set Output <ctx_result_set> <hitlist> <hit> <score>3</score><rowid>AAAPoEAABAAAMWsAAC</rowid> <sdata name="AUTHOR">John</sdata> <sdata name="PUBDATE">2001-01-03 00:00:00</sdata> </hit> <hit> <score>3</score><rowid>AAAPoEAABAAAMWsAAG</rowid> <sdata name="AUTHOR">John</sdata> <sdata name="PUBDATE">2001-01-03 00:00:00</sdata> </hit> </hitlist> <count>100</count> 20
  • 21. Result Set Output - Continued <groups sdata="PUBDATE"> <group value="2001-01-01 00:00:00"><count>25</count></group> <group value="2001-01-02 00:00:00"><count>50</count></group> <group value="2001-01-03 00:00:00"><count>25</count></group> </groups> <groups sdata="AUTHOR"> <group value="John"><count>50</count></group> <group value="Mike"><count>25</count></group> <group value="Steve"><count>25</count></group> </groups> </ctx_result_set> 21
  • 23. Roadmap – merging Text and SES Oracle Text Secure Enterprise Search Full Control Full Featured • Fine-grained Index Options • Built in database and mid-tier • Data Storage Options • Crawlers for many sources • Lexer Options • Simple Query Interface • Stoplists • End user GUI / API • Use existing database • Embedded security • RAC, Exadata 23
  • 24. Coming Search Features • Natural Language Processing enhancements – Ontology based classification – Question answering • Automatic Partitioning – Query load load balancing • Full support for facetted navigation (MVDATA sections) • Functional completeness for Result Set Interface – Result Iterator – streaming support – Parallel Query • Replication Support – Golden Gate / Logical Standby / Streams • Operator improvements – NEAR2 – best query in one operator – MNOT – mild not, eg YORK mnot NEW YORK – Nested near • Substring index and query performance improvements 24
  • 25. Coming Search Features - Continued • Multiple enhancements to query performance – BIGIO leverages Secure Files CLOBs – Automatic optimization of indexes with “stage index” – Two level index – keep common search terms in memory • Partition maintenance without reindexing • Off-load filtering from database server • Section specific index options – Choose different options, eg language, stopwords, PRINTJOINS for each section • Regular expression based stopwords • Forward Index – Hugely improved performance for highlighting, snippets • PDF “Native” Highlighting • Unlimited SDATA, MDATA and Field Sections 25
  • 26. The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 26
  • 27. 27