Oracle® Text Reference 11g Release 2 (11.2) Part Number E10944-01 |
|
|
View PDF |
This chapter describes the SQL statements and Oracle Text operators for creating and managing Text indexes and performing Text queries.
The following statements are described in this chapter:
Note:
This section describes theALTER
INDEX
statement as it pertains to managing a Text domain index.
For a complete description of the ALTER
INDEX
statement, see Oracle Database SQL Language Reference.
Purpose
Use ALTER
INDEX
to make changes to, or perform maintenance tasks for a CONTEXT
, CTXCAT
, or CTXRULE
index.
All Index Types
Use ALTER
INDEX
to perform the following tasks on all Oracle Text index types:
Rename the index or index partition. See ALTER INDEX RENAME Syntax.
Rebuild the index using different preferences. Some restrictions apply for the CTXCAT
index type. See ALTER INDEX REBUILD Syntax.
Add stopwords to the index. See ALTER INDEX REBUILD Syntax.
CONTEXT and CTXRULE Index Types
Use ALTER
INDEX
to perform the following tasks on CONTEXT
and CTXRULE
index types:
Resume a failed index operation (creation/optimization).
Add sections and stop sections to the index.
Replace index metadata.
See Also:
ALTER INDEX REBUILD Syntax to learn more about performing these tasks.Overview of ALTER INDEX Syntax
The syntax for ALTER
INDEX
is fairly complex. The major divisions are covered in the following sections:
ALTER INDEX MODIFY PARTITION Syntax—use this to modify an index partition's metadata.
ALTER INDEX PARAMETERS Syntax—use this to modify the parameters of a non-partitioned index, or to modify all partitions of a local partitioned index, without rebuilding the index.
ALTER INDEX RENAME Syntax—use this to rename an index or index partition.
ALTER INDEX REBUILD Syntax—use this to rebuild an index or index partition. With this command, you can also replace index metadata; add stopwords, sections, and stop sections to an index; and resume a failed operation.
The parameters for ALTER INDEX REBUILD
have their own syntax, which is a subset of the syntax for ALTER
INDEX
. For example, the ALTER INDEX REBUILD PARAMETERS
command can take either REPLACE
or RESUME
as an argument, and ALTER INDEX REBUILD PARAMETERS ('REPLACE')
can take several arguments. Valid examples of ALTER INDEX REBUILD
include the following statements:
ALTER INDEX REBUILD PARALLEL n ALTER INDEX REBUILD PARAMETERS ('SYNC memsize') ALTER INDEX REBUILD PARAMETERS ('REPLACE DATASTORE datastore_pref') ALTER INDEX REBUILD PARAMETERS ('REPLACE WORDLIST wordlist_pref')
ALTER INDEX MODIFY PARTITION Syntax
Use the following syntax to modify the metadata of an index partition:
ALTER INDEX index_name MODIFY PARTITION partition_name PARAMETER (paramstring)
Specify the name of the index whose partition metadata you want to modify.
Specify the name of the index partition whose metadata you want to modify.
The only valid argument here is 'REPLACE METADATA
'. This follows the same syntax as ALTER INDEX REBUILD PARTITION PARAMETERS ('REPLACE METADATA')
; refer to the REPLACE METADATA
subsection of the ALTER INDEX REBUILD Syntax section for more information. (The two commands are equivalent. ALTER INDEX MODIFY PARTITION
is offered for ease of use, and is the recommended syntax.)
ALTER INDEX PARAMETERS Syntax
Use the following syntax for modifying the parameters either of non-partitioned or local partitioned indexes, without rebuilding the index. For partitioned indexes, this command works at the index level, not at the partition level. This command changes information for the entire index, including all partitions.
ALTER INDEX index_name PARAMETERS (paramstring)
ALTER INDEX PARAMETERS
accepts the following arguments for paramstring;
'REPLACE METADATA
'
Replaces current metadata. Refer to the REPLACE METADATA
subsection of the ALTER INDEX REBUILD Syntax section for more information.
'ADD STOPWORD
'
Dynamically adds a stopword to an index. Refer to the ADD STOPWORD
subsection of the ALTER INDEX REBUILD Syntax section for more information.
'ADD FIELD SECTION
'
Dynamically adds a field section to an index. Refer to the ADD FIELD
subsection of the ALTER INDEX REBUILD Syntax section for more information.
'ADD ZONE SECTION
'
Dynamically adds a zone section to an index. Refer to the ADD ZONE
subsection of the ALTER INDEX REBUILD Syntax section for more information.
'ADD ATTR SECTION
'
Dynamically adds an attribute section to an index Refer to the ADD ATTR
subsection of the ALTER INDEX REBUILD Syntax section for more information.
Each of the above commands has an equivalent ALTER INDEX REBUILD PARAMETERS
version. For example, ALTER INDEX PARAMETERS ('REPLACE METADATA')
is equivalent to ALTER INDEX REBUILD PARAMETERS ('REPLACE METADATA')
. However, the ALTER INDEX PARAMETERS
versions work on either partitioned or non-partitioned indexes, whereas the ALTER INDEX REBUILD PARAMETERS
versions work only on non-partitioned indexes.
Use the following syntax to rename an index or index partition:
ALTER INDEX [schema.]index_name RENAME TO new_index_name; ALTER INDEX [schema.]index_name RENAME PARTITION part_name TO new_part_name;
Specify the name of the index to rename.
Specify the new name for schema.index.
The new_index_name
parameter can be no more than 25 bytes, and 21 bytes for a partitioned index. If you specify a name longer than 25 bytes (or longer than 21 bytes for a partitioned index), then Oracle Text returns an error and the renamed index is no longer valid.
Note:
Whennew_index_name
is more than 25 bytes (21 for local partitioned index) and less than 30 bytes, Oracle Text renames the index, even though the system returns an error. To drop the index and associated tables, you must drop new_index_name
with the DROP
INDEX
statement and then re-create and drop index_name
.Specify the name of the index partition to rename.
Specify the new name for partition.
Use ALTER
INDEX
REBUILD
to rebuild an index, rebuild an index partition, resume a failed operation, replace index metadata, add stopwords to an index, or add sections and stop sections to an index.
ALTER
INDEX
REBUILD
has its own sub-syntax. That is, its parameters have their own syntax. For example, the ALTER
INDEX
REBUILD
PARAMETERS
command can take either REPLACE
or RESUME
as an argument, and ALTER
INDEX
REBUILD
PARAMETERS
('REPLACE')
has several arguments it can take.
Valid examples of ALTER
INDEX
REBUILD
include the following statements:
ALTER INDEX REBUILD PARALLEL n ALTER INDEX REBUILD PARAMETERS (SYNC memsize) ALTER INDEX REBUILD PARAMETERS (REPLACE DATASTORE datastore_pref) ALTER INDEX REBUILD PARAMETERS (REPLACE WORDLIST wordlist_pref)
This is the syntax for ALTER
INDEX
REBUILD
:
ALTER INDEX [schema.]index REBUILD [PARTITION partname] [ONLINE] [PARAMETERS (paramstring)][PARALLEL N] ;
Rebuilds the index partition partname
. Only one index partition can be built at a time.
When you rebuild a partition you can specify only RESUME
or REPLACE
in paramstring. These operations work only on the partname
you specify.
With the REPLACE
operation, you can only specify MEMORY
and STORAGE
for each index partition.
Adding Partitions To add a partition to the base table, use the ALTER
TABLE
SQL statement. When you add a partition to an indexed table, Oracle Text automatically creates the metadata for the new index partition. The new index partition has the same name as the new table partition. Change the index partition name with ALTER
INDEX
RENAME
.
Splitting or Merging Partitions Splitting or merging a table partition with ALTER
TABLE
renders the index partition(s) invalid. You must rebuild them with ALTER
INDEX
REBUILD
.
ONLINE
enables you to continue to perform updates, inserts, and deletes on a base table. It does not enable you to query the base table.
Note:
You can specifyREPLACE
or RESUME
when rebuilding an index or an index partition ONLINE
.Optionally specify paramstring
. If you do not specify paramstring
, then Oracle Text rebuilds the index with existing preference settings.
The syntax for paramstring is as follows:
paramstring =
'REPLACE [DATASTORE datastore_pref] [FILTER filter_pref] [LEXER lexer_pref] [WORDLIST wordlist_pref] [STORAGE storage_pref] [STOPLIST stoplist] [SECTION GROUP section_group] [MEMORY memsize [[POPULATE | NOPOPULATE] [INDEX SET index_set] [METADATA preference new_preference] [[METADATA] SYNC (MANUAL | EVERY "interval-string" | ON COMMIT)] [[METADATA] TRANSACTIONAL|NONTRANSACTIONAL | RESUME [memory memsize] | OPTIMIZE [token index_token | fast | full [maxtime (time | unlimited)] | SYNC [memory memsize] | ADD STOPWORD word [language language] | ADD ZONE SECTION section_name tag tag | ADD FIELD SECTION section_name tag tag [(VISIBLE | INVISIBLE)] | ADD ATTR SECTION section_name tag tag@attr | ADD STOP SECTION tag'
Rebuilds an index. You can optionally specify your own preferences, or system-defined preferences.
You can only replace preferences that are supported for that index type. For instance, you cannot replace index set for a CONTEXT
or CTXRULE
index. Similarly, for the CTXCAT
index type, you can replace lexer, wordlist, storage index set, and memory preferences.
POPULATE
is the default and does not need to be specified. If you want to empty the index of its contents, then specify NOPOPULATE
. Clear an index of its contents when you need to rebuild your index incrementally. The NOPOPULATE
choice is available for a specific partition of the index, and not just for the entire index.
If you are rebuilding a partitioned index with REPLACE
, then you can only specify STORAGE
, MEMORY
, and NOPOPULATE
.
See Also:
Chapter 2, "Oracle Text Indexing Elements" for more information about creating and setting preferences, including information about system-defined preferences.Replaces the existing preference
class settings, including SYNC
parameters, of the index with the settings from new_preference
. Only index preferences and attributes are replaced. The index is not rebuilt.
This command is useful for when you want to replace a preference and its attribute settings after the index is built, without reindexing all data. Reindexing data can require significant time and computing resources.
This command is also useful for changing the type of SYNC
, which can be automatic, manual, or on-commit.
ALTER INDEX REBUILD PARAMETER ('REPLACE METADATA')
does not work for a local partitioned index at the global level for the index. You cannot, for example, use this syntax to change a global preference, such as filter or lexer type, without rebuilding the index. Use ALTER INDEX PARAMETERS
instead to change the metadata of an index at the global level, including all partitions. Refer to "ALTER INDEX PARAMETERS Syntax".
When should I use the METADATA keyword? REPLACE METADATA
should be used only when the change in index metadata will not lead to an inconsistent index, which can lead to incorrect query results.
For example, use this command in the following instances:
To go from a single-language lexer to a multi-lexer in anticipation of multilingual data. For an example, see "Replacing Index Metadata: Changing Single-lexer to Multi-lexer".
To change the WILDCARD_MAXTERMS
setting in BASIC_WORDLIST.
To change the type of SYNC
, which can be automatic, manual, or on-commit.
These changes are safe and will not lead to an inconsistent index that might adversely affect your query results
Caution:
The REPLACE
METADATA
command can result in inconsistent index data, which can lead to incorrect query results. As such, Oracle does not recommend using this command, unless you carefully consider the effect it will have on the consistency of your index data and subsequent queries.
There can be many instances when changing metadata can result in inconsistent index data. For example, Oracle recommends against using the METADATA
keyword after doing the following procedures:
Changing the USER_DATASTORE procedure to a new PL/SQL stored procedure that has different output.
Changing the BASIC_WORDLIST attribute PREFIX_INDEX
from NO
to YES
because no prefixes have been generated for already-existing documents. Changing it from YES
to NO
is safe.
Adding or changing BASIC_LEXER
printjoin and skipjoin characters, because new queries with these characters would be lexed differently from how these characters were lexed at index time.
In these unsafe cases, Oracle recommends rebuilding the index.
Specify SYNC
for automatic synchronization of the CONTEXT
index when a DML change has occurred to the base table. You can specify one of the SYNC
methods shown in Table 1-1, "ALTER INDEX Sync Methods".
Table 1-1 ALTER INDEX Sync Methods
Sync Type | Description |
---|---|
|
No automatic synchronization. This is the default. You must manually synchronize the index with Use |
|
Automatically synchronize the index at a regular interval specified by the value of interval-string. interval-string takes the same syntax as that for scheduler jobs. Automatic synchronization using Make sure that interval-string is set to a long enough period so that any previous sync jobs will have completed. Otherwise, the sync job may hang. interval-string must be enclosed in double quotes ('' ''). See "Enabling Automatic Index Synchronization" for an example of automatic sync syntax. |
|
Synchronize the index immediately after a commit. The commit does not return until the sync is complete. (Because the synchronization is performed as a separate transaction, there may be a time period, usually small, when the data is committed but index changes are not.) The operation uses the memory specified with the memory parameter. Note that the sync operation has its own transaction context. If this operation fails, the data transaction still commits. Index synchronization errors are logged in the See "Enabling Automatic Index Synchronization" for an example of |
Each partition of a locally partitioned index can have its own type of sync: (ON COMMIT
, EVERY
, or MANUAL
). The type of sync specified in master parameter strings applies to all index partitions unless a partition specifies its own type.
With automatic (EVERY
) synchronization, you can specify memory size and parallel synchronization. The syntax is:
... EVERY interval_string MEMORY mem_size PARALLEL paradegree ...
ON
COMMIT
synchronizations can only be executed serially and at the same memory size as what was specified at index creation.
Note:
This command rebuilds the index. When you want to change theSYNC
setting without rebuilding the index, use the REBUILD REPLACE METADATA SYNC (MANUAL | ON COMMIT)
operation.This parameter enables you to turn the TRANSACTIONAL
property on or off. For more information, see "TRANSACTIONAL".
Using this parameter only succeeds if there are no rows in the DML pending queue. Therefore, you may need to sync the index before issuing this command.
To turn on the TRANSACTIONAL
index property:
ALTER INDEX myidx REBUILD PARAMETERS('replace metadata transactional');
or
ALTER INDEX myidx REBUILD PARAMETERS('replace transactional');
To turn off the TRANSACTIONAL
index property:
ALTER INDEX myidx REBUILD PARAMETERS('replace metadata nontransactional');
or
ALTER INDEX myidx REBUILD PARAMETERS('replace nontransactional');
Resumes a failed index operation. You can optionally specify the amount of memory to use with memsize
.
Note:
ThisALTER
INDEX
operation applies only to CONTEXT
and CTXRULE
indexes. It does not apply to CTXCAT
indexes.Note:
ThisALTER
INDEX
operation will not be supported in future releases.
To optimize your index, use CTX_DDL.OPTIMIZE_INDEX.
Optimizes the index. Specify token
, fast,
or full
optimization. You typically optimize after you synchronize the index.
When you optimize in token
mode, Oracle Text optimizes only index_token
. Use this method of optimization to quickly optimize index information for specific words.
When you optimize in fast
mode, Oracle Text works on the entire index, compacting fragmented rows. However, in fast
mode, old data is not removed.
When you optimize in full
mode, you can optimize the whole index or a portion. This method compacts rows and removes old data (deleted rows).
Note:
Optimizing infull
mode runs even when there are no deleted document rows. This is useful when you need to optimize time-limited batches with the maxtime
parameter.Use the maxtime
parameter to specify in minutes the time Oracle Text is to spend on the optimization operation. Oracle Text starts the optimization where it left off and optimizes until complete or until the time limit has been reached, whichever comes first. Specifying a time limit is useful for automating index optimization, where you set Oracle Text to optimize the index for a specified time on a regular basis.
When you specify maxtime
unlimited
, the entire index is optimized. This is the default. When you specify 0 for maxtime
, Oracle Text performs minimal optimization.
Log the progress of optimization by writing periodic progress updates to the CTX_OUTPUT
log. An event for CTX_OUTPUT.ADD_EVENT
, called CTX_OUTPUT.EVENT_OPT_PRINT_TOKEN
, prints each token as it is being optimized.
Note:
ThisALTER
INDEX
operation applies only to CONTEXT
and CTXRULE
indexes. It does not apply to CTXCAT
indexes.Note:
ThisALTER
INDEX
operation will not be supported in future releases.
To synchronize your index, use CTX_DDL.SYNC_INDEX.
Synchronizes the index. You can optionally specify the amount of runtime memory to use with memsize
. Synchronize the index when you have DML operations on your base table.
Note:
ThisALTER
INDEX
operation applies only to CONTEXT
and CTXRULE
indexes. It does not apply to CTXCAT
indexes.Memory Considerations The memory parameter memsize specifies the amount of memory Oracle Text uses for the ALTER
INDEX
operation before flushing the index to disk. Specifying a large amount of memory improves indexing performance because there is less I/O and improves query performance and maintenance because there is less fragmentation.
Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful if you want to track indexing progress or when run-time memory is scarce.
Dynamically adds a stopword word to the index.
Index entries for word
that existed before this operation are not deleted. However, subsequent queries on word are treated as though it has always been a stopword.
When your stoplist is a multi-language stoplist, you must specify language
.
The index is not rebuilt by this statement.
Dynamically adds the zone section section_name
identified by tag
to the existing index.
The added section section_name
applies only to documents indexed after this operation. For the change to take effect, you must manually re-index any existing documents that contain the tag.
The index is not rebuilt by this statement.
Note:
ThisALTER
INDEX
operation applies only to CONTEXT
and CTXRULE
indexes. It does not apply to ctxcat
indexes.See Also:
"Notes"Dynamically adds the field section section_name
identified by tag
to the existing index.
Optionally specify VISIBLE
to make the field sections visible. The default is INVISIBLE
.
See Also:
CTX_DDL.ADD_FIELD_SECTION for more information on visible and invisible field sections.The added section section_name
applies only to documents indexed after this operation. For the change to affect previously indexed documents, you must explicitly re-index the documents that contain the tag.
This statement does not rebuild the index.
Note:
ThisALTER
INDEX
operation applies only to CONTEXT
CTXRULE
indexes. It does not apply to CTXCAT
indexes.See Also:
"Notes".Dynamically adds an attribute section section_name to the existing index. You must specify the XML tag and attribute in the form tag@attr
. You can add attribute sections only to XML section groups.
The added section section_name applies only to documents indexed after this operation. Thus for the change to take effect, you must manually re-index any existing documents that contain the tag.
The index is not rebuilt by this statement.
Note:
ThisALTER
INDEX
operation applies only to CONTEXT
CTXRULE
indexes. It does not apply to CTXCAT
indexes.See Also:
"Notes".Dynamically adds the stop section identified by tag
to the existing index. As stop sections apply only to automatic sectioning of XML documents, the index must use the AUTO_SECTION_GROUP
section group. The tag you specify must be case sensitive and unique within the automatic section group or else ALTER
INDEX
raises an error.
The added stop section tag
applies only to documents indexed after this operation. For the change to affect previously indexed documents, you must explicitly re-index the documents that contain the tag.
The text within a stop section is always searchable.
The number of stop sections you can add is unlimited.
The index is not rebuilt by this statement.
See Also:
"Notes"Note:
ThisALTER
INDEX
operation applies only to CONTEXT
indexes. It does not apply to CTXCAT
indexes.Optionally specify with n the parallel degree for parallel indexing. This parameter is supported only when you use SYNC
, REPLACE
, and RESUME
in paramstring
. The actual degree of parallelism might be smaller depending on your resources.
Parallel indexing can speed up indexing when you have large amounts of data to index and when your operating system supports multiple CPUs.
The following statement resumes the indexing operation on newsindex
with 2 megabytes of memory:
ALTER INDEX newsindex REBUILD PARAMETERS('resume memory 2M');
The following statement rebuilds the index, replacing the stoplist preference with new_stop
.
ALTER INDEX newsindex REBUILD PARAMETERS('replace stoplist new_stop');
Rebuilding a Partitioned Index
The following example creates a partitioned text table, populates it, and creates a partitioned index. It then adds a new partition to the table and then rebuilds the index with ALTER
INDEX
as follows:
PROMPT create partitioned table and populate it create table part_tab (a int, b varchar2(40)) partition by range(a) (partition p_tab1 values less than (10), partition p_tab2 values less than (20), partition p_tab3 values less than (30)); insert into part_tab values (1,'Actinidia deliciosa'); insert into part_tab values (8,'Distictis buccinatoria'); insert into part_tab values (12,'Actinidia quinata'); insert into part_tab values (18,'Distictis Rivers'); insert into part_tab values (21,'pandorea jasminoides Lady Di'); insert into part_tab values (28,'pandorea rosea'); commit; PROMPT create partitioned index create index part_idx on part_tab(b) indextype is ctxsys.context local (partition p_idx1, partition p_idx2, partition p_idx3);
PROMPT add a partition and populate it alter table part_tab add partition p_tab4 values less than (40); insert into part_tab values (32, 'passiflora citrina'); insert into part_tab values (33, 'passiflora alatocaerulea'); commit;
The following statement rebuilds the index in the newly populated partition. In general, the index partition name for a newly added partition is the same as the table partition name, unless the name has already been used. In this case, Oracle Text generates a new name.
alter index part_idx rebuild partition p_tab4;
The following statement queries the table for the two hits in the newly added partition:
select * from part_tab where contains(b,'passiflora') >0; The following statement queries the newly added partition directly: select * from part_tab partition (p_tab4) where contains(b,'passiflora') >0;
Replacing Index Metadata: Changing Single-lexer to Multi-lexer
The following example demonstrates how an application can migrate from single-language documents (English) to multi-language documents (English and Spanish) by replacing the index metadata for the lexer.
REM create a simple table, which stores only english (American) text create table simple (text varchar2(80)); insert into simple values ('the quick brown fox'); commit; REM we'll create a simple lexer to lex this english text begin ctx_ddl.create_preference('us_lexer','basic_lexer'); end; / REM create a text index on the simple table create index simple_idx on simple(text) indextype is ctxsys.context parameters ('lexer us_lexer'); REM we can query easily select * from simple where contains(text, 'fox')>0; REM now suppose we want to start accepting spanish documents. REM first we have to extend the table with a language column alter table simple add (lang varchar2(10) default 'us'); REM now let's create a spanish lexer, begin ctx_ddl.create_preference('e_lexer','basic_lexer'); ctx_ddl.set_attribute('e_lexer','base_letter','yes'); end; / REM Then we create a multi-lexer incorporating our english and spanish lexers. REM Note that the DEFAULT lexer is the exact same lexer that we have already REM indexed all the documents with. begin ctx_ddl.create_preference('m_lexer','multi_lexer'); ctx_ddl.add_sub_lexer('m_lexer','default','us_lexer'); ctx_ddl.add_sub_lexer('m_lexer','spanish','e_lexer'); end; / REM now let's replace our metadata alter index simple_idx rebuild parameters ('replace metadata language column lang lexer m_lexer'); REM we're ready for some spanish data. Note that we could have inserted REM this BEFORE the alter index, as long as we didn't SYNC. insert into simple values ('el zorro marrón rápido', 'e'); commit; exec ctx_ddl.sync_index('simple_idx'); REM now we can query the spanish data with base lettering: select * from simple where contains(text, 'rapido')>0;
Optimizing the Index
To optimize your index, use CTX_DDL.OPTIMIZE_INDEX.
Synchronizing the Index
To synchronize your index, use CTX_DDL.SYNC_INDEX.
To add to the index the zone section author
identified by the tag <author>
, enter the following statement:
ALTER INDEX myindex REBUILD PARAMETERS('add zone section author tag author');
To add a stop section identified by tag <fluff>
to the index that uses the AUTO_SECTION_GROUP
, enter the following statement:
ALTER INDEX myindex REBUILD PARAMETERS('add stop section fluff');
Assume that the following text appears in an XML document:
<book title="Tale of Two Cities">It was the best of times.</book>
and that you want to create a separate section for the title attribute and you want to name the new attribute section booktitle
. To do so, enter the following statement:
ALTER INDEX myindex REBUILD PARAMETERS('add attr section booktitle tag title@book');
Using Flashback Queries
If a Text query is flashed back to a point before an ALTER
INDEX
statement was issued on the Text index for which the query is being run, then:
The query optimizer will not choose the index access path for that given index because the index is treated according to its creation time with ALTER
INDEX
. Therefore, to the query optimizer, the index is perceived not to exist.
The functional processing of the Text operator will fail with ORA-01466 or ORA-08176 errors if the ALTER
INDEX
statement involves re-creation of DR$
index tables.
To work around this issue, use the DBMS_FLASHBACK
package. For example:
EXEC dbms_flashback.enable_at_system_change_number(:scn); SELECT id from documents WHERE CONTAINS(text, 'oracle')>0; EXEC dbms_flashback.disable;
Note:
In previous releases, flashback Text queries usingAS
OF
predicates with Text operators such as CONTAINS
and CATSEARCH
are not supported.Notes
Before altering the index section information, Oracle Text checks the new section against the existing sections to ensure that all validity constraints are met. These constraints are the same for adding a section to a section group with the CTX_DDL
PL/SQL package and are as follows:
You cannot add zone, field, or stop sections to a NULL_SECTION_GROUP
.
You cannot add zone, field, or attribute sections to an automatic section group.
You cannot add attribute sections to anything other than XML section groups.
You cannot have the same tag for two different sections.
Section names for zone, field, and attribute sections cannot intersect.
You cannot exceed 64 fields per section.
You cannot add stop sections to basic, HTML, XML, or news section groups.
SENTENCE
and PARAGRAPH
are reserved section names.
Related Topics
CTX_DDL.SYNC_INDEX in Chapter 7, "CTX_DDL Package"
CTX_DDL.OPTIMIZE_INDEX in Chapter 7, "CTX_DDL Package"
Note:
This section describes theALTER
TABLE
statement as it pertains to adding and modifying a partitioned text table with a context domain index.
For a complete description of the ALTER
TABLE
statement, see Oracle Database SQL Language Reference.
Purpose
Use ALTER
TABLE
to add, modify, split, merge, exchange, or drop a partitioned text table with a context domain index. The following sections describe some of the ALTER
TABLE
operations.
Modify Partition Syntax
Unusable Local Indexes
ALTER TABLE [schema.]table MODIFY PARTITION partition UNUSABLE LOCAL INDEXES
Marks the index partition corresponding to the given table partition UNUSABLE
. You might mark an index partition unusable before you rebuild the index partition as described in "Rebuild Unusable Local Indexes".
If the index partition is not marked unusable, then the rebuild command returns without actually rebuilding the local index partition.
Rebuild Unusable Local Indexes
ALTER TABLE [schema.]table MODIFY PARTITION partition REBUILD UNUSABLE LOCAL INDEXES
Rebuilds the index partition corresponding to the specified table partition that has an UNUSABLE
status.
Note:
If the index partition status is alreadyVALID
before you enter this command, then this command does not rebuild the index partition. Do not depend on this command to rebuild the index partition unless the index partition status is UNUSABLE
.Add Partition Syntax
ALTER TABLE [schema.]table ADD PARTITION [partition] VALUES LESS THAN (value_list) [partition_description]
Adds a new partition to the high end of a range partitioned table.
To add a partition to the beginning or to the middle of the table, use ALTER TABLE SPLIT PARTITION
.
The newly added table partition is always empty, and the context domain index (if any) status for this partition is always VALID
. After doing DML, if you want to synchronize or optimize this newly added index partition, then you must look up the index partition name and enter the ALTER
INDEX
REBUILD
PARTITION
statement. For this newly added partition, index partition name is usually the same as the table partition name, but if the table partition name is already used by another index partition, the system assigns a name in the form of SYS_Pn
.
By querying the USER_IND_PARTITIONS
view and comparing the HIGH_VALUE
field, you can determine the index partition name for the newly added partition.
Merge Partition Syntax
ALTER TABLE [schema.]table MERGE PARTITIONS partition1, partition2 [INTO PARTITION [new_partition] [partition_description]] [UPDATE GLOBAL INDEXES]
Applies only to a range partition. This command merges the contents of two adjacent partitions into a new partition and then drops the original two partitions. If the resulting partition is non-empty, then the corresponding local domain index partition is marked UNUSABLE
. You can use ALTER TABLE MODIFY PARTITION
to rebuild the partition index.
For a global, non-partitioned index, if you perform the merge operation without an UPDATE GLOBAL INDEXES
clause, then the resulting index (if not NULL
) will be invalid and must be rebuilt. If you specify the UPDATE GLOBAL INDEXES
clause after the operation and the sync type is manual, then the index will be valid, but you will still need to synchronize the index with CTX_DDL.SYNC_INDEX for the update to take place.
The naming convention for the resulting index partition is the same as in ALTER TABLE ADD PARTITION
.
Split Partition Syntax
ALTER TABLE [schema.]table SPLIT PARTITION partition_name_old AT (value_list) [into (partition_description, partition_description)] [prallel_clause] [UPDATE GLOBAL INDEXES]
Applies only to range partition. This command divides a table partition into two partitions, thus adding a new partition to the table. The local corresponding index partitions will be marked UNUSABLE
if the corresponding table partitions are non-empty. Use ALTER TABLE MODIFY PARTITION
to rebuild the partition indexes.
For a global, non-partitioned index, if you perform the split operation without an UPDATE GLOBAL INDEXES
clause, then the resulting index (if not NULL
) will be invalid and must be rebuilt. If you specify the UPDATE GLOBAL INDEXES
clause after the operation and the sync type is manual, then the index will be valid, but you will still need to synchronize the index with CTX_DDL.SYNC_INDEX for the update to take place.
The naming convention for the two resulting index partition is the same as in ALTER TABLE ADD PARTITION
.
Exchange Partition Syntax
ALTER TABLE [schema.]table EXCHANGE PARTITION partition WITH TABLE table [INCLUDING|EXCLUDING INDEXES} [WITH|WITHOUT VALIDATION] [EXCEPTIONS INTO [schema.]table] [UPDATE GLOBAL INDEXES]
Converts a partition to a non-partitioned table, and converts a table to a partition of a partitioned table by exchanging their data segments. Rowids are preserved.
If EXCLUDING INDEXES
is specified, all the context indexes corresponding to the partition and all the indexes on the exchanged table are marked as UNUSABLE
. To rebuild the new index partition in this case, enter ALTER TABLE MODIFY PARTITION
.
If INCLUDING INDEXES
is specified, then for every local domain index on the partitioned table, there must be a non-partitioned domain index on the non-partitioned table. The local index partitions are exchanged with the corresponding regular indexes.
For a global, non-partitioned index, if you perform the exchange operation without an UPDATE GLOBAL INDEXES
clause, then the resulting index (if not NULL
) will be invalid and must be rebuilt. If you specify the UPDATE GLOBAL INDEXES
clause after the operation and the sync type is manual, then the index will be valid, but you will still need to synchronize the index with CTX_DDL.SYNC_INDEX for the update to take place.
Field Sections
Field section queries might not work the same if the non-partitioned index and local index use different section IDs for the same field section.
Storage
Storage is not changed. So if the index on the non-partitioned table $I table was in tablespace XYZ, then after the exchange partition, it will still be in tablespace XYZ, but now it is the $I table for an index partition.
Storage preferences are not switched, so if you switch and then rebuild the index, then the table may be created in a different location.
Restrictions
Both indexes must be equivalent. They must use the same objects and the same settings for each object. Note that Oracle Text only checks that the indexes are using the same object. But they should use the same exact everything.
No index object can be partitioned, that is, when the user has used the storage object to partition the $I, $N tables.
If either index or index partition does not meet all these restrictions an error is raised and both the index and index partition will be INVALID
. The user needs to manually rebuild both index and index partition using ALTER INDEX REBUILD
.
Truncate Partition Syntax
ALTER TABLE [schema.]table TRUNCATE PARTITION [DROP|REUSE STORAGE] [UPDATE GLOBAL INDEXES]
Removes all rows from a partition in a table. Corresponding CONTEXT
index partitions are also removed.
For a global, non-partitioned index, if you perform the truncate operation without an UPDATE GLOBAL INDEXES
clause, then the resulting index (if not NULL
) will be invalid and must be rebuilt. If you specify the UPDATE GLOBAL INDEXES
clause after the operation, the index will be valid.
ALTER TABLE Examples
Global Index on Partitioned Table Examples
The following example creates a range-partitioned table with three partitions. Each partition is populated with two rows. A global, non-partitioned CONTEXT
index is then created. To demonstrate the UPDATE GLOBAL INDEXES
clause, the partitions are split and merged with an index synchronization.
create table tdrexglb_part(a int, b varchar2(40)) partition by range(a) (partition p1 values less than (10), partition p2 values less than (20), partition p3 values less than (30)); insert into tdrexglb_part values (1,'row1'); insert into tdrexglb_part values (8,'row2'); insert into tdrexglb_part values (11,'row11'); insert into tdrexglb_part values (18,'row18'); insert into tdrexglb_part values (21,'row21'); insert into tdrexglb_part values (28,'row28'); commit; create index tdrexglb_parti on tdrexglb_part(b) indextype is ctxsys.context; create table tdrexglb(a int, b varchar2(40)); insert into tdrexglb values(20,'newrow20'); commit; PROMPT make sure query works select * from tdrexglb_part where contains(b,'row18') >0; PROMPT split partition alter table tdrexglb_part split partition p2 at (15) into (partition p21, partition p22) update global indexes; PROMPT before sync select * from tdrexglb_part where contains(b,'row11') >0; select * from tdrexglb_part where contains(b,'row18') >0; exec ctx_ddl.sync_index('tdrexglb_parti') PROMPT after sync select * from tdrexglb_part where contains(b,'row11') >0; select * from tdrexglb_part where contains(b,'row18') >0; PROMPT merge partition alter table tdrexglb_part merge partitions p22, p3 into partition pnew3 update global indexes; PROMPT before sync select * from tdrexglb_part where contains(b,'row18') >0; select * from tdrexglb_part where contains(b,'row28') >0; exec ctx_ddl.sync_index('tdrexglb_parti'); PROMPT after sync select * from tdrexglb_part where contains(b,'row18') >0; select * from tdrexglb_part where contains(b,'row28') >0; PROMPT drop partition alter table tdrexglb_part drop partition p1 update global indexes; PROMPT before sync select * from tdrexglb_part where contains(b,'row1') >0; exec ctx_ddl.sync_index('tdrexglb_parti'); PROMPT after sync select * from tdrexglb_part where contains(b,'row1') >0; PROMPT exchange partition alter table tdrexglb_part exchange partition pnew3 with table tdrexglb update global indexes; PROMPT before sync select * from tdrexglb_part where contains(b,'newrow20') >0; select * from tdrexglb_part where contains(b,'row28') >0; exec ctx_ddl.sync_index('tdrexglb_parti'); PROMPT after sync select * from tdrexglb_part where contains(b,'newrow20') >0; select * from tdrexglb_part where contains(b,'row28') >0; PROMPT move table partition alter table tdrexglb_part move partition p21 update global indexes; PROMPT before sync select * from tdrexglb_part where contains(b,'row11') >0; exec ctx_ddl.sync_index('tdrexglb_parti'); PROMPT after sync select * from tdrexglb_part where contains(b,'row11') >0; PROMPT truncate table partition alter table tdrexglb_part truncate partition p21 update global indexes; update global indexes;
Use the CATSEARCH
operator to search CTXCAT indexes. Use this operator in the WHERE
clause of a SELECT
statement.
The CATSEARCH
operator also supports database links. You can identify a remote table or materialized view by appending @dblink
to the end of its name. The dblink
must be a complete or partial name for a database link to the database containing the remote table or materialized view. (Indexing of remote views is not supported.)
The grammar of this operator is called CTXCAT
. You can also use the CONTEXT
grammar if your search criteria require special functionality, such as thesaurus, fuzzy matching, proximity searching or stemming. To utilize the CONTEXT
grammar, use the Query Template Specification in the text_query
parameter as described in this section.
About Performance
Use the CATSEARCH
operator with a CTXCAT index mainly to improve mixed query performance. Specify your text query condition with text_query
and your structured condition with structured_query
.
Internally, Oracle Text uses a combined b-tree index on text and structured columns to quickly produce results satisfying the query.
Limitations
If the optimizer chooses to use the functional query invocation, then your query will fail. The optimizer might choose functional invocation when your structured clause is highly selective.
The structured_query
argument of the CATSEARCH
operator must reference columns used during CREATE
INDEX
sets; otherwise, error DRG-10845 will be raised. For example, the error will be raised if you issue a CATSEARCH
query on a view created on top of a table with CTXCAT
index on it, and the name of the logical column on the view is different from the actual column name on the physical table. The columns referenced by the structured_query
argument of CATSEARCH
operator must be the physical column name used during CREATE
INDEX
sets, not the logical column on the view.
Syntax
CATSEARCH(
[schema.]column, text_query [VARCHAR2|CLOB], structured_query VARCHAR2,
RETURN NUMBER;
Specify the text column to be searched on. This column must have a CTXCAT
index associated with it.
Specify one of the following to define your search in column
.
Query Template Specification (for using CONTEXT
grammar)
CATSEARCH query operations
The CATSEARCH
operator supports only the following query operations:
Logical AND
Logical OR
(|)
Logical NOT
(-)
" " (quoted phrases)
Wildcarding
The following table provides the syntax for these operators.
Table 1-2 CATSEARCH Query Operators
Operation | Syntax | Description of Operation |
---|---|---|
Logical |
a b c |
Returns rows that contain a, b and c. |
Logical |
a | b | c |
Returns rows that contain a, b, or c. |
Logical |
a - b |
Returns rows that contain a and not b. |
Hyphen with no space |
a-b |
Hyphen treated as a regular character. For example, if the hyphen is defined as skipjoin, words such as web-site are treated as the single query term website. Likewise, if the hyphen is defined as a printjoin, words such as web-site are treated as web-site in the |
" " |
"a b c" |
Returns rows that contain the phrase "a b c". For example, entering "Sony CD Player" means return all rows that contain this sequence of words. |
( ) |
(A B) | C |
Parentheses group operations. This query is equivalent to the |
Wildcard (right and double truncated) |
term* a*b |
The wildcard character matches zero or more characters. For example, do* matches dog, and gl*s matches glass. Left truncation not supported. Note: Oracle recommends that you create a prefix index if your application uses wildcard searching. Set prefix indexing with the BASIC_WORDLIST preference. |
The following limitations apply to these operators:
The left-hand side (the column name) must be a column named in at least one of the indexes of the index set.
The left-hand side must be a plain column name. Functions and expressions are not allowed.
The right-hand side must be composed of literal values. Functions, expressions, other columns, and subselects are not allowed.
Multiple criteria can be combined with AND
. Note that OR
is not supported.
When querying a remote table through a database link, the database link must be specified for CATSEARCH
as well as for the table being queried.
For example, these expressions are supported:
catsearch(text, 'dog', 'foo > 15') catsearch(text, 'dog', 'bar = ''SMITH''') catsearch(text, 'dog', 'foo between 1 and 15') catsearch(text, 'dog', 'foo = 1 and abc = 123') catsearch@remote(text, 'dog', 'foo = 1 and abc = 123')
And these expressions are not supported:
catsearch(text, 'dog', 'upper(bar) = ''A''') catsearch(text, 'dog', 'bar LIKE ''A%''') catsearch(text, 'dog', 'foo = abc') catsearch(text, 'dog', 'foo = 1 or abc = 3')
Specify a marked-up string that specifies a query template. Specify one of the following templates:
Query rewrite, used to expand a query string into different versions
Progressive relaxation, used to progressively enter less restrictive versions of a query to increase recall
Alternate grammar, used to specify CONTAINS
operators (See CONTEXT Query Grammar Examples)
Alternate language, used to specify alternate query language
Alternate scoring, used to specify alternate scoring algorithms
See Also:
The text_query parameter description forCONTAINS
for more information about the syntax for these query templates.Specify the structured conditions and the ORDER
BY
clause. There must exist an index for any column you specify. For example, if you specify 'category_id=1 order by bid_close'
, you must have an index for 'category_id, bid_close'
as specified with CTX_DDL
.ADD_INDEX
.
With structured_query
, you can use standard SQL syntax with only the following operators:
=
<=
>=
>
<
IN
BETWEEN
AND
(to combine two or more clauses)
Note:
You cannot use parentheses () in thestructured_query
parameter.Examples
Create the Table
The following statement creates the table to be indexed.
CREATE TABLE auction (category_id number primary key, title varchar2(20), bid_close date);
The following table inserts the values into the table:
INSERT INTO auction values(1, 'Sony CD Player', '20-FEB-2000'); INSERT INTO auction values(2, 'Sony CD Player', '24-FEB-2000'); INSERT INTO auction values(3, 'Pioneer DVD Player', '25-FEB-2000'); INSERT INTO auction values(4, 'Sony CD Player', '25-FEB-2000'); INSERT INTO auction values(5, 'Bose Speaker', '22-FEB-2000'); INSERT INTO auction values(6, 'Tascam CD Burner', '25-FEB-2000'); INSERT INTO auction values(7, 'Nikon digital camera', '22-FEB-2000'); INSERT INTO auction values(8, 'Canon digital camera', '26-FEB-2000');
Create the CTXCAT
Index
The following statements create the CTXCAT
index:
begin
ctx_ddl.create_index_set('auction_iset'); ctx_ddl.add_index('auction_iset','bid_close');
end; / CREATE INDEX auction_titlex ON auction(title) INDEXTYPE IS CTXSYS.CTXCAT PARAMETERS ('index set auction_iset');
Query the Table
A typical query with CATSEARCH
might include a structured clause as follows to find all rows that contain the word camera ordered by bid_close
:
SELECT * FROM auction WHERE CATSEARCH(title, 'camera', 'order by bid_close desc')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 8 Canon digital camera 26-FEB-00 7 Nikon digital camera 22-FEB-00
The following query finds all rows that contain the phrase Sony CD Player and that have a bid close date of February 20, 2000:
SELECT * FROM auction WHERE CATSEARCH(title, '"Sony CD Player"', 'bid_close=''20-FEB-00''')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 1 Sony CD Player 20-FEB-00
The following query finds all rows with the terms Sony and CD and Player:
SELECT * FROM auction WHERE CATSEARCH(title, 'Sony CD Player', 'order by bid_close desc')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 4 Sony CD Player 25-FEB-00 2 Sony CD Player 24-FEB-00 1 Sony CD Player 20-FEB-00
The following query finds all rows with the term CD and not Player:
SELECT * FROM auction WHERE CATSEARCH(title, 'CD - Player', 'order by bid_close desc')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 6 Tascam CD Burner 25-FEB-00
The following query finds all rows with the terms CD or DVD or Speaker:
SELECT * FROM auction WHERE CATSEARCH(title, 'CD | DVD | Speaker', 'order by bid_close desc')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 3 Pioneer DVD Player 25-FEB-00 4 Sony CD Player 25-FEB-00 6 Tascam CD Burner 25-FEB-00 2 Sony CD Player 24-FEB-00 5 Bose Speaker 22-FEB-00 1 Sony CD Player 20-FEB-00
The following query finds all rows that are about audio equipment:
SELECT * FROM auction WHERE CATSEARCH(title, 'ABOUT(audio equipment)', NULL)> 0;
CONTEXT Query Grammar Examples
The following examples show how to specify the CONTEXT
grammar in CATSEARCH
queries using the template feature.
PROMPT PROMPT fuzzy: query = ?test PROMPT should match all fuzzy variations of test (for example, text) select pk||' ==> '||text from test where catsearch(text, '<query> <textquery grammar="context"> ?test </textquery> </query>','')>0 order by pk; PROMPT PROMPT fuzzy: query = !sail PROMPT should match all soundex variations of bot (for example, sell) select pk||' ==> '||text from test where catsearch(text, '<query> <textquery grammar="context"> !sail </textquery> </query>','')>0 order by pk; PROMPT PROMPT theme (ABOUT) query PROMPT query: about(California) select pk||' ==> '||text from test where catsearch(text, '<query> <textquery grammar="context"> about(California) </textquery> </query>','')>0 order by pk;
The following example shows a field section search against a CTXCAT
index using CONTEXT
grammar by means of a query template in a CATSEARCH
query.
-- Create and populate table create table BOOKS (ID number, INFO varchar2(200), PUBDATE DATE); insert into BOOKS values(1, '<author>NOAM CHOMSKY</author><subject>CIVIL RIGHTS</subject><language>ENGLISH</language><publisher>MIT PRESS</publisher>', '01-NOV-2003'); insert into BOOKS values(2, '<author>NICANOR PARRA</author><subject>POEMS AND ANTIPOEMS</subject><language>SPANISH</language> <publisher>VASQUEZ</publisher>', '01-JAN-2001'); insert into BOOKS values(1, '<author>LUC SANTE</author><subject>XML DATABASE</subject><language>FRENCH</language><publisher>FREE PRESS</publisher>', '15-MAY-2002'); commit; -- Create index set and section group exec ctx_ddl.create_index_set('BOOK_INDEX_SET'); exec ctx_ddl.add_index('BOOKSET','PUBDATE'); exec ctx_ddl.create_section_group('BOOK_SECTION_GROUP', 'BASIC_SECTION_GROUP'); exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','AUTHOR','AUTHOR'); exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','SUBJECT','SUBJECT'); exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','LANGUAGE','LANGUAGE'); exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','PUBLISHER','PUBLISHER'); -- Create index create index books_index on books(info) indextype is ctxsys.ctxcat parameters('index set book_index_set section group book_section_group'); -- Use the index -- Note that: even though CTXCAT index can be created with field sections, it -- cannot be accessed using CTXCAT grammar (default for CATSEARCH). -- We need to use query template with CONTEXT grammar to access field -- sections with CATSEARCH select id, info from books where catsearch(info, '<query> <textquery grammar="context"> NOAM within author and english within language </textquery> </query>', 'order by pubdate')>0;
Related Topics
Syntax for CTXCAT Index Type in this chapter.
Use the CONTAINS
operator in the WHERE
clause of a SELECT
statement to specify the query expression for a Text query.
The CONTAINS
operator also supports database links. You can identify a remote table or materialized view by appending @dblink
to the end of its name. The dblink
must be a complete or partial name for a database link to the database containing the remote table or materialized view. (Querying of remote views is not supported.)
CONTAINS
returns a relevance score for every row selected. Obtain this score with the SCORE operator.
The grammar for this operator is called the CONTEXT
grammar. You can also use CTXCAT
grammar if your application works better with simpler syntax. To do so, use the Query Template Specification in the text_query
parameter as described in this section.
See Also:
"The CONTEXT Grammar" topic in Oracle Text Application Developer's GuideSyntax
CONTAINS( [schema.]column, text_query [VARCHAR2|CLOB] [,label NUMBER]) RETURN NUMBER;
Specify the text column to be searched on. This column must have a Text index associated with it.
Specify one of the following:
The query expression that defines your search in column
.
A marked-up document that specifies a query template. Use one of the following templates:
Use this template to automatically write different versions of a query before you submit the query to Oracle Text. This is useful when you need to maximize the recall of a user query. For example, you can program your application to expand a single phrase query of 'cat dog' into the following queries:
{cat} {dog} {cat} ; {dog} {cat} AND {dog} {cat} ACCUM {dog}
These queries are submitted as one query and results are returned with no duplication. In this example, the query returns documents that contain the phrase cat dog as well as documents in which cat is near dog, and documents that have cat and dog.
This is done with the following template:
<query> <textquery lang="ENGLISH" grammar="CONTEXT"> cat dog <progression> <seq><rewrite>transform((TOKENS, "{", "}", " "))</rewrite></seq> <seq><rewrite>transform((TOKENS, "{", "}", " ; "))</rewrite></seq> <seq><rewrite>transform((TOKENS, "{", "}", "AND"))</rewrite></seq> <seq><rewrite>transform((TOKENS, "{", "}", "ACCUM"))</rewrite></seq> </progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/> </query>
The operator TRANSFORM
is used to specify the rewrite rules and has the following syntax (note that it uses double parentheses):
TRANSFORM((terms, prefix, suffix, connector))
Table 1-3 TRANSFORM Parameters
Parameter | Description |
---|---|
term |
Specify the type of terms to be produced from the original query. Specify either Specifying |
prefix |
Specify the literal string to be prepended to all terms. |
suffix |
Specify the literal string to be appended to all terms. |
connector |
Specify the literal string to connect all terms after applying prefix and suffix. |
Note:
An error will be raised if the input Text query string specified in the Query Rewrite Template withTRANSFORM
rules contains any Oracle Text query operators (such as AND
, OR
, or SOUNDEX
). Also, any special characters (such as %
or $
) in the input Text query string will need to be escaped, or an error will be raised.Use this template to progressively relax your query. Progressive relaxation is when you increase recall by progressively issuing less restrictive versions of a query, so that your application can return an appropriate number of hits to the user.
For example, the query of black pen can be progressively relaxed to:
black pen black NEAR pen black AND pen black ACCUM pen
This is done with the following template
<query> <textquery lang="ENGLISH" grammar="CONTEXT"> <progression> <seq>black pen</seq> <seq>black NEAR pen</seq> <seq>black AND pen</seq> <seq>black ACCUM pen</seq> </progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/> </query>
Use this template to specify an alternate grammar, such as CONTEXT
or CATSEARCH
. Specifying an alternate grammar enables you to enter queries using different syntax and operators.
For example, with CATSEARCH
, enter ABOUT
queries using the CONTEXT
grammar. Likewise with CONTAINS
, enter logical queries using the simplified CATSEARCH
syntax.
The phrase 'dog cat mouse' is interpreted as a phrase in CONTAINS
. However, with CATSEARCH
this is equivalent to a AND
query of 'dog AND cat AND mouse'. To specify that CONTAINS
use the alternate grammar, enter the following template:
<query> <textquery grammar="CTXCAT">dog cat mouse</textquery> <score datatype="integer"/> </query>
Use this template to specify an alternate language.
<query><textquery lang="french">bon soir</textquery></query>
Use this template to specify an alternative scoring algorithm.
The following example specifies that the query use the CONTEXT
grammar and return integer scores using the COUNT
algorithm. This algorithm returns score as the number of query occurrences in the document.
<query> <textquery grammar="CONTEXT" lang="english"> mustang </textquery> <score datatype="INTEGER" algorithm="COUNT"/> </query>
The following example uses the normalization_expr
attribute to add SDATA
(price) into the score returned by the query, and uses it as the final score.
<query> <textquery grammar="CONTEXT" lang="english"> DEFINESCORE(dog, RELEVANCE) and cat </textquery> <score algorithm="COUNT" normalization_expr ="doc_score+ SDATA(price)"/> </query>
The normalization_expr
attribute is used only with the alternate scoring template, and is an arithmetic expression that consists of:
Arithmetic operators: + - * /. The operator precedence is the same as that for SQL operator precedence.
Grouping operators: (). Parentheses can be used to alter the precedence of the arithmetic operators.
Absolute function: ABS(n) returns the absolute value of n; where n is any expression that returns a number.
Logarithmic function: LOG(n): returns the base-10 logarithmic value of n; where n is any expression that returns a number.
Predefined components: The doc_score
predefined component can be used to return the initial query score of a particular document.
SDATA
component: SDATA
(name) returns the value of the SDATA
with the specified name as the score.
Only SDATA
with NUMBER
or DATE
datatype is allowed. An error is raised otherwise.
The sdata string and the SDATA
name are case-insensitive.
As an SDATA
section value can be NULL, any expression with NULL SDATA
section value is evaluated as 0. For example: the normalization_expr "doc_score + SDATA(price)"
will be evaluated to 0 if SDATA(price)
for a given document has NULL value.
Numeric literals: Any number literal that conforms to SQL's pattern of NUMBER
literal and is within the range of the double precision floating point (-3.4e38
to 3.4e38
).
Date Literals: Date literals must be enclosed with DATE
(). Only the following format is allowed: YYYY-MM-DD
or YYYY-MM-DD HH24:MI:SS
. For example: DATE(2005-11-08)
.Consistent with SQL, if no time is specified, then 00:00:00
is assumed.
Only the minus (-
) operator is allowed between date-type data (DATE
literals and date-type SDATA
). Using other operators will result in an error. Subtracting two date-type data will produce a number (float) that represents the difference in number of days between the two dates. For example, the following expression is allowed:
SDATA(dob) – DATE(2005-11-08)
and the following expression is not allowed:
SDATA(dob) + DATE(2005-11-08)
The plus (+
) and minus (-
) operators are allowed between numeric data and date type of data. The number operand is interpreted as the number or fraction of day(s). For example, the following expression is allowed:
DATE(2005-11-08) + 1 = 9 NOV 2005
and the following expression is not allowed:
DATE(2005-11-08)* 3 = ERROR
Template Attribute Values
Table 1-4 gives the possible values for template attributes:
Table 1-4 Template Attribute Values
Tag Attribute | Description | Possible Values | Meaning |
---|---|---|---|
grammar= |
Specify the grammar of the query. |
|
|
datatype= |
Specify the type of number returned as score. |
|
Returns score as integer between 0 and 100. Returns score as its high precision floating point number between 0 and 100. |
algorithm= |
Specify the scoring algorithm to use. |
|
Default. Returns scores as the number of occurrences in document. |
lang= |
Specify the language name. |
Any language supported by Oracle Database. See the Oracle Database Globalization Support Guide. |
Template Grammar Definition
The query template interface is an XML document. Its grammar is defined with the following XML DTD:
<!ELEMENT query (textquery, score?)> <!ELEMENT textquery (#PCDATA|progression)*> <!ELEMENT progression (seq)+> <!ELEMENT seq (#PCDATA|rewrite)*> <!ELEMENT rewrite (#PCDATA)> <!ELEMENT score EMPTY> <!ATTLIST textquery grammar (context | ctxcat) #IMPLIED> <!ATTLIST textquery language CDATA #IMPLIED> <!ATTLIST score datatype (integer | float) "integer"> <!ATTLIST score algorithm (default | count) "default">
All tags and attributes values are case-sensitive.
See Also:
Chapter 3, "Oracle Text CONTAINS Query Operators" for more information about the operators in query expressions.Optionally, specify the label that identifies the score generated by the CONTAINS
operator.
Returns
For each row selected, CONTAINS
returns a number between 0 and 100 that indicates how relevant the document row is to the query. The number 0 means that Oracle Text found no matches in the row.
Note:
You must use theSCORE
operator with a label to obtain this number.The following example searches for all documents in the in the text
column that contain the word oracle. The score for each row is selected with the SCORE
operator using a label of 1:
SELECT SCORE(1), title from newsindex WHERE CONTAINS(text, 'oracle', 1) > 0;
The CONTAINS
operator must be followed by an expression such as > 0, which specifies that the score value calculated must be greater than zero for the row to be selected.
When the SCORE
operator is called (for example, in a SELECT
clause), the CONTAINS
clause must reference the score label value as in the following example:
SELECT SCORE(1), title from newsindex WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;
The following example specifies that the query be parsed using the CATSEARCH
grammar:
SELECT id FROM test WHERE CONTAINS (text, '<query> <textquery lang="ENGLISH" grammar="CATSEARCH"> cheap pokemon </textquery> <score datatype="INTEGER"/> </query>' ) > 0;
Grammar Template Example
The following example shows how to use the CTXCAT
grammar in a CONTAINS
query. The example creates a CTXCAT
and a CONTEXT
index on the same table, and compares the query results:
PROMPT create context and ctxcat indexes both with theme indexing on PROMPT create index tdrbqcq101x on test(text) indextype is ctxsys.context parameters ('lexer theme_lexer'); create index tdrbqcq101cx on test(text) indextype is ctxsys.ctxcat parameters ('lexer theme_lexer'); PROMPT ***** San Diego *********** PROMPT ***** CONTEXT grammar *********** PROMPT ** should be interpreted as phrase query ** select pk||' ==> '||text from test where contains(text,'San Diego')>0 order by pk; PROMPT ***** San Diego *********** PROMPT ***** CTXCAT grammar *********** PROMPT ** should be interpreted as AND query *** select pk||' ==> '||text from test where contains(text, '<query> <textquery grammar="CTXCAT">San Diego</textquery> <score datatype="integer"/> </query>')>0 order by pk; PROMPT ***** Hitlist from CTXCAT index *********** select pk||' ==> '||text from test where catsearch(text,'San Diego','')>0 order by pk;
Alternate Scoring Query Template Example
The following query template adds price SDATA
section (or SDATA
filter-by column) value into the score returned by the query and uses it as the final score.
<query> <textquery grammar="CONTEXT" lang="english"> DEFINESCORE(dog, RELEVANCE) and cat </textquery> <score algorithm="COUNT" normalization_expr ="doc_score+SDATA(price)"/> </query>
Query Relaxation Template Example
The following query template defines a query relaxation sequence. The query of black pen is entered in sequence as black pen then black NEAR pen then black AND pen then black ACCUM pen. Query hits are returned in this sequence with no duplication as long as the application needs results.
select id from docs where CONTAINS (text, ' <query> <textquery lang="ENGLISH" grammar="CONTEXT"> <progression> <seq>black pen</seq> <seq>black NEAR pen</seq> <seq>black AND pen</seq> <seq>black ACCUM pen</seq> </progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/> </query>')>0;
Query relaxation is most effective when your application needs the top n hits to a query, which you can obtain with the FIRST_ROWS
hint or in a PL/SQL cursor.
Query Rewrite Example
The following template defines a query rewrite sequence. The query of kukui nut is rewritten as follows:
{kukui} {nut}
{kukui} ; {nut}
{kukui} AND {nut}
{kukui} ACCUM {nut}
select id from docs where CONTAINS (text, ' <query> <textquery lang="ENGLISH" grammar="CONTEXT"> kukui nut <progression> <seq><rewrite>transform((TOKENS, "{", "}", " "))</rewrite></seq> <seq><rewrite>transform((TOKENS, "{", "}", " ; "))</rewrite>/seq> <seq><rewrite>transform((TOKENS, "{", "}", "AND"))</rewrite><seq/> <seq><rewrite>transform((TOKENS, "{", "}", "ACCUM"))</rewrite><seq/> </progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/> </query>')>0;
Notes
Querying Multi-Language Tables
With the multi-lexer preference, you can create indexes from multi-language tables. At query time, the multi-lexer examines the session's language setting and uses the sub-lexer preference for that language to parse the query. If the language setting is not mapped, then the default lexer is used.
When the language setting is mapped, the query is parsed and run as usual. The index contains tokens from multiple languages, so such a query can return documents in several languages.
To limit your query to returning document of a given language, use a structured clause on the language column.
Query Performance Limitation with a Partitioned Index
Oracle Text supports the CONTEXT
indexing and querying of a partitioned text table.
However, for optimal performance when querying a partitioned table with an ORDER
BY
SCORE
clause, query the partition. If you query the entire table and use an ORDER
BY
SCORE
clause, the query might not perform optimally unless you include a range predicate that can limit the query to a single partition.
For example, the following statement queries the partition p_tab4 partition directly:
select * from part_tab partition (p_tab4) where contains(b,'oracle') > 0 ORDER BY SCORE DESC;
Related Topics
Syntax for CONTEXT Index Type in this chapter
Chapter 3, "Oracle Text CONTAINS Query Operators"
"The CONTEXT Grammar" topic in Oracle Text Application Developer's Guide
SCORE in this chapter
This section describes the CREATE
INDEX
statement as it pertains to creating an Oracle Text domain index and composite domain index.
See Also:
Oracle Database SQL Language Reference for a complete description of theCREATE
INDEX
statementPurpose
Use CREATE
INDEX
to create an Oracle Text index. An Oracle Text index is an Oracle Database domain index or composite domain index of type CONTEXT
, CTXCAT,
CTXRULE,
or CTXXPATH
. A domain index is an application-specific index. A composite domain index (CDI) is an Oracle Text index that not only indexes and processes a specified text column, but also indexes and processes FILTER
BY
and ORDER
BY
structured columns, which are specified during index creation.
You must create an appropriate Oracle Text index to enter CONTAINS
, CATSEARCH
, or MATCHES
queries.
You cannot create an Oracle Text index on an index-organized table.
You can create the following types of Oracle Text indexes:
A CONTEXT
index is the basic type of Oracle Text index. This is an index on a text column. A CONTEXT
index is useful when your source text consists of many large, coherent documents. Query this index with the CONTAINS
operator in the WHERE
clause of a SELECT
statement. This index requires manual synchronization after DML. See "Syntax for CONTEXT Index Type".
The CTXCAT
type of index is a combined index on a text column and one or more other columns. CTXCAT
is typically used to index small documents or text fragments, such as item names, prices and descriptions found in catalogs. Query this index with the CATSEARCH
operator in the WHERE
clause of a SELECT
statement. This type of index is optimized for mixed queries. This index is transactional, automatically updating itself with DML to the base table. See "Syntax for CTXCAT Index Type".
A CTXRULE
index is used to build a document classification application. The CTXRULE
index is an index created on a table of queries or a column containing a set of queries, where the queries serve as rules to define the classification criteria. Query this index with the MATCHES
operator in the WHERE
clause of a SELECT
statement. See "Syntax for CTXRULE Index Type".
Create this index when you need to speed up existsNode()
queries on an XMLType column. See "Syntax for CTXXPATH Index Type".
You do not need the CTXAPP
role to create an Oracle Text index. If you have Oracle Database grants to create a b-tree index on the text column, you have sufficient permission to create a text index. The issuing owner, table owner, and index owner can all be different users, which is consistent with Oracle standards for creating regular B-tree indexes.
Use a CONTEXT
index to create an index on a text column. Query this index with the CONTAINS
operator in the WHERE
clause of a SELECT
statement. This index requires manual synchronization after DML.
CREATE INDEX [schema.]index ON [schema.]table(txt_column) INDEXTYPE IS ctxsys.context [ONLINE] [FILTER BY filter_column[, filter_column]...] [ORDER BY oby_column[desc|asc][, oby_column[desc|asc]]...] [LOCAL [(PARTITION [partition] [PARAMETERS('paramstring')]
[, PARTITION [partition] [PARAMETERS('paramstring')]])]
[PARAMETERS(paramstring)] [PARALLEL n] [UNUSABLE]];
Specify the name of the Text index to create.
Specify the name of the table and column to index. txt_column is the name of the domain index column on which the CONTAINS()
operator will be invoked.
Your table can optionally contain a primary key if you prefer to identify your rows as such when you use procedures in CTX_DOC
. When your table has no primary key, document services identifies your documents by ROWID
.
The column that you specify must be one of the following types: CHAR
, VARCHAR
, VARCHAR2
, BLOB
, CLOB
, BFILE
, XMLType
, or URIType
.
The table that you specify can be a partitioned table. If you do not specify the LOCAL
clause, then a global, non-partitioned index is created.
DATE
, NUMBER
, and nested table columns cannot be indexed. Object columns also cannot be indexed, but their attributes can be indexed, provided that they are atomic data types.
Attempting to create an index on a Virtual Private Database (VPD) protected table will fail unless one of the following criteria is true:
The VPD policy is created such that it does not apply to INDEX
statement type.
The policy function returns a NULL predicate for the current user.
The user (or index owner) is SYS
.
The user has the EXEMPT
ACCESS
POLICY
privilege.
Indexes on multiple columns are not supported with the CONTEXT
index type. You must specify only one column in the column list.
Note:
With theCTXCAT
index type, you can create indexes on text and structured columns. See "Syntax for CTXCAT Index Type" in this chapter.Creates the index while enabling DML inserts/updates/deletes on the base table.
During indexing, Oracle Text enqueues DML requests in a pending queue. At the end of the index creation, Oracle Text locks the base table. During this time DML is blocked. You must synchronize the index in order for DML changes to be available.
Limitations
The following limitations apply to using ONLINE
:
At the very beginning or very end of the ONLINE
process, DML might fail.
ONLINE
is supported for CONTEXT
indexes only.
This is the structured indexed column on which a range or equality predicate in the WHERE
clause of a mixed query will operate. You can specify one or more structured columns for filter_column
, on which the relational predicates are expected to be specified along with the CONTAINS()
predicate in a query.
The cost-based optimizer (CBO) will consider pushing down the structured predicates on these FILTER
BY
columns with the following relational operators: <
, <=
, =
, >=
, >
, between
, and LIKE
(for VARCHAR2
).
These columns can only be of CHAR
, NUMBER
, DATE
, VARCHAR2
, or RAW
type. Additionally, VARCHAR2
and RAW
types are only supported if the maximum length is specified and is limited to no more than 249
. ADT
attributes of supported types (CHAR
, NUMBER
, DATE
, VARCHAR2
, or RAW
) are also allowed. An error is raised for all other datatypes. Expressions, for example, func(cola)
, and virtual columns are not allowed.
txt_column
is allowed in the FILTER
BY
column list.
DML operations on FILTER
BY
columns are always transactional.
This is the structured indexed column on which a structured ORDER
BY
mixed query will be based. A list of structured oby_columns can be specified in the ORDER
BY
clause of a CONTAINS()
query.
These columns can only be of CHAR
, NUMBER
, DATE
, VARCHAR2
, or RAW
type. Additionally, VARCHAR2
and RAW
types are only supported if the maximum length is specified and is limited to no more than 249
. Expressions, for example, func(cola)
, and virtual columns are not allowed.
The order of the specified columns matters. The cost based optimizer will consider pushing the sort into the composite domain index only if the ORDER
BY
clause in the text query contains:
entire ordered ORDER
BY
columns declared by the ORDER
BY
clause during CREATE
INDEX
,
The cost based optimizer (CBO) will consider pushing the sort into the composite domain index (CDI) only if the ORDER
BY
clause in the text query contains:
entire ordered ORDER
BY
columns declared by the ORDER
BY
clause during CREATE
INDEX
,
score
followed by the prefix of the ordered ORDER
BY
columns declared by the ORDER
BY
clause during CREATE
INDEX
, or
score
following the prefix of the ordered ORDER
BY
columns declared by the ORDER
BY
clause during CREATE
INDEX
.
CBO will consider pushing the sort into the CDI only if the ORDER
BY
clause in the text query contains:
entire ordered ORDER
BY
columns declared by the ORDER
BY
clause during CREATE
INDEX
,
only the prefix of the ordered ORDER
BY
columns declared by the ORDER
BY clause during CREATE
INDEX
,
score followed by the prefix of the ordered ORDER
BY
columns declared by the ORDER
BY
clause during CREATE
INDEX
,
or score following the prefix of the ordered ORDER
BY
columns declared by the ORDER
BY
clause during CREATE
INDEX
.
The following example illustrates CBO behavior with regard to ORDER
BY
columns:
CREATE INDEX foox ON foo(D) INDEXTYPE IS CTXSYS.CONTEXT FILTER BY B, C ORDER BY A, B desc;
Consider the following query:
SELECT A, SCORE(1) FROM foo WHERE CONTAINS(D, 'oracle',1)>0 AND C>100 ORDER BY col_list;
Note:
If you setNLS_SORT
or NLS_COMP
parameters (that is, alter session set NLS_SORT = <some lang>;
), then CBO will not push the Sort or related structured predicate into the CDI. This behavior is consistent with regular btree indexes.CBO will consider pushing the sort into CDI if col_list
has the following values:
B B,A SCORE(1), B B, SCORE(1) A, B, C A, B asc
(or simply A
, B
)
CBO will NOT consider to push the sort into CDI if col_list has the following values:
B B,A SCORE(1), B B, SCORE(1) A, B, C A, B asc (or simply A, B)
(or simply A
, B
)
score
followed by the prefix of the ordered ORDER
BY
columns declared by the ORDER
BY
clause during CREATE
INDEX
,
or score
following the prefix of the ordered ORDER
BY
columns declared by the ORDER
BY
clause during CREATE
INDEX
.
Expressions, for example, func(cola)
, are not allowed.
txt_column
appearing in the ORDER
BY
column list is allowed.
DML operations on ORDER
BY
columns are always transactional.
Limitations
The following limitations apply to FILTER
BY
and ORDER
BY
:
A structured column is allowed in FILTER
BY
and ORDER
BY
clauses. However, a column that is mapped to MDATA
in a FILTER
BY
clause cannot also appear in the ORDER
BY
clause. An error will be raised in this case.
The maximum length for CHAR
, VARCHAR2
and RAW
columns cannot be greater than 249. Additionally, if the VARCHAR2
or RAW
column is mapped to an MDATA
column, then the specified maximum length cannot exceed 64 and 32 bytes, respectively. (Note that MDATA
does not support CHAR
datatypes. If a FILTER
BY
column of CHAR
datatype is mapped to an MDATA
section, then an error will be raised during CREATE
INDEX
.)
The sum of number of FILTER
BY
and ORDER
BY
columns cannot be greater than 32.
Note:
As with concatenated btree indexes or bitmap indexes, performance degradation may occur in DML as the number of FILTER
BY
and ORDER
BY
columns increases.
Mapping a FILTER
BY
column to MDATA
is not recommended if the FILTER
BY
column contains sequential values or has very high cardinality. Doing so can result in very long and narrow $I
table and reduced $X
performance. An example is a column of type DATE
. For columns of this type, mapping to SDATA
is recommended.
Note:
An index table with the nameDR$
indextable$S
is created to store FILTER
BY
and ORDER
BY
columns that are mapped to SDATA
sections. If nothing is mapped to an SDATA
section, then the $S
table will not be created.
$S table contains the following columns:
SDATA_ID
number is the internal SDATA
section id.
SDATA_LAST
number, the last docid, which is analogous to token_last
.
SDATA_DATA
RAW
(2000)
, the compressed SDATA
values. Note that if $S is created on a tablespace with 4K db block size, then it will be defined as RAW
(1500)
.
Restriction: For performance reasons, $S
table must be created on a tablespace with db block size >= 4K without overflow segment and without PCTTHRESHOLD
clause. If $S
is created on a tablespace with db block size < 4K, or is created with an overflow segment or with PCTTHRESHOLD
clause, then appropriate errors will be raised during CREATE
INDEX
.
Restriction on Exporting and Importing text tables with Composite Domain Index created with FILTER
BY
and/or ORDER
BY
clauses:
Regular exp
and imp
will not support exporting and importing of Composite Domain Index. Doing so will lead to the following error: EXP-00113: Feature Composite Domain Index is unsupported
.
To export a text table with Composite Domain Index, you must use Data Pump Export and Import utilities (invoked with the expdp
and impdp
commands, respectively), or DBMS_DATAPUMP
PL/SQL package.
Limitations of using ALTER INDEX and ALTER TABLE with FILTER BY and ORDER BY columns of the Composite Domain Index, which are imposed by Extensible Indexing Framework in Oracle Database:
(These limitations are imposed by Extensible Indexing Framework in Oracle Database.)
Using ALTER
INDEX
to add or drop FILTER
BY
and ORDER
BY
columns is currently not supported. You must re-create the index to add or drop FILTER
BY
or ORDER
BY
columns.
To use ALTER
TABLE
MODIFY
COLUMN
to modify the datatype of a column that has the composite domain index built on it, you must first drop the composite domain index before modifying the column.
To use ALTER
TABLE
DROP
COLUMN
to drop a column that is part of the composite domain index, you must first drop the composite domain index before dropping the index column.
The following limitations apply to FILTER
BY
and ORDER
BY
when used with PL/SQL packages:
Mapping FILTER
BY
columns to sections is optional. If section mapping does not exist for a FILTER
BY
column, then it is mapped to an SDATA
section by default. The section name assumes the name of the FILTER
BY
column.
If a section group is not specified during CREATE
INDEX
of a composite domain index, then system default section group settings are used. An SDATA
section is created for each of the FILTER
BY
and ORDER
BY
columns.
Note:
Since section name does not allow certain special characters and is case insensitive, if the column name is case sensitive or contains special characters, then an error will be raised. To work around this problem, you need to map the column to anMDATA
or SDATA
section before creating the index. See CTX_DDL.ADD_MDATA_COLUMN or CTX_DDL.ADD_SDATA_COLUMN.An error is raised if a column that is mapped to MDATA
also appears in the ORDER
BY
column clause.
Column section names are unique to their section group. That is, you cannot have an MDATA
column section named FOO
if you already have an MDATA
column section named FOO
. Nor can you have a field section named FOO
if you already have an SDATA
column section named FOO
. This is true whether it is implicitly created (by CREATE
INDEX
for FILTER
BY
or ORDER
BY
clauses) or explicitly created (by CTX_DDL.ADD_SDATA_COLUMN
).
One section name can only be mapped to one FILTER
BY
column, and vice versa. Mapping a section to more than one column, or mapping a column to more than one section is not allowed.
Column sections can be added to any type of section group, including the NULL section group.
If a section group with sections added by CTX_DDL.ADD_MDATA_COLUMN
or CTX_DDL.ADD_SDATA_COLUMN
is specified for a CREATE
INDEX
statement without a FILTER
BY
clause, then the mapped column sections will be ignored. However, the index will still get created without those column sections. The same is true for a FILTER
BY
clause that does not contain mapped column(s) in the specified section group.
See Also:
CTX_DDL.ADD_SDATA_COLUMNSpecify LOCAL
to create a local partitioned context index on a partitioned table. The partitioned table must be partitioned by range. Hash, composite and list partitions are not supported.
You can specify the list of index partition names with partition_name. If you do not specify a partition name, then the system assigns one. The order of the index partition list must correspond to the table partition by order.
The PARAMETERS
clause associated with each partition specifies the parameters string specific to that partition. You can only specify sync (manual|every |on commit), memory and storage for each index partition.
The PARAMETERS
clause also supports POPULATE
and NOPOPULATE
. See POPULATE | NOPOPULATE.
Query the views CTX_INDEX_PARTITIONS or CTX_USER_INDEX_PARTITIONS to find out index partition information, such as index partition name, and index partition status.
See Also:
"Creating a Local Partitioned Index"Query Performance Limitation with Partitioned Index
For optimal performance when querying a partitioned index with an ORDER
BY
SCORE
clause, query the partition. If you query the entire table and use an ORDER
BY
SCORE
clause, the query might not perform optimally unless you include a range predicate that can limit the query to the fewest number of partitions, which is optimally a single partition.
Optionally specify with n the parallel degree for parallel indexing. The actual degree of parallelism might be smaller depending on your resources.
You can use this parameter on non-partitioned tables. However, creating a non-partitioned index in parallel does not turn on parallel query processing.
Parallel indexing is supported for creating a local partitioned index.
See Also:
"Creating a Local Partitioned Index in Parallel"
"Performance Tuning" chapter in Oracle Text Application Developer's Guide
Performance
Parallel indexing can speed up indexing when you have large amounts of data to index and when your operating system supports multiple CPUs.
Note:
UsingPARALLEL
to create a local partitioned index enables parallel queries. (Creating a non-partitioned index in parallel does not turn on parallel query processing.)
Parallel querying degrades query throughput especially on heavily loaded systems. Because of this, Oracle recommends that you disable parallel querying after creating a local index. To do so, use ALTER
INDEX
NOPARALLEL
.
For more information on parallel querying, see the "Performance Tuning" chapter in Oracle Text Application Developer's Guide
Limitations
The following limitations apply to using PARALLEL
:
Parallel indexing is supported only for CONTEXT
index type.
Create an unusable index. This creates index metadata only and exits immediately.
You might create an unusable index when you need to create a local partitioned index in parallel.
Optionally specify indexing parameters in paramstring
. You can specify preferences owned by another user using the user.preference
notation.
The syntax for paramstring
is as follows:
paramstring =
'[DATASTORE datastore_pref] [FILTER filter_pref] [CHARSET COLUMN charset_column_name] [FORMAT COLUMN format_column_name] [LEXER lexer_pref] [LANGUAGE COLUMN language_column_name] [WORDLIST wordlist_pref] [STORAGE storage_pref] [STOPLIST stoplist] [SECTION GROUP section_group] [MEMORY memsize] [POPULATE | NOPOPULATE] [SYNC (MANUAL | EVERY "interval-string" | ON COMMIT)] [TRANSACTIONAL]'
Create datastore, filter, lexer, wordlist, and storage preferences with CTX_DDL.CREATE_PREFERENCE and then specify them in the paramstring.
Note:
When you specify no paramstring, Oracle Text uses the system defaults.For more information about these defaults, see "Default Index Parameters" in Chapter 2.
Specify the name of your datastore preference. Use the datastore preference to specify where your text is stored.See Datastore Types in Chapter 2, "Oracle Text Indexing Elements".
Specify the name of your filter preference. Use the filter preference to specify how to filter formatted documents to plain text or HTML. See Filter Types in Chapter 2, "Oracle Text Indexing Elements".
Specify the name of the character set column. This column must be in the same table as the text column, and it must be of type CHAR
, VARCHAR
, or VARCHAR2
. Use this column to specify the document character set for conversion to the database character set. The value is case insensitive. You must specify a globalization support character set string, such as JA16EUC.
When the document is plain text or HTML, the AUTO_FILTER
and CHARSET
filter use this column to convert the document character set to the database character set for indexing.
Use this column when you have plain text or HTML documents with different character sets or in a character set different from the database character set.
NLS_LENGTH_SEMANTICS = CHAR
is not supported at the database level. This parameter is supported for the following columns:
The CHARSET
COLUMN
, for example:
VARCHAR2 <size> CHAR CHAR <size> CHAR
Creating an index on a VARCHAR2
and CHAR
column
Using a VARCHAR2
and CHAR
column for FILTER
BY
and ORDER
BY
clause of CREATE
INDEX
FORMAT
COLUMN
Specify the name of the format column. The format column must be in the same table as the text column and it must be CHAR
, VARCHAR
, or VARCHAR2
type.
FORMAT COLUMN
determines how a document is filtered, or, in the case of the IGNORE
value, if it is to be indexed.
The AUTO_FILTER
uses the format column when filtering documents. Use this column with heterogeneous document sets to optionally bypass filtering for plain text or HTML documents.
In the format column, you can specify one of the following options:
TEXT
BINARY
IGNORE
TEXT
indicates that the document is either plain text or HTML. When TEXT
is specified, the document is not filtered, but may have the character set converted.
BINARY
indicates that the document is a format supported by the AUTO_FILTER
object other than plain text or HTML, for example PDF. BINARY
is the default, if the format column entry cannot be mapped.
IGNORE
indicates that the row is to be ignored during indexing. Use this value when you need to bypass rows that contain data incompatible with text indexing such as image data, or rows in languages that you do not want to process. The difference between documents with TEXT
and IGNORE
format column types is that the former are indexed but ignored by the filter, while the latter are not indexed at all. Thus, IGNORE
can be used with any filter type.
Note:
Documents are not marked for re-indexing when only the format column changes. The indexed column must be updated to flag the re-index.Specify the name of your lexer or multi-lexer preference. Use the lexer preference to identify the language of your text and how text is tokenized for indexing. See Lexer Types in Chapter 2, "Oracle Text Indexing Elements".
Specify the name of the language column when using a multi-lexer preference. See MULTI_LEXER in Chapter 2, "Oracle Text Indexing Elements".
This column must exist in the base table. It cannot be the same column as the indexed column. Only the first 30 bytes of the language column is examined for language identification.
Note:
Documents are not marked for re-indexing when only the language column changes. The indexed column must be updated to flag the re-index.Specify the name of your wordlist preference. Use the wordlist preference to enable features such as fuzzy, stemming, and prefix indexing for better wildcard searching. See Wordlist Type in Chapter 2, "Oracle Text Indexing Elements".
Specify the name of your storage preference for the Text index. Use the storage preference to specify how the index tables are stored. See Storage Types in Chapter 2, "Oracle Text Indexing Elements".
Specify the name of your stoplist. Use stoplist to identify words that are not to be indexed. See CTX_DDL.CREATE_STOPLIST in Chapter 7, "CTX_DDL Package".
Specify the name of your section group. Use section groups to create searchable sections in structured documents. See CTX_DDL.CREATE_SECTION_GROUP in Chapter 7, "CTX_DDL Package".
Specify the amount of run-time memory to use for indexing. The syntax for memsize is as follows:
memsize = number[K|M|G]
where K stands for kilobytes., M stands for megabytes, and G stands for gigabytes.
The value you specify for memsize
must be between 1M and the value of MAX_INDEX_MEMORY
in the CTX_PARAMETERS view. To specify a memory size larger than the MAX_INDEX_MEMORY
, you must reset this parameter with CTX_ADM.SET_PARAMETER to be larger than or equal to memsize
.
The default is the value specified for DEFAULT_INDEX_MEMORY
in CTX_PARAMETERS
.
The memsize parameter specifies the amount of memory Oracle Text uses for indexing before flushing the index to disk. Specifying a large amount memory improves indexing performance because there are fewer I/O operations and improves query performance and maintenance, because there is less fragmentation.
Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful when run-time memory is scarce.
Specify NOPOPULATE
to create an empty index. The default is POPULATE
.
Note:
POPULATE
| NOPOPULATE
is the only option whose default value cannot be set with CTX_ADM.SET_PARAMETER.
This option is not valid with CTXXPATH
indexes.
Empty indexes are populated by updates or inserts to the base table. You might create an empty index when you need to create your index incrementally or to selectively index documents in the base table. You might also create an empty index when you require only theme and Gist output from a document set.
Specify SYNC
for automatic synchronization of the CONTEXT
index when there are inserts, updates or deletes to the base table. You can specify one of the following SYNC
methods:
Table 1-5 SYNC Types
SYNC type | Description |
---|---|
|
No automatic synchronization. This is the default. You must manually synchronize the index with |
EVERY "interval-string" |
Automatically synchronize the index at a regular interval specified by the value of interval-string. interval-string takes the same syntax as that for scheduler jobs. Automatic synchronization using Make sure that interval-string is set to a long enough period that any previous sync jobs will have completed; otherwise, the sync job may hang. interval-string must be enclosed in double quotes, and any single quote within interval-string must be escaped with another single quote. See "Enabling Automatic Index Synchronization" for an example of automatic sync syntax. |
ON COMMIT |
Synchronize the index immediately after a commit. The commit does not return until the sync is complete. (Because the synchronization is performed as a separate transaction, there may be a period, usually small, when the data is committed but index changes are not.) The operation uses the memory specified with the memory parameter. Note that the sync operation has its own transaction context. If this operation fails, the data transaction still commits. Index synchronization errors are logged in the See "Enabling Automatic Index Synchronization" for an example of |
Each partition of a locally partitioned index can have its own type of sync (ON COMMIT
, EVERY
, or MANUAL
). The type of sync specified in master parameter strings applies to all index partitions unless a partition specifies its own type.
With automatic (EVERY
) synchronization, users can specify memory size and parallel synchronization. That syntax is:
... EVERY interval_string MEMORY mem_size PARALLEL paradegree ...
ON
COMMIT
synchronizations can only be run serially and must use the same memory size that was specified at index creation.
See Also:
Oracle Database Administrator's Guide for information on job schedulingSpecify that documents can be searched immediately after they are inserted or updated. If a text index is created with TRANSACTIONAL
enabled, then, in addition to processing the synchronized rowids already in the index, the CONTAINS
operator will process unsynchronized rowids as well. Oracle Text does in-memory indexing of unsynchronized rowids and processes the query against the in-memory index.
TRANSACTIONAL
is an index-level parameter and does not apply at the partition level.
You must still synchronize your text indexes from time to time (with CTX_DDL.SYNC_INDEX
) to bring pending rowids into the index. Query performance degrades as the number of unsynchronized rowids increases. For that reason, Oracle recommends setting up your index to use automatic synchronization with the EVERY
parameter. (See SYNC (MANUAL | EVERY "interval-string" | ON COMMIT).)
Transactional querying for indexes that have been created with the TRANSACTIONAL
parameter can be turned on and off (for the duration of a user session) with the PL/SQL variable CTX_QUERY.disable_transactional_query
. This is useful, for example, if you find that querying is slow due to the presence of too many pending rowids. Here is an example of setting this session variable:
exec ctx_query.disable_transactional_query := TRUE;
If the index uses AUTO_FILTER
, queries involving unsynchronized rowids will require filtering of unsynchronized documents.
CREATE INDEX: CONTEXT Index Examples
The following sections give examples of creating a CONTEXT
index.
Creating CONTEXT Index Using Default Preferences
The following example creates a CONTEXT
index called myindex
on the docs
column in mytable
. Default preferences are used.
CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context;
See Also:
For more information about default settings, see "Default Index Parameters" in Chapter 2.Also refer to Oracle Text Application Developer's Guide.
Creating CONTEXT Index with Custom Preferences
The following example creates a CONTEXT
index called myindex
on the docs
column in mytable
. The index is created with a custom lexer preference called my_lexer
and a custom stoplist called my_stop
.
This example also assumes that the preference and stoplist were previously created with CTX_DDL.CREATE_PREFERENCE for my_lexer
, and CTX_DDL.CREATE_STOPLIST for my_stop
. Default preferences are used for the unspecified preferences.
CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context PARAMETERS('LEXER my_lexer STOPLIST my_stop');
Any user can use any preference. To specify preferences that exist in another user's schema, add the user name to the preference name. The following example assumes that the preferences my_lexer
and my_stop
exist in the schema that belongs to user kenny
:
CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context PARAMETERS('LEXER kenny.my_lexer STOPLIST kenny.my_stop');
Enabling Automatic Index Synchronization
You can create your index and specify that the index be synchronized at regular intervals for inserts, updates and deletes to the base table. To do so, create the index with the SYNC
(
EVERY
"
interval-string")
parameter.
To use job scheduling, you must log in as a user who has DBA privileges and then grant CREATE JOB
privileges.
The following example creates an index and schedules three synchronization jobs for three index partitions. The first partition uses ON
COMMIT
synchronization. The other two partitions are synchronized by jobs that are scheduled to be executed every Monday at 3 P.M.
CONNECT system/manager GRANT CREATE JOB TO dr_test CREATE INDEX tdrmauto02x ON tdrmauto02(text) INDEXTYPE IS CTXSYS.CONTEXT local (PARTITION tdrm02x_i1 PARAMETERS(' MEMORY 20m SYNC(ON COMMIT)'), PARTITION tdrm02x_i2, PARTITION tdrm02x_i3) PARAMETERS(' SYNC (EVERY "NEXT_DAY(TRUNC(SYSDATE), ''MONDAY'') + 15/24") ');
See the Oracle Database Administrator's Guide for information on job scheduling syntax.
Creating CONTEXT Index with Multi-Lexer Preference
The multi-lexer decides which lexer to use for each row based on a language column. This is a character column in the table which stores the language of the document in the text column. For example, create the table globaldoc
to hold documents of different languages:
CREATE TABLE globaldoc ( doc_id NUMBER PRIMARY KEY, lang VARCHAR2(10), text CLOB );
Assume that global_lexer
is a multi-lexer preference you created. To index the global_doc
table, specify the multi-lexer preference and the name of the language column as follows:
CREATE INDEX globalx ON globaldoc(text) INDEXTYPE IS ctxsys.context PARAMETERS ('LEXER global_lexer LANGUAGE COLUMN lang');
See Also:
For more information about creating multi-lexer preferences, see "MULTI_LEXER" in Chapter 2.Creating a Local Partitioned Index
The following example creates a text table that is partitioned into three, populates it, and then creates a partitioned index.
PROMPT create partitioned table and populate it CREATE TABLE part_tab (a int, b varchar2(40)) PARTITION BY RANGE(a)
(partition p_tab1 values less than (10), partition p_tab2 values less than (20), partition p_tab3 values less than (30));
PROMPT create partitioned index CREATE INDEX part_idx on part_tab(b) INDEXTYPE IS CTXSYS.CONTEXT
LOCAL (partition p_idx1, partition p_idx2, partition p_idx3);
Note:
The limit for the number of partitions in Oracle Text is the same as the maximum number of partitions per table in Oracle Database.Using FILTER BY and ORDER BY Clauses
The following example creates an index on table docs and orders the documents by author's publishing date.
First, create the table:
CREATE TABLE docs ( docid NUMBER, pub_date DATE, author VARCHAR2(30), category VARCHAR2(30), document CLOB );
Create the index with FILTER
BY
and ORDER
BY
clauses:
CREATE INDEX doc_idx on docs(document) indextype is ctxsys.context FILTER BY category, author ORDER BY pub_date desc, docid PARAMETERS ('memory 500M');
Parallel indexing can improve index performance when you have multiple CPUs.
To create an index in parallel, use the PARALLEL
clause with a parallel degree. This example uses a parallel degree of 3:
CREATE INDEX myindex ON mytab(pk) INDEXTYPE IS ctxsys.context PARALLEL 3;
Creating a Local Partitioned Index in Parallel
Creating a local partitioned index in parallel can improve performance when you have multiple CPUs. With partitioned tables, you can divide the work. You can create a local partitioned index in parallel in two ways:
Use the PARALLEL
clause with the LOCAL
clause in CREATE INDEX.
In this case, the maximum parallel degree is limited to the number of partitions you have. See "Parallelism with CREATE INDEX".
Create an unusable index first, then run the DBMS_PCLXUTIL.BUILD_PART_INDEX
utility. This method can result in a higher degree of parallelism, especially if you have more CPUs than partitions. See "Parallelism with DBMS_PCLUTIL.BUILD_PART_INDEX".
If you attempt to create a local partitioned index in parallel, and the attempt fails, you may see the following error message:
ORA-29953: error in the execution of the ODCIIndexCreate routine for one or more of the index partitions
To determine the specific reason why the index creation failed, query the CTX_USER_INDEX_ERRORS view.
Parallelism with CREATE INDEX
You can achieve local index parallelism by using the PARALLEL
and LOCAL
clauses in CREATE INDEX.
In this case, the maximum parallel degree is limited to the number of partitions that you have.
The following example creates a table with three partitions, populates them, and then creates the local indexes in parallel with a degree of 2:
create table part_tab3(id number primary key, text varchar2(100)) partition by range(id) (partition p1 values less than (1000), partition p2 values less than (2000), partition p3 values less than (3000)); begin for i in 0..2999 loop insert into part_tab3 values (i,'oracle'); end loop; end; / create index part_tab3x on part_tab3(text) indextype is ctxsys.context local (partition part_tabx1, partition part_tabx2, partition part_tabx3) parallel 2;
Parallelism with DBMS_PCLUTIL.BUILD_PART_INDEX
You can achieve local index parallelism by first creating an unusable CONTEXT
index, and then running the DBMS_PCLUTIL.BUILD_PART_INDEX
utility. This method can result in a higher degree of parallelism, especially when you have more CPUs than partitions.
In this example, the base table has three partitions. We create a local partitioned unusable index first, then run DBMS_PCLUTIL.BUILD_PART_INDEX
, which builds the 3 partitions in parallel (referred to as inter-partition parallelism). Also, inside each partition, index creation proceeds in parallel (called intra-partition parallelism) with a parallel degree of 2. Therefore, the total parallel degree is 6 (3 times 2).
create table part_tab3(id number primary key, text varchar2(100)) partition by range(id) (partition p1 values less than (1000), partition p2 values less than (2000), partition p3 values less than (3000)); begin for i in 0..2999 loop insert into part_tab3 values (i,'oracle'); end loop; end; / create index part_tab3x on part_tab3(text) indextype is ctxsys.context local (partition part_tabx1, partition part_tabx2, partition part_tabx3) unusable; exec dbms_pclxutil.build_part_index(jobs_per_batch=>3, procs_per_job=>2, tab_name=>'PART_TAB3', idx_name=>'PART_TAB3X', force_opt=>TRUE);
After a CREATE
INDEX
or ALTER
INDEX
operation, you can view index errors with Oracle Text views. To view errors on your indexes, query the CTX_USER_INDEX_ERRORS view. To view errors on all indexes as CTXSYS
, query the CTX_INDEX_ERRORS view.
For example, to view the most recent errors on your indexes, enter the following statements:
SELECT err_timestamp, err_text FROM ctx_user_index_errors ORDER BY err_timestamp DESC;
To clear the index error view, enter the following statement:
DELETE FROM ctx_user_index_errors;
The CTXCAT
index is a combined index on a text column and one or more other columns. Query this index with the CATSEARCH
operator in the WHERE
clause of a SELECT
statement. This type of index is optimized for mixed queries. This index is transactional, automatically updating itself with DML to the base table.
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.ctxcat
[PARAMETERS
('[index set index_set] [lexer lexer_pref] [storage storage_pref] [stoplist stoplist] [section group sectiongroup_pref [wordlist wordlist_pref] [memory memsize]');
Specify the name of the table and column to index.
The column that you specify when you create a CTXCAT
index must be of type CHAR
or VARCHAR2
. No other types are supported for CTXCAT
.
Attempting to create an index on a Virtual Private Database (VPD) protected table will fail unless one of the following options is true:
The VPD policy is created such that it does not apply to INDEX
statement type, which is the default
The policy function returns a null predicate for the current user.
The user (index owner) is SYS
.
The user has the EXEMPT
ACCESS
POLICY
privilege.
Specify the index set preference to create the CTXCAT
index. Index set preferences name the columns that make up your sub-indexes. Any column that is named in an index set column list cannot have a NULL
value in any row of the base table, or else you get an error.
Always ensure that your columns have non-NULL values before and after indexing.
Index Performance and Size Considerations
Although a CTXCAT
index offers query performance benefits, creating this type of index has its costs. The time that it takes Oracle Text to create a CTXCAT
index depends on the total size of the index.
The total size of a CTXCAT
index is directly related to:
Total text to be indexed
Number of component indexes in the index set
Number of columns in the base table that make up the component indexes
Having many component indexes in your index set also degrades DML performance because more indexes must be updated.
Because of these added costs in creating a CTXCAT
index, you should carefully consider the query performance benefit that each component index gives your application before adding it to your index set.
See Also:
Oracle Text Application Developer's Guide for more information about creatingCTXCAT
indexes and the benefitsWhen you create an index of type CTXCAT
, you can use the supported index preferences listed in Table 1-6 in the parameters
string.
Table 1-6 Supported CTXCAT Index Preferences
Preference Class | Supported Types |
---|---|
Datastore |
This preference class is not supported for |
Filter |
This preference class is not supported for |
Lexer |
BASIC_LEXER ( |
Wordlist |
|
Storage |
|
Stoplist |
Supports single language stoplists only ( |
Section Group |
Only Field Section is supported for |
Unsupported Preferences and Parameters
When you create a CTXCAT
index, you cannot specify datastore and filter preferences. For section group preferences, only the field section preference is supported. You also cannot specify language, format, or charset columns as with a CONTEXT
index.
Creating a CTXCAT Index
This section gives a brief example for creating a CTXCAT
index. For a more complete example, see the Oracle Text Application Developer's Guide.
Consider a table called AUCTION
with the following schema:
create table auction(
item_id number, title varchar2(100), category_id number, price number, bid_close date);
Assume that queries on the table involve a mandatory text query clause and optional structured conditions on price
. Results must be sorted based on bid_close
. This means that we need an index to support good response time for the structured and sorting criteria.
You can create a catalog index to support the different types of structured queries a user might enter. For structured queries, a CTXCAT
index improves query performance over a context index.
To create the indexes, first, create the index set preference, next, optionally, add the storage preference, and, finally, add the required indexes to it:
begin
ctx_ddl.create_index_set('auction_iset'); ctx_ddl.add_index('auction_iset','bid_close'); ctx_ddl.add_index('auction_iset','price, bid_close');
end;
Optionally, create the storage preference:
begin ctx_ddl.create_preference('auction_st_pref', 'BASIC_STORAGE'); ctx_ddl.set_attribute('auction_st_pref', 'I_TABLE_CLAUSE', 'tablespace TEXT storage (initial 5M)'); ctx_ddl.set_attribute('auction_st_pref', 'I_ROWID_INDEX_CLAUSE', 'tablespace TEXT storage (initial 5M)'); ctx_ddl.set_attribute('auction_st_pref', 'I_INDEX_CLAUSE', 'tablespace TEXT storage (initial 5M) compress 2'); end; /
Then, create the CTXCAT
index with CREATE
INDEX
as follows:
create index auction_titlex on AUCTION(title) indextype is CTXSYS.CTXCAT parameters ('index set auction_iset storage auction_st_pref');
Querying a CTXCAT Index
To query the title column for the word pokemon, enter regular and mixed queries as follows:
select * from AUCTION where CATSEARCH(title, 'pokemon',NULL)> 0; select * from AUCTION where CATSEARCH(title, 'pokemon', 'price < 50 order by bid_close desc')> 0;
CTXRULE
is an index on a column containing a set of queries. Query this index with the MATCHES
operator in the WHERE
clause of a SELECT
statement.
CREATE INDEX [schema.]index on [schema.]table(rule_col) INDEXTYPE IS
ctxsys.ctxrule [PARAMETERS ('[lexer lexer_pref] [storage storage_pref]
[section group section_pref] [wordlist wordlist_pref] [classifier classifier_pref]');
[PARALLEL n];
Specify the name of the table and rule column to index. The rules can be query compatible strings, query template strings, or binary support vector machine rules.
The column you specify when you create a CTXRULE
index must be VARCHAR2
, CLOB or BLOB
. No other types are supported for CTXRULE
.
Attempting to create an index on a Virtual Private Database (VPD) protected table will fail unless one of the following is true:
The VPD policy does not have the INDEX
statement type turned on (which is the default)
The policy function returns a null predicate for the current user.
The user (index owner) is SYS
.
The user has the EXEMPT
ACCESS
POLICY
privilege.
Specify the lexer preference to be used for processing queries and later for the documents to be classified with the MATCHES
function.
With both classifiers SVN_CLASSFIER
and RULE_CLASSIFIER
, you can use the BASIC_LEXER
, CHINESE_LEXER
, JAPANESE_LEXER
, or KOREAN_MORPH_LEXER
lexer. (See "Classifier Types" and "Lexer Types".)
For processing queries, these lexers support the following operators: ABOUT
, STEM
, AND
, NEAR
, NOT
, OR
, and WITHIN
.
The thesaural operators (BT*
, NT*
, PT
, RT
, SYN
, TR
, TRSYS
, TT,
and so on) are supported. However, these operators are expanded using a snapshot of the thesaurus at index time, not when the MATCHES
function is entered. This means that if you change your thesaurus after you index, you must re-index your query set.
Specify the storage preference for the index on the queries.Use the storage preference to specify how the index tables are stored. See "Storage Types" in Chapter 2, "Oracle Text Indexing Elements".
Specify the section group. This parameter does not affect the queries. It applies to sections in the documents to be classified. The following section groups are supported for the CTXRULE
index type:
See "Section Group Types" in Chapter 2, "Oracle Text Indexing Elements".
CTXRULE
does not support special sections.
Specify the wordlist preferences. This is used to enable stemming operations on query terms. See Wordlist Type in Chapter 2, "Oracle Text Indexing Elements".
Specify the classifier preference. See "Classifier Types" in Chapter 2, "Oracle Text Indexing Elements". You must use the same preference name you specify with CTX_CLS.TRAIN
.
Example for Creating a CTXRULE Index
See the Oracle Text Application Developer's Guide for a complete example of using the CTXRULE
index type in a document routing application.
Syntax for CTXXPATH Index Type
This indextype if provided only for backward compatibility. Create a CTXXPATH
index when you need to speed up existsNode()
queries on an XMLType
column.
CREATE INDEX [schema.]index on [schema.]table(XMLType column) INDEXTYPE IS ctxsys.CTXXPATH [PARAMETERS ('[storage storage_pref] [memory memsize]')];
Specify the name of the table and column to index.
The column you specify when you create a CTXXPATH
index must be XMLType
. No other types are supported for CTXXPATH
.
Specify the storage preference for the index on the queries.Use the storage preference to specify how the index tables are stored. See "Storage Types" in Chapter 2, "Oracle Text Indexing Elements".
Specify the amount of run-time memory to use for indexing. The syntax for memsize
is as follows:
memsize = number[M|G|K]
where M stands for megabytes, G stands for gigabytes, and K stands for kilobytes.
The value you specify for memsize
must be between 1M and the value of MAX_INDEX_MEMORY
in the CTX_PARAMETERS view. To specify a memory size larger than the MAX_INDEX_MEMORY
, you must reset this parameter with CTX_ADM.SET_PARAMETER to be larger than or equal to memsize
.
The default is the value specified for DEFAULT_INDEX_MEMORY
in CTX_PARAMETERS
.
Index creation on an XMLType
column:
CREATE INDEX xml_index ON xml_tab(col_xml) indextype is ctxsys.CTXXPATH; or CREATE INDEX xml_index ON xml_tab(col_xml) indextype is ctxsys.CTXXPATH PARAMETERS('storage my_storage memory 40M');
Querying the table with existsNode
:
select xml_id from xml_tab x where x.col_xml.existsnode('/book/chapter[@title="XML"]') > 0;
Related Topics
CTX_DDL.CREATE_PREFERENCE in Chapter 7, "CTX_DDL Package".
CTX_DDL.CREATE_STOPLIST in Chapter 7, "CTX_DDL Package".
CTX_DDL.CREATE_SECTION_GROUP in Chapter 7, "CTX_DDL Package".
Note:
This section describes theDROP
INDEX
statement as it pertains to dropping a Text domain index.
For a complete description of the DROP
INDEX
statement, see Oracle Database SQL Language Reference.
Purpose
Use DROP
INDEX
to drop a specified Text index.
Syntax
DROP INDEX [schema.]index [force];
Optionally force the index to be dropped. Use force option when Oracle Text cannot determine the state of the index, such as when an indexing operation crashes.
Oracle recommends against using this option by default. Use it as a last resort when a regular call to DROP
INDEX
fails.
Example
The following example drops an index named doc_index
in the current user's database schema.
DROP INDEX doc_index;
Related Topics
Use the MATCHES
operator to find all rows in a query table that match a given document. The document must be a plain text, HTML, or XML document.
The MATCHES
operator also supports database links. You can identify a remote table or materialized view by appending @dblink
to the end of its name. The dblink
must be a complete or partial name for a database link to the database containing the remote table or materialized view. (Querying of remote views is not supported.)
This operator requires a CTXRULE
index on your set of queries.
When the SVM_CLASSIFIER classifier type is used, MATCHES
returns a score in the range 0 to 100; a higher number indicates a greater confidence in the match. Use the label
parameter and MATCH_SCORE
to obtain this number. Then use the matching score to apply a category-specific threshold to a particular category.
If SVM_CLASSIFIER
is not used, then this operator returns either 100 (the document matches the criteria) or 0 (the document does not match).
Limitation
If the optimizer chooses to use the functional query invocation with a MATCHES
query, your query will fail.
Syntax
MATCHES(
[schema.]column, document VARCHAR2 or CLOB [,label INTEGER])
RETURN NUMBER;
Specify the column containing the indexed query set.
Specify the document to be classified. The document can be plain-text, HTML, or XML. Binary formats are not supported.
Optionally specify the label that identifies the score generated by the MATCHES
operator. Use this label with MATCH_SCORE.
Matches Example
The following example creates a table querytable
, and populates it with classification names and associated rules. It then creates a CTXRULE
index.
The example enters the MATCHES
query with a document string to be classified. The SELECT
statement returns all rows (queries) that are satisfied by the document:
create table querytable (classification varchar2(64), text varchar2(4000)); insert into querytable values ('common names', 'smith OR jones OR brown'); insert into querytable values ('countries', 'United States OR Great Britain OR France'); insert into querytable values ('Oracle DB', 'oracle NEAR database'); create index query_rule on querytable(text) indextype is ctxsys.ctxrule; SELECT classification FROM querytable WHERE MATCHES(text, 'Smith is a common name in the United States') > 0; CLASSIFICATION ---------------------------------------------------------------- common names countries
Related Topics
CTX_CLS.TRAIN
The Oracle Text Application Developer's Guide contains extended examples of simple and supervised classification, which make use of the MATCHES
operator.
Use the MATCH_SCORE
operator in a statement to return scores produced by a MATCHES
query.
The MATCH_SCORE
operator also supports database links. You can identify a remote table or materialized view by appending @dblink
to the end of its name. The dblink
must be a complete or partial name for a database link to the database containing the remote table or materialized view. (Querying of remote views is not supported.)
When the SVM_CLASSIFIER
classifier type is used, this operator returns a score in the range 0 to 100. Then use the matching score to apply a category-specific threshold to a particular category.
If SVM_CLASSIFIER
is not used, then this operator returns either 100 (the document matches the criteria) or 0 (the document does not match).
Syntax
MATCH_SCORE(label NUMBER)
Specify a number to identify the score produced by the query. Use this number to identify the MATCHES
clause that returns this score.
Example
To get the matching score, use
select cat_id, match_score(1) from training_result where matches(profile, text,1)>0;
Related Topics
Use the SCORE
operator in a SELECT
statement to return the score values produced by a CONTAINS query. The SCORE
operator can be used in a SELECT
, ORDER
BY
, or GROUP
BY
clause.
The SCORE
operator also supports database links. You can identify a remote table or materialized view by appending @dblink
to the end of its name. The dblink
must be a complete or partial name for a database link to the database containing the remote table or materialized view. (Querying of remote views is not supported.)
Syntax
SCORE(label NUMBER)
Specify a number to identify the score produced by the query. Use this number to identify the CONTAINS
clause which returns this score.
Example
Single CONTAINS
When the SCORE
operator is called (for example, in a SELECT
clause), the CONTAINS
clause must reference the score label value as in the following example:
SELECT SCORE(1), title from newsindex WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;
Multiple CONTAINS
Assume that a news database stores and indexes the title and body of news articles separately. The following query returns all the documents that include the words Oracle in their title and java in their body. The articles are sorted by the scores for the first CONTAINS
(Oracle) and then by the scores for the second CONTAINS
(java).
SELECT title, body, SCORE(10), SCORE(20)
FROM news WHERE CONTAINS (news.title, 'Oracle', 10) > 0 OR
CONTAINS (news.body, 'java', 20) > 0 ORDER BY SCORE(10), SCORE(20);
Related Topics