12 Character Set Scanner Utilities

This chapter introduces the Language and Character Set File Scanner and the Database Character Set Scanner. The character set scanner utilities are tools for detecting and verifying valid and invalid data. The Language and Character Set File Scanner supports text files, while the Database Character Set Scanner scans data inside the database.

This chapter contains these topics:

The Language and Character Set File Scanner
The Database Character Set Scanner
Scan Modes in the Database Character Set Scanner
Installing and Starting the Database Character Set Scanner
Database Character Set Scanner Parameters
Database Character Set Scanner Sessions: Examples
Database Character Set Scanner Reports
How to Handle Convertible or Lossy Data in the Data Dictionary
Storage and Performance Considerations in the Database Character Set Scanner
Database Character Set Scanner CSALTER Script
Database Character Set Scanner Views
Database Character Set Scanner Error Messages

The Language and Character Set File Scanner

The Language and Character Set File Scanner (LCSSCAN) is a high-performance, statistically based utility for determining the language and character set for unknown file text. It can automatically identify a wide variety of language and character set pairs. With each text, the language and character set detection engine sets up a series of probabilities, each probability corresponding to a language and character set pair. The most statistically probable pair identifies the dominant language and character set.

The purity of the text affects the accuracy of the language and character set detection. The ideal case is literary text of one single language with no spelling or grammatical errors. These types of text may require 100 characters of data or more and can return results with a very high factor of confidence. On the other hand, some technical documents can require longer segments before they are recognized. Documents that contain a mix of languages or character sets or text such as addresses, phone numbers, or programming language code may yield poor results. For example, if a document has both French and German embedded, then the accuracy of guessing either language successfully is statistically reduced. Both plain text and HTML files are accepted. If the format is known, you should set the FORMAT parameter to improve accuracy.

This section includes the following topics:

Syntax of the LCSSCAN Command
Examples: Using the LCSSCAN Command
Getting Command-Line Help for the Language and Character Set File Scanner
Supported Languages and Character Sets
LCSSCAN Error Messages

Syntax of the LCSSCAN Command

Start the Language and Character Set File Scanner with the LCSSCAN command. Its syntax is as follows:

LCSSCAN  [RESULTS=number] [FORMAT=file_type] [BEGIN=number] [END=number] FILE=file_name

The parameters are described in the rest of this section.

RESULTS

The RESULTS parameter is optional.

Property	Description
Default value	`1`
Minimum value	`1`
Maximum value	`3`
Purpose	The number of language and character set pairs that are returned. They are listed in order of probability. The comparative weight of the first choice cannot be quantified. The recommended value for this parameter is the default value of `1`.

FORMAT

The FORMAT paramater is optional.

Property	Description
Default Value	`text`
Purpose	This parameter identifies the type of file to be scanned. The possible values are `html`, `text`, and `auto`.

BEGIN

The BEGIN parameter is optional.

Property	Description
Default value	`1`
Minimum value	`1`
Maximum value	Number of bytes in file
Purpose	The byte of the input file where `LCSSCAN` begins the scanning process. The default value is the first byte of the input file.

END

The END parameter is optional.

Property	Description
Default value	End of file
Minimum value	`3`
Maximum value	Number of bytes in file
Purpose	The last byte of the input file that `LCSSCAN` scans. The default value is the last byte of the input file.

FILE

The FILE parameter is required.

Property	Description
Default value	None
Purpose	Specifies the name of a text file to be scanned

Examples: Using the LCSSCAN Command

Example 12-1 Specifying Only the File Name in the LCSSCAN Command

LCSSCAN FILE=example.txt

In this example, the entire example.txt file is scanned because the BEGIN and END parameters have not been specified. One language and character set pair will be returned because the RESULTS parameter has not been specified.

Example 12-2 Specifying the Format as HTML

LCSSCAN FILE=example.html FORMAT=html

In this example, the entire example.html file is scanned because the BEGIN and END parameters have not been specified. The scan will strip HTML tags before the scan, thus results are more accurate. One language and character set pair will be returned because the RESULTS parameter has not been specified.

Example 12-3 Specifying the RESULTS and BEGIN Parameters for LCSSCAN

LCSSCAN RESULTS=2 BEGIN=50 FILE=example.txt

The scanning process starts at the 50th byte of the file and continues to the end of the file. Two language and character set pairs will be returned.

Example 12-4 Specifying the RESULTS and END Parameters for LCSSCAN

LCSSCAN RESULTS=3 END=100 FILE=example.txt

The scanning process starts at the beginning of the file and ends at the 100th byte of the file. Three language and character set pairs will be returned.

Example 12-5 Specifying the BEGIN and END Parameters for LCSSCAN

LCSSCAN BEGIN=50 END=100 FILE=example.txt

The scanning process starts at the 50th byte and ends at the 100th byte of the file. One language and character set pair will be returned because the RESULTS parameter has not been specified.

Getting Command-Line Help for the Language and Character Set File Scanner

To obtain a summary of the Language and Character Set File Scanner parameters, enter the following command:

LCSSCAN HELP=y

The resulting output shows a summary of the Language and Character Set Scanner parameters.

Supported Languages and Character Sets

The Language and Character Set File Scanner supports several character sets for each language.

When the binary values for a language match two or more encodings that have a subset/superset relationship, the subset character set is returned. For example, if the language is German and all characters are 7-bit, then US7ASCII is returned instead of WE8MSWIN1252, WE8ISO8859P15, or WE8ISO8859P1.

When the character set is determined to be UTF-8, the Oracle character set UTF8 is returned by default unless 4-byte characters (supplementary characters) are detected within the text. If 4-byte characters are detected, then the character set is reported as AL32UTF8.

See Also:

"Language and Character Set Detection Support" for a list of supported languages and character sets

LCSSCAN Error Messages

LCD-00001 An unknown error occured.: Cause: An error occurred accessing an internal structure.; Action: Report this error to Oracle Support.

LCD-00002 NLS data could not be loaded.: Cause: An error occurred accessing $ORACLE_HOME/nls/data.; Action: Check to make sure $ORACLE_HOME/nls/data exists and is accessible. If not found check $ORA_NLS10 directory.

LCD-00003 An error occurred while reading the profile file.: Cause: An error occurred accessing $ORACLE_HOME/nls/data.; Action: Check to make sure $ORACLE_HOME/nls/data exists and is accessible. If not found check $ORA_NLS10 directory.

LCD-00004 The beginning or ending offset has been set incorrectly.: Cause: The beginning and ending offsets must be an integer greater than 0.; Action: Change the offset to a positive number.

LCD-00005 The ending offset has been set incorrectly.: Cause: The ending offset must be greater than the beginning offset.; Action: Change the ending offset to be greater than the beginning offset.

LCD-00006 An error occurred when opening the input file.: Cause: The file was not found or could not be opened.; Action: Check the name of the file specified. Make sure the full file name is specified and that the file is not in use.

LCD-00007 The beginning offset has been set incorrectly.: Cause: The beginning offset must be less than the number of bytes in the file.; Action: Check the size of the file and specify a smaller beginning offset.

LCD-00008 No result was returned.: Cause: Not enough text was inputted to produce a result.; Action: A larger sample of text needs to be inputted to produce a reliable result.

The Database Character Set Scanner

The Database Character Set Scanner assesses the feasibility of migrating an Oracle database to a new database character set. The Database Character Set Scanner checks all character data in the database and tests for the effects and problems of changing the character set encoding. A summary report is generated at the end of the scan that shows the scope of work required to convert the database to a new character set.

Based on the information in the summary report, you can decide on the most appropriate method to migrate the database's character set. The methods are:

Export and Import utilities
CSALTER script
CSALTER script with Export and Import utilities on selected tables

Note:
If the Database Character Set Scanner shows conversion exceptions, then these problems must be fixed before using any of the described methods. This may involve further data analysis and modifying the problem data to eliminate those exceptions. In extreme cases, both the database and the application might need to be modified. Oracle recommends you contact Oracle Consulting Services for services in database character set migration.

See Also:
"Changing the Database Character Set of an Existing Database"

Conversion Tests on Character Data

The Database Character Set Scanner reads the character data and tests for the following conditions on each data cell:

Do character code points of the data cells change when converted to the new character set?
Can the data cells be successfully converted to the new character set?
Will the post-conversion data fit into the current column size?

The Database Character Set Scanner reads and tests for data in CHAR, VARCHAR2, LONG, CLOB, NCHAR, NVARCHAR2, NCLOB and VARRAY columns as well as nested tables. The Database Character Set Scanner does not perform post-conversion column size testing for LONG, CLOB, and NCLOB columns.

Scan Modes in the Database Character Set Scanner

The Database Character Set Scanner provides four modes of database scan:

Full Database Scan
User Scan
Table Scan
Column Scan

Full Database Scan

The Database Character Set Scanner reads and verifies the character data of all tables belonging to all users in the database including the data dictionary (such as SYS and SYSTEM users), and it reports on the effects of the simulated migration to the new database character set. It scans all schema objects including stored packages, procedures and functions, and object definitions stored as part of the data dictionary.

To understand the feasibility of migrating your database to a new database character set, you need to perform a full database scan.

User Scan

The Database Character Set Scanner reads and verifies character data of all tables belonging to the specified user and reports on the effects on the tables of changing the character set.

Table Scan

The Database Character Set Scanner reads and verifies the character data of the specified tables, and reports the effects on the tables of changing the character set.

Column Scan

The Database Character Set Scanner reads and verifies the character data of the specified columns, and reports the effects on the tables of changing the character set.

Installing and Starting the Database Character Set Scanner

This section describes how to install and start the Database Character Set Scanner. It includes the following topics:

Access Privileges for the Database Character Set Scanner
Installing the Database Character Set Scanner System Tables
Starting the Database Character Set Scanner
Creating the Database Character Set Scanner Parameter File
Getting Command-Line Help for the Database Character Set Scanner

Access Privileges for the Database Character Set Scanner

To use the Database Character Set Scanner, you must have DBA privileges for Oracle Database.

Installing the Database Character Set Scanner System Tables

Before using the Database Character Set Scanner, you must run the csminst.sql script to set up the necessary system tables on the database that you plan to scan. The csminst.sql script needs to be run only once. The script performs the following tasks to prepare the database for scanning:

Creates a user named CSMIG
Assigns the necessary privileges to CSMIG
Assigns the default tablespace to CSMIG
Creates the Character Set Scanner system tables under CSMIG

You can modify the default tablespace for CSMIG by editing the csminst.sql script. Modify the following statement in csminst.sql to assign the preferred tablespace to CSMIG as follows:

ALTER USER csmig DEFAULT TABLESPACE tablespace_name;

Ensure that there is sufficient storage space available in the assigned tablespace before scanning the database. The amount of space required depends on the type of scan and the nature of the data in the database.

On UNIX platforms, run csminst.sql using these commands and SQL statement:

% cd $ORACLE_HOME/rdbms/admin 
% sqlplus sys/password as sysdba
SQL> START csminst.sql

Starting the Database Character Set Scanner

You can start the Database Character Set Scanner from the command line by one of these methods:

Using the Database Character Set Scanner parameter file

CSSCAN username/password PARFILE=file_name

Using the command line to specify parameter values. For example:

CSSCAN username/password FULL=y TOCHAR=al32utf8 ARRAY=10240 PROCESS=3

Using an interactive session

CSSCAN username/password

In an interactive session, the Database Character Set Scanner prompts you for the values of the following parameters:

FULL/TABLE/USER
    TOCHAR
    ARRAY
    PROCESS

If you want to specify other parameters, then use the Database Character Set Parameter file or specify the parameter values in the CSSCAN command.

Creating the Database Character Set Scanner Parameter File

The Database Character Set Scanner parameter file enables you to specify Database Character Set Scanner parameters in a file where they can be easily modified or reused. Create a parameter file using a text editor.

Use one of the following formats to specify parameters in the Database Character Set Scanner parameter file:

parameter_name=value
parameter_name=(value1, value2, ...)

You can add comments to the parameter file by preceding them with the pound sign (#). All characters to the right of the pound sign are ignored.

The following is an example of a parameter file:

USERID=system/manager
USER=HR # scan HR tables
TOCHAR=al32utf8
ARRAY=4096000
PROCESS=2 # use two concurrent scan processes
FEEDBACK=1000

Getting Command-Line Help for the Database Character Set Scanner

The Database Character Set Scanner provides command-line help. Enter the following command:

CSSCAN HELP=Y

The resulting output shows a summary of the Database Character Set Scanner parameters.

Database Character Set Scanner Parameters

The following table shows a summary of parameters for the Database Character Set Scanner. The rest of this section contains detailed descriptions of the parameters.

Parameter	Default	Prompt	Description
USERID	-	yes	Username/password
FULL	`N`	yes	Scan entire database
USER	`-`	yes	Owner of the tables to be scanned
TABLE	-	yes	List of tables to scan
EXCLUDE	-	no	List of tables to exclude
TOCHAR	-	yes	New database character set name
FROMCHAR	-	no	Current database character set name
TONCHAR	-	no	New national character set name
FROMNCHAR	-	no	Current national character set name
ARRAY	`1024000`	yes	Size of array fetch buffer
PROCESS	`1`	yes	Number of concurrent scan processes
MAXBLOCKS	-	no	The maximum number of blocks that can be in a table without the table being split
CAPTURE	`N`	no	Capture convertible data
COLUMN	-	no	List of columns to scan
QUERY	-	no	Query to apply to restrict output before scan
SUPPRESS	-	no	Maximum number of exceptions logged for each table
FEEDBACK	-	no	Report progress every n rows
BOUNDARIES	-	no	List of column size boundaries for summary report
LASTRPT	`N`	no	Generate report of the previous database scan
LOG	`scan`	no	Base file name for report files
PARFILE	-	no	Parameter file name
PRESERVE	`N`	no	Preserve existing scan results
LCSD	`N`	no	Enable language and character set detection
LCSDDATA	`LOSSY`	no	Define the scope of the language and character set detection
HELP	`N`	no	Show help screen

ARRAY

Property	Description
Default value	`1024000`
Minimum value	`4096`
Maximum value	Unlimited
Purpose	Specifies the size in bytes of the array buffer used to fetch data. The size of the array buffer determines the number of rows fetched by the Database Character Set Scanner at any one time.

The following formula estimates the number of rows fetched at one time for a given table:

rows fetched = 
(ARRAY value) / [(sum of all the CHAR and VARCHAR2 column sizes) + 
(number of CLOB columns * 4000) + (number of VARRAY columns * 4000)]

For example, suppose table A contains two CHAR columns (5 bytes and 10 bytes), two VARCHAR2 columns (100 bytes and 200 bytes), and one CLOB column. If ARRAY=1024000 (the default), then the number of rows fetched is calculated as follows:

1024000/[5 + 10 + 100 + 200 + (1*4000) + (0*4000)] = 237.3

The Database Character Set Scanner can fetch 23 rows of data at one time from table A.

If the sum in the denominator exceeds the value of the ARRAY parameter, then the Database Character Set Scanner fetches only one row at a time. Tables with LONG columns are fetched only one row at a time.

This parameter affects the duration of a database scan. In general, the larger the size of the array buffer, the shorter the duration time. Each scan process allocates the specified size of array buffer.

BOUNDARIES

Property	Description
Default value	None
Purpose	Specifies the list of column boundary sizes that are used for an application data conversion summary report. This parameter is used to locate the distribution of the application data for the `CHAR`, `VARCHAR2`, `NCHAR`, and `NVARCHAR2` data types.

For example, if you specify a BOUNDARIES value of (10, 100, 1000), then the application data conversion summary report produces a breakdown of the CHAR data into the following groups by their column length, CHAR(1..10), CHAR(11..100) and CHAR(101..1000). The behavior is the same for the VARCHAR2, NCHAR, and NVARCHAR2 data types.

CAPTU RE

Property	Description
Default value	`N`
Range of values	`Y` or `N`
Purpose	Indicates whether to capture the information on the individual convertible rows, as well as the default of storing only the exception rows. Information regarding the convertible rows is written to the `CSM$ERRORS` table if the `CAPTURE` parameter is set to `Y`. It records the data that needs to be converted during the conversion to the target character set. When `CAPTURE` is set to `Y`, the data dictionary `CONVERTIBLE` data cells are also listed in the database scan individual exception report `scan.err`. With `CAPTURE` set to `Y`, the amount of time required to complete the scan can increase and more storage space may be required.

COLU MN

Property	Description
Default value	None
Purpose	Specifies the names of the columns to be scanned

When this parameter is specified, the Database Character Set Scanner scans the specified columns. You can specify the following when you specify the name of the column:

schemaname specifies the names of the user's schema from which to scan the table
tablename specifies the name of the table from which to scan the column
columnname specifies the name of the column to be scanned

For example, the following command scans the columns LASTNAME and FIRSTNAME in the hr sample schema:

CSSCAN system/manager COLUMN=(HR.EMPLOYEES.LASTNAME, HR.EMPLOYEES.FIRSTNAME) ...

EXCL UDE

Property	Description
Default value	None
Purpose	Specifies the names of the tables to be excluded from the scan

When this parameter is specified, the Database Character Set Scanner excludes the specified tables from the scan. You can specify the following when you specify the name of the table:

schemaname specifies the name of the user's schema from which to exclude the table
tablename specifies the name of the table or tables to be excluded

For example, the following command scans all of the tables that belong to the hr sample schema except for the employees and departments tables:

CSSCAN system/manager USER=HR EXCLUDE=(HR.EMPLOYEES , HR.DEPARTMENTS) ...

FEEDBACK

Property	Description
Default value	None
Minimum value	`100`
Maximum value	`100000`
Purpose	Specifies that the Database Character Set Scanner should display a progress meter in the form of a dot for every `N` number of rows scanned

Fo r example, if you specify FEEDBACK=1000, then the Database Character Set Scanner displays a dot for every 1000 rows scanned. The FEEDBACK value applies to all tables being scanned. It cannot be set for individual tables.

FROMCHAR

Property	Description
Default value	None
Purpose	Specifies the current character set name for `CHAR`, `VARCHAR2`, `LONG`, and `CLOB` data types in the database. By default, the Database Character Set Scanner assumes the character set for these data types to be the database character set.

Us e this parameter to override the default database character set definition for CHAR, VARCHAR2, LONG, and CLOB data in the database.

FROMNCHAR

Property	Description
Default value	The current database character set.
Purpose	Specifies the current national database character set name for `NCHAR`, `NVARCHAR2`, and `NCLOB` data types in the database. By default, the Database Character Set Scanner assumes the character set for these data types to be the database national character set.

Us e this parameter to override the default database character set definition for NCHAR, NVARCHAR2, and NCLOB data in the database.

FULL

Property	Description
Default value	`N`
Range of values	`Y` or `N`
Purpose	Indicates whether to perform the full database scan (that is, to scan the entire database including the data dictionary). Specify `FULL=Y` to scan in full database mode.

See Also:

"Scan Modes in the Database Character Set Scanner" for more information about full database scans

HELP

Property	Description
Default value	`N`
Range of values	`Y` or `N`
Purpose	Displays a help message with the descriptions of the Database Character Set Scanner parameters

LASTRPT

Property	Description
Default value	`N`
Range of values	`Y` or `N`
Purpose	Indicates whether to regenerate the Database Character Set Scanner reports based on statistics gathered from the previous database scan

I f LASTRPT=Y is specified, then the Database Character Set Scanner does not scan the database, but creates the report files using the information left by the previous database scan session instead.

If LASTRPT=Y is specified, then only the USERID, BOUNDARIES, and LOG parameters take effect.

LCSD

Property	Description
Default value	`N`
Range of values	`Y` or `N`
Purpose	Indicates whether to apply language and character set detection during scanning

If LCSD=Y is specified, then the Database Character Set Scanner (CSSCAN) performs language and character set detection on the data cells categorized by the LCSDDATA parameter. The accuracy of the detection depends greatly on the size and the quality of the text being analyzed. The ideal case is literary text of one single language with no spelling or grammatical errors. Data cells that contain a mixture of languages or character sets or text such as addresses and names can yield poor results. When CSSCAN cannot determine the most likely language and character set, it may return up to three most likely languages and character sets for each cell. In some cases it may return none. CSSCAN ignores any data cells with less than 10 bytes of data and returns UNKNOWN for their language and character set.

The language and character set detection is a statistically-based technology, so its accuracy varies depending on the quality of the input data. The goal is to provide CSSCAN users with additional information about unknown data inside the database. It is important for CSSCAN users to review the detection results and the data itself before migrating the data to another character set.

Note that language and character set detection can affect the performance of the Database Character Set Scanner, depending on the amount of data that is being analyzed.

LCSDDATA

Property	Description
Default value	`LOSSY`
Range of values	`LOSSY`, `TRUNCATION`, `CONVERTIBLE`, `CHANGELESS`, `ALL`
Purpose	Specifies the scope of the language and character set detection. The default is to apply the detection to only the `LOSSY` data cells.

This parameter takes effect only when LCSD=Y is specified. For example, if LCSD=Y and LCSDDATA=LOSSY, CONVERTIBLE, then the Database Character Set Scanner tries to detect the character sets and languages of the data cells that are either LOSSY or CONVERTIBLE. Data that is classified as CHANGELESS and TRUNCATION will not be processed. Setting LCSDDATA=ALL results in language and character set detection for all data cells scanned in the current session.

After language and character set detection has been applied to CONVERTIBLE and TRUNCATION data cells, some data cells may change from their original classification to LOSSY. This occurs when the character set detection process determines that the character set of these data cells is not the character set specified in the FROMCHAR parameter.

LOG

Property	Description
Default value	`scan`
Purpose	Specifies a base file name for the following Database Character Set Scanner report files: Database Scan Summary Report file, whose extension is `.txt` Individual Exception Report file, whose extension is `.err` Screen log file, whose extension is `.out`

By de fault, the Database Character Set Scanner generates the three text files, scan.txt, scan.err, and scan.out in the current directory.

MAXBLOCKS

Property	Description
Default value	`None`
Minimum value	`1000`
Maximum value	Unlimited
Purpose	Specifies the maximum block size for each table, so that large tables can be split into smaller chunks for the Database Character Set Scanner to process

For example, if the MAXBLOCKS parameter is set to 1000, then any tables that are greater than 1000 blocks in size are divided into n chunks, where n=CEIL(table block size/1000).

Dividing large tables into smaller pieces is beneficial only when the number of processes set with PROCESS is greater than 1. If the MAXBLOCKS parameter is not set, then the Database Character Set Scanner attempts to split up large tables based on its own optimization rules.

PARFILE

Property	Description
Default value	`None`
Purpose	Specifies the name for a file that contains a list of Database Character Set Scanner parameters

PRE SERVE

Property	Description
Default value	`N`
Range of values	`Y` or `N`
Purpose	Indicates whether to preserve the statistics gathered from the previous scan session

If PRESERVE=Y is specified, then the Database Character Set Scanner preserves all of the statistics from the previous scan. It adds (if PRESERVE=Y) or overwrites (if PRESERVE=N) the new statistics for the tables being scanned in the current scan request.

PROCESS

Property	Description
Default value	`1`
Minimum value	`1`
Maximum value	`32`
Purpose	Specifies the number of concurrent scan processes to utilize for the database scan

QU ERY

Property	Description
Default value	None
Purpose	Applies a filter to restrict the data to be scanned by specifying a clause for a `SELECT` statement, which is applied to all tables and columns in the scanner session

The value of the query parameter is a string that contains a WHERE clause for a SELECT statement that will be applied to all tables and columns listed in the TABLE and COLUMN parameters.

Only one query clause is allowed per scan session. The QUERY parameter is only applicable when performing table or column scans. The parameter will be ignored when performing a Full database or a User scan. QUERY can be applied to multiple tables and columns scans, however, the identical WHERE clause will be appended to all specified tables and columns.

For example, the following command scans the employess who were hired within the last 30 days:

CSSCAN system/manager TABLE=HR.EMPLOYEES QUERY= 'hire_date > SYSDATE - 180' ...

Note that the WHERE clause is not required inside the QUERY parameter. CSSCAN will automatically remove the WHERE clause if it is found to be the first five characters in the QUERY parameter.

SUPPRESS

Property	Description
Default value	Unset (results in unlimited number of rows)
Minimum value	`0`
Maximum value	Unlimited
Purpose	Specifies the maximum number of data exceptions being logged for each table

Th e Database Character Set Scanner inserts information into the CSM$ERRORS table when an exception is found in a data cell. The table grows depending on the number of exceptions reported.

This parameter is used to suppress the logging of individual exception information after a specified number of exceptions are inserted for each table. For example, if SUPPRESS is set to 100, then the Database Character Set Scanner records a maximum of 100 exception records for each table.

TABLE

Property	Description
Default value	`None`
Purpose	Specifies the names of the tables to scan

You can specify the following when you specify the name of the table:

schemaname specifies the name of the user's schema from which to scan the table
tablename specifies the name of the table or tables to be scanned

For example, the following command scans the employees and departments tables in the HR sample schema:

CSSCAN system/manager TABLE=(HR.EMPLOYEES, HR.DEPARTMENTS)

TOCH AR

Property	Description
Default value	`None`
Purpose	Specifies a target database character set name for the `CHAR`, `VARCHAR2`, `LONG`, and `CLOB` data

TONCHAR

Property	Description
Default value	`None`
Purpose	Specifies a target database character set name for the `NCHAR`, `NVARCHAR2`, and `NCLOB` data

If you do not specify a value for TONCHAR, then the Database Character Set Scanner does not scan NCHAR, NVARCHAR2, and NCLOB data.

USER

Property	Description
Default value	`None`
Purpose	Specifies the owner of the tables to be scanned

If the USER parameter is specified, then the Database Character Set Scanner scans all tables belonging to the specified owner. For example, the following statement scans all tables belonging to HR:

CSSCAN system/manager USER=hr ...

USERID

Property	Description
Default value	`None`
Purpose	Specifies the username and password (and optional connect string) of the user who scans the database. If you omit the password, then the Database Character Set Scanner prompts you for it

The following formats are all valid:

username/password
username/password@connect_string
username
username@connect_string

Database Character Set Scanner Sessions: Examples

The following examples show you how to use the command-line and parameter-file methods for the Full Database, User, Single Table, and Column scan modes.

Full Database Scan Examples

The following examples show how to scan the full database to see the effects of migrating it to AL32UTF8. These examples assume that the current database character set is WE8ISO8859P1.

Example 12-6 Database Scan Using the Parameter-File Method

% csscan system/manager parfile=param.txt

The param.txt file contains the following information:

full=y
tochar=al32utf8
array=4096000
process=4

Example 12-7 Database Scan Using the Command-Line Method

% csscan system/manager full=y tochar=al32utf8 array=4096000 process=4

Database Character Set Scanner Messages

The scan.out file shows which tables were scanned. The default file name for the report can be changed by using the LOG parameter.

See Also:

"LOG"

Database Character Set Scanner vx.x : Release xx.x.x.x.x - Production

Copyright (c) 1982, 200x, Oracle.  All rights reserved.

Connected to:
Oracle Database xxx Enterprise Edition Release xx.x.x.x.x - Production
With the Partitioning and Data Mining options

Enumerating tables to scan...

. process 1 scanning SYS.SOURCE$[AAAABHAABAAAAJqAAA]
. process 2 scanning SYS.TAB$[AAAAACAABAAAAA0AAA]
. process 2 scanning SYS.CLU$[AAAAACAABAAAAA0AAA]
. process 2 scanning SYS.ICOL$[AAAAACAABAAAAA0AAA]
. process 2 scanning SYS.COL$[AAAAACAABAAAAA0AAA]
. process 1 scanning SYS.IND$[AAAAACAABAAAAA0AAA]
. process 1 scanning SYS.TYPE_MISC$[AAAAACAABAAAAA0AAA]
. process 1 scanning SYS.LOB$[AAAAACAABAAAAA0AAA]
.
.
.
. process 1 scanning IX.AQ$_ORDERS_QUEUETABLE_G
. process 2 scanning IX.AQ$_ORDERS_QUEUETABLE_I

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

User Scan Examples

The following examples show how to scan the user tables to see the effects of migrating them to AL32UTF8. These examples assume that the current database character set is US7ASCII, but the actual data stored is in Western European WE8MSWIN1252 encoding.

Example 12-8 User Scan with Parameter-File Method

% csscan system/manager parfile=param.txt

The param.txt file contains the following information:

user=hr
fromchar=we8mswin1252
tochar=al32utf8
array=4096000
process=1

Example 12-9 User Scan with Command-Line Method

% csscan system/manager user=hr fromchar=we8mswin1252 
    tochar=al32utf8 array=4096000 process=1

Database Character Set Scanner Messages

The scan.out file shows which tables were scanned.

Database Character Set Scanner vx.x : Release xx.x.x.x.x - Production

Copyright (c) 1982, 200x, Oracle.  All rights reserved.

Connected to:
Oracle Database xxx Enterprise Edition Release xx.x.x.x.x - Production
With the Partitioning and Data Mining options
Copyright (c) 1983, 200x Oracle. All rights reserved.

Enumerating tables to scan...

. process 1 scanning HR.JOBS
. process 1 scanning HR.DEPARTMENTS
. process 1 scanning HR.JOB_HISTORY
. process 1 scanning HR.EMPLOYEES

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

Single Table Scan Examples

The following examples show how to scan a single table to see the effects of migrating it to WE8MSWIN1252. These examples assume that the current database character set is in US7ASCII. Language and character set detection is performed on the LOSSY data cells.

Example 12-10 Single Table Scan with Parameter-File Method

% csscan system/manager parfile=param.txt

The param.txt file contains the following information:

table=hr.employees
tochar=we8mswin1252
array=4096000
process=1
suppress=100
lcsd=y

Example 12-11 Single Table Scan with Command-Line Method

% csscan system/manager table=hr.employees tochar=we8mswin1252 array=4096000 process=1 supress=100 lcsd=y

Database Character Set Scanner Messages

The scan.out file shows which tables were scanned.

Database Character Set Scanner vx.x : Release xx.x.x.x.x - Production

Copyright (c) 1982, 200x, Oracle.  All rights reserved.

Connected to:
Oracle Database xxx Enterprise Edition Release xx.x.x.x.x - Production
With the Partitioning and Data Mining options
Copyright (c) 1983, 200x Oracle. All rights reserved.

. process 1 scanning HR.EMPLOYEES

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

The following example shows how to scan a single table to see the effect of migrating it to WE8MSWIN1252. Before scanning, a query is run against the table to limit the rows that will be scanned. This example assumes the current database character set is in US7ASCII. Language and character set detection is performed on the LOSSY data cells.

Example 12-12 Single Table Scan with Parameter-File Method, Showing the Effect of Migrating to WE8MSWIN1252

% csscan system/manager parfile=param.txt

The param.txt file contains the following information:

table=hr.employees
query='hire_date > SYSDATE - 180'
tochar=we8mswin1252
array=4096000
process=1
suppress=100
lcsd=y

Example 12-13 Single Table Scan with Command-File Method, Showing the Effect of Migrating to WE8MSWIN1252

% csscan system/manager table=hr.employees query='hire_date > SYSDATE - 180' tochar=we8mswin1252 array=4096000 process=1 supress=100 lcsd=y

Database Character Set Scanner Messages

The scan.out file shows which tables were scanned.

Database Character Set Scanner vx.x : Release xx.x.x.x.x - Production

Copyright (c) 1982, 200x, Oracle.  All rights reserved.

Connected to:
Oracle Database xxx Enterprise Edition Release xx.x.x.x.x - Production
With the Partitioning and Data Mining options
Copyright (c) 1983, 200x Oracle. All rights reserved.

. process 1 scanning HR.EMPLOYEES

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

Column Scan Examples

The following example shows how to scan columns within a table to see the effects of migrating it to WE8MSWIN1252. This example assumes the current database character set is in US7ASCII. Language and character set detection is performed on the LOSSY data cells.

Example 12-14 Column Scan with Parameter-File Method

% csscan system/manager parfile=param.txt

The param.txt file contains the following information:

column=(hr.employees.lastname, hr.employees.firstname)
tochar=we8mswin1252
array=4096000
process=1
suppress=100
lcsd=y

Example 12-15 Column Scan with Command-Line Method

% csscan system/manager column=(hr.employees.lastname, hr.employees.firstname)
 tochar=we8mswin1252 array=4096000 process=1 supress=100 lcsd=y

Database Character Set Scanner Messages

The scan.out file shows which tables were scanned.

Database Character Set Scanner vx.x : Release xx.x.x.x.x - Production

Copyright (c) 1982, 200x, Oracle.  All rights reserved.

Connected to:
Oracle Database xxx Enterprise Edition Release xx.x.x.x.x - Production
With the Partitioning and Data Mining options
Copyright (c) 1983, 200x Oracle. All rights reserved.

. process 1 scanning HR.EMPLOYEES

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

Database Character Set Scanner Reports

The Database Character Set Scanner generates two reports for each scan:

Database Scan Summary Report
Database Scan Individual Exception Report

The Database Scan Summary Report is found in the scan.txt file. The Database Scan Individual Exception Report is found in the scan.err file.

The default file names for the reports can be changed by using the LOG parameter.

See Also:

"LOG"

Database Scan Summary Report

The scan.txt file contains the Database Scan Summary Report. The following output is an example of the report header. This section contains the time when each process of the scan was performed.

Database Scan Summary Report

Time Started  : 200x-12-16 20:35:56
Time Completed: 200x-12-16 20:37:31

Process ID         Time Started       Time Completed
---------- -------------------- --------------------
         1  200x-12-16 20:36:07  2002-12-16 20:37:30
         2  200x-12-16 20:36:07  2002-12-16 20:37:30
---------- -------------------- --------------------

The report consists of the following sections:

Database Size
Database Scan Parameters
Scan Summary
Data Dictionary Conversion Summary
Application Data Conversion Summary
Application Data Conversion Summary Per Column Size Boundary
Distribution of Convertible Data Per Table
Distribution of Convertible Data Per Column
Indexes To Be Rebuilt
Truncation Due To Character Semantics

The information available for each section depends on the type of scan and the parameters you select.

Database Size

This section reports on the current database size as well as identifying the amount of potential data expansion after the character set migration.

The following output is an example.

Tablespace                           Used            Free           Total       Expansion
------------------------- --------------- --------------- --------------- ---------------
SYSTEM                            206.63M         143.38M         350.00M         588.00K
SYSAUX                              8.25M         131.75M         140.00M            .00K
------------------------- --------------- --------------- --------------- ---------------
Total                             214.88M         275.13M         490.00M         588.00K

The size of the largest CLOB is 57370 bytes

Database Scan Parameters

This section describes the parameters selected and the type of scan you chose. The following output is an example.

Parameter                      Value                                           
------------------------------ ------------------------------------------------
CSSCAN Version                 v2.1 
Instance Name                  rdbms06                                         
Database Version               10.2.0.0.0                                      
Scan type                      Full database                                   
Scan CHAR data?                YES                                             
Database character set         WE8ISO8859P1                                    
FROMCHAR                       WE8ISO8859P1                                    
TOCHAR                         AL32UTF8                                        
Scan NCHAR data?               NO                                              
Array fetch buffer size        1024000                                         
Number of processes            2                                               
Capture convertible data?      NO                                              
Charset Language Detections    Yes                                             
Charset Language Parameter     LOSSY                                           
------------------------------ ------------------------------------------------

Scan Summary

This section indicates the feasibility of the database character set migration. There are two basic criteria that determine the feasibility of the character set migration of the database. One is the condition of the data dictionary and the other is the condition of the application data.

The Scan Summary section consists of two status lines: one line reports on the data dictionary, and the other line reports on the application data.

The following is sample output from the Scan Summary:

All character type data in the data dictionary are convertible to the new character set
Some character type application data are not convertible to the new character set

Table 12-1 shows the types of status that can be reported for the data dictionary and application data.

Table 12-1 Possible Status of the Data Dictionary and Application Data

Data Dictionary Status	Application Data Status
All character-type data in the data dictionary remains the same in the new character set.	All character-type application data remains the same in the new character set.
All character-type data in the data dictionary is convertible to the new character set.	All character-type application data is convertible to the new character set.
Some character-type data in the data dictionary is not convertible to the new character set.	Some character-type application data is not convertible to the new character set.

When all data remains the same in the new character set, it means that the encoding values of the original character set are identical in the target character set. For example, ASCII data is stored using the same binary values in both WE8ISO8859P1 and AL32 UTF8. In this case, the database character set can be migrated using the CSALTER script.

If all the data is convertible to the new character set, then the data can be safely migrated using the Export and Import utilities. However, the migrated data will have different encoding values in the target character set.

See Also:

"Database Scan Individual Exception Report" for more information about non-convertible data
"Migrating a Character Set Using the CSALTER Script"
"Migrating Character Data Using a Full Export and Import"

Data Dictionary Conversion Summary

This section contains the statistics about the conversion of the data in the data dictionary. The numbers of data cells with each type of status are reported by data type. To achieve a comprehensive data dictionary conversion summary, you need to use a full database scan.

Table 12-2 describes the possible types of status of a data cell.

Table 12-2 Possible Status of Data

Status	Description
`Changeless`	Data remains the same in the new character set
`Convertible`	Data can be successfully converted to the new character set
`Truncation`	Data will be truncated if conversion takes place
`Lossy`	Character data will be lost if conversion takes place

In order to provide more information on how to convert data in the data dictionary, the Data Dictionary Conversion Summary report has been divided into two parts: Data Dictionary Tables, and XML CSX Dictionary Tables. Compact binary XML (CSX) is a new format for XML documents in XML DB in Oracle Database 11g Release 2 (11.2). When you insert an XML document into a CSX column, the database saves CSX data into a BLOB column in the application data part. Simultaneously, the database saves the data into the tables of the XDB user, which belong to data dictionary. In the data dictionary, these tables are called as XML CSX Dictionary Tables.

Similarly, the tables for "Distribution of Convertible, Truncated and Lossy Data by Table" and "Distribution of Convertible, Truncated and Lossy Data by Column" have been organized into three parts, respectively: Data Dictionary Tables, XML CSX Dictionary Tables, and Application Data.

The following output is an example of the expanded Data Dictionary Conversion Summary report.

Database Scan Summary Report
 
Time Started  : 2006-08-15 01:13:24
Time Completed: 2006-08-15 01:16:32
 
Process ID         Time Started       Time Completed
---------- -------------------- --------------------
         1  2006-08-15 01:13:28  2006-08-15 01:16:30
---------- -------------------- --------------------
 
[Database Size]
 
Tablespace                           Used            Free           Total       Expansion
------------------------- --------------- --------------- --------------- ---------------
SYSTEM                            542.77M           9.82M         552.59M           1.41M
SYSAUX                             80.00M            .00K          80.00M         574.00K
TEMP                                 .00K            .00K            .00K            .00K
XDB_RESINFO                        51.81M          48.19M         100.00M          18.00K
XDB_TEMP                             .00K            .00K            .00K            .00K
------------------------- --------------- --------------- --------------- ---------------
Total                             674.59M          58.01M         732.59M           1.98M

The size of the largest CLOB is 61115 bytes
 
[Database Scan Parameters]
 
Parameter                      Value                                           
------------------------------ ------------------------------------------------
CSSCAN Version                 v2.1                                            
Instance Name                  cm2                                             
Database Version               11.1.0.0.0                                      
Scan type                      Full database                                   
Scan CHAR data?                YES                                             
Database character set         WE8ISO8859P1                                    
FROMCHAR                       we8iso8859p1                                    
TOCHAR                         utf8                                            
Scan NCHAR data?               YES                                             
NCHAR character set            AL16UTF16                                       
FROMNCHAR                      al16utf16                                       
TONCHAR                        utf8                                            
Array fetch buffer size        102400                                          
Number of processes            1                                               
Capture convertible data?      NO                                              
------------------------------ ------------------------------------------------

[Scan Summary]
 
All character type data in the data dictionary are convertible to the new character set
All character type application data are convertible to the new character set
 
[Data Dictionary Conversion Summary]
 
Data Dictionary Tables:
 
Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                     3,294,477                0                0                0
CHAR                             1,941                0                0                0
LONG                           133,551                0                0                0
CLOB                            21,633              841                0                0
NVARCHAR2                            0                0                0                0
NCHAR                                0                0                0                0
NCLOB                                0                0                0                0
VARRAY                          26,965                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                        3,478,567              841                0                0
Total in percentage             99.976%           0.024%           0.000%           0.000%
 
The data dictionary can be safely migrated using the CSALTER script

XML CSX Dictionary Tables:
 
Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                           173                8                0                0
CHAR                                 0                0                0                0
LONG                                 0                0                0                0
CLOB                                 0                0                0                0
NVARCHAR2                            0                0                0                0
NCHAR                                0                0                0                0
NCLOB                                0                0                0                0
VARRAY                               0                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                              173                8                0                0
Total in percentage             95.580%           4.420%           0.000%           0.000%
 
[Application Data Conversion Summary]
 
Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                        36,011                0                0                0
CHAR                             5,533                0                0                0
LONG                                 0                0                0                0
CLOB                                27               26                0                0
NVARCHAR2                            0           17,280                0                0
NCHAR                                0                0                0                0
NCLOB                                4                0                0                0
VARRAY                             319                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                           41,894           17,306                0                0
Total in percentage             70.767%          29.233%           0.000%           0.000%

[Distribution of Convertible, Truncated and Lossy Data by Table]
 
         Data Dictionary Tables:
 
         USER.TABLE                                              Convertible       Truncation            Lossy
         -------------------------------------------------- ---------------- ---------------- ----------------
         SYS.EXTERNAL_TAB$                                                 1                0                0
         SYS.METASTYLESHEET                                              145                0                0
         SYS.RULE$                                                         1                0                0
         SYS.SCHEDULER$_EVENT_LOG                                         26                0                0
         SYS.WRH$_SQLTEXT                                                239                0                0
         SYS.WRH$_SQL_PLAN                                               217                0                0
         SYS.WRI$_ADV_DIRECTIVE_META                                       5                0                0
         SYS.WRI$_ADV_OBJECTS                                              1                0                0
         SYS.WRI$_DBU_FEATURE_METADATA                                    94                0                0
         SYS.WRI$_DBU_FEATURE_USAGE                                        3                0                0
         SYS.WRI$_DBU_HWM_METADATA                                        17                0                0
         SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY                                71                0                0
         SYS.WRI$_REPT_FILES                                               4                0                0
         XDB.XDB$RESCONFIG                                                 9                0                0
         XDB.XS$PRINCIPALS                                                 1                0                0
         XDB.XS$SECURITYCLASS                                              7                0                0
         -------------------------------------------------- ---------------- ---------------- ----------------

         XML CSX Dictionary Tables:
 
         USER.TABLE                                              Convertible       Truncation            Lossy
         -------------------------------------------------- ---------------- ---------------- ----------------
         XDB.X$QN0S0VE2C21FAJ8E0G5PHDLAO3BO                                8                0                0
         -------------------------------------------------- ---------------- ---------------- ----------------
 
         Application Data:
 
         USER.TABLE                                              Convertible       Truncation            Lossy
         -------------------------------------------------- ---------------- ---------------- ----------------
         OE.PRODUCT_DESCRIPTIONS                                      17,280                0                0
         OE.WAREHOUSES                                                     4                0                0
         PM.ONLINE_MEDIA                                                  14                0                0
         PM.PRINT_MEDIA                                                    8                0                0
         -------------------------------------------------- ---------------- ---------------- ----------------
 
         [Distribution of Convertible, Truncated and Lossy Data by Column]

         Data Dictionary Tables:
 
         USER.TABLE|COLUMN                                       Convertible       Truncation            Lossy
         -------------------------------------------------- ---------------- ---------------- ----------------
         SYS.EXTERNAL_TAB$|PARAM_CLOB                                      1                0                0
         SYS.METASTYLESHEET|STYLESHEET                                   145                0                0
         SYS.RULE$|CONDITION                                               1                0                0
         SYS.SCHEDULER$_EVENT_LOG|ADDITIONAL_INFO                         26                0                0
         SYS.WRH$_SQLTEXT|SQL_TEXT                                       239                0                0
         SYS.WRH$_SQL_PLAN|OTHER_XML                                     217                0                0
         SYS.WRI$_ADV_DIRECTIVE_META|DATA                                  5                0                0
         SYS.WRI$_ADV_OBJECTS|ATTR4                                        1                0                0
         SYS.WRI$_DBU_FEATURE_METADATA|INST_CHK_LOGIC                     11                0                0
         SYS.WRI$_DBU_FEATURE_METADATA|USG_DET_LOGIC                      83                0                0
         SYS.WRI$_DBU_FEATURE_USAGE|FEATURE_INFO                           3                0                0
         SYS.WRI$_DBU_HWM_METADATA|LOGIC                                  17                0                0
         SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY|EXPRESSION                     71                0                0
         SYS.WRI$_REPT_FILES|SYS_NC00005$                                  4                0                0
         XDB.XDB$RESCONFIG|XMLDATA                                         9                0                0
         XDB.XS$PRINCIPALS|XMLDATA                                         1                0                0
         XDB.XS$SECURITYCLASS|XMLDATA                                      7                0                0
         -------------------------------------------------- ---------------- ---------------- ----------------
 
         XML CSX Dictionary Tables:
 
         USER.TABLE|COLUMN                                       Convertible       Truncation            Lossy
         -------------------------------------------------- ---------------- ---------------- ----------------
         XDB.X$QN0S0VE2C21FAJ8E0G5PHDLAO3BO|LOCALNAME                      8                0                0
         -------------------------------------------------- ---------------- ---------------- ----------------
 
         Application Data:

         USER.TABLE|COLUMN                                       Convertible       Truncation            Lossy
         -------------------------------------------------- ---------------- ---------------- ----------------
         OE.PRODUCT_DESCRIPTIONS|TRANSLATED_DESCRIPTION                8,640                0                0
         OE.PRODUCT_DESCRIPTIONS|TRANSLATED_NAME                       8,640                0                0
         OE.WAREHOUSES|SYS_NC00003$                                        4                0                0
         PM.ONLINE_MEDIA|SYS_NC00042$                                      7                0                0
         PM.ONLINE_MEDIA|SYS_NC00062$                                      7                0                0
         PM.PRINT_MEDIA|AD_FINALTEXT                                       4                0                0
         PM.PRINT_MEDIA|AD_SOURCETEXT                                      4                0                0
         -------------------------------------------------- ---------------- ---------------- ----------------

[Indexes to be Rebuilt]
 
USER.INDEX on USER.TABLE(COLUMN)                                                         
-----------------------------------------------------------------------------------------
SYS.I_WRI$_OPTSTAT_HH_ST on SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY(SYS_NC00024$)
SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST on SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY(SYS_NC00024$)
SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST on SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY(OBJ#)
SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST on SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY(INTCOL#)
SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST on SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY(COLNAME)
OE.PROD_NAME_IX on OE.PRODUCT_DESCRIPTIONS(TRANSLATED_NAME)
XDB.X$QQ0S0VE2C21FAJ8E0G5PHDLAO3BO on XDB.X$QN0S0VE2C21FAJ8E0G5PHDLAO3BO(NMSPCID)
XDB.X$QQ0S0VE2C21FAJ8E0G5PHDLAO3BO on XDB.X$QN0S0VE2C21FAJ8E0G5PHDLAO3BO(LOCALNAME)
XDB.X$QQ0S0VE2C21FAJ8E0G5PHDLAO3BO on XDB.X$QN0S0VE2C21FAJ8E0G5PHDLAO3BO(FLAGS)
-----------------------------------------------------------------------------------------

The following output is an example of the Data Dictionary Conversion Summary report in previous releases.

Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                     1,214,557                0                0                0
CHAR                               967                0                0                0
LONG                            88,657                0                0                0
CLOB                               138              530                0                0
VARRAY                              18                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                        1,304,337              530                0                0
Total in percentage             99.959%           0.041%           0.000%           0.000%

The data dictionary can be safely migrated using the CSALTER script.

If the numbers of data cells recorded in the Convertible, Truncation, and Lossy columns are zero in the data dictionary tables and the XML CSX dictionary tables, then no data conversion is required to migrate the data dictionary from the FROMCHAR character set to the TOCHAR character set.

If the numbers in the Truncation and Lossy columns are zero and some numbers in the Convertible columns are not zero in both data dictionary tables and XML CSX dictionary tables, then all data in the data dictionary is convertible to the new character set. However, it is dangerous to convert the data in the data dictionary without understanding their impact on the database. All convertible cells in the XML CSX dictionary tables can be safely converted, so they are output explicitly out of the data dictionary tables. The CSALTER script can convert some of the convertible cells in the data dictionary. The message that follows the conversion summary table indicates whether this conversion can be supported by the CSALTER script.

If the numbers in the Lossy column are not zero in the data dictionary tables or XML CSX dictionary tables, then there is data in the data dictionary that is not convertible. Therefore, it is not feasible to migrate the current database to the new character because the export and import processes cannot convert the data into the new character set. For example, you might have a table name with invalid characters or a PL/SQL procedure with a comment line that includes data that cannot be mapped to the new character set. These changes to schema objects must be corrected manually before migration to a new character set.

If the numbers in the Truncation column are not zero in the data dictionary tables or XML CSX dictionary tables, then the export and import process will truncate the data.

See Also:

"Database Character Set Scanner CSALTER Script"
"How to Handle Convertible or Lossy Data in the Data Dictionary"

Application Data Conversion Summary

This section contains the statistics on conversion summary of the application data. The numbers of data cells with each type of status are reported by data type. Table 12-2 describes the types of status that can be reported.

The following output is an example.

Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                        37,757                3                0                0
CHAR                             6,404                0                0                0
LONG                                 4                0                0                0
CLOB                                23               20                0                1
VARRAY                             319                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                           44,507               23                0                1
Total in percentage             99.946%           0.051%               0%           0.002%

Application Data Conversion Summary Per Column Size Boundary

This section contains the conversion summary of the CHAR and VARCHAR2 application data. The number of data cells with each type of status are reported by column size boundaries specified by the BOUNDARIES parameter. Table 12-2 describes the possible types of status. This information is available only when the BOUNDARIES parameter is specified. The following output is an example.

Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2(1..30)                 28,702                2                0                0
VARCHAR2(31..4000)               9,055                1                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
CHAR(1..30)                      6,404                0                0                0
CHAR(31..4000)                       0                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                           44,161                3                0                0

Distribution of Convertible Data Per Table

This section shows how Convertible, Truncation, and Lossy data is distributed within the database. The statistics are reported by table. If the list contains only a few rows, then the Convertible data is localized. If the list contains many rows, then the Convertible data occurs throughout the database. The following output is an example.

USER.TABLE                        Convertible       Truncation            Lossy
---------------------------- ---------------- ---------------- ----------------
HR.EMPLOYEES                                1                0                0
OE.CUSTOMERS                                2                0                0
PM.ONLINE_MEDIA                            13                0                0
PM.PRINT_MEDIA                              7                0                1
SYS.EXTERNAL_TAB$                           1                0                0
SYS.METASTYLESHEET                         80                0                0
---------------------------- ---------------- ---------------- ----------------

Distribution of Convertible Data Per Column

This section shows how Convertible, Truncation, and Lossy data is distributed within the database. The statistics are reported by column. The following output is an example.

USER.TABLE|COLUMN                          Convertible       Truncation            Lossy
------------------------------------- ---------------- ---------------- ----------------
HR.EMPLOYEES|FIRST_NAME                              1                0                0
OE.CUSTOMERS|CUST_EMAIL                              1                0                0
OE.CUSTOMERS|CUST_FIRST_NAME                         1                0                0
PM.ONLINE_MEDIA|SYS_NC00042$                         6                0                0
PM.ONLINE_MEDIA|SYS_NC00062$                         7                0                0
PM.PRINT_MEDIA|AD_FINALTEXT                          3                0                1
PM.PRINT_MEDIA|AD_SOURCETEXT                         4                0                0
SYS.EXTERNAL_TAB$|PARAM_CLOB                         1                0                0
SYS.METASTYLESHEET|STYLESHEET                       80                0                0
------------------------------------- ---------------- ---------------- ----------------

Indexes To Be Rebuilt

This generates a list of all the indexes that are affected by the database character set migration. These can be rebuilt after the data has been imported. The following output is an example.

USER.INDEX on USER.TABLE(COLUMN)                                                      
--------------------------------------------------------------------------------------
HR.EMP_NAME_IX on HR.EMPLOYEES(FIRST_NAME)
HR.EMP_NAME_IX on HR.EMPLOYEES(LAST_NAME)
OE.CUST_EMAIL_IX on OE.CUSTOMERS(CUST_EMAIL)
--------------------------------------------------------------------------------------

Truncation Due To Character Semantics

This section applies only to columns that are defined using character semantics. The Truncation Due to Character Semantics section identifies the number of data cells that would be truncated if they were converted to the target character set (for example, by the SQL CONVERT function or another inline conversion process) before the database character set is updated with the CSALTER script. If the data conversion occurs after the database character set is changed, then this section can be ignored.

For example, a VARCHAR2(5 char) column in a WE8MSWIN1252 database can store up to 5 characters, using 5 bytes. When these characters are migrated to AL32UTF8, the same 5 characters can expand to as much as 20 bytes in length. Because the physical byte limits allocated for the column are determined by the current database character set, this column must be manually expanded to 20 bytes before the data can be converted in the target character set. Alternatively, you can apply the character set conversion to this column after the database character set has been changed. Then the same VARCHAR2(5 char) definition will automatically allocate 20 bytes, and no special handling is required.

The following output is an example of the Truncation Due To Character Semantics section of the report.

USER.TABLE|COLUMN                                        Truncation
-------------------------------------------------- ----------------
HR.EMPLOYEES|FIRST_NAME                                           1

Character Set Detection Result

This section appears when the language and character set detection is turned on by the LCSD parameter. It displays a list of character sets detected by the Database Character Set Scanner. The character sets are ordered by occurrence. NUMBER refers to the number of data cells.

The following output is an example of the Character Set Detection Result section.

CHARACTER SET                            NUMBER       PERCENTAGE
------------------------------ ---------------- ----------------
WE8MSWIN1252                                 38          97.436%
UNKNOWN                                       1           2.564%
------------------------------ ---------------- ----------------

Language Detection Result

This section appears when the language and character set detection is turned on by the LCSD parameter. It displays a list of the languages detected by the Database Character Set Scanner. The languages are ordered by occurrence.

The following output is an example of the Language Detection Result Section.

LANGUAGE                                 NUMBER       PERCENTAGE
------------------------------ ---------------- ----------------
ENGLISH                                      36          92.308%
FRENCH                                        2           5.128%
UNKNOWN                                       1           2.564%
------------------------------ ---------------- ----------------

Database Scan Individual Exception Report

The scan.err file contains the Individual Exception Report. It consists of the following summaries:

Database Scan Parameters
Data Dictionary Individual Exceptions
Application Data Individual Exceptions

Database Scan Parameters

This section describes the parameters and the type of scan chosen. The following output is an example.

Parameter                      Value                                           
------------------------------ ------------------------------------------------
CSSCAN Version                 v2.1                                            
Instance Name                  rdbms06                                         
Database Version               10.2.0.0.0                                      
Scan type                      Full database                                   
Scan CHAR data?                YES                                             
Database character set         WE8ISO8859P1                                    
FROMCHAR                       WE8ISO8859P1                                    
TOCHAR                         AL32UTF8                                        
Scan NCHAR data?               NO                                              
Array fetch buffer size        1024000                                         
Number of processes            2                                               
Capture convertible data?      NO                                              
Charset Language Detection     Yes                                             
Charset Language Parameter     LOSSY                                           
------------------------------ ------------------------------------------------

Data Dictionary Individual Exceptions

This section reports on whether data dictionary data is convertible or has exceptions. There are two types of exceptions:

exceed column size
lossy conversion

The following output is an example for a data dictionary that contains convertible data.

User  : SYS
Table : OBJ$
Column: NAME
Type  : VARCHAR2(30)
Number of Exceptions         : 0
Max Post Conversion Data Size: 30

ROWID              Exception Type      Size Cell Data(first 30 bytes)
------------------ ------------------ ----- ------------------------------
AAAAASAABAAAikLAAQ convertible              Aufträge
------------------ ------------------ ----- ------------------------------

See Also:

"Application Data Individual Exceptions" for more information about exceptions
"How to Handle Convertible or Lossy Data in the Data Dictionary"

Application Data Individual Exceptions

This report identifies the data with exceptions so that this data can be modified if necessary.

There are two types of exceptions:

exceed column size

The column size should be extended if the maximum column width has been surpassed. Otherwise, data truncation occurs.
lossy conversion

The data must be corrected before migrating to the new character set. Otherwise the invalid characters are converted to a replacement character. Replacement characters are usually specified as ? or ¿ or as a character that is linguistically similar to the source character.

The following example shows an individual exception report that illustrates some possible problems when changing the database character set from WE8ISO8859P1 to AL32UTF8.

User:   USER1
Table:  TEST
Column: NAME
Type:   VARCHAR2(10)
Number of Exceptions: 2  
Max Post Conversion Data Size: 11

ROWID              Exception Type      Size Cell Data(first 30 bytes)     
------------------ ------------------ ----- ------------------------------
AAAA2fAAFAABJwQAAg exceed column size    11 Ährenfeldt
AAAA2fAAFAABJwQAAu lossy conversion         órâclë8™
AAAA2fAAFAABJwQAAu exceed column size    11 órâclë8™
------------------ ------------------ ----- ------------------------------

The values Ährenfeldt and órâclë8™ exceed the column size (10 bytes) because each of the characters Ä, ó, â, and ë occupies one byte in WE8ISO8859P1 but two bytes in AL32UTF8. The value órâclë8™ has lossy conversion to AL32UTF8 because the trademark sign ™ (code 153) is not a valid WE8ISO8859P1 character. It is a WE8MSWIN1252 character, which is a superset of WE8ISO8859P1.

You can view the data that has an exception by issuing a SELECT statement:

SELECT name FROM user1.test WHERE ROWID='AAAA2fAAFAABJwQAAu';

You can modify the data that has the exception by issuing an UPDATE statement:

UPDATE user1.test SET name = 'Oracle8 TM'
WHERE ROWID='AAAA2fAAFAABJwQAAu';

If the language and character set detection option is enabled, then CSSCAN attempts to provide the most probable languages and character sets for the data cells specified by the LCSDDATA parameter.

The following example shows an individual exception report that illustrates language and character set detection results for lossy data cells when changing the database character set from US7ASCII to AL32UTF8.

User:   USER2
Table:  TEST
Column: NAME
Type:   VARCHAR2(30)
Number of Exceptions: 2  
Max Post Conversion Data Size: 11

ROWID              Exception Type      Size Cell Data(first 30 bytes)   Language & Character Set         
------------------ ------------------ ----- --------------------------- ------------------------
AAAA2fAAFAABJwQAAt lossy conversion         C'est français              (French,UTF8)   
AAAA2fAAFAABJwQAAu lossy conversion         Ciò è italiana              (Italian,WE8MSWIN1252)
------------------ ------------------ ----- --------------------------- ------------------------

See Also:

How to Handle Convertible or Lossy Data in the Data Dictionary

Unlike modifying user application data, updating and changing the contents of data dictionary tables directly is not supported. Updating the system tables without understanding the internal dependencies can lead to database corruption.

If the data dictionary is convertible, then the data cells are encoded in the FROMCHAR character set. Three common scenarios for the existence of convertible data in the data dictionary are:

CLOB data in the data dictionary

For single-byte character sets, CLOB data is stored in the database character set encoding. For multibyte character sets, CLOB data is stored in an internal Oracle Database format which is compatible with UCS-2. For example, the byte representation of the string 'ABC' stored inside a VARCHAR2 column in a US7ASCII database remains unchanged when migrated to AL32UTF8. The same string stored inside a CLOB column doubles in size and is stored completely differently. When migrating from a single-byte character set to a multibyte character set, CLOB data is never CHANGELESS.
XML CSX data in the data dictionary

All XML CSX data in the data dictionary can be safely converted like convertible application data.
Migrating a database to a character set that is a superset in the sense that it contains all of the characters of the original character set, but the binary values of some characters is not the same in the two character sets

This is similar to user application data whose data cells need to be manually converted to the new character set. A common cause of this is that the user 's object definitions (such as table names, column names, package names and package bodies) were created using non-ASCII characters. These are typically characters or symbols that are part of the user's native language.

The easiest approach to migrating convertible data dictionary data is to create a new database in the target character set and then re-create all of the data dictionary and schema definitions by exporting and importing. However, this method means creating a new database.

If you want to migrate the existing database instead of building a new one, then the CSALTER script is the simplest way to migrate convertible CLOB data inside the data dictionary and to change the existing database character set to the target character set.

For data dictionary CONVERTIBLE data cells that are not CLOB data, you must find the schema objects containing the convertible data. Then you can choose to do one of the following actions:

Amend the object definitions (such as removing non-ASCII characters from comments inside a package body) so that the data cells become CHANGELESS
Drop these objects from the database schema altogether and then re-create them after the database character set has been migrated to the target character set

LOSSY dictionary data cells require further examination of the data dictionary to see whether the current FROMCHAR character set is the actual character set of the database. If it is, you must correct these object definitions (such as removing the offending characters from comments inside a package body) so that the data cells become CHANGELESS and they can be migrated safely to the target character set.

Three SQL scripts are included with the Database Character Set Scanner to help users to locate the CONVERTIBLE and LOSSY data cells in the data dictionary:

analyze_source.sql
analyze_histgrm.sql
analyze_rule.sql

The scripts are stored in the $ORACLE_HOME/nls/csscan/sql directory. They perform SQL SELECT operations on the SYS.SOURCE$, SYS.HISTGRM$ and SYS.RULE$ data dictionary tables so that the offending data dictionary objects can be identified.

The following example shows output from the analyze_source.sql script:

SQL> @$ORACLE_HOME/nls/csscan/sql/analyze_source.sql

Table: SYS.SOURCE$
Error: CONVERTIBLE DATA

no rows selected

Table: SYS.SOURCE$
Error: EXCEPTIONAL DATA

OWNER       OBJECT_NAME     OBJECT_TYPE     EXCEPTIONAL
----------- --------------  -------------   -----------
SCOTT       FOO             FUNCTION        1

Storage and Performance Considerations in the Database Character Set Scanner

This section describes storage and performance issues in the Database Character Set Scanner. It contains the following topics:

Storage Considerations for the Database Character Set Scanner
Performance Considerations for the Database Character Set Scanner
Recommendations and Restrictions for the Database Character Set Scanner

Storage Considerations for the Database Character Set Scanner

This section describes the size and the growth of the Database Character Set Scanner's system tables, and explains the approach to maintain them. There are three system tables that can increase rapidly depending on the nature of the data stored in the database.

You may want to assign a large tablespace to the user CSMIG by amending the csminst.sql script. By default, the SYSTEM tablespace is assigned to the user CSMIG.

This section includes the following topics:

CSM$TABLES
CSM$COLUMNS
CSM$ERRORS

CSM$TABLES

The Database Character Set Scanner enumerates all tables that need to be scanned into the CSM$TABLES table.

You can look up the number of tables (to get an estimate of how large CSM$TABLES can become) in the database by issuing the following SQL statement:

SELECT COUNT(*) FROM DBA_TABLES;

CSM$COLUMNS

The Database Character Set Scanner stores statistical information for each column scanned into the CSM$COLUMNS table.

You can look up the number of character type columns (to get an estimate of how large CSM$COLUMNS can become) in the database by issuing the following SQL statement:

SELECT COUNT(*) FROM DBA_TAB_COLUMNS
WHERE DATA_TYPE IN ('CHAR', 'VARCHAR2', 'LONG', 'CLOB', 'VARRAY');

CSM$ERRORS

When exce ptions are detected with cell data, the Database Character Set Scanner inserts individual exception information into the CSM$ERRORS table. This information then appears in the Individual Exception Report and facilitates identifying records to be modified if necessary.

If your database contains a lot of data that is signaled as Exceptional or Convertible (when the parameter CAPTURE=Y is set), then the CSM$ERRORS table can grow very large. You can prevent the CSM$ERRORS table from growing unnecessarily large by using the SUPPRESS parameter.

The SUPPRESS parameter applies to all tables. The Database Character Set Scanner suppresses inserting individual Exceptional information after the specified number of exceptions is inserted. Limiting the number of exceptions to be recorded may not be useful if the exceptions are spread over different tables.

Performance Considerations for the Database Character Set Scanner

This section describes how to improve performance when scanning the database.

Using Multiple Scan Processes

If you plan to scan a relatively large database, for example, over 50GB, then you might want to consider using multiple scan processes. This shortens the duration of database scans by using hardware resources such as CPU and memory available on the machine. A guideline for determining the number of scan processes to use is to set the number equal to the CPU_COUNT initialization parameter.

Setting the Array Fetch Buffer Size

The Database Character Set Scanner fetches multiple rows at a time when an array fetch is allowed. You can usually improve performance by letting the Database Character Set Scanner use a bigger array fetch buffer. Each process allocates its own fetch buffer.

Optimizing the QUERY Clause

When the Character Set Scanner is run without specifying the QUERY parameter, each SELECT operation created by CSSCAN will automatically include a /*+ROWID*/ hint. This is to enable faster access, so that all data retrieval will be performed using ROWID when performing table scans. When a QUERY parameter is supplied, the scanner assumes that the condition in the WHERE clause may already be optimized by other means, hence the ROWID hint will not be added to the QUERY clause.

To ensure optimal performance when using the QUERY parameter, careful tuning (for example, creating additional indexes) will be needed in the SELECT statements.

Suppressing Exception and Convertible Log

The Database Character Set Scanner inserts individual Exceptional and Convertible (when CAPTURE=Y) information into the CSM$ERRORS table. In general, insertion into the CSM$ERRORS table is more costly than data fetching. If your database has a lot of data that is signaled as Exceptional or Convertible, then the Database Character Set Scanner issues many insert statements, causing performance degradation. Oracle recommends setting a limit on the number of exception rows to be recorded using the SUPPRESS parameter.

Recommendations and Restrictions for the Database Character Set Scanner

All the character-based data in CHAR, VARCHAR2, LONG, CLOB, and VARRAY columns are stored in the database character set, which is specified with the CREATE DATABASE statement when the database is first created. However, in some configurations, it is possible to store data in a different character set from the database character set either intentionally or unintentionally. This happens most often when the NLS_LANG character set is the same as the database character set, because in such cases Oracle Database sends and receives data as is, without conversion or validation being guaranteed. It can also happen if one of the two character sets is a superset of the other, in which case many of the code points appear as if they were not converted. For example, if NLS_LANG is set to WE8ISO8859P1 and the database character set is WE8MSWIN1252, then all code points except the range 128-159 are preserved through the client/server conversion.

The same binary code point value can be used to represent different characters between the different character sets. Most European character sets share liberal use of the 8-bit range to encode native characters, so it is very possible for a cell to be reported as convertible but for the wrong reasons. When you set the FROMCHAR parameter, the assumption is that all character data is encoded in that character set, but the Database Character Set Scanner may not be able to accurately determine its validity.

For example, this can occur when the Database Character Set Scanner is used with the FROMCHAR parameter set to WE8MSWIN1252. This single-byte character set encodes a character in every available code point so that no matter what data is being scanned, the scanner always identifies a data cell as being available in the source character set.

Scanning Database Containing Data Not in the Database Character Set

If a database contains data that is not in the database character set, but it is encoded in another character set, then the Database Character Set Scanner can perform a scan if the FROMCHAR parameter specifies the encoded character set.

Scanning Database Containing Data from Two or More Character Sets

If a database contains data from more than one character set, then the Database Character Set Scanner cannot accurately test the effects of changing the database character set by a single scan. If the data can be divided into separate tables, one for each character set, then the Database Character Set Scanner can perform multiple table scans to verify the validity of the data.

For each scan, use a different value of the FROMCHAR parameter to tell the Database Character Set Scanner to treat all target columns in the table as if they were in the specified character set.

Database Character Set Scanner CSALTER Script

The CSALTER script is a DBA tool for special character set migration. Similar to the obsolete ALTER DATABASE CHARACTER SET SQL statement, CSALTER should be used only by the system administrator. System administrators must run the Database Character Set Scanner first to confirm that the proper conditions exist for running CSALTER. Also, the database must be backed up before running CSALTER.

To run the CSALTER script, start SQL*Plus and connect to the database whose character set is to be migrated. Note that the Database Character Set Scanner must be run before the CSALTER script. Then enter the following command:

sqlplus> @@CSALTER.PLB

The CSALTER script includes the following phases:

Checking Phase of the CSALTER Script
Updating Phase of the CSALTER Script

Checking Phase of the CSALTER Script

In the checking phase, the CSALTER script performs the following tasks:

It checks whether the user login is SYS. Only user SYS is allowed to run the script.
It checks whether a full database scan has been previously run within the last 7 days. If a full database scan has not been previously run, then the script stops and reports an error. It is the DBA's responsibility to ensure that no one updates the database between the times when the full database scans and the CSALTER script is run.
It checks whether CLOB columns in the data dictionary that were created by Oracle Database are changeless or convertible. Convertible CLOB columns occur when migrating from a single-byte character set to a multibyte character set. If there are any lossy cells found in CLOB columns in the data dictionary, then the script stops. The lossy CLOB columns may need to be specially handled; contact Oracle Support Services for more information.

Any table that belongs to the following schemas is considered to be part of the data dictionary:

SYS

SYSTEM

CTXSYS

DIP

DMSYS

EXFSYS

LBACSYS

MDSYS

ORDDATA

ORDPLUGINS

ORDSYS

SI_INFORMTN_SCHEMA

XDB

It checks whether all CLOB columns in the Sample Schemas created by Oracle Database are changeless or convertible. Convertible CLOB columns occur when migrating from a single-byte character set to a multibyte character set. The tables that belong to the following schemas are part of the Sample Schemas:

HR

OE

SH

PM

It checks whether the CLOB dataype is the only data type that contains convertible data in the data dictionary and Sample Schemas. It checks that all other users' tables have no convertible data for all data types including the CLOB data type. Because the CSALTER script is meant to be run only when the current database is a proper subset of the new database, all data should be changeless with the possible exception of the CLOB data. When migrating from a single-byte character set to a multibyte character set, user-created CLOB data requires conversion and must first be exported and deleted from the schema. The database must be rescanned in order to run the CSALTER script. Cells of all other data types that are reported to be convertible or subject to truncation must be corrected before the Database Character Set Scanner is rerun.
See Also:
- "Subsets and Supersets"
- "Migrating a Character Set Using the CSALTER Script" for more information about the CSALTER script and CLOB data

Updating Phase of the CSALTER Script

After the CSALTER script confirms that every CLOB in the data dictionary passes the checks described in "Checking Phase of the CSALTER Script", the CSALTER script performs the conversion. After all CLOB data in the data dictionary and the Sample Schemas have been updated, the script commits the change and saves the information in the CSM$TABLES view. After all CLOB data in the data dictionary have been updated, the CSALTER script updates the database metadata to the new character set. The entire migration process is then completed.

The CSALTER script is resumable. If the update of the database to the new character set fails at any time, then the DBA must shut down and restart the database and rerun the CSALTER script before doing anything else. Because the updated information is already saved in the CSM$TABLES view, the script will not update the CLOB data in the data dictionary tables twice. The process of migration is simply resumed to finish the update of the database to the new character set.

If the CSALTER script fails, then use the following method to resume the update:

From the SQL*Plus session where the CSALTER script was run, enter the following command immediately:

SHUTDOWN ABORT

Start up the database and open it, because CSALTER requires an open database.

STARTUP OPEN

Run the CSALTER script:

@@CSALTER.PLB

Shut down the database with either the IMMEDIATE or the NORMAL option.
Start up the database.

Database Character Set Scanner Views

The Database Character Set Scanner uses the following views:

CSMV$COLUMNS
CSMV$CONSTRAINTS
CSMV$ERRORS
CSMV$INDEXES
CSMV$TABLES

CSMV$COLUMNS

This view contains statistical information about columns that were scanned.

Column Name	Data Type	NULL	Description
`OWNER_ID`	`NUMBER`	`NOT NULL`	User ID of the table owner
`OWNER_NAME`	`VARCHAR2(30)`	`NOT NULL`	User name of the table owner
`TABLE_ID`	`NUMBER`	`NOT NULL`	Object ID of the table
`TABLE_NAME`	`VARCHAR2(30)`	`NOT NULL`	Object name of the table
`COLUMN_ID`	`NUMBER`	`NOT NULL`	Column ID
`COLUMN_INTID`	`NUMBER`	`NOT NULL`	Internal column ID (for abstract data types)
`COLUMN_NAME`	`VARCHAR2(30)`	`NOT NULL`	Column name
`COLUMN_TYPE`	`VARCHAR2(9)`	`NOT NULL`	Column data type
`TOTAL_ROWS`	`NUMBER`	`NOT NULL`	Number of rows in this table
`NULL_ROWS`	`NUMBER`	`NOT NULL`	Number of `NULL` data cells
`CONV_ROWS`	`NUMBER`	`NOT NULL`	Number of data cells that need to be converted
`ERROR_ROWS`	`NUMBER`	`NOT NULL`	Number of data cells that have exceptions
`EXCEED_SIZE_ROWS`	`NUMBER`	`NOT NULL`	Number of data cells that have truncations
`DATA_LOSS_ROWS`	`NUMBER`	-	Number of data cells that undergo lossy conversion
`MAX_POST_CONVERT_SIZE`	`NUMBER`	-	Maximum post-conversion data size

CSMV$CONSTRAINTS

This view contains statistical information about constraints that were scanned.

Column Name	Data Type	NULL	Description
`OWNER_ID`	`NUMBER`	`NOT NULL`	User ID of the constraint owner
`OWNER_NAME`	`VARCHAR2(30)`	`NOT NULL`	User name of the constraint owner
`CONSTRAINT_ID`	`NUMBER`	`NOT NULL`	Object ID of the constraint
`CONSTRAINT_NAME`	`VARCHAR2(30)`	`NOT NULL`	Object name of the constraint
`CONSTRAINT_TYPE#`	`NUMBER`	`NOT NULL`	Constraint type number
`CONSTRAINT_TYPE`	`VARCHAR2(11)`	`NOT NULL`	Constraint type name
`TABLE_ID`	`NUMBER`	`NOT NULL`	Object ID of the table
`TABLE_NAME`	`VARCHAR2(30)`	`NOT NULL`	Object name of the table
`CONSTRAINT_RID`	`NUMBER`	`NOT NULL`	Root constraint ID
`CONSTRAINT_LEVEL`	`NUMBER`	`NOT NULL`	Constraint level

CSMV$ERRORS

This view contains individual exception information for cell data and object definitions.

Column Name	Data Type	NULL	Description
`OWNER_ID`	`NUMBER`	`NOT NULL`	User ID of the table owner
`OWNER_NAME`	`VARCHAR2(30)`	`NOT NULL`	User name of the table owner
`TABLE_ID`	`NUMBER`	`NOT NULL`	Object ID of the table
`TABLE_NAME`	`VARCHAR2(30)`	`-`	Object name of the table
`COLUMN_ID`	`NUMBER`	`-`	Column ID
`COLUMN_INTID`	`NUMBER`	`-`	Internal column ID (for abstract data types)
`COLUMN_NAME`	`VARCHAR2(30)`	`-`	Column name
`DATA_ROWID`	`VARCHAR2(1000)`	`-`	The rowid of the data
`COLUMN_TYPE`	`VARCHAR2(9)`	`-`	Column data type of object type
`ERROR_TYPE`	`VARCHAR2(11)`	`-`	Type of error encountered

CSMV$INDEXES

This view contains individual exception information for indexes.

Column Name	Data Type	NULL	Description
`INDEX_OWNER_ID`	`NUMBER`	`NOT NULL`	User ID of the index owner
`INDEX_OWNER_NAME`	`VARCHAR2(30)`	`NOT NULL`	User name of the index owner
`INDEX_ID`	`NUMBER`	`NOT NULL`	Object ID of the index
`INDEX_NAME`	`VARCHAR2(30)`	`-`	Object name of the index
`INDEX_STATUS#`	`NUMBER`	`-`	Status number of the index
`INDEX_STATUS`	`VARCHAR2(8)`	`-`	Status of the index
`TABLE_OWNER_ID`	`NUMBER`	`-`	User ID of the table owner
`TABLE_OWNER_NAME`	`VARCHAR2(30)`	`-`	User name of the table owner
`TABLE_ID`	`NUMBER`	`-`	Object ID of the table
`TABLE_NAME`	`VARCHAR2(30)`	`-`	Object name of the table
`COLUMN_ID`	`NUMBER`	`-`	Column ID
`COLUMN_INTID`	`NUMBER`	`-`	Internal column ID (for abstract data types)
`COLUMN_NAME`	`VARCHAR2(30)`	`-`	Column name

CSMV$TABLES

This view contains information about database tables to be scanned. The Database Character Set Scanner enumerates all tables to be scanned into this view.

Column Name	Data Type	NULL	Description
`OWNER_ID`	`NUMBER`	`NOT NULL`	User ID of the table owner
`OWNER_NAME`	`VARCHAR2(30)`	`NOT NULL`	User name of the table owner
`TABLE_ID`	`NUMBER`	`-`	Object ID of the table
`TABLE_NAME`	`VARCHAR2(30)`	`-`	Object name of the table
`MIN_ROWID`	`VARCHAR2(18)`	`-`	Minimum rowid of the split range of the table
`MAX_ROWID`	`VARCHAR2(18)`	`-`	Maximum rowid of the split range of the table
`BLOCKS`	`NUMBER`	`-`	Number of blocks in the split range
`SCAN_COLUMNS`	`NUMBER`	`-`	Number of columns to be scanned
`SCAN_ROWS`	`NUMBER`	`-`	Number of rows to be scanned
`SCAN_START`	`VARCHAR2(8)`	`-`	Time table scan started
`SCAN_END`	`VARCHAR2(8)`	`-`	Time table scan completed

Database Character Set Scanner Error Messages

The Database Character Set Scanner has the following error messages:

CSS-00100: failed to allocate memory size of number (%d): Cause: An attempt was made to allocate memory with size 0 or a value larger than the maximum allowed.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00101: failed to release memory: Cause: An attempt was made to release memory with an invalid pointer.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00102: failed to release memory, null pointer given: Cause: An attempt was made to release memory with a null pointer.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00105: failed to parse BOUNDARIES parameter: Cause: BOUNDARIES parameter was specified in an invalid format.; Action: Refer to the documentation for the correct syntax.

CSS-00106: failed to parse SPLIT parameter: Cause: SPLIT parameter was specified in an invalid format.; Action: Refer to the documentation for the correct syntax.

CSS-00107: Character set migration utility schema not installed: Cause: CSM$VERSION table not found in the database.; Action: Run CSMINST.SQL on the database.

CSS-00108: Character set migration utility schema not compatible: Cause: Incompatible CSM$* tables found in the database.; Action: Run CSMINST.SQL on the database.

CSS-00110: failed to parse USERID: Cause: USERID parameter was specified in an invalid format.; Action: Refer to the documentation for the correct syntax.

CSS-00111: failed to get RDBMS version: Cause: Failed to retrieve the value of the version of the database.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00112: database version not supported: Cause: The database version was older than release 8.0.5.0.0.; Action: Upgrade the database to release 8.0.5.0.0 or later. Then try again.

CSS-00113: user username (%s) is not allowed to access data dictionary: Cause: The specified user cannot access the data dictionary.; Action: Set O7_DICTIONARY_ACCESSIBILITY parameter to TRUE, or use SYS user.

CSS-00114: failed to get database character set name characterset_name (%s): Cause: Failed to retrieve value of NLS_CHARACTERSET or NLS_NCHAR_CHARACTERSET parameter from NLS_DATABASE_PARAMETERS view.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00115: invalid character set name characterset_name (%s): Cause: The specified character set was not a valid Oracle character set.; Action: Retry with a valid Oracle character set name.

CSS-00116: failed to reset NLS_LANG or NLS_NCHAR parameter: Cause: Failed to force NLS_LANG character set to be the same as the database character set.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00117: failed to clear previous scan log: Cause: Failed to delete all rows from CSM$* tables.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00118: failed to save command parameters: Cause: Failed to insert rows into CSM$PARAMETERS table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00119: failed to save scan start time: Cause: Failed to insert a row into CSM$PARAMETERS table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00120: failed to enumerate tables to scan: Cause: Failed to enumerate tables to scan into CSM$TABLES table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00121: failed to save scan complete time: Cause: Failed to insert a row into CSM$PARAMETERS table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00122: failed to create scan report: Cause: Failed to create database scan report.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00123: failed to check if user username (%s) exists: Cause: SELECT statement that checks if the specified user exists in the database failed.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00124: user username (%s) not found: Cause: The specified user does not exist in the database.; Action: Check to make sure that the the user name exists and/or is spelled correctly.

CSS-00125: failed to check if table username.tablename (%s.%s) not found: Cause: SELECT statement that checks if the specified table exists in the database failed.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00126: table username.tablename (%s.%s) not found: Cause: The specified table does not exist in the database.; Action: Check the user name and table name for correctness.

CSS-00127: user username (%s) does not have DBA privilege: Cause: The specified user did not have DBA privileges, which are required to scan the database.; Action: Choose a user with DBA privileges.

CSS-00128: failed to get server version string: Cause: Failed to retrieve the version string of the database.; Action: None.

CSS-00129: query cannot be used in ALL or USER scan mode: Cause: The query parameter should not be used with FULL Database Scan and User Scan mode.; Action: Remove the query parameter.

CSS-00130: failed to initialize semaphore: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00131: failed to spawn scan process number (%d): Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00132: failed to destroy semaphore: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00133: failed to wait semaphore: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00134: failed to post semaphore: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00140: failed to scan table (tid=number (%d), oid=number (%d)): Cause: Data scan on specified table failed.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00141: failed to save table scan start time: Cause: Failed to update a row in the CSM$TABLES table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00142: failed to get table information: Cause: Failed to retrieve various information from userID and object ID of the table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00143: failed to get column attributes: Cause: Failed to retrieve column attributes of the table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00144: failed to scan table username.tablename (%s.%s): Cause: Data scan on specified table failed.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00145: failed to save scan result for columns: Cause: Failed to insert rows into CSM$COLUMNS table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00146: failed to save scan result for table: Cause: Failed to update a row of CSM$TABLES table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00147: unexpected data truncation: Cause: Scanner allocated the exactly same size of memory as the column byte size for fetch buffer, resulting in unexpected data truncation.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00150: failed to enumerate table: Cause: Failed to retrieve the specified table information.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00151: failed to enumerate user tables: Cause: Failed to enumerate all tables that belong to the specified user.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00152: failed to enumerate all tables: Cause: Failed to enumerate all tables in the database.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00153: failed to enumerate character type columns: Cause: Failed to enumerate all CHAR, VARCHAR2, LONG, and CLOB columns of tables to scan.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00154: failed to create list of tables to scan: Cause: Failed to enumerate the tables into CSM$TABLES table.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00155: failed to split tables for scan: Cause: Failed to split the specified tables.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00156: failed to get total number of tables to scan: Cause: SELECT statement that retrieves the number of tables to scan failed.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00157: failed to retrieve list of tables to scan: Cause: Failed to read all table IDs into the scanner memory.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00158: failed to retrieve index defined on column: Cause: SELECT statement that retrieves index defined on the column failed.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00160: failed to open summary report file: Cause: FILE OPEN function returned error.; Action: Check if you have create or write privilege on the disk and check if the file name specified for the LOG parameter is valid.

CSS-00161: failed to report scan elapsed time: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00162: failed to report database size information: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00163: failed to report scan parameters: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00164: failed to report scan summary: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00165: failed to report conversion summary: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00166: failed to report conversion per table and column: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00167: failed to open exception report file: Cause: FILE OPEN function returned error.; Action: Check if you have create or write privilege on the disk and check if the file name specified for LOG parameter is valid.

CSS-00168: failed to report individual exceptions: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00170: failed to retrieve size of tablespace tablespacename (%s): Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00171: failed to retrieve free size of tablespace tablespacename (%s): Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00172: failed to retrieve total size of tablespace tablespacename (%s): Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00173: failed to retrieve used size of the database: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00174: scanner capture data not available: Cause: Unknown.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00175: TOCHAR or TONCHAR parameter not specified: Cause: TOCHAR or TONCHAR parameter was not specified.; Action: Specify the TOCHAR or TONCHAR parameter.

CSS-00176: failed to enumerate user tables in bitmapped tablespace: Cause: Failed to enumerate tables in bitmapped tablespace.; Action: This is an internal error. Contact Oracle Support Services.

CSS-00177: failed to retrive max LOB size: Cause: Unknown; Action: This is an internal error. Contact Oracle Support Services.

CSS-00178: failed to check if column username.tablename.columnname(%s.%s.%s) exists: Cause: SELECT statement that checks if the specified column exists in the database failed; Action: This is an internal error. Contact Oracle Support Services.

CSS-00179: column %s.%s.%s not found: Cause: The specified column did not exist in the database; Action: Check the user name, table name, and column name.

CSS-00180: failed to check the index on the csm$columns table: Cause: SELECT statement that checks if the index IDX_CSM$COLUMNS exists failed; Action: This is an internal error. Contact Oracle Support Services.

CSS-00181: failed to drop index on the csm$columns table: Cause: The SQL Statement to drop the index IDX_CSM$COLUMNS failed; Action: This is an internal error. Contact Oracle Support Services.