Oracle® Text Reference 11g Release 2 (11.2) Part Number E10944-01 |
|
|
View PDF |
This appendix contains a list of the document formats supported by the automatic (AUTO_FILTER
) filtering technology. The following topics are covered in this appendix:
The automatic filtering technology in Oracle Text uses Oracle Outside in HTML Export technology. This technology also enables you to convert documents to HTML for document presentation with the CTX_DOC
package.
To use automatic filtering for indexing and DML processing, you must specify the AUTO_FILTER
object in your filter preference.
To use automatic filtering technology for converting documents to HTML with the CTX_DOC
package, you need not use the AUTO_FILTER
indexing preference, but you must still set up your environment to use this filtering technology, as described in this appendix.
Note:
The underlying technology used by Oracle Text was migrated to Oracle Outside in HTML Export in release 11.1.0.7. See "Formats No Longer Supported in 11.1.0.7" for a list of formats that are no longer supported as a result of this migration. Applications that require support for those formats can useUSER_FILTER
to plug in third-party filtering technology supporting those formats. See "USER_FILTER" for more information.The supported platforms and formats listed in this appendix apply for this release. These supported formats are updated for patch releases. To view the latest formats, refer to the Oracle Technology Network:
http://www.oracle.com/technology/products/text
Password-protected documents and documents with password-protected content are not supported by the AUTO_FILTER
filter.
For other limitations, refer to sections in this chapter concerning specific document types.
Several platforms can take advantage of AUTO_FILTER
filter technology.
AUTO_FILTER
filter technology is supported on the following platforms:
Windows (x86 32-bit) Windows 2000, Windows 2003, Windows XP, and Windows Vista
Windows (Itanium 64-bit) Windows .Net Server 2003 Enterprise Edition
Windows (x86 64-bit) Windows 2003 x64 Standard, Enterprise, and Datacenter Editions (64-bit Extended Systems)
HP-UX (PA-RISC 64-bit) 11.i
HP/UX (Itanium 64) 11i
IBM AIX (32-bit pSeries) 5.1 - 5.3
iSeries (OS/400 using PASE) V5R2
Red Hat Linux (x86) Advanced Server 3, 4, and 5
Red Hat Linux (x86) Red Hat Enterprise Linux (RHEL) 4
Red Hat Linux (Itanium 64) Advanced Server 3, 4, and 5
Red Hat Linux (zSeries, 31-bit) Advanced Server 3 and 4
Red Hat Enterprise Linux AS/ES 3.0, 4.0 and 5.0, x86-64 (AMD64/EM64T)Oracle Enterprise Linux 4.0 and 5.0, x86-64 (AMD64/EM64T)
SuSE Linux (x86) 9, 10, and Enterprise Server 9.0
SuSE Linux (x86 64-bit) SUSE Enterprise Server (SLES) 9, 10
SuSE Linux (Itanium 64) Enterprise Server 8
SuSE Linux (zSeries, 31-bit) 9
Sun Solaris (SPARC 64-bit) 9.x - 10.x
Sun Solaris (x86-64-bit) 10x
Note that some of these platforms may not be supported by the Oracle Database.
A PDF document can have different levels of security settings as follows:
Table B-1 AUTO_FILTER Behavior with PDF Security Settings
Security Setting Level | Description | AUTO_FILTER Behavior |
---|---|---|
Level 1 |
Requires a password for opening the document. |
Unable to filter. Doc-level error will be raised. |
Level 2 |
Disallows user printing of the document. |
|
Level 3 |
Disallows user modification or change of the document. |
|
Level 4 |
Disallows the user from copying or extracting content from the document. |
|
The following limitations apply when filtering PDF files:
Multi-byte PDFs are supported, provided the PDF document is created using Character ID-keyed (CID) fonts, predefined CJK CMap files, or ToUnicode
font encodings, and the document does not contain embedded fonts.
Embedded fonts in a PDF document are not filtered correctly. They are usually displayed using the question mark (?) replacement character.
Hyperlinks in a PDF are not active when displayed in a browser or a viewing window.
Annotations, such as notes, sound, or movies, are not supported.
The tables in this section list the document formats that Oracle Text supports for filtering.
Document filtering is used for indexing, DML, and for converting documents to HTML with the CTX_DOC
package.
Note:
These lists do not represent the complete list of formats that Oracle Text is able to process. TheUSER_FILTER
and PROCEDURE_FILTER
enable Oracle Text to process any document format, provided an external filter exists that can filter to some textual format like plain-text, HTML, XML, and so forth.Format | Version |
---|---|
Adobe FrameMaker (MIF) | Versions 3.0, 4.0, 5.0, and 6.0 and Japanese 3.0, 4.0, 5.0, and 6.0 (text only) |
ANSI Text | 7 and 8 bit |
ASCII Text | 7 and 8 bit |
DEC WPS Plus (DX) | Versions through 3.1 |
DEC WPS Plus (WPL) | Versions through 4.1 |
DisplayWrite 2 and 3 (TXT) | All versions |
EBCDIC | All versions |
Enable | Versions 3.0, 4.0, and 4.5 |
First Choice | Versions through 3.0 |
Framework | Version 3.0 |
Hangul | Versions 97, 2002, and 2005 |
IBM FFT | All versions |
IBM Revisable Form Text | All versions |
IBM Writing Assistant | Version 1.01 |
Just System Ichitaro | Versions 4.x through 6.x, 8.x through 13.x and 2004 |
JustWrite | Versions through 3.0 |
Legacy | Versions 1.1 |
Lotus AMI/AMI Professional | Versions 3.1 |
Lotus Manuscript | Version 2.0 |
Lotus Word Pro (non-Windows) | Versions SmartSuite 97, Millennium, and Millennium 9.6 (text only) |
Lotus Word Pro (Windows) | Versions SmartSuite 96, 97, and Millennium and Millennium 9.6 |
MacWrite II | Version 1.1 |
MASS11 | Versions through 8.0 |
Microsoft Rich Text Format (RTF) | All versions |
Microsoft Word (DOS) | Versions through 6.0 |
Microsoft Word (Mac) | Versions 4.0 - 2004 |
Microsoft Word (Windows) | Versions through 2007 |
Microsoft WordPad | All versions |
Microsoft Works (DOS) | Versions through 2.0 |
Microsoft Works (Mac) | Versions through 2.0 |
Microsoft Works (Windows) | Versions through 4.0 |
Microsoft Windows Write | Versions through 3.0 |
MultiMate | Versions through 4.0 |
Navy DIF | All versions |
Nota Bene | Version 3.0 |
Novell Perfect Works | Version 2.0 |
Novell/Corel WordPerfect (DOS) | Versions through 6.1 |
Novell/Corel WordPerfect (Mac) | Versions 1.02 through 3.0 |
Novell/Corel WordPerfect (Windows) | Versions through 12.0 |
Office Writer | Versions 4.0 - 6.0 |
OpenOffice Writer (Windows and UNIX) | OpenOffice version 1.1 and 2.0 |
PC-File Letter | Versions through 5.0 |
PC-File+ Letter | Versions through 3.0 |
PFS:Write | Versions A, B, and C |
Professional Write Plus (Windows) | Version 1.0 |
Q&A (DOS) | Version 2.0 |
Q&A Write (Windows) | Version 3.0 |
Samna Word | Versions through Samna Word IV+ |
Signature | Version 1.0 |
SmartWare II | Version 1.02 |
Sprint | Versions through 1.0 |
StarOffice Writer | Version 5.2 (text only) and 6.x through 8.x |
Total Word | Version 1.2 |
Unicode Text | All versions |
UTF-8 | All versions |
Volkswriter 3 and 4 | Versions through 1.0 |
Wang PC (IWP) | Versions through 2.6 |
WordMARC | Versions through Composer Plus |
WordStar (Windows) | Version 1.0 |
WordStar 2000 (DOS) | Versions through 3.0 |
XyWrite | Versions through III Plus |
Format | Version |
---|---|
Enable | Versions 3.0, 4.0, and 4.5 |
First Choice | Versions through 3.0 |
Framework | Version 3.0 |
Lotus 1-2-3 (DOS & Windows) | Versions through 5.0 |
Lotus 1-2-3 (OS/2) | Versions through 2.0 |
Lotus 1-2-3 Charts (DOS & Windows) | Versions through 5.0 |
Lotus 1-2-3 for SmartSuite | Versions 97 - Millennium 9.6 |
Lotus Symphony | Versions 1.0, 1.1, and 2.0 |
Mac Works | Version 2.0 |
Microsoft Excel Charts | Versions 2.x - 7.0 |
Microsoft Excel (Mac) | Versions 3.0 - 4.0, 98, 2001, 2002, 2004, and v.X |
Microsoft Excel (Windows) | Versions 2.2 through 2007 |
Microsoft Multiplan | Version 4.0 |
Microsoft Works (Windows) | Versions through 4.0 |
Microsoft Works (DOS) | Versions through 2.0 |
Microsoft Works (Mac) | Versions through 2.0 |
Mosaic Twin | Version 2.5 |
Novell Perfect Works | Version 2.0 |
PFS:Professional Plan | Version 1.0 |
Quattro Pro (DOS) | Versions through 5.0 (text only) |
Quattro Pro (Windows) | Version through 12.0 (text only) |
SmartWare II | Version 1.02 |
StarOffice/OpenOffice Calc (Windows and UNIX) | StarOffice versions 5.2 (text only) through 8.x and OpenOffice version 1.1 and 2.0 |
SuperCalc 5 | Version 4.0 |
VP Planner 3D | Version 1.0 |
Format | Version |
---|---|
Corel/Novell Presentations | Versions through 12.0 |
Harvard Graphics (DOS) | Versions 2.x and 3.x |
Harvard Graphics (Windows) | Windows versions |
Freelance (Windows) | Versions through Millennium 9.6 |
Freelance (OS/2) | Versions through 2.0 |
Microsoft PowerPoint (Windows) | Versions 3.0 through 2007 |
Microsoft PowerPoint (Mac) | Versions 4.0 through v.x |
StarOffice/OpenOffice Impress (Windows and UNIX) | StarOffice versions 5.2 (text only) and 6.x through 8.x (full support) and OpenOffice version 1.1 and 2.0 (text only) |
Format | Version |
---|---|
Access | Versions through 2.0 |
dBASE | Versions through 5.0 |
DataEase | Version 4.x |
dBXL | Version 1.3 |
Enable | Versions 3.0, 4.0, and 4.5 |
First Choice | Versions through 3.0 |
FoxBase | Version 2.1 |
Framework | Version 3.0 |
Microsoft Works (Windows) | Versions through 4.0 |
Microsoft Works (DOS) | Versions through 2.0 |
Microsoft Works (Mac) | Versions through 2.0 |
Paradox (DOS) | Versions through 4.0 |
Paradox (Windows) | Versions through 1.0 |
Personal R:BASE | Version 1.0 |
R:BASE 5000 | Versions through 3.1 |
R:BASE System V | Version 1.0 |
Reflex | Version 2.0 |
Q & A | Versions through 2.0 |
SmartWare II | Version 1.02 |
When filtering an archive file, all the contents of the files inside the archive will be exported to a single output file. This will also include the contents of all subfolders and files inside the archive file.
Table B-2 lists the archive formats that Oracle supports.
Format | Version |
---|---|
Microsoft Outlook Folder (PST) | Microsoft Outlook Folder and Microsoft Outlook Offline Folder files versions 97, 98, 2000, 2002, 2003, and 2007 |
Microsoft Outlook Message (MSG) | Microsoft Outlook Message and Microsoft Outlook Form Template versions 97, 98, 2000, 2002, 2003, and 2007 |
MIME | MIME-encoded mail messages. |
The following formats are supported:
MIME formats
EML
MHT (Web Archive)
NWS (Newsgroup single-part and multi-part)
Simple Text Mail (defined in RFC 2822)
TNEF format
MIME encodings, including
base64 (defined in RFC 1521)
binary (defined in RFC 1521)
binhex (defined in RFC 1741)
btoa
quoted-printable (defined in RFC 1521)
utf-7 (defined in RFC 2152)
uue
xxe
yenc
In addition, the body of a message can be encoded in several ways. The following encodings are supported:
HTML
RTF
TNEF
Text/enriched (defined in RFC 1523)
Text/richtext (defined in RFC1341)
Embedded mail message (defined in RFC 822) - this is handled as a link to a new message
The attachments of a MIME message can be stored in many formats. Oracle Corporation processes all attachment types that its technology supports.
Format | Version |
---|---|
Executable (EXE, DLL) | |
HTML | Versions through 3.0, with some limitations |
MacroMedia Flash | Macromedia Flash 6.x, MacroMedia Flash 7.x, and MacroMedia Flash Lite (text only) |
Microsoft Project | Versions 98 - 2003 (text only) |
MP3 | ID3 information |
vCard, vCalendar | Version 2.1 |
Windows Executable | |
WML | Version 5.2 |
XML | Text only |
Yahoo Instant |
Table B-3 lists the graphic formats that the AUTO_FILTER
filter recognizes. This means that indexing a text column that contains any of these formats produces no error. As such, it is safe for the column to contain any of these formats.
Formats are categorized as either embedded graphics or standalone graphics. Embedded graphics are inserted or referenced within a document.
Note:
TheAUTO_FILTER
filter cannot extract textual information from graphics.Table B-3 Supported Graphics Formats for AUTO_FILTER Filter
Graphics Format | Version |
---|---|
Adobe Photoshop (PSD) |
Version 4.0 |
Windows Icon Cursor (ICO) |
no specific version |
Adobe Photoshop (PSD) |
Version 4.0 |
Adobe Illustrator |
Versions 7.0 and 9.0 |
Adobe FrameMaker graphics (FMV) |
Vector/raster through 5.0 |
Adobe Acrobat (PDF) |
Versions 1.0, 2.1, 3.0, 4.0, 5.0, 6.0, and 7.0 (including Japanese PDF) |
Ami Draw (SDW) |
Ami Draw |
AutoCAD Interchange and Native Drawing formats (DXF and DWG) |
AutoCAD Drawing Versions 2.5 - 2.6, 9.0-14.0, 2000i and 2002 |
AutoShade Rendering (RND) |
Version 2.0 |
Binary Group 3 Fax |
All versions |
Bitmap (BMP, RLE, ICO, CUR, OS/2 DIB, and WARP) |
All versions |
CALS Raster (GP4) |
Type I and Type II |
Corel Clipart format (CMX) |
Versions 5 through 6 |
Corel Draw (CDR) |
Versions 3.x - 8.x |
Corel Draw (CDR with TIFF header) |
Versions 2.x - 9.x |
Computer Graphics Metafile (CGM) |
ANSI, CALS NIST version 3.0 |
Encapsulated PostScript (EPS) |
TIFF header only |
GEM Paint (IMG) |
All versions |
Graphics Environment Mgr (GEM) |
Bitmap and vector |
Graphics Interchange Format (GIF) |
All versions |
Hewlett Packard Graphics Language (HPGL) |
Version 2.0 |
IBM Graphics Data Format (GDF) |
Version 1.0 |
IBM Picture Interchange Format (PIF) |
Version 1.0 |
Initial Graphics Exchange Spec (IGES) |
Version 5.1 |
JBIG2 |
JBIG2 graphic embeddings in PDF files |
JFIF (JPEG not in TIFF format) |
All versions |
JPEG (including EXIF) |
All versions |
Kodak Flash Pix (FPX) |
All versions |
Kodak Photo CD (PCD) |
Version 1.0 |
Lotus PIC |
All versions |
Lotus Snapshot |
All versions |
Macintosh PIC1 and PICT2 |
Bitmap only |
MacPaint (PNTG) |
All versions |
Micrografx Draw (DRW) |
Versions through 4.0 |
Micrografx Designer (DRW) |
Versions through 3.1 |
Micrografx Designer (DFS) |
Windows 95, version 6.0 |
Novell PerfectWorks (Draw) |
Version 2.0 |
OS/2 PM Metafile (MET) |
Version 3.0 |
Paint Shop Pro 6 (PSP) |
Windows only, versions 5.0 - 6.0 |
PC Paintbrush (PCX and DCX) |
All versions |
Portable Bitmap (PBM) |
All versions |
Portable Graymap (PGM) |
No specific version |
Portable Network Graphics (PNG) |
Version 1.0 |
Portable Pixmap (PPM) |
No specific version |
Postscript (PS) |
Levels 1-2 |
Progressive JPEG |
No specific version |
Sun Raster (SRS) |
No specific version |
StarOffice/OpenOffice Draw for Windows and UNIX |
StarOffice versions 5.2 (text only) through 8.x and OpenOffice version 1.1 and 2.0 |
TIFF |
Versions through 6 |
TIFF CCITT Group 3 and 4 |
Versions through 6 |
Truevision TGA (TARGA) |
Version 2 |
Visio (preview) |
Version 4 |
Visio |
Versions 5, 2000, 2002, and 2003 |
WBMP |
No specific version |
Windows Enhanced Metafile (EMF) |
No specific version |
Windows Metafile (WMF) |
No specific version |
WordPerfect Graphics (WPG and WPG2) |
Versions through 2.0 |
X-Windows Bitmap (XBM) |
x10 compatible |
X-Windows Dump (XWD) |
x10 compatible |
X-Windows Pixmap (XPM) |
x10 compatible |
Certain document formats are not supported if you upgrade from release 11.1.0.6 to 11.1.0.7. This is because Oracle Text filtering technology has been migrated to Oracle Outside in HTML Export technology. To filter these unsupported formats, you can plug in a third party filtering technology using USER_FILTER
. See "USER_FILTER" for more information.
Table B-4 lists the formats supported in release 11.1.0.6, but not in 11.1.0.7.
Table B-4 Formats Supported in Release 11.1.0.6 and not in 11.1.0.7
Format | Versions |
---|---|
Word Processing Formats |
|
Applix Words (AW) |
3.11, 4.0, 4.1, 4.2, 4.3, 4.4 |
JustSystems Ichitaro (JTD) |
2005 |
Folio Flat File (FFF) |
3.1 |
Fujitsu Oasys (OA2) |
7 |
Lotus Word Pro (LWP) |
9.7, 9.8 |
WordPerfect for Linux |
All versions |
Desktop Publishing Formats |
|
Adobe Framemaker (MIF) |
7 |
Spreadsheet Formats |
|
Applix Spreadsheets (AS) |
4.2, 4.3, 4.4 |
Lotus 1-2-3 (123) |
Millennium Edition R9, 9.8 |
Microsoft Works Spreadsheet (DOS) |
3.4 |
Microsoft Works Spreadsheet (Mac) |
3.4 |
Comma-Separated Values (SCV) |
N/A |
Presentation Formats |
|
Applix Presents (AG) |
4.0, 4.2, 4.3, 4.4 |
Lotus Freelance Graphics (PRE) |
Millennium Edition R9, 9.8 |
Microsoft Visio XML Format |
2003 |
Graphic Formats |
|
SGI RGB Image (RGB) |
No specific version |
Windows Animated Cursor (ANI) |
No specific version |
WordPerfect Graphics 2 (WPG2) |
7 |
Microsoft Office Drawing (MSO) |
No specific version |
Windows Icon Cursor (ICO) |
No specific version |