Oracle® Text Application Developer's Guide 11g Release 2 (11.2) Part Number E10945-01 |
|
|
View PDF |
This appendix describes how to build a simple Web search application using the CONTEXT
index type, whether by writing your own code or by using the Oracle Text Wizard. The following topics are covered:
A common use of Oracle Text is to index HTML files on Web sites and provide search capabilities to users. The sample application in this appendix indexes a set of HTML files stored in the database and uses a Web server connected to Oracle Database to provide the search service.
This appendix describes two versions of the Web query application:
One using PL/SQL Server Pages (PSP)
One using Java Server Pages (JSP)
Both versions of these applications can be produced by means of a query application wizard, which produces the necessary code automatically.
You can view and download both the PSP and JSP application code, as well as the text query application wizard, at the Oracle Technology Network Web site:
http://www.oracle.com/technology/products/text
The text query application wizard Web page also contains complete instructions on how to use the wizard.
Figure A-1 shows what the JSP version of the text query application looks like. This application was created with the Oracle Text application wizard.
Figure A-2 shows the results of the text query.
Figure A-2 The Text Query Application with Results
The application returns links to documents containing the search term. Each document has four links:
The HTML link displays the document.
Graphics are not displayed in the filtered document. (You can see the source document for the first hit by looking at Figure 5-1, "Sample Document for Highlighting, Gisting, and Theme Extraction".)
The Highlight link displays the document with the search term highlighted. Figure 5-2, "Pet Highlighted in Pet Magnet Document" shows an example of highlighting.
The Theme link shows the top 50 themes associated with the document. Figure 5-3, "Query Application Displaying Document Themes" shows an example of theme extraction.
The Gist link displays a short summary of the document. Figure 5-4, "Query Application Presenting Document Gist" shows an example of this gisting feature.
This application is based on PL/SQL server pages. Figure A-3, "The PSP Web Application" illustrates how the browser calls the PSP-stored procedure on Oracle Database through a Web server.
This application has the following requirements:
Your Oracle Database (version 8.1.6 or higher) is up and running.
You have the Oracle PL/SQL gateway running
You have a Web server such as Apache up and running and correctly configured to send requests to Oracle Database.
This section describes how to build the PSP Web application.
You must create a text table to store your HTML files. This example creates a table called search_table
as follows:
create table search_table (tk numeric primary key, title varchar2(2000), text clob);
You must load the text table with the HTML files. This example uses the control file loader.ctl to load the files named in loader.dat. The SQL*Loader statement is as follows:
% sqlldr userid=scott/tiger control=loader.ctl
If you are using the text query wizard: The wizard produces a script to create an index. (See the instructions on the download Web page for the wizard.) Run that script.
If you are not using the wizard: Index the HTML files by creating a CONTEXT
index on the text column as follows. Because you are indexing HTML, this example uses the NULL_FILTER
preference type for no filtering and uses the HTML_SECTION_GROUP
type:
create index idx_search_table on search_table(text) indextype is ctxsys.context parameters ('filter ctxsys.null_filter section group CTXSYS.HTML_SECTION_GROUP');
The application must present selected documents to the user. To do so, Oracle Database must read the documents from the CLOB
in search_table
and output the result for viewing, This is done by calling procedures in the search_htmlservices package. The file search_htmlservices.sql must be compiled. You can do this at the SQL*Plus prompt:
SQL> @search_htmlservices.sql Package created.
The search page is invoked by calling search_html.psp from a browser. You compile search_html in Oracle Database with the loadpsp
command-line program:
% loadpsp -replace -user scott/tiger search_html.psp "search_html.psp": procedure "search_html" created.
See Also:
Oracle Database Application Developer's Guide - Fundamentals for more information about using PSPYou must configure your Web server to accept client PSP requests as a URL. Your Web server forwards these requests to Oracle Database and returns server output to the browser. Refer to Figure A-3.
You can use the Oracle WebDB Web listener or Oracle Application Server, which includes the Apache Web server. See your Web server documentation for more information.
You can access the query application from a browser using a URL. You configure the URL with your Web server. An example URL might look like:
http://server.example.com:7777/mypath/search_html
The application displays a query entry box in your browser and returns the query results as a list of HTML links, as shown in Figure A-1 and Figure A-2.
This section lists the code used to build the example Web application. It includes the following files:
This example shows a sample loader.ctl
file. It is used by sqlldr
to load the data file, loader.dat
.
LOAD DATA INFILE 'loader.dat' INTO TABLE search_table REPLACE FIELDS TERMINATED BY ';' (tk INTEGER, title CHAR, text_file FILLER CHAR, text LOBFILE(text_file) TERMINATED BY EOF)
This example shows a sample loader.dat
file. Each row contains three fields: a reference number for the document, a label (or "title"), and the name of the HTML document to load into the text column of search_table
. The file has been truncated for this example.
1; Pizza Shredder;Pizza.html 2; Refrigerator w/ Front-Door Auto Cantaloupe Dispenser;Cantaloupe.html 3; Self-Tipping Couch;Couch.html 4; Home Air Dirtier;Mess.html 5; Set of Pet Magnets;Pet.html 6; Esteem-Building Talking Pillow;Snooze.html . . . 28; Shaggy Found Inspiration For Success In Jamaica ;shaggy_found.html 29; Solar Flare Eruptions Likely ;solar_flare.html 30; Supersonic Plane Breaks Food Barrier ;food_barrier.html 31; SOUNDSCAN REPORT: Recipe for An Aspiring Top Ten;urban_groove_1.html . . .
set define off create or replace package search_htmlServices as procedure showHTMLDoc (p_id in numeric); procedure showDoc (p_id in varchar2, p_query in varchar2); end; / show errors; create or replace package body search_htmlServices as procedure showHTMLDoc (p_id in numeric) is v_clob_selected CLOB; v_read_amount integer; v_read_offset integer; v_buffer varchar2(32767); begin select text into v_clob_selected from search_table where tk = p_id; v_read_amount := 32767; v_read_offset := 1; begin loop dbms_lob.read(v_clob_selected,v_read_amount,v_read_offset,v_buffer); htp.print(v_buffer); v_read_offset := v_read_offset + v_read_amount; v_read_amount := 32767; end loop; exception when no_data_found then null; end; end showHTMLDoc; procedure showDoc (p_id in varchar2, p_query in varchar2) is v_clob_selected CLOB; v_read_amount integer; v_read_offset integer; v_buffer varchar2(32767); v_query varchar(2000); v_cursor integer; begin htp.p('<html><title>HTML version with highlighted terms</title>'); htp.p('<body bgcolor="#ffffff">'); htp.p('<b>HTML version with highlighted terms</b>'); begin ctx_doc.markup (index_name => 'idx_search_table', textkey => p_id, text_query => p_query, restab => v_clob_selected, starttag => '<i><font color=red>', endtag => '</font></i>'); v_read_amount := 32767; v_read_offset := 1; begin loop dbms_lob.read(v_clob_selected,v_read_amount,v_read_offset,v_buffer); htp.print(v_buffer); v_read_offset := v_read_offset + v_read_amount; v_read_amount := 32767; end loop; exception when no_data_found then null; end; exception when others then null; --showHTMLdoc(p_id); end; end showDoc; end; / show errors set define on
<%@ plsql procedure="search_html" %> <%@ plsql parameter="query" default="null" %> <%! v_results numeric := 0; %> <html> <head> <title>search_html Search </title> </head> <body> <% If query is null Then %> <center> <form method=post action="search_html"> <b>Search for: </b> <input type=text name="query" size=30> <input type=submit value=Search> </center> <hr> <% Else %> <p> <%! color varchar2(6) := 'ffffff'; %> <center> <form method=post action="search_html"> <b>Search for:</b> <input type=text name="query" size=30 value="<%= query %>"> <input type=submit value=Search> </form> </center> <hr> <p> <% -- select statement for doc in ( select /*+ FIRST_ROWS */ rowid, tk, title, score(1) scr from search_table where contains(text, query,1) >0 order by score(1) desc ) loop v_results := v_results + 1; if v_results = 1 then %> <center> <table border="0"> <tr bgcolor="#6699CC"> <th>Score</th> <th>Title</th> </tr> <% end if; %> <tr bgcolor="#<%= color %>"> <td> <%= doc.scr %>% </td> <td> <%= doc.title %> [<a href="search_htmlServices.showHTMLDoc?p_id= <%= doc.tk %>">HTML</a>] [<a href="search_htmlServices.showDoc?p_id= <%= doc.tk %>&p_query=<%= query %>">Highlight</a>] </td> </tr> <% if (color = 'ffffff') then color := 'eeeeee'; else color := 'ffffff'; end if; end loop; %> </table> </center> <% end if; %> </body></html>
Creating the JSP-based Web application involves most of the same steps as those used in building the PSP-based application (see "Building the Web Application"). You can use the same loader.dat
and loader.ctl
files. However, with the JSP-based application, you do not need to do the following:
Compile the search_htmlservices
package
Compile the search_html
PSP page with loadpsp
This application has the following requirements:
Your Oracle database (version 8.1.6 or higher) is up and running.
You have a Web server such as Apache up and running and correctly configured to send requests to Oracle Database.
This section lists the Java code used to build the example Web application. It includes the following files:
The code for this file was generated by the text query application wizard. (Some longer lines have been split to make the code easier to read.)
<%@ page import="java.sql.*, java.util.*, java.net.*, oracle.jdbc.*, oracle.jsp.dbutil.*" %> <%@ page contentType="text/html;charset=UTF-8" %> <% oracle.jsp.util.PublicUtil.setReqCharacterEncoding(request, "UTF-8"); %> <jsp:useBean id="name" class="oracle.jsp.jml.JmlString" scope ="request" > <jsp:setProperty name="name" property="value" param="query" /> </jsp:useBean> <% String connStr="jdbc:oracle:thin:@jsmith-pc.us.oracle.com:1521:zippy922"; java.util.Properties info=new java.util.Properties(); Connection conn = null; ResultSet rset = null; OracleCallableStatement callStmt = null; Statement stmt = null; String userQuery = null; String myQuery = null; URLEncoder myEncoder; int count=0; int loopNum=0; int startNum=0; if (name.isEmpty()) { %> <html> <title>Text Search</title> <body> <table width="100%"> <tr bgcolor="#336699"> <td><font face="arial, helvetica" align="left" color="#CCCC99" size=+2>Text Search</td> </tr> </table> <center> <form method = post> Search for: <input type=text name=query size = 30> <input type=submit value="Search"> </form> </center> </body> </html> <%} else { %> <html> <title>Text Search</title> <body text="#000000" bgcolor="#FFFFFF" link="#663300" vlink="#996633" alink="#ff6600"> <table width="100%"> <tr bgcolor="#336699"> <td><font face="arial, helvetica" align="left" color="#CCCC99" size=+2>Text Search</td> </tr> </table> <center> <form method = post action="TextSearchApp.jsp"> Search for: <input type=text name="query" value="<%=name.getValue() %>" size = 30> <input type=submit value="Search"> </form> </center> <% try { DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver() ); info.put ("user", "jsmith"); info.put ("password","hello"); conn = DriverManager.getConnection(connStr,info); stmt = conn.createStatement(); userQuery = request.getParameter("query"); myQuery = URLEncoder.encode(userQuery); String numStr = request.getParameter("sn"); if(numStr!=null) startNum=Integer.parseInt(numStr); String theQuery = translate(userQuery); callStmt =(OracleCallableStatement)conn.prepareCall("begin "+ "?:=ctx_query.count_hits(index_name=>'ULTRA_IDX1', "+ "text_query=>?"+ "); " + "end; "); callStmt.setString(2,theQuery); callStmt.registerOutParameter(1, OracleTypes.NUMBER); callStmt.execute(); count=((OracleCallableStatement)callStmt).getNUMBER(1).intValue(); if(count>=(startNum+20)){ %> <font color="#336699" FACE="Arial,Helvetica" SIZE=+1>Results <%=startNum+1%> - <%=startNum+20%> of <%=count%> matches <% } else if(count>0){ %> <font color="#336699" FACE="Arial,Helvetica" SIZE=+1>Results <%=startNum+1%> - <%=count%> of <%=count%> matches <% } else { %> <font color="#336699" FACE="Arial,Helvetica" SIZE=+1>No match found <% } %> <table width="100%"> <TR ALIGN="RIGHT"> <% if((startNum>0)&(count<=startNum+20)) { %> <TD ALIGN="RIGHT"> <a href="TextSearchApp.jsp?sn=<%=startNum-20 %>&query= <%=myQuery %>">previous20</a> </TD> <% } else if((count>startNum+20)&(startNum==0)) { %> <TD ALIGN="RIGHT"> <a href="TextSearchApp.jsp?sn=<%=startNum+20 %>&query=<%=myQuery %>">next20</a> </TD> <% } else if((count>startNum+20)&(startNum>0)) { %> <TD ALIGN="RIGHT"> <a href="TextSearchApp.jsp?sn=<%=startNum-20 %>&query= <%=myQuery %>">previous20</a> <a href="TextSearchApp.jsp?sn=<%=startNum+20 %>&query= <%=myQuery %>">next20</a> </TD> <% } %> </TR> </table> <% String ctxQuery = "select /*+ FIRST_ROWS */ rowid, 'TITLE', score(1) scr from 'ULTRA_TAB1' where contains('TEXT', '"+theQuery+"',1 ) > 0 order by score(1) desc"; rset = stmt.executeQuery(ctxQuery); String color = "ffffff"; String rowid = null; String fakeRowid = null; String[] colToDisplay = new String[1]; int myScore = 0; int items = 0; while (rset.next()&&items< 20) { if(loopNum>=startNum) { rowid = rset.getString(1); fakeRowid = URLEncoder.encode(rowid); colToDisplay[0] = rset.getString(2); myScore = (int)rset.getInt(3); items++; if (items == 1) { %> <center> <table BORDER=1 CELLSPACING=0 CELLPADDING=0 width="100%" <tr bgcolor="#CCCC99"> <th><font face="arial, helvetica" color="#336699">Score</th> <th><font face="arial, helvetica" color="#336699">TITLE</th> <th> <font face="arial, helvetica" color="#336699">Document Services</th> </tr> <% } %> <tr bgcolor="#FFFFE0"> <td ALIGN="CENTER"> <%= myScore %>%</td> <td> <%= colToDisplay[0] %> <td> </td> </tr> <% if (color.compareTo("ffffff") == 0) color = "eeeeee"; else color = "ffffff"; } loopNum++; } } catch (SQLException e) { %> <b>Error: </b> <%= e %><p> <% } finally { if (conn != null) conn.close(); if (stmt != null) stmt.close(); if (rset != null) rset.close(); } %> </table> </center> <table width="100%"> <TR ALIGN="RIGHT"> <% if((startNum>0)&(count<=startNum+20)) { %> <TD ALIGN="RIGHT"> <a href="TextSearchApp.jsp?sn=<%=startNum-20 %>&query= <%=myQuery %>">previous20</a> </TD> <% } else if((count>startNum+20)&(startNum==0)) { %> <TD ALIGN="RIGHT"> <a href="TextSearchApp.jsp?sn=<%=startNum+20 %>&query= <%=myQuery %>">next20</a> </TD> <% } else if((count>startNum+20)&(startNum>0)) { %> <TD ALIGN="RIGHT"> <a href="TextSearchApp.jsp?sn=<%=startNum-20 %>&query= <%=myQuery %>">previous20</a> <a href="TextSearchApp.jsp?sn=<%=startNum+20 %>&query= <%=myQuery %>">next20</a> </TD> <% } %> </TR> </table> </body></html> <%} %> <%! public String translate (String input) { Vector reqWords = new Vector(); StringTokenizer st = new StringTokenizer(input, " '", true); while (st.hasMoreTokens()) { String token = st.nextToken(); if (token.equals("'")) { String phrase = getQuotedPhrase(st); if (phrase != null) { reqWords.addElement(phrase); } } else if (!token.equals(" ")) { reqWords.addElement(token); } } return getQueryString(reqWords); } private String getQuotedPhrase(StringTokenizer st) { StringBuffer phrase = new StringBuffer(); String token = null; while (st.hasMoreTokens() && (!(token = st.nextToken()).equals("'"))) { phrase.append(token); } return phrase.toString(); } private String getQueryString(Vector reqWords) { StringBuffer query = new StringBuffer(""); int length = (reqWords == null) ? 0 : reqWords.size(); for (int ii=0; ii < length; ii++) { if (ii != 0) { query.append(" & "); } query.append("{"); query.append(reqWords.elementAt(ii)); query.append("}"); } return query.toString(); } %>