Close

GoSWISH/2: Help for Creating a GoSWISH/2 index

GoSWISH/2 Information

GoSWISH/2 Index Name
The optional index name is used to store your selected options. In the future, you can regenerate the SWISH index (and description index) by choosing this index name.

If you do not specify an index name, GoSWISH/2 will automatically generate one for you.

GoSWISH/2 Index Title
An optional title. This can be used to briefly describe what is being indexed.

GoSWISH/2 Index Description
An optional description. This can be used to describe what is being indexed, and how it is being indexed (that is, what options you've selected).

Indexing options

Directories to index
You can enter a space-delimited list of relative directories to index -- GoSWISH/2 will instruct SWISH to index files in these directories (and in subdirectories of these directories).


Examples: DIRX or PHYSICS/EXPER/FEB98 or /JOKES/OLD

Relative directories are assumed to be subdirectories of the possibly host-specfic a default data directory.

More examples:(assume that the default data directory is D:\WWW).

  • If:   directory to index = /samples
    then files in D:\WWW\SAMPLES (and it's subdirectories) will be indexed
  • If:   directory to index = cars/pickups
    then files in D:\WWW\CARS\PICKUPS (and it's subdirectories) will be indexed
Notes:
  • Swish will index the subdirectories of each directory that you enter. To suppress indexing of files in subdirectories, you can use the * (wildcard) character as a filename.
    For example:
  • samples/ means all files in samples/, and in subdirectories of samples
  • samples/* means all files in samples/ but NOT in subdirectories of samples/
  • samples/foo*.* means all files that match foo*.* in samples/ but NOT in subdirectories of samples/
  • Leading and trailing / (or \) characters in a relative directory entry can be omitted (they will be added and converted as need be).

The SWISH index file
The main purpose of this form is to tell GoSWISH/2 how to create a SWISH index file. SWISH index files contain optimized lookup information, which are used by GoSWISH/2's "search" mode.

These indices are stored in the GoSWISH/2's INDEX subdirectory (say, x:\sre2003\srehttp2\addon\goswish2\index).

Search form document
GoSwish/2 will generate a search form document that references the SWISH index you are about to create. This HTML document can be used as a front-end for the SWISH search engine. In other words, this document can be used as-is (though you might want to customize it) by clients interested in searching the directories you are about to index!

There are several options for this entry:

  1. A file name: a file (of this name) will be created in the possibly host-specific default data directory.
    Example: SEARCHME.HTML
  2. A file and a relative path: a file (of this name) will be created in the subdirectory under the possibly host-specific default data directory.
    Example: CANDY/CHOCOLATE/SEARCHME.HTML
  3. A fully qualified file name, and a URL that points to it (seperated by a space): The fully qualified file name will be created. You must also include this URL -- it's used to form a link (to this search form) in GoSWISH/2's response to you.
    Example: D:\WWW\GIANTS\FOOBAR.HTM   /GIANTS/FOOBAR.HTM
  4. A subdirectory name that ends with a / : a randomly derived name will be created in this subdirectory of the possibly host-specific data directory.
    Example: /_GOSWISH2/
  5. Or, you can leave this field blank, in which case a randomly generated name will be created in the possibly host-specific default data directory.

Replacement rules
The exact syntax of this replace rules field is:
    DIR_PREFIX  url_prefix , [ DIR_PREFIX url_prefix]
where:
    DIR_PREFIX: a drive and directory 
    url_prefix: a sitename (with possible path information)
and where [ DIR_PREFIX url_prefix] 
   are optional repetitions of additional DIR_PREFIX, url_prefix pairs. 
Notes:
  • you should put each DIR_PREFIX url_prefix pair on a seperate line
  • the [ and ] should not be written
  • If you specify any replace rules, the default replace rules will not be generated.
  • Replace rules are case sensitive -- it's safest to enter all capital letters for the "DIR_PREFIX", and use capital letters in you "directories to index".
  • When entering directories (in the DIR_PREFIX), use / characters (it's that unix'ey heritage of SWISH....).
Examples:
 *  D:/WWW  http://www.mysite.org/
 *  D:/WWW/SAMPLES http:/www.mysite.org/sampdir
 *  D:/WWW/SAMPLES  http://www.mysite.org/sampdir  
     D:/WWW/TEST http://www.mysite.org/test/ver1
MetaNames
The MetaNames field is used to define keyword classes based on <META> elements in HTML documents. When specified, words that fall within such elements will be indexed seperately.

That is: each of these entries defines a keyword, with values obtained from matching <META> elements in HTML documents. Words that appear in the content of such an element will not found by standard searches.

Instead, you can search under the appropriate MetaName keyword, using search strings with the syntax:    AMetaName=keywords
This means: search for "keywords" in the CONTENT field of <META NAME="AMetaName" ... > elements.

Search string example:
    description = (dogs or cats) not (lions and tigers)
This query will retrieve all the files in which the "description" is associated either with "dogs" or "cats" and that do not contain the words "lions" and "tigers", where "lions" and "tigers" are not associated to any meta name (that is, "lions" and "tigers" could appear anywhere in the file).

Document Properties
Document properties are defined (for HTML documents) by use of <META> tags. For example, the following META tags set the value of the "author" property to "Jefferson", and set the"description" property to "Blueprints for Monticello".

<meta name="author" content="Jefferson"> <meta name="description" content="Blueprints for Monticello ">

During search, you can tell GoSWISH/2 which of these property names to display.

For well organized sites, you can use properties instead of generating descriptive summaries

Generating descriptive summaries
When displaying "hits", GoSWISH/2 can also display a short descriptive summary of the files contents. To do this, a set of descriptive summaries must first be created (which are then stored in a seperate descriptive-summaries cache file).

GoSWISH/2 has two means of generating these descriptions: either by examining the contents of text files, or by explicitily defining a description in a directory-specific-description file. file.

The Read from directory-specific descriptions file option means: look for a descriptive summary in the directory-specific description file
The ... and generate if necessary option means: try the directory-specific-description file first; and if that doesn't work, try to generate a descriptive summary by examining the contents of the
More details....
  • By examining contents of a file
    GoSWISH/2 has two modes for generating descriptive summaries
  • By examining the HTML tags in HTML files: descriptive summaries are created for all HTML documents (as identified by the List of HTML document extensions). This descriptive summary is generated from META NAME="DESCRIPTION" elements in the HTML documents <HEAD> section, and from <H1> and <H2> elements.
  • By extracting the beginning of all text documents (including HTML documents): the first several lines of the text file are used as a descriptive summary. Note: all files that are identified as "files to index" are assumed to be text documents
  • From a directory-specific-description file
    Before attempting to generate a descriptive summary (from the contents of a file), the appropriate directory-specific-description file (typically named DESCRIBE.TXT) will first be checked.
  • The syntax of these files is:
     filename.ext  a description
     filenam2.ext  another description
     filenam3.ext  another description, this one 
      | on 2 lines (continuation of filenam3.ext)
    Note the use of | as a continuation character.

  • directory-specific means use the "description file" (i.e.; DESCRIBE.TXT) in the directory that contains the document.