swish logo

Using GoSWISH to create &use SWISH Indices

SWISH is a search tool that allows clients to search your site. To use SWISH, you must:
  1. Create a SWISH index file, and possibly generate descriptive summaries, of your site (or of a portion of your site)
  2. Create an HTML document containing a FORM that will search this file
  3. Make this document available to the world
GoSWISH makes this easy. Just use one of the forms on this page (though step 3 is up to you!).

Several modes are supported:

  • Quick mode: useful if you are willing to accept the default parameters,
  • Regular: let's you set the most important parameters
  • Custom: allows you to set most of the SWISH parameters.
  • Expert: Create a configuration file and run SWISH from the command line!

  • Or, you can
  • Since GoSwish is designed to search files, the following forms have been crippled -- they won't work, but they should give you a flavor of the GoSwish look and feel.
    To see GoSwish in action, try the SREhttp/2 search engine

    Quick Mode

    Use this Quick Mode to create a search index, and the necessary tools for searching your site, using default options. All you need to do is specify the "web directory" to search (and, optionally, whether to generate "descriptive summaries")!

    Create a "default options" SWISH index, and a front end, for:
    You can also generate descriptive summaries for files included in this index (highly recommended if you want short summaries to be returned with each match):  Do NOT create
     Read from directory-specific descriptions file
     ... and generate if necessary


    Regular Mode

    The Regular Mode allows you to set the most important parameters, without burdening you with a multitude of SWISH parameters that you probably will never change.

    Directories to index:
    Create an index of all files in these directories (and in their subdirectories). You can enter relative directories. or fully qualified directories.
    Generate descriptive summaries:
     Do NOT create
     Read from directory-specific descriptions file
     ... and generate if necessary
    Along with the URL, TITLE (or filename) and relevancy score of matching documents, GoSWISH can also display a descriptive summary, that is generated from the contents of the document.
    Files to index:
    Only files with these these (case insensitive) extensions will be indexed. Both the file name, and the contents of the file will be indexed. To index all files, enter an *.
    Do not index contents of these files:
    Files with these (case insensitive) extensions will not have their contents indexed (just the filenames will be indexed). This list must be a subset of the files to index list.
    To ignore this suppression, enter a * (but do you really want to index the contents of .GIF files?.
    List of properties to save:
    If these <META ... > tags are defined for an HTML document, their CONTENTs can be returned as part of the search results (along with the rank, file name, title, and document size).
    List of <META> names to index under:
    This is a list of <META> names to index under. Words that appear in the CONTENT of these <META> names will be indexed seperately.

    Custom Mode

    Use this to set a number of parameters; including file exclusion rules, and the name of the output file.
    The only required field is the directories to index. For all other fields, you can accept the defaults, or leave it empty (in which case GoSWISH will generate a default value). In general, use of the defaults is recommended!

       
    Parameter Description
    Directory & file names
    Directories to index:
    Create an index of all files in these directories (and in their subdirectories). You can enter relative directories (that do NOT contain drive information), or fully qualified directories (that do contain drive information).

    Relative directories are assumed to be subdirectories of this server's web-root directory.

    SWISH index name:

    Search-form document to create:

    The name of the SWISH index file to create. If not specified, a random name will be used.
    The search-form document is an HTML document that will contain a link to the search mode of GOSWISH.

    You can leave both of these fields empty, GoSWISH will automatically generate filenames.

    Indexing Rules
    Replace Rules (case sensitive) :

    Leave empty to use defaults; enter NONE to suppress Replace Rules
    This can contains (sets of) replace rules. These are mainly used to convert filenames into URLS. For example:
    d:\w3\samp http://mysite.net/samp
    By default, a (set of) replace rules that generate URLS back to the indexed files will be generated. If you are not indexing any fully qualified directories, use of this default is highly recommended.
    Files to index:
    Only files with these these (case insensitive) extensions will be indexed. Both the file name, and the contents of the file will be indexed.
    To index all files, enter a * (if empty, a default set of files will be indexed).
    Do not index contents of these files: Files with these (case insensitive) extensions will not have their contents indexed (just the filenames will be indexed). This list must be a subset of the files to index list.
    To ignore this suppression, enter a * (if empty, a default set of files will not have their contents checked).
    Do not index rules.
    PathName:
    Directory:
    Filename:
    Title:
    These file rules are used to limit what directories and files are searched. The first word should be contains, followed by a space delimited list.
    • PathName: If the pathname (to the file, or to the directory) contains any of these strings, do not index.
    • Directory: If one of these files is in the directory, do not index any file in the directory
    • Filename: If the filename contains one of these strings, do not index the file.
    • Title:If the title contains one of these strings, do not index.
    More Options
    Two limits (percent #_files) to use to identify common words:
    After indexing, swish can automatically tell which words are the most common and omit them from the index according to these parameters. For example:
    IgnoreLimit 75 250 -- ignore all words that occur in over 75% of the files and that also occur in over 250 different files.
    Common words (to be ignored):
    Ignore these "commonly occuring" words. If you leave this blank (or enter SwishDefault), a default set (of about thousand words) will be used
    Ignore HTML comments If you check this, GoSWISH will ignore all texts within HTML comments (all text enclosed by <!-- and -->)
    The name of this index:
    the administrator:
    the description:
    a pointer:
    These are strictly optional items used to identify the index. Leave them blank and some basic client, server, and selector information will be used.
    Use a "word stemming algorithim when creating the index A "word stemming" algorithm removes common suffixes, such as plurals and "ing" endings.
    List of properties to save: If these <META ... > tags are defined for an HTML document, their CONTENTs can be returned as part of the search results (along with the rank, file name, title, and document size).
    List of <META> names to index under This is a list of <META> names to index under. Words that appear in the CONTENT of these <META> names will be indexed seperately.
    Summary Creation
    Generate descriptive summaries
     Do NOT create
     Read from directory-specific descriptions file
     ... and generate if necessary
    For every match found during a search of a SWISH index, the URL of the matching document, it's TITLE, and a relevancy score are displayed. You can also display a descriptive summary, that is generated from the contents of the document.
    List of HTML document extensions:
    Files ending with these extensions are assumed to be HTML documents (descriptive summaries will be generated from meta elements, headers, etc.)
    Name of directory-specific-description file:
    Filename.ext only: do not include a path or drive.
    To allow you to specify your own descriptive summaries (say, for image files), you can create a (set of) directory-specific-description files.

    Enter a title that will be displayed at
    the top of the search form.
    You can include HTML tags.
    Leave this empty and a default
    title will be used.
    Enter a title that will be displayed
    when results are returned.
    You can include HTML tags
    Leave this empty and a default
    title will be used.
    Do you want to monitor SWISH while it runs: Yes || No      


    Expert Mode

    You can always write your own SWISH configuration file (see GOSWISH.TXT, or the SWISH home page, for more information)!

    Some further definitions

    Directories to index
    You can enter a space-delimited list of several directories to index -- GoSWISH will instruct SWISH to index files in these directories (and in subdirectories of these directories).

    There are two forms of directory entries, relative and fully qualified:

    Some examples might help (the following assume that the web-root directory is D:\WWW).

    Notes:
    The SWISH index file
    The main purpose of this form is to tell GoSWISH how to create a SWISH index file. SWISH index files contain optimized lookup information, which are used by GoSWISH's "search" mode.

    If a relative filename is given (for example, a file name with no directory or drive specified), it will be created in the INDEX subdirectory of this server's SWISH_DIR directory. If you leave this field blank, a randomly generated name (of the form INDEXn.SWI) is created (in this INDEX subdirectory).

    Search form document
    GoSwish will generate a search form document that references the SWISH index you are about to create. This HTML document can be used as a front-end for the SWISH search engine. In other words, this document can be used as-is (though you might want to customize it) by clients interested in searching the directories you are about to index!

    There are several options for this entry:

    1. A file name: a file (of this name) will be created in the server's web_root directory.
      Example: SEARCHME.HTML
    2. A file and a relative path: a file (of this name) will be created in the subdirectory (of the relative path) under the server's web_root directory.
      Example: CANDY/CHOCOLATE/SEARCHME.HTML
    3. A fully qualified file name, and a URL that points to it (seperated by a space): The fully qualified file name will be created. You must also include this URL -- it's used to form a link (to this search form) in GoSWISH's response to you.
      Example: D:\WWW\GIANTS\FOOBAR.HTM   /GIANTS/FOOBAR.HTM
    4. A subdirectory name that ends with a / : a randomly derived name (eg; SEARCH2.HTM) will be created in this subdirectory of the web_root directory.
    5. Or, you can leave this field blank, in which case a randomly derived name (eg; SEARCH2.HTM) will be created in the web_root directory.

    Replacement rules
    The exact syntax of this replace rules field is:
        DIR_PREFIX  url_prefix , [ DIR_PREFIX url_prefix]
    where:
        DIR_PREFIX: a drive and directory 
        url_prefix: a sitename (with possible path information)
    and where [ DIR_PREFIX url_prefix] 
       are optional repetitions of additional DIR_PREFIX, url_prefix pairs. 
    Notes: Examples:
     *  D:/WWW  http://www.mysite.org/
     *  D:/WWW/SAMPLES http:/www.mysite.org/sampdir
     *  D:/WWW/SAMPLES  http://www.mysite.org/sampdir  
         D:/WWW/TEST http://www.mysite.org/test/ver1
    MetaNames
    The MetaNames field is used to define keyword classes based on <META> elements in HTML documents. When specified, words that fall within such elements will be indexed seperately.

    That is: each of these entries defines a keyword, with values obtained from matching <META> elements in HTML documents. Words that appear in the content of such an element will not found by standard searches.

    Instead, you can search under the appropriate MetaName keyword, using search strings with the syntax:    AMetaName=keywords
    This means: search for "keywords" in the CONTENT field of <META NAME="AMetaName" ... > elements.

    Search string example:
        description = (dogs or cats) not (lions and tigers)
    This query will retrieve all the files in which the "description" is associated either with "dogs" or "cats" and that do not contain the words "lions" and "tigers", where "lions" and "tigers" are not associated to any meta name (that is, "lions" and "tigers" could appear anywhere in the file).

    Document Properties
    Document properties are defined (for HTML documents) by use of <META> tags. For example, the following META tags set the value of the "author" property to "Jefferson", and set the"description" property to "Blueprints for Monticello".

    <meta name="author" content="Jefferson"> <meta name="description" content="Blueprints for Monticello ">

    During search, you can tell GoSWISH which of these property names to display.

    For well organized sites, you can use properties instead of generating descriptive summaries

    Generating descriptive summaries
    When displaying "hits", GoSwish can also display a short descriptive summary of the files contents. To do this, a set of descriptive summaries must first be created (which are then stored in a seperate descriptive-summaries cache file).

    GoSwish has two means of generating these descriptions: either by examining the contents of text files, or by explicitily defining a description in a directory-specific-description file. file.

    The Read from directory-specific descriptions file option means: look for a descriptive summary in the directory-specific description file
    The ... and generate if necessary option means: try the directory-specific-description file first; and if that doesn't work, try to generate a descriptive summary by examining the contents of the
    More details....
    back to entry form