<%@LANGUAGE="VBSCRIPT" CODEPAGE="1252"%> Untitled Document
What is TRIST?

TRIST is a software tool that can be used to create indecies of textual documents with the provision to search that index by keywords.

The principle aim of TRIST was to provide search capabilities of local files to local users’. That is to say files and users on an individual machine. As the tool developed it became apparent and quite logical for the tool to evolve to a state where it can provide searches of the index to network users via a locally hosted web interface. This led to the final release where web interface components can be integrated into users’ personal sites to provide searches of files stored on their web server, namely a quick and easy method by which to add search facilities to their web-sites.

Potentially, the TRIST tool allows a user to build the index of their site and then use that index to return results in a manner akin to common search engines.

Example: A corporate site which includes a knowledge base or FAQ section. The web-site designer could use the TRIST tool to add keyword searching of the documents so a visitor to the site could request pages of relevance to them as opposed to navigating through catergories.

The results from TRIST are generated from a keyword search. This is a search that internet users will be familiar to from most popular search engines. The sample web interface in TRIST has a single text field and a search button. To generate a search, a site visitor would key in keywords to be looked up and click the search button. If the visitor requested a search for the three keywords ‘TRIST SEARCH TOOL’ then all the documents that the tool has indexed that contain an occurrence of the three words would be selected and displayed to the user in order of relevance.

Document relevance is determined by counting the numbers of keyword occurrences within the document. This value is weighted depending on the keyword and summed together to give a total value for the document. The highest value document is displayed first in the results, followed by documents in order.

The tool currently accepts documents in three formats. These have been chosen to allow for a good cross section of applications and suitability.

HTML

To enable indexing of web-based content, the tool supports HTML files with extensions .htm and .html. The tool ignores any ‘meta-tags’, scripts, styles or formatting and focuses only on words that are visible when the document is rendered in a web browser.

RTF

To accommodate the need to index documents saved in a common ‘word-processer’ format, where heavy formatting applies but text content is still of relevance. As with HTML the tool only indexes words that are visible in the rendered form. In all but a few cases, this will be the text as written by the author in whichever word processing package they have used. Traditionally RTF files are associated with Microsoft Word as an alternative format to the normal .doc format. Word formats are not supported by TRIST.

TXT

To allow for general applications, the tool also handles the indexing of plain text files of the form .txt extension. This inclusion is due to the vast amount of software that can export in TXT format regardless of their native formats and conventions. As a result, it is possible to index log files and configuration files where the file has a .txt extension.

The tool comprises three components. The first is a database. This is provided in the format of Microsoft Access 2000. Although this is not the best Database platform available, it is widely available and understood.

Secondly there is the TRIST Jar file. This is a Java program. The Jar represents the back office of the TRIST tool and it is with this, target documents can be identified and indexed. The tool connects to the database and stores the index it creates within its table structure. As a result, the database must be set up prior to the Jar being executed.

The third and final component is the sample interface. This collection of files can be used independently where a suitable web server is in place, or integrated with existing sites to provide a search option.

Files are defined by creating ‘Watch Folders’. These are folders of files that are defined within the TRIST tool. The contents of these folders and subfolders is searched and files of compatible types are added to a ‘pending list’. This is a list of files waiting to be indexed. They are indexed when the user selects the ‘Index Pending Files’ option. The tool will index the files as well as storing details about the file itself. This allows the tool to identify files that have been modified and then re-index them. The tool also identifies files that have been removed as well as modified and then removes them from the index.