Microsoft Internet Information Server 4.0
INDEX SERVER

Introduction

Index Server is a full-featured content indexing and search utility that is tightly integrated with IIS to perform searches in Web sites and Intranets. It is installed as a separate component from the NT 4.0 Option Pack, is launched at the end of installation and is started when the first search query is initiated. As much as possible, the functions of Index server run automatically without user intervention. Index Server runs as a Windows NT service, and can be stopped and started like any other NT service in the Sevices Applet of the Control Panel. Index Server can search the following file types:
Text Files-.txt
HTML Files-.htm
MS Word Files-.doc
MS Excel Files-.xls
MS PowerPoint Files-.ppt
You can extend the rage of file types by installing content filters, which are DLL files that allow index server to search other file types as Adobe Acrobat PDF. Index Server supports multiple languages, as it's indexes are in Unicode, a 16 bit character encoding system that supports large alphabets.

The Indexing Process


Default Corpus and Scope
The default corpus includes all of the local and virtual directories in the default Web site.
Content Filters
When a search is performed, Index Server uses the correct content filter for the document being searched (.txt, .htm etc.).
Word Breaker
Next, a word breaker language utility identifies and separates individual words.
Normalizer
A normalizer is a clean-up tool that takes the words from the word breaker and removes capitalization, punctuation, and pluralization.
Noise Word
Noise words, such as "about", "at", and "are", that offer no useful content are filtered out.
Indexes
Words that are identified by the word breaker, make it through the normalizer, and are not noise words, are then stored in the index.
Word List Indexes
A list of words that index server extracts from a new file or an existing file that changed, exists only in RAM, and is re-created when index server is launched.
Persistent Indexes
Index Server also maintains an index saved on a storage device in case of system shutdown or power loss.
Shadow Index
A file that consists of one or more word lists.
Master Index
Also called the Catalog is created by combining all shadow indexes with the current master index. Index Server maintains one master index per corpus.
Merges
Occur automatically when some configuration values are exceeded.
Index Server Queries
Queries can be performed against any word lists. Queries are initiated through a form, which is an interface used by site visitors to initiate a search. Index server supports Web forms in standard HTML, ASP pages, and SQL Forms. To view a sample form, open your browser and in the address field type: http://youripaddress/Iissamples/Issamples/default.htm. You can edit and customize these forms.
The IDQ File
Defines a queries parameters including the scope (what portions of the corpus) and the restrictions of the search.
The HQX File
Used to format query results into an HTML document.