Advanced Modern Campus CMS Website Search
When using Modern Campus CMS Website Search, users can perform an advanced search to better find the information they are searching for. While each implementation of Website Search may look slightly different, the options present in an advanced search remain the same.
Applying Search LogicLink to this section
Texis and Metamorph, the drivers behind the search engine, use set logic for text queries. Set logic is easier to use and provides more abilities than boolean. The examples below make reference to single keywords, but keep in mind that each keyword can represent an entire list of things or any of the special pattern matchers.
Sets (or lists) of things are specified by placing the elements within parenthesis, separated by commas. Example: (bob,joe,sam,sue). In the examples below, a list like this could replace any of the keywords.
The default behavior of the search is to locate an intersection (or 'AND') of every element within a query. This means that the query: "microsoft bob interface" is the equivalent to the boolean query: "microsoft AND bob AND interface."
- - (without): The - (minus) is the most commonly used logic symbol. It means the answer should EXCLUDE references to that item.
- + (mandatory): The + (plus) symbol in front of a search item means that the answer MUST INCLUDE that item. This is generally used in conjunction with the permutation operation.
- @N (permute): The @ followed by a number indicates how many intersections to locate of the terms in your query. This may be confusing at first, but it is very powerful.
Query |
Finds |
---|---|
bob sam joe |
Bob with Sam and Joe |
bob sam -joe |
Bob with Sam without Joe |
bob sam joe @1 |
Bob with Sam, or Bob with Joe, or Joe with Sam |
A B C D @1 |
AB or AC or AD or BC or BD or CD |
+A B C D @1 |
ABC or ABD or ACD |
A B C -D @1 |
( AB or AC or BC ) without D |
Adjusting Linguistic ControlsLink to this section
Concept sets can be edited to include special vocabulary, acronyms, and slang. There is sufficient vocabulary intelligence off the shelf so that editing is not required to make good use of the program immediately upon installation. However, such customization is encouraged to keep online research in rapport with users' needs, especially as search routines and vocabulary evolve.
A word need not be "known" by Metamorph for it to be processed. The fact of a word having associations stored in the thesaurus makes abstraction of concept possible, but is not required to match word forms. Such word stemming knowledge is inherent, and any string of characters can be matched exactly as entered.
You can edit the special word lists Metamorph uses to process English if you wish. As it may not be immediately apparent to what degree these word lists affect general searching, it is cautioned that such editing be used sparingly and with the wisdom of experience. Even so, what Metamorph deems to be Noise, Prefixes, and Suffixes is all under user control.
Controlling ProximityLink to this section
The Proximity control is a feature in the Advanced Search screen of Modern Campus CMS Website Search.
Mastering the usage of proximity gives the ability to locate answers with greater precision. The Parametric Search Appliance input form gives you several options to control the search proximity:
- Line: All query terms must occur on the same line
- Sentence: Query items should all reside within the same sentence
- Paragraph: Items must reside within the same paragraph or text block
- Page (default): All items must occur within same HTML document
A bar graph displays any time a ranking search was performed (for example all searches except Show Parents).
Keywords, Phrases, and WildcardsLink to this section
Letter cases are ignored when matching queries.
The wild-card character * (asterisk) may be used to match just the prefix of a word or to ignore the middle of something.
If the desired item is more complicated than the simple * wild-card can accomplish, try using the regular expression matcher.
To locate a number of adjacent words in a specific order, surround them with quotation marks (“). Putting a hyphen (-) between words also forces order and one-word proximity.
Query |
Locates |
---|---|
john |
john, John |
"john public" |
John Public |
web-browser |
Web browser, web-browser |
John*Public |
John Q. Public, John Public |
456*a*def |
1-456-789-ABCDEF |
activate |
activate, activation, activated, ... * |
Locating Misspellings, Typos, and ApproximationsLink to this section
The approximate pattern matcher lets users find results that are similar to their search query. To invoke an approximate match, precede the word or pattern with the percent sign (%).
This pattern matcher locates items by examining how closely the text matches the query item. It uses percentages of difference as a measure, and will default to finding items with eighty percent similar content. Users may specify a different percentage on the query line by following the “%” with a two-digit number that represents the value you desire.
This method:
- Handles character transpositions and omissions.
- Can be used on non-word items like addresses.
- Will match foreign language constructions.
- Finds accidental OCR errors or character insertions.
Expression |
Will Find |
---|---|
william %shakespeare |
William Shekespeere, William Shake~eare, William 5hakespeare |
%75MYPARTNO9045d/6a |
Anything within 75% of looking like MYPARTNO9045d/6a |
ki* %lear |
king leer, ki lear, etc... |
Numeric Pattern MatcherLink to this section
The numeric pattern matcher allows users to find quantities in textual information in any way they may be represented.
To invoke a numeric value search within a query, precede the value with a hash symbol (#).
To Find |
Syntax |
Example Match |
---|---|---|
any value |
## |
a few dozen |
equal to |
#5000 |
five thousand |
greater than |
#>5000 |
2.2 million |
less than |
#<5000 |
1,000.00 |
greater than or equal |
#>=5000 |
5,000.01 |
less than or equal |
#<=5000 |
four thousand nine hundred and 59/100 |
between |
#>5000<6000 |
5.5 kilotons |
Expression |
Will Find |
---|---|
#>0<1 |
15 percent, 500 milligrams, 0.25, 15 sixteenths, 5/32 |
#665 |
six hundred three score and five |
#>1e6<1e12 |
five gigabytes, 5,000,000,000, 2.2 million |
Notes: The expression “greater than 0, less than one” above is useful for finding statistical information in the text. For example, if a user enters the query, “votes #>0<1,” the program will find, “One third of the voters cast their ballots for the incumbent.”
Query ProtectionLink to this section
The following apicp settings alter the set of query syntax and features that are allowed. Metamorph has a powerful search syntax, but if improperly or inadvertently used can take a long time to resolve poorly constructed queries. In a high-load environment such as a web search engine, this can bog down a server, slowing all users for the sake of one bad search.
Therefore, Vortex is by default highly restrictive of the queries it will allow, denying some specialized features for the sake of quicker resolution of all queries. By altering these settings, script authors can "open up" Texis and Metamorph to allow more powerful searches, at the risk of higher load for special searches.
- alequivs (boolean, off by default): If on, allows equivalences in queries. If off, only the actual terms in a query will be searched for; no equivalences. This is regardless of ~ usage or the setting of keepeqvs. Note that the equivalence file will still be used to check for phrases in the query, however. Turning this on allows greater search flexibility, as equivalent words to a term can be searched for, but decreases search speed. Note: In tsql version 5 and earlier the default was on.
- alintersects (boolean, off by default): If on, allow use of the @ (intersections) operator in queries. Queries with few or no intersections (for example @0) may be slower, as they can generate a copious number of hits. Note: In tsql version 5 and earlier the default was on.
- allinear (boolean, off by default): If on, an all-linear query-one without any indexable "anchor" words-is allowed. A query like "/money #million" where all the terms use unindexable pattern matchers (REX, NPM or XPM) is an example. Such a query requires that the entire table be linearly searched, which can be very slow for a table of significant size. Note: In tsql version 5 and earlier the default was on.
If allinear is off, all queries must have at least one term that can be resolved with the Metamorph index, and a Metamorph index must exist on the field. Under such circumstances, other unindexable terms in the query can generally be resolved quickly, if the "anchor" term limits the linear search to a tiny fraction of the table. The error message "Query would require linear search" may be generated by linear queries if allinear is off.
Note that an otherwise indexable query like "rocket" may become linear if there is no Metamorph index on its field, or if an index for another part of the SQL query is favored instead by Texis. For example, with the SQL query "select Title from Books where Date > 'May 1998' and Title like 'gardening'" Texis may use a Date index rather than a Title Metamorph index for speed. In such a case it may be necessary to enable linear processing for a complicated query to proceed-since part of the table is being linearly searched.
- alnot (boolean, on by default): If on, allows "NOT" logic (for example the - operator) in a query.
- alpostproc (boolean, off by default): If on, post-processing of queries is allowed when needed after an index lookup, for example to resolve unindexable terms like REX expressions, or like queries with a non-inverted Metamorph index. If off, some queries are faster, but may not be as accurate if they aren't completely resolved. The error message "Query would require post-processing" may be generated by such queries if alpostproc is off. Note: In tsql version 5 and earlier the default was on.
- alwild (boolean, on by default): If on, wildcards are allowed in queries. Wildcards can slow searches because potentially many words must be looked for.
- alwithin (boolean, off by default): If on, "within" operators (w/) are allowed. These generally require a post-process to resolve, and hence can slow searches. If off, the error message "'delimiters' not allowed in query" will be generated if the within operator is used in a query. Note: In tsql version 5 and earlier the default was on.
- builtindefaults: Restore all settings to builtin Thunderstone factory defaults, ignoring any texis.ini [Apicp] changes. Added in Texis version 6.
- defaults: Restore all settings to defaults set in the texis.ini) [Apicp] section (or builtin defaults for settings not set there).
- denymode (string or integer; warning by default): What action to take when a disallowed query is attempted:
- silent or 0 Silently remove the offending set or operation.
- warning or 1 Remove the term and warn about it with a putmsg-catchable message.
- error or 2 Fail the query.
A message such as "'delimiters' not allowed in query" may be generated when a disallowed query is attempted and denymode is not silent.
- qmaxsets (integer, 100 by default): The maximum number of sets (terms) allowed in a query. Note: also settable as qmaxterms for back-compatibility with earlier versions.
- qmaxsetwords (integer, 500 by default, unlimited by default in tsql): The maximum number of search words allowed per set (term), after equivalence and wildcard expansion. Some wildcard searches can potentially match thousands of distinct words in an index, many of which may be garbage or typos but still have to be looked up, slowing a query. If this limit is exceeded, a message such as "Max words per set exceeded at word 'xyz*' in query 'xyz* abc'" is generated, and the entire set is considered a noise word and not looked up in the index. A value of 0 means unlimited.
The set may only be partially dropped (with the message "Partially dropping term 'xyz*' in query 'xyz* abc'") depending on the setting of dropwordmode (which must be set with a SQL set statement). If dropwordmode is 0 (the default), the root word, valid suffixes, and more-common words are still searched, up to the qmaxsetwords limit if possible; the remaining wildcard matches are dropped. If dropwordmode is 1, the entire set is dropped as if a noise word.
Note that qmaxsetwords is the max number of search words, not the number of matching hits after the search. Thus a single but often-occurring word like "html" counts as one word in this context. Note: In tsql version 5 and earlier the default was unlimited.
- qmaxwords (integer, 1100 by default): The maximum number of words allowed in the entire query, after equivalence and wildcard expansion. If this limit is exceeded, a message such as "Max words per query exceeded at word 'xyz*' in query 'xyz* abc'" is generated, and the query cannot be resolved. 0 means unlimited. Like qmaxsetwords, this is distinct search words, not hits. dropwordmode also applies here. Note: In tsql version 5 and earlier the default was unlimited.
- qminprelen (integer, 2 by default): The minimum allowed length of the prefix (non-* part) of a wildcard term. Short prefixes (for example "a*") may match many words and thus slow the search. Note: In tsql version 5 and earlier the default was 1.
- qminwordlen (integer, 2 by default): The minimum allowed length of a word in a query. Note that this is different from minwordlen, the minimum word length for prefix/suffix processing to occur. Note: In tsql version 5 and earlier the default was 1.
- querysettings (string or integer): Container for changing all or a group of settings to a certain mode. (Explicit texis.ini [Apicp] settings still apply, as with all non-builtin "...defaults" settings). The argument may be one of the following:
- defaults or 0 Set Vortex defaults; same as <apicp defaults>.
- texis5defaults or 1
- Set defaults for Texis (for example tsql not Vortex) version 5 and earlier. Some of these defaults are in common with Texis 6 and later:
- alprefixproc, keepnoise, keepeqvs are off
- alwild, alnot are on
- minwordlen 255
- sdexp/edexp are empty
- eqprefix set to "builtin"
- ueqprefix set to "eqvsusr"
- denymode is "warning"
- qmaxsets is 100
- The rest are different from Texis 6 and later:
- alpostproc, allinear, alwithin, alintersects, alequivs, alexactphrase are on (instead of off in version 6)
- qminwordlen, qminprelen are 1 (instead of 2 in version 6)
- qmaxsetwords is unlimited (instead of 500 in version 6)
- qmaxwords is unlimited (instead of 1100 in version 6)
- vortexdefaults or 2 Set Vortex defaults; same as <apicp defaults>.
- protectionoff or 3 Turn off query protection settings, for example set all al... settings on (allowed), exactphrase on, qmin... limits to minimums, qmax... limits to maximum (unlimited), denymode to warning.
Added in Texis version 6:
- texisdefaults: Restore Texis (as opposed to Vortex) version 5 and earlier default values. Note: This setting is deprecated in Texis version 6 and later (as Texis defaults have changed to match Vortex defaults for consistency), and may be removed in a future release. Set querysettings texis5defaults instead. The texisdefaults setting is still respected, but will cause a warning noting that it is deprecated. If legacy scripts cannot be updated to use querysettings texis5defaults instead, this warning can be silenced with the texis.ini setting [Texis] Texis Defaults Warning = off
Setting texisdefaults turns off query protection, for example it will enable linear searches, post-processing, within operators, etc. Note: this will permit some queries to run than can potentially take an inordinate amount of time, even with a Metamorph index. Use with caution.
Ranking FactorsLink to this section
Ranking Factors allow users to re-rank the search based upon a variety of parameters. These parameters are called Ranking Factors. Depending on these settings, certain search results will be given higher priority than others based upon the parameters selected.
Each ranking factor can be given a certain amount of Importance, which is selected from the drop-down menu next to each factor, ranging from "off" to "max." The higher the importance, the higher rank in the results Website Search will give to pages that have "high scores" in that parameter.
The Rank Factors available are as follows:
- Word Ordering: Whether the words in the document are in the same order as they are in the search field. If set at "max," documents with the search terms in the same order as they appear in the search field have a higher rank.
- Word Proximity: How close the words in the search field are to each other in the document. If set at "max," documents that have all the words in the search field grouped closely together have a higher rank.
- Database Frequency: How frequently the search terms appear in the database table (usually the database table consists of all the documents being searched in the site). If a search term appears more frequently in the database, its rank decreases, because it is likely a more common term that won't lead users to the document they want. If set at "max," terms with a high amount of database frequency see a large reduction in their ranking in the search results.
- Document Frequency: How frequently the search terms appear in the document. If set at "max," pages with the search terms appearing more frequently have a higher rank.
- Position in Text: How close to the beginning of the document the search terms are found. If set at "max," pages with the search terms at the beginning of the document have a higher rank.
- Depth in Site: How "deep" the document is in the site file structure. If set at "max," documents at or near the root of the site are given priority.
Regular Expression Pattern MatcherLink to this section
The regular expression pattern matcher allows you to find those items that cannot be located with a simple wildcard search.
To invoke the regular expression pattern matcher within a query precede the expression with a forward slash (/).
Expression |
Purpose |
---|---|
/19[789][0-9] |
Find years between 1970 and 1999 |
/[1-9][01][0-9]-?[0-9]{3}-=[0-9]{4} |
Any USA Phone Number |
/\n=\space+Shakespeare |
Find 'Shakespeare' as the first word |
/\upper=\lower+\space\upper=\lower+ |
Proper Names without initials |
/(abc904) |
finds '(abc904)' anywhere it exists |
Thesaurus ExpansionLink to this section
The Parametric Search Appliance has a vocabulary of over 250,000 word and phrase associations. Each entry is generally classifiable by either its meaning or part of speech.
Depending on the administrator's Synonyms setting for this profile, synonyms may already be included for each term in your query. If not, synonyms may be included for individual terms within your query by preceding them with a tilde (~).
Word FormsLink to this section
The Word Forms options are found in the Advanced Search screen of Website Search. They give you control over how many variations of your query terms will be sought in your search. There are four Word Form options available:
- Exact match: Only exact matches will be allowed (default).
- Plurals & possessives: Plural and possessive forms will be found (s, es, 's).
- Any word forms: As many word forms as can be derived will be located.
- Custom: Uses the Custom Suffix List, Custom Suffix Default Removal, and Custom Suffix Min Length settings to create your own custom behavior.
Word | "Exact" Option | "Plural" Option | "Any" Option |
---|---|---|---|
president | president | president + presidents, president's | president + presidents, president's + presidential presidency, preside, presides, presiding, presided |
tight | tight | tight + tights | tight + tights + tightly, tightening, tightened, tighter, tightest |
program | program | program + programs, program's | program + programs, program's + programming, programmatic, programmed, programmer, programmable |
This process is called morpheme processing, and it is generally smarter than a traditional "stemming" algorithm. It checks to see if a word could be a valid form of the search term.
Notes: Thesaurus terms are also treated in the same manner. Words smaller than five characters aren't morpheme processed.