SlideShare a Scribd company logo
The commercial advantages of
Westlight’s approach to keywording
Ever since photo agencies began publishing images on CD-ROM, Westlight has been
one of the agencies most intensely involved in keywording. We have invested more
than $1 million in keywording since 1992.
Our efforts have always focused on maximizing usability for the customer. We
recognized early on that in a highly competitive market, just having great images
wouldn’t be enough if clients couldn’t find them easily. Having too few keywords would
unduly limit the client’s options and neglect many sales opportunities, whereas having
too many keywords would confuse the client and imply a lack of discipline. Efficient
searching would be as important as content and, for clients overloaded with choices,
might become the deciding factor.
Westlight was well prepared to address the issues associated with keywording (and with
digital imaging in general) because we had been using computers longer than most
other agencies. By 1992 we already had nearly a decade’s worth of information about
our business—representing tens of thousands of transactions—stored in our computer
network. Armed with this information and the insight of an experienced sales staff, we
did not have to theorize about what our clients might want, but could make practical,
informed decisions based on facts gathered on the live battlefield of day-to-day
business.
Through analysis of our sales data and discussions with our salespeople, we were able
to identify not only which kinds of images sold well, but the ways clients asked for them
and the criteria that led to their selection. Westlight owner Craig Aurness came up with
the idea of translating these criteria into searchable database fields that clients could
use to simulate their live interaction with Westlight’s researchers. The resulting search
“language,” marketed under the name QUESTock, was an immediate hit with clients.
QUESTock allows clients to search for images based on their layout requirements in
addition to content. Included among the search options are graphic criteria such as
colors, lighting, and camera angles, each placed in a separate database field. This
fragmented database structure was designed to be flexible and dynamic because we
knew the number of images would grow, increasing the search possibilities.
Most other image browsers hide the keywords, so users must use the keyboard to type
them in, essentially learning the vocabulary by trial and error. Our proprietary interface
removes the mystery, making the search options visible so users can use the mouse to
click on words.
Our language structure enables us to specify shades of meaning that would be
impossible to express concisely if only conventional keywords were used. For example,
clients can find “images of people in which you can’t tell their ethnicity,” or “images that
weren’t shot in Alaska but look as if they could have been,” or “images that contain both
trees and water, but in which the trees are dominant.” Some of these searches are
© 1998 Westlight Page 1 of 18
11/12/2016
accomplished through the use of special fields, others by spelling the keyword in a
special way.
Each new edition of QUESTock is thoroughly field-tested through the release of beta
versions. More recently we have begun analyzing the various image search engines
that have appeared on the World Wide Web. We have learned from our and others’
mistakes and continue to adapt our product to current market demands.
Our database structure was so carefully planned from the beginning that its basic
appearance has remained virtually unchanged since our first disc. We have continued
to make additions to the search options, but have not had to remove any options. Our
systems are streamlined to the point that the speed at which we can keyword new
images has nearly doubled since we started. Thus, our initial investment has proved to
be highly cost-effective.
Craig and Daphne Aurness continue to assume the central role of overseeing
QUESTock’s evolution, maintaining its dominant position as the most innovative
commercial image search system ever devised.
Behind the scenes, writer-editor Richard Carson has been involved in all aspects of
keywording labor. Richard, whose relationship with Westlight dates back to 1985,
conceives and maintains most of the in-house systems we use to enter and process
keywords. These systems make use of multiple databases and programs written in
Microsoft FoxPro. This use of a full-strength database program, outside the environment
of the image browser, is the key to our success in maintaining keyword quality. Richard
continues to do most of our keywording but has introduced quality control measures into
our programs, so that other workers can enter keywords without deviating from our
stylistic standards.
© 1998 Westlight Page 2 of 18
11/12/2016
Summary of keywording issues
This book shows how we have addressed the following issues related to keywording.
While they are not all equally important, at least some of them should sound familiar to
anyone who has been involved in a keywording project.
• Quickly generating a few basic keywords for in-house use, before sending the
images on to the keyworders.
• Getting data from similar images that have already been keyworded, both to avoid
duplication of effort and to maintain consistency.
• Calling up a list of suggested words in response to a common photo subject.
• Automatically generating words—both mandatory and suggested—based on data
the image already has.
• Double-checking keywords against related information in other database fields.
• Maintaining consistency in spelling.
• Changing the appearance of the keywords for use in projects having different
editorial standards.
• Identifying synonym pairs so the system will respond to either keyword in the same
way.
• Translating the keywords for distribution to overseas markets.
• Controlling the placement of geographical information so that, for example, a park is
not identified as a city.
• Identifying keyword subsets so that only certain words are exported, as when a
single keyword field is to be broken up into multiple fields.
• Placing keywords that have never been used before in a temporary file so they can
be reviewed by an editor.
• Controlling the flow of keywords as they pass through work stages requiring the use
of different programs.
• Using statistical output to determine how often one word appears with another and
to identify cases of over- or under-usage.
• Placing words in linguistic categories so they can be sorted in various ways on
reports.
© 1998 Westlight Page 3 of 18
11/12/2016
Preliminary steps before keywording
As mentioned earlier, Westlight already had been using computers for many years
before stock agencies began scanning and keywording images. Therefore, our images
already were passing through work stages that involved data entry. The only difference
is that the data entry involved information exclusively for our use and not directly
accessible to clients. For example, we used computers to trace an image’s duping
status and other facts related to the handling of the physical transparency.
When we introduced keywording, we created a new department in our company for it.
We did not allow this new work step to intrude on the existing data entry tasks to which
the library workers had become accustomed. Transparencies make their way into our
library as fast as they always have. Indeed, a new image usually is in our files and
accessible to clients some weeks before it has been fully keyworded.
We found that our library workers had all along been entering certain information about
images that suggested keywords. For example, they were entering the photographer’s
name, the name of the physical file in which they planned to put the transparency, and
the image’s orientation. We found ways of pulling this information so that images that
have not yet been formally keyworded can be combined in the in-house browser with
those that have. Thus, our researchers immediately have some way of finding the
image electronically while they wait for the keyworder to provide more detailed
information.
One of the very first facts to be entered about an image is the name of the physical file
into which we intend to place the transparency (our library is broken down into more
than 2,200 named files). We use this name to generate as many keywords as we can
determine from it. For example, if the image is being filed under “Monument Valley,” we
know that in addition to this term, it also can take the keyword “deserts.” These
keywords are temporary and will be overwritten by whatever data the keyworder later
provides.
The database that contains the above information is separate from the one that contains
finished keywords. When we output image data for in-house use by our researchers, the
output routine looks in both databases for the image. If it finds the image in the keyword
database, it generates finished data; if it doesn’t find the image there, it gets
rudimentary keywords from the other database.
The workers who enter the file subjects also have certain limited keywording options.
They can choose a few commonly used words from two lists of major categories and
subcategories. They also can enter the geographical location. At this point, the level of
keyword quality need not be high. The person formally in charge of keywording the
image (usually Richard Carson) will determine what keywords it should have, and these
initial keywords will not be used again.
© 1998 Westlight Page 4 of 18
11/12/2016
Quick keywording of similars
As the number of digital images grows, the chances increase that a new image will
closely resemble an older one. Similar images need to have similar keywords. Clients
will not trust a search system that reveals great differences in the keywords of obviously
similar images; they want to know that a single search will find them all. At the same
time, we don’t want to have to go to great pains to compare all such images to be sure
we are giving them the same keywords.
Our keyworders don’t have to track down and study the data for an older image every
time they encounter a new image that is similar. All they need to know is the number of
the older image.
If the new image is a close similar of one that has been keyworded before, the
keyworder can simply enter the number of the older image that it resembles. The
program displays the title of the older image and asks for confirmation. It then copies all
of that image’s keywords to the new image. Only visual descriptions are copied; in-
house statistics unique to each image are unaffected. The photographer’s name is not
copied, so this method can be used with similar images by different photographers.
This “cloning” option is especially useful when many similar images are being worked
on in succession.
Even when the images are not identical, this option can be used as a starting point. For
example, the keyworder may have a series of abstract background patterns that differ
only in color. The data from the first such image can be copied to all the others, and
then the data relating to color can be changed manually, perhaps by another worker
who is in charge of entering colors.
© 1998 Westlight Page 5 of 18
11/12/2016
Getting keywords based on subject
In addition to our physical filing system, we have constructed a separate system of
subject codes exclusively for keywording, based on what kinds of images we have
tended to select for digital scanning.
This system is more visually oriented and makes finer distinctions between certain
compositional elements that specifically affect keywording. For example, out on the floor
we may have just one physical file for all mountain climbing shots, whereas when
keywording we will distinguish between images that contain one and two people,
because of the different concepts suggested by each.
The subject codes are organized into a hierarchy that can be represented by an outline,
rather like the Dewey Decimal System (except that our system doesn’t have to account
for every possible subject, only those in which we specialize). These codes act as
macros that call up short lists of selected keywords associated with the subject. The use
of codes is entirely optional, but helpful.
At his or her discretion, the keyworder can determine what subject code (if any) applies
to the image and then enter that code to get a list of suggested keywords. He or she
then can deselect any unwanted words before appending the words to the image.
(Alternatively, the keyworder can simply accept all the words and wait until a later step
before examining them.) More than one code can be entered.
Though we store the last code entered, it is only a means of copying keywords and
generally is never referred to again. Thus, we can make updates to the coding system,
possibly making some codes obsolete in the process, without affecting the data for older
images.
Appendix A shows our subject coding system in detail. The short list is for the
keyworder’s reference while working and shows only enough information to allow all
codes to be seen at a glance. The long document reveals the entire contents of the
code database, showing the list of words that each code calls up. This system is entirely
flexible and enables us to add, delete, and move codes or words as our knowledge
improves. Changes to the codes require that the keyworders be given updated lists, but
changes to the words do not.
© 1998 Westlight Page 6 of 18
11/12/2016
Geographical locations
For reasons that will become clear, we ask our keyworders to look at the location
information for the image, if applicable, before proceeding to the keywords field.
Locations carry with them a whole group of considerations that other keywords do not have:
• They are numerous (about a third of all keywords).
• They occur in clusters rather than individually: “Hollywood, Los Angeles, California,
United States, North America.” Some sort of automation is required to ensure that all
terms in the cluster will be included every time.
• Any attempt to keep track of them by categorizing them creates problems with
proper placement. For example, creating a separate data field for cities results in
terms being put there that are not cities. (Westlight has separate fields for cities,
states, countries, and continents, plus a special fifth field we call region—and many
location terms are not any of these five.)
• Many terms are ambiguous; for example, Victoria is a city in both Canada and China
(Hong Kong) and a territory in Australia.
Westlight has vigorously tackled the problem of monitoring the constant creation of new
location terms, and we believe we have it largely under control. Part of our success is
due to our creating a temporary storage space for newly created terms, a technique we
discuss in the next section.
Deciding when to include the location
Even when an image’s location is known, it is not necessarily relevant. For example, on
the CDs we send to clients, we generally omit the location information for generic indoor
shots of offices or laboratories. Nevertheless, we have allowed ourselves the option of
changing our minds by creating a storage area for this information that does not require
us to include it.
As stated above, our database contains separate fields for city, state, country,
continent, and region. If the photographer has provided us with such information, we
always enter it in these fields. However, this information will not be included in anything
we release to clients unless we take the additional step of copying it to the keywords
field. Copying the data results in the automatic addition of the keyword “locations.” Our
output program looks for this word and interprets it to mean that the contents of the five
location fields are to be included in the output.
If we decide that including the location was a mistake, we can delete the location terms
from the keywords field. The location information will no longer be generated, but we
still have it stored in the separate location fields for possible later use.
© 1998 Westlight Page 7 of 18
11/12/2016
Controlling proper entry of locations
Because Westlight maintains five separate fields for location data, we have to make
sure not only that terms are spelled correctly, but also that they are placed in the correct
one of the five fields. When clients click on the country field, they expect to see only
countries and only one spelling for each country.
Any time we enter location data, our spell checking database is active. If we misspell a
name and the spell checker recognizes the misspelling, the spelling will be corrected on
the spot. If the misspelling is not in the spell checker, it probably will be flagged as a
“new” word and stored in a cache where it later can be evaluated and possibly added to
the spell checker.
The spell checker recognizes all two-letter postal abbreviations and replaces them with
fully spelled-out names, thereby allowing the keyworder to enter U.S. states quickly. It
also converts certain other abbreviations, such as “USA,” “NYC,” “L.A.,” and ‘S.F.”
After checking the spelling, the data entry routine then looks at our thesaurus. This
database generates new words in response to existing words; for example, it
guarantees that every image that says “Los Angeles” also will say “California.” (See
“Other additions to keywords” and Appendix D.) The thesaurus is where we store the
information about whether a term is a continent, country, state, city, or region. A number
code from 1 to 5 indicates which it is. The data entry routine places the word in the
appropriate field based on the number it finds here.
The keyworder needs only to enter the minimum that he or she knows the routine will
recognize, and it will fill in everything else. For example, with Los Angeles, the
keyworder need only enter “L.A.” The spell checker will change this to “Los Angeles,”
and the thesaurus then will fill in all the fields based on “Los Angeles.” The original entry
“L.A.” doesn’t even have to be put in the right field; no matter where it is entered, all five
fields will be filled in correctly.
For that matter, the original entry doesn’t have to belong in any of the five fields at all.
For example, “Grand Canyon” does not fall into any of our five categories. With Grand
Canyon images, “Arizona” is the most specific term that can be entered in the location
fields. Nevertheless, if the keyworder incorrectly enters “Grand Canyon” as, say, a city,
the thesaurus not only will add “Arizona” and all other terms that derive from it, but also
will delete “Grand Canyon” (because that term does not show a number code from 1 to
5). When the keyworder sees the original term disappear, he or she knows that it can be
entered only in the general keywords field and not in one of the special location fields.
With locations that are obscure or ambiguous, the keyworder may have to enter more
information. For example, entering only “Victoria” will not work. This is an ambiguous
name that has been purposely left out of the thesaurus, so no additional terms will
appear. But “Victoria, BC” will work, because the routine does recognize “BC.” In such a
case the keyworder must be responsible for putting the unrecognized term in the right
field; the thesaurus doesn’t know where it belongs, and if it’s in the wrong field, the
thesaurus probably will overwrite it with something else that belongs there. Few
situations require the entry of more than two terms. (Westlight has given thought to the
idea of attaching codes to ambiguous terms that would specify their meaning, so that
the thesaurus can process them.)
© 1998 Westlight Page 8 of 18
11/12/2016
Viewing and editing keywords
Only after all the preceding matters have been addressed is the keyworder ready to
look at the contents of the keywords field.
The keyworder selects the keywords field and sees all the keywords that have been
created so far (scrolling if necessary): those generated by any subject codes that were
entered, the location if it was added (plus the word “locations”), and any other words
that may have been put there by someone else.
The keyworder spends as much time in this screen as necessary, deleting unwanted
words and adding others. It is not necessary to preserve alphabetical order. An
experienced keyworder can enter fewer words, knowing that the program will
automatically generate some.
When the keyworder is satisfied with the list as it stands, he or she exits this screen.
Doing so launches a series of programs that evaluate the keywords.
Spell checking
The keywords are spell checked before anything else is done with them (see Appendix
C). The subsequent programs then will base their actions on the correct spelling. If a
word is one we have decided to forbid (usually a vague term such as “life” or “weather”),
the spell checker will delete it altogether by replacing it with a blank.
As with any spell checker, there are some things it cannot do. For example, it does not
know if the word “painting” refers to an art object or to the act of painting a house.
Therefore, it does not change the word to “paintings,” nor will the thesaurus later add
“paintings.” Instead, such words are handled by the program that follows.
Consistency with other fields
The next program looks for any of a group of problem words and compares them with
other fields in Westlight’s QUESTock database structure, flagging any conflicts. This
program is analogous to a grammar checker.
For example, if the keyword list contains the word “men,” the program looks at our “age”
field to see if it is marked “adults,” and at our “gender” field to see if it is marked either
“male” or “both.” If a discrepancy is found, the program asks the keyworder what to do:
delete the keyword or correct the other field. In this case, the keyworder would have no
other options.
The program also makes optional suggestions. For example, we use the keyword
“pairs” for any image containing two people, but we also use it in other senses. Upon
finding this word, the program will check the “number of people” field to see if it says
“two.” If it doesn’t, the program will offer to change it, but will not require any changes.
After checking the other fields, the program looks for certain pairs of words that
contradict each other. For example, if an image of Florida or Bermuda has the keyword
“tropics,” the program tells the keyworder that this is not a tropical location and that the
© 1998 Westlight Page 9 of 18
11/12/2016
image would more appropriately take our special keyword “generic tropics” that allows
for nonliteral searches. If the shot is not generic in nature, “tropics” can be deleted
altogether.
Finally, the program looks for singular nouns that may or may not take a plural form. For
example, if the image has the keyword “aspen,” the program asks if this is a city in
Colorado or a tree. If it is a city, the program does nothing; if it is a tree, the program
adds an ‘s’ to it.
Any such occurrence that results in a change will cause the program to run through
another cycle. When the program has run through an entire cycle without any changes
being made, it stops.
Trapping new words
The next program compares the keywords to our vocabulary database. This database,
which is updated regularly, lists every keyword we have ever used, plus some (mostly
locations) that have not been used but have been approved for use should they occur.
(See Appendix H.)
If a keyword isn’t found in the vocabulary, it must be either (1) a misspelling that the
spell checker didn’t catch, (2) a weak choice that the keyworder ought to reconsider, or
(3) a legitimate new term that should be added to the vocabulary.
A popup appears showing the unrecognized word and asking the keyworder to
categorize it. One of the options is to correct a typing error. If the word is a simple typo,
the keyworder can call it up and correct it; or if it is decided that the word is no good, it
can be deleted. (An additional option, available only to experienced workers, adds the
typo to the spell checker so that it will be automatically corrected if it ever occurs again.)
If the word still isn’t recognized even after it is corrected, the popup appears again.
If the word is spelled correctly and not recognized, the keyworder can choose one of
about 20 major categories to attach to the image, and possibly a subcategory. This
option is useful mainly for locations. The keyworder can here specify that the new term
describes, say, a city in Italy, which will help the editor later when the term is evaluated.
On the other hand, the keyworder can elect to let the editor decide by choosing “Don’t
know.”
The word and its category are appended to a small database that duplicates the field
structure of the main vocabulary database. This small database acts as a cache, or
trap, for all keywords that have been created since the last time the vocabulary was
updated. The keywording program recognizes words that currently are in the trap, so if
the keyworder plans to use this same word again with the next image, he or she will not
have to go through this step again.
Our vocabulary database is tightly controlled; it would become an ungovernable mess if
everyone were adding words to it directly. The trap allows the keyworder to use the new
keyword, but he or she has merely submitted it for consideration; it will not permanently
become part of the vocabulary database until it passes inspection.
© 1998 Westlight Page 10 of 18
11/12/2016
Richard periodically inspects the trap, putting misspellings into the spell checker and
cleaning up the categories for the remaining words. When the trap contains only clean
data, it is appended to the vocabulary and cleared out.
Note that if the keyworder is found to have entered misspellings, it is not necessary to
hunt down that person’s work and correct it. It is enough to put the misspellings in the
spell checker. All keywords are spell checked once more before being released to
clients, so the misspellings will be corrected no matter where they exist in the system.
Suggesting additional words
Once the program is able to recognize all the keywords, it alphabetizes and redisplays
them in their current, spell-checked form. It then generates further keyword suggestions
based on these existing words.
The program finds each of the current keywords in the vocabulary database, except
those that have been trapped. For each keyword, it pulls the contents of four fields that
contain lists of additional words. (How these lists get written is explained under “Getting
statistics on keyword usage.”) When it has pulled all the lists for all the words, it sorts
them, removes duplicates (including words the image already has), and displays the
resulting compilation as three lists.
The first list contains compulsory keywords. For example, if “apples” was one of the
original keywords, but “fruits” was not, then “fruits” will show up in the compulsory list
(with “apples” next to it to show where it came from). Compulsory words are inseparable
from the word that generated them; they cannot be deleted unless the other word also is
deleted. Normally there should be no problems with any of the compulsory words, but
the keyworder should look at them, to confirm both the legitimacy of the original
keywords and our judgment regarding compulsory status.
The other two lists contain optional keywords. They are divided according to
probability: the words in one list are considered more likely to apply to the image than
the words in the other list. (If one of the original keywords gave a word high probability
and another gave it low probability, it is considered to have high probability.)
The keyworder can go back and forth between these lists as many times as he or she
wants, selecting and deselecting words. When this process is finished, the selected
words are appended to the image.
At this point the keyworder can move on to the next image. However, the selection
process may have prompted ideas for even more words or revealed errors in the
original words. In that case, the keyworder can open the keywords field again and
repeat the entire procedure.
Appendix B shows sample excerpts from the vocabulary database. For each excerpted
word, the lists of compulsory and optional words are shown. These lists are stored in
the form of FoxPro memo fields.
© 1998 Westlight Page 11 of 18
11/12/2016
Other additions to keywords
Words generated by other databases
The output routine that formats our keywords for publication uses the same spell
checker and thesaurus that are used during data entry. The spell checker changes any
misspellings it finds, and the thesaurus then generates additional compulsory words
based on the words that are there.
Because we always spell check the keywords before publishing them, we can use the
spell checker to make global changes. If we decide to change the way we spell a word,
we can simply have the spell checker make the change instead of tracking down every
occurrence of the word and changing it manually. Of course, we also have to make sure
the new spelling is reflected in the thesaurus.
Appendix C shows an excerpt from the spell checker. If the spelling on the left is found,
it is replaced with the spelling on the right. These two fields are all that the spell checker
consists of. Note that forbidden keywords are replaced with blanks, which the output
routine ignores.
The thesaurus makes essentially the same additions that the “compulsory” word field
(Appendix B) makes during data entry. Because the contents of the compulsory field
may change, the thesaurus assures that all images will reflect the current state of the
compulsory field. Also, the thesaurus is more thorough because it goes through two
cycles; after generating words, it then creates second-generation words from the words
it generated the first time.
Additionally, the thesaurus fills in the location fields in a manner similar to that described
earlier under “Controlling proper entry of locations.” For example, someone may have
entered “Arizona” in keywords but neglected to enter anything in the location fields. The
thesaurus will automatically fill in the location fields based on “Arizona,” as well as
adding these terms (plus the keyword “locations”) to keywords.
Appendix D contains two views of the thesaurus. The first list is alphabetized on the
found keyword; this list shows which words the thesaurus generates if it finds the
boldfaced word. The second list is alphabetized on the generated keyword; it shows all
the ways the boldfaced word might be generated. If the thesaurus reflects any poor
decisions on our part, a list like the second one helps us diagnose the problem. Getting
rid of a bad keyword may be as simple as deleting a single entry from the thesaurus.
Words generated by other fields
In the last section we discussed comparing certain keywords to other fields, such as
making sure that the keyword “men” agrees with what is in the age and gender fields.
Westlight’s current database structure creates about 40 such situations where keywords
must agree with other fields.
Most such keywords do not have to be manually entered in the keywords field at all.
The data entry program looks for them as a quality control measure, to ensure that they
© 1998 Westlight Page 12 of 18
11/12/2016
are correct where they exist; but if they don’t exist, it is not necessary to add them. As
long as the other field that they relate to has been properly filled in, these words will be
automatically added to the keywords field.
For example, if our “season” field specifies “spring,” that word will automatically be
added to the keywords during output if it is not already there. An experienced
keyworder, who knows that this automatic addition will take place, will not bother to
enter the season as a keyword, unless he or she has been specifically instructed to do
seasons.
Words such as “men” and “women” that describe people are a little different because
the program cannot always be sure they apply. If the age field specifies only adults,
then “men” and/or “women” can be safely added; but if the age field also specifies
children, an ambiguity results. In such cases “men” and “women” would be among the
words offered as optional during keywording (see “Suggesting additional words” and
Appendix B).
Our keyworders are advised to become familiar with the 40 keywords that relate to other
fields, so that they know not to expend effort on them unless they are told otherwise.
© 1998 Westlight Page 13 of 18
11/12/2016
Synonyms and variants
Westlight keeps track of word pairs and groups that essentially should be treated as one
word. For example, we want all our programs to respond the same way regardless of
whether the keyword field contains “automobiles” or “cars.” We have several ways of
controlling such word pairs.
In the thesaurus (Appendix D), each pair is represented by twin entries; for example,
automobiles→cars and cars→automobiles. Regardless of which word is entered, the
other word will be included in the output.
In the vocabulary database, both words contain the same data in their records, so that
the same lists of compulsory and optional words will be pulled regardless of which is
entered.
One of the four list fields in the vocabulary database is exclusively for synonyms (see
Appendix B). Usually this field contains only one word, but some words have two
synonyms, such as “cougars/mountain lions/pumas.” Synonyms are considered a
special kind of compulsory word and are simply put into the compulsory list during
keywording. Giving them their own field helps us to remember that any changes we
make to a word’s data need also to be made to its synonym’s.
An additional field allows us to choose one synonym as a primary spelling and make the
other synonyms subordinate to it. When we print reports, we can choose to suppress
the subordinate synonyms so that only one version of the term is printed.
© 1998 Westlight Page 14 of 18
11/12/2016
Adapting the keywords to different styles
Sometimes Westlight’s images are combined with images from other agencies, as when
we submit them for inclusion on the Stock Workbook’s CDs. In such cases we are
expected to have our keywords adhere to editorial guidelines that may differ from those
we use for our proprietary QUESTock editions that we publish ourselves. The Picture
Agency Council of America (PACA) did not release an official statement on keywording
style until 1995, a year after Westlight released QUESTock to the market.
As PACA-style keywording has become more common, Westlight has begun including
two search engines on our CDs: one in our classic QUESTock style, and one in PACA
style for clients who are used to that or who want something simpler. We continue to
market our QUESTock interface aggressively (and successfully) because its fragmented
database structure provides a flexibility that cannot be accommodated by PACA
standards.
To conform to PACA standards, we maintain a separate spell checker and thesaurus
that work together to change the appearance of the keywords. When we process the
keywords, we simply specify that the alternate spell checker and thesaurus are to be
used.
The PACA spell checker is much bigger than our normal one because it contains all of
our “correct” spellings in addition to all of our recognized misspellings. Besides
correcting misspellings, it also converts our normally uppercase keywords to upper and
lower case. Appendix E shows an excerpt.
The alternate thesaurus contains many more synonym pairs than our normal one. For
example, normally we restrict the number of singular nouns we include, by having our
spell checker change singulars to plurals. The Stock Workbook is more inclusive and
asks for both singulars and plurals. Our alternate thesaurus contains many twin entries
for such variations. It also has an “expand” field that we use to mark these “extra”
spellings, so that we can suppress them should the occasion arise. Appendix F shows
an excerpt.
Because we regularly add new words to our vocabulary, we need to do regular updating
of our alternate spell checker so it will convert the new words to upper and lower case.
We periodically run a program that compares the PACA spell checker and thesaurus to
the QUESTock ones, looking for potential new entries and putting them into a temporary
work space. If we overlook anything, with luck the effect will be only aesthetic, as the
Stock Workbook’s browser is not case-sensitive.
© 1998 Westlight Page 15 of 18
11/12/2016
Foreign translations
Westlight has taken on the ambitious task of translating its keywords and menu options
into six foreign languages for international distribution. The translations are being
provided by our foreign agents and by our bilingual employees.
Our translation database contains not only all our keywords but also all the terms that
appear in our other QUESTock search fields. Each entry has a field to store a
translation in each of the six languages. If the foreign language has two or more equally
valid synonyms for the same English word, they all can be stored in one field, delimited
with commas, as space permits. Appendix G shows an excerpt.
Various other fields help to classify the terms so that, for example, keywords can be
isolated from other terms, or locations from other keywords. We can give our translators
sorted lists, grouping similar terms together to aid their comprehension.
This database is used by a special output program that formats our keywords in the
usual manner, then looks up their translations and makes all the necessary
substitutions. The result is more accurate than that produced by over-the-counter
translation programs, which work poorly with mere lists of words that have no
grammatical context.
For example, generic translators tend to translate many plural nouns, such as “controls,”
as if they were third person singular verbs. Because we know that we are not using the
word as a verb, we can guarantee that our translator will never interpret it as one. Our
customized translator has only one option, the one translation that we know is always
correct.
Nevertheless, even among our keywords there are ambiguities. When we have used an
English word in more than one sense, we need a way of handling the multiple
possibilities. Such words have multiple records in our translation database, one for each
meaning we have used. Each record contains the same English word but different
translations of it. A separate “definition” field contains a short description of which usage
is being addressed.
At present our customized translator has no way of knowing which translation to use. It
can, however, identify ambiguous words by looking to see if the definition field has
anything in it. If the word is ambiguous, the translator keeps it in English, but flags it by
putting an asterisk in front of it. When we see the finished output, the ambiguous words
are all at the top of the alphabetized list, and we can translate them manually by
referring to a printout of the translations.
Westlight is exploring the possibility of attaching codes to ambiguous keywords at the
time they are originally entered, specifying their meaning, so that the translation
program will know from the code which translation to use.
© 1998 Westlight Page 16 of 18
11/12/2016
Keyword subsets
Westlight’s QUESTock CDs contain as many as 50 database fields on which clients can
search. Some of these fields are essentially pared-down versions of the keywords field.
For example, the “major subject” field contains a short list of selected keywords, such as
“sports,” that are analogous to the major divisions of a printed catalog. The “concept”
field lists abstract words such as “success” that are especially useful to advertisers. All
of the words in these fields also can be found in the keywords field, just like any other
keyword; but because they are among the most common choices, the special fields act
as a way of screening out all other keywords, making them easier to find.
These special fields do not exist in the database in which we store finished keywords,
and our keyworders normally pay no attention to such categories. The fields are created
by the output routine we use to format the keywords for inclusion on our CDs. The
program looks for these specific words among the keywords, and any that it finds are
copied to the new field that it creates.
Several tiny databases contain the keywords that are to be isolated. A “subjects”
database contains the approximately 30 keywords we have selected to be major
subjects. A “concepts” database contains 200 selected concepts, and so on.
The output routine compares an image’s keywords to the words in these databases. If
the image has a keyword that is in the subjects database, that word is copied to the
subjects field in addition to being left in the keywords field. Whenever we run the output
routine, any changes we have made to the keywords will be automatically reflected in
the special fields, which are newly created each time.
The same process works in reverse. Occasionally our workers may do rudimentary
keywording within the image browser of images that have not yet been added to the
FoxPro database. In so doing, they may use the special fields as an easy way of
selecting major subjects and concepts, rather than entering them in the keywords field.
Then they export the image data out of the image browser and import it into the FoxPro
database. The import routine takes whatever words are in the special fields and puts
them into the keywords field.
© 1998 Westlight Page 17 of 18
11/12/2016
Getting statistics on keyword usage
Several of our programs involve calculating the frequency with which a given keyword
occurs among the images in our image database.
The most obvious use of this information is simply to list the words together with their
counts. Appendix H is an example of such a report. It lists every keyword in our
vocabulary database except those with a count of zero. The category heads and
subheads come from the category fields in the vocabulary database. The words are
grouped by category and then in descending order of frequency, enabling us to
determine, for example, which species of animal or which American cities are most
represented.
We also can run a similar report that groups all our search terms according to the
QUESTock field in which they appear on our CDs.
For the kind of list shown here, the counts are determined by running the entire keyword
database through our output routine, creating a temporary database from the output,
and then running a program that counts each word and puts the count into the
vocabulary database. The counts thus take into account the effects of spell checking
and generation of additional thesaurus words. The process, which can be accomplished
in a few hours on a fast computer, is repeated periodically. For some uses we do not
require as much accuracy and can get the information directly from the “raw,”
unprocessed keywords in the keyword database.
We can link the keyword data to sales reports, telling us if certain keywords tend to
occur repeatedly among images that are high sellers.
Relationship of one keyword to another
One of our programs generates a list of keywords that tend to coexist with a given
keyword. The program finds all images that have the keyword, pulls all of those images’
other keywords, and calculates how often each of those other keywords appears. The
result is a list that tells us, for example, that 35% of images containing mountains also
contain trees.
We use such lists as a guide in helping us decide what optional keywords should be
displayed during data entry. Words that invariably occur are placed in the compulsory
group if they are clearly related. Other words that occur more than half the time are
placed in the higher-level optional group. Words that occur less than half the time but
more than a quarter of the time are placed in the lower-level optional group. Thus, a
keyworder who enters “mountains” but does not enter “trees” will see “trees” displayed
among the lower-level options.
The same program can filter out certain categories of keywords so that, for example, we
can generate a list of concepts associated with a given keyword.
© 1998 Westlight Page 18 of 18
11/12/2016

More Related Content

Viewers also liked

Biomas de venezuela
Biomas de venezuelaBiomas de venezuela
Biomas de venezuela
Neyda_10
 
Manejo de Desechos
Manejo de DesechosManejo de Desechos
Manejo de Desechos
Mailyng
 
How Fairytales Have Impact On Healing
How Fairytales Have Impact On HealingHow Fairytales Have Impact On Healing
How Fairytales Have Impact On Healing
Health Education Library for People
 
Brave New World - Media Ecology In The Digital Age (#CAHEIT 2013)
Brave New World - Media Ecology In The Digital Age (#CAHEIT 2013)Brave New World - Media Ecology In The Digital Age (#CAHEIT 2013)
Brave New World - Media Ecology In The Digital Age (#CAHEIT 2013)Hamza Khan
 
Banner
BannerBanner
Bannercmida1
 
Webcast 12 09
Webcast 12 09Webcast 12 09
Webcast 12 09
Andreas Schulte
 
MAPA CONCEPTUAL DEL AGUA
MAPA  CONCEPTUAL  DEL  AGUAMAPA  CONCEPTUAL  DEL  AGUA
MAPA CONCEPTUAL DEL AGUA
karinagasdaly
 
01/28/13 US Supreme Court Response (gujarati)
01/28/13 US Supreme Court Response (gujarati)01/28/13 US Supreme Court Response (gujarati)
01/28/13 US Supreme Court Response (gujarati)VogelDenise
 
Let's play with Goldfish
Let's play with GoldfishLet's play with Goldfish
Let's play with Goldfish
Tetsuyuki Kobayashi
 
Hàbils digitals: factor de competitivitat per a les empreses
Hàbils digitals: factor de competitivitat per a les empresesHàbils digitals: factor de competitivitat per a les empreses
Hàbils digitals: factor de competitivitat per a les empreses
Agència per a la Competitivitat de l'empresa - ACCIÓ
 
01/28/13 US Supreme Court Response (esperanto)
01/28/13 US Supreme Court Response (esperanto)01/28/13 US Supreme Court Response (esperanto)
01/28/13 US Supreme Court Response (esperanto)VogelDenise
 
01/28/13 US Supreme Court Response (haitian creole)
01/28/13 US Supreme Court Response (haitian creole)01/28/13 US Supreme Court Response (haitian creole)
01/28/13 US Supreme Court Response (haitian creole)VogelDenise
 

Viewers also liked (17)

Biomas de venezuela
Biomas de venezuelaBiomas de venezuela
Biomas de venezuela
 
Manejo de Desechos
Manejo de DesechosManejo de Desechos
Manejo de Desechos
 
How Fairytales Have Impact On Healing
How Fairytales Have Impact On HealingHow Fairytales Have Impact On Healing
How Fairytales Have Impact On Healing
 
Presentation1
Presentation1Presentation1
Presentation1
 
My place (expresion oral)
My place (expresion oral)My place (expresion oral)
My place (expresion oral)
 
Brave New World - Media Ecology In The Digital Age (#CAHEIT 2013)
Brave New World - Media Ecology In The Digital Age (#CAHEIT 2013)Brave New World - Media Ecology In The Digital Age (#CAHEIT 2013)
Brave New World - Media Ecology In The Digital Age (#CAHEIT 2013)
 
Sunseeker predator 82
Sunseeker predator 82Sunseeker predator 82
Sunseeker predator 82
 
Banner
BannerBanner
Banner
 
Webcast 12 09
Webcast 12 09Webcast 12 09
Webcast 12 09
 
MAPA CONCEPTUAL DEL AGUA
MAPA  CONCEPTUAL  DEL  AGUAMAPA  CONCEPTUAL  DEL  AGUA
MAPA CONCEPTUAL DEL AGUA
 
01/28/13 US Supreme Court Response (gujarati)
01/28/13 US Supreme Court Response (gujarati)01/28/13 US Supreme Court Response (gujarati)
01/28/13 US Supreme Court Response (gujarati)
 
Let's play with Goldfish
Let's play with GoldfishLet's play with Goldfish
Let's play with Goldfish
 
Baca buku
Baca bukuBaca buku
Baca buku
 
Integers
IntegersIntegers
Integers
 
Hàbils digitals: factor de competitivitat per a les empreses
Hàbils digitals: factor de competitivitat per a les empresesHàbils digitals: factor de competitivitat per a les empreses
Hàbils digitals: factor de competitivitat per a les empreses
 
01/28/13 US Supreme Court Response (esperanto)
01/28/13 US Supreme Court Response (esperanto)01/28/13 US Supreme Court Response (esperanto)
01/28/13 US Supreme Court Response (esperanto)
 
01/28/13 US Supreme Court Response (haitian creole)
01/28/13 US Supreme Court Response (haitian creole)01/28/13 US Supreme Court Response (haitian creole)
01/28/13 US Supreme Court Response (haitian creole)
 

Similar to CORBIS.DOC

Information architecture for websites and intranets
Information architecture for websites and intranetsInformation architecture for websites and intranets
Information architecture for websites and intranets
Content Formula
 
MongoDB vs Firebase
MongoDB vs Firebase MongoDB vs Firebase
MongoDB vs Firebase
MERN Stack Development Company
 
Software Requirement Analysis and Thinking Process towards a good Architecture
Software Requirement Analysis and Thinking Process towards a good ArchitectureSoftware Requirement Analysis and Thinking Process towards a good Architecture
Software Requirement Analysis and Thinking Process towards a good Architecturemahmud05
 
Image processing project list for java and dotnet
Image processing project list for java and dotnetImage processing project list for java and dotnet
Image processing project list for java and dotnet
redpel dot com
 
Datalayer Best Practices with Observepoint
Datalayer Best Practices with ObservepointDatalayer Best Practices with Observepoint
Datalayer Best Practices with Observepoint
Mike Plant
 
Real Time Search
Real Time SearchReal Time Search
Real Time Search
Wowd
 
Web Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics WebWeb Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics Web
Aatif19921
 
Web crawler with email extractor and image extractor
Web crawler with email extractor and image extractorWeb crawler with email extractor and image extractor
Web crawler with email extractor and image extractor
Abhinav Gupta
 
Lunar.pptx
Lunar.pptxLunar.pptx
Lunar.pptx
stowlson
 
Designing with content-first
Designing with content-firstDesigning with content-first
Designing with content-first
Andy Parker
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Marianne Sweeny
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
Marianne Sweeny
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Rackspace
 
Rocket jones 4 stage process
Rocket jones 4 stage processRocket jones 4 stage process
Rocket jones 4 stage process
Zachary Ostin
 
Tech Stack & Web App Development For Startups
Tech Stack & Web App Development For StartupsTech Stack & Web App Development For Startups
Tech Stack & Web App Development For Startups
ZimbleCode
 
Success Story: Microsoft Deploys ER/Studio
Success Story: Microsoft Deploys ER/StudioSuccess Story: Microsoft Deploys ER/Studio
Success Story: Microsoft Deploys ER/Studio
Embarcadero Technologies
 

Similar to CORBIS.DOC (20)

Information architecture for websites and intranets
Information architecture for websites and intranetsInformation architecture for websites and intranets
Information architecture for websites and intranets
 
MongoDB vs Firebase
MongoDB vs Firebase MongoDB vs Firebase
MongoDB vs Firebase
 
Software Requirement Analysis and Thinking Process towards a good Architecture
Software Requirement Analysis and Thinking Process towards a good ArchitectureSoftware Requirement Analysis and Thinking Process towards a good Architecture
Software Requirement Analysis and Thinking Process towards a good Architecture
 
Image processing project list for java and dotnet
Image processing project list for java and dotnetImage processing project list for java and dotnet
Image processing project list for java and dotnet
 
Datalayer Best Practices with Observepoint
Datalayer Best Practices with ObservepointDatalayer Best Practices with Observepoint
Datalayer Best Practices with Observepoint
 
Real Time Search
Real Time SearchReal Time Search
Real Time Search
 
Web Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics WebWeb Search Engine, Web Crawler, and Semantics Web
Web Search Engine, Web Crawler, and Semantics Web
 
Web crawler with email extractor and image extractor
Web crawler with email extractor and image extractorWeb crawler with email extractor and image extractor
Web crawler with email extractor and image extractor
 
Lunar.pptx
Lunar.pptxLunar.pptx
Lunar.pptx
 
Designing with content-first
Designing with content-firstDesigning with content-first
Designing with content-first
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
HatemCV201508
HatemCV201508HatemCV201508
HatemCV201508
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDB
 
Rocket jones 4 stage process
Rocket jones 4 stage processRocket jones 4 stage process
Rocket jones 4 stage process
 
Tech Stack & Web App Development For Startups
Tech Stack & Web App Development For StartupsTech Stack & Web App Development For Startups
Tech Stack & Web App Development For Startups
 
Success Story: Microsoft Deploys ER/Studio
Success Story: Microsoft Deploys ER/StudioSuccess Story: Microsoft Deploys ER/Studio
Success Story: Microsoft Deploys ER/Studio
 
Peerbelt_Presentation
Peerbelt_PresentationPeerbelt_Presentation
Peerbelt_Presentation
 

CORBIS.DOC

  • 1. The commercial advantages of Westlight’s approach to keywording Ever since photo agencies began publishing images on CD-ROM, Westlight has been one of the agencies most intensely involved in keywording. We have invested more than $1 million in keywording since 1992. Our efforts have always focused on maximizing usability for the customer. We recognized early on that in a highly competitive market, just having great images wouldn’t be enough if clients couldn’t find them easily. Having too few keywords would unduly limit the client’s options and neglect many sales opportunities, whereas having too many keywords would confuse the client and imply a lack of discipline. Efficient searching would be as important as content and, for clients overloaded with choices, might become the deciding factor. Westlight was well prepared to address the issues associated with keywording (and with digital imaging in general) because we had been using computers longer than most other agencies. By 1992 we already had nearly a decade’s worth of information about our business—representing tens of thousands of transactions—stored in our computer network. Armed with this information and the insight of an experienced sales staff, we did not have to theorize about what our clients might want, but could make practical, informed decisions based on facts gathered on the live battlefield of day-to-day business. Through analysis of our sales data and discussions with our salespeople, we were able to identify not only which kinds of images sold well, but the ways clients asked for them and the criteria that led to their selection. Westlight owner Craig Aurness came up with the idea of translating these criteria into searchable database fields that clients could use to simulate their live interaction with Westlight’s researchers. The resulting search “language,” marketed under the name QUESTock, was an immediate hit with clients. QUESTock allows clients to search for images based on their layout requirements in addition to content. Included among the search options are graphic criteria such as colors, lighting, and camera angles, each placed in a separate database field. This fragmented database structure was designed to be flexible and dynamic because we knew the number of images would grow, increasing the search possibilities. Most other image browsers hide the keywords, so users must use the keyboard to type them in, essentially learning the vocabulary by trial and error. Our proprietary interface removes the mystery, making the search options visible so users can use the mouse to click on words. Our language structure enables us to specify shades of meaning that would be impossible to express concisely if only conventional keywords were used. For example, clients can find “images of people in which you can’t tell their ethnicity,” or “images that weren’t shot in Alaska but look as if they could have been,” or “images that contain both trees and water, but in which the trees are dominant.” Some of these searches are © 1998 Westlight Page 1 of 18 11/12/2016
  • 2. accomplished through the use of special fields, others by spelling the keyword in a special way. Each new edition of QUESTock is thoroughly field-tested through the release of beta versions. More recently we have begun analyzing the various image search engines that have appeared on the World Wide Web. We have learned from our and others’ mistakes and continue to adapt our product to current market demands. Our database structure was so carefully planned from the beginning that its basic appearance has remained virtually unchanged since our first disc. We have continued to make additions to the search options, but have not had to remove any options. Our systems are streamlined to the point that the speed at which we can keyword new images has nearly doubled since we started. Thus, our initial investment has proved to be highly cost-effective. Craig and Daphne Aurness continue to assume the central role of overseeing QUESTock’s evolution, maintaining its dominant position as the most innovative commercial image search system ever devised. Behind the scenes, writer-editor Richard Carson has been involved in all aspects of keywording labor. Richard, whose relationship with Westlight dates back to 1985, conceives and maintains most of the in-house systems we use to enter and process keywords. These systems make use of multiple databases and programs written in Microsoft FoxPro. This use of a full-strength database program, outside the environment of the image browser, is the key to our success in maintaining keyword quality. Richard continues to do most of our keywording but has introduced quality control measures into our programs, so that other workers can enter keywords without deviating from our stylistic standards. © 1998 Westlight Page 2 of 18 11/12/2016
  • 3. Summary of keywording issues This book shows how we have addressed the following issues related to keywording. While they are not all equally important, at least some of them should sound familiar to anyone who has been involved in a keywording project. • Quickly generating a few basic keywords for in-house use, before sending the images on to the keyworders. • Getting data from similar images that have already been keyworded, both to avoid duplication of effort and to maintain consistency. • Calling up a list of suggested words in response to a common photo subject. • Automatically generating words—both mandatory and suggested—based on data the image already has. • Double-checking keywords against related information in other database fields. • Maintaining consistency in spelling. • Changing the appearance of the keywords for use in projects having different editorial standards. • Identifying synonym pairs so the system will respond to either keyword in the same way. • Translating the keywords for distribution to overseas markets. • Controlling the placement of geographical information so that, for example, a park is not identified as a city. • Identifying keyword subsets so that only certain words are exported, as when a single keyword field is to be broken up into multiple fields. • Placing keywords that have never been used before in a temporary file so they can be reviewed by an editor. • Controlling the flow of keywords as they pass through work stages requiring the use of different programs. • Using statistical output to determine how often one word appears with another and to identify cases of over- or under-usage. • Placing words in linguistic categories so they can be sorted in various ways on reports. © 1998 Westlight Page 3 of 18 11/12/2016
  • 4. Preliminary steps before keywording As mentioned earlier, Westlight already had been using computers for many years before stock agencies began scanning and keywording images. Therefore, our images already were passing through work stages that involved data entry. The only difference is that the data entry involved information exclusively for our use and not directly accessible to clients. For example, we used computers to trace an image’s duping status and other facts related to the handling of the physical transparency. When we introduced keywording, we created a new department in our company for it. We did not allow this new work step to intrude on the existing data entry tasks to which the library workers had become accustomed. Transparencies make their way into our library as fast as they always have. Indeed, a new image usually is in our files and accessible to clients some weeks before it has been fully keyworded. We found that our library workers had all along been entering certain information about images that suggested keywords. For example, they were entering the photographer’s name, the name of the physical file in which they planned to put the transparency, and the image’s orientation. We found ways of pulling this information so that images that have not yet been formally keyworded can be combined in the in-house browser with those that have. Thus, our researchers immediately have some way of finding the image electronically while they wait for the keyworder to provide more detailed information. One of the very first facts to be entered about an image is the name of the physical file into which we intend to place the transparency (our library is broken down into more than 2,200 named files). We use this name to generate as many keywords as we can determine from it. For example, if the image is being filed under “Monument Valley,” we know that in addition to this term, it also can take the keyword “deserts.” These keywords are temporary and will be overwritten by whatever data the keyworder later provides. The database that contains the above information is separate from the one that contains finished keywords. When we output image data for in-house use by our researchers, the output routine looks in both databases for the image. If it finds the image in the keyword database, it generates finished data; if it doesn’t find the image there, it gets rudimentary keywords from the other database. The workers who enter the file subjects also have certain limited keywording options. They can choose a few commonly used words from two lists of major categories and subcategories. They also can enter the geographical location. At this point, the level of keyword quality need not be high. The person formally in charge of keywording the image (usually Richard Carson) will determine what keywords it should have, and these initial keywords will not be used again. © 1998 Westlight Page 4 of 18 11/12/2016
  • 5. Quick keywording of similars As the number of digital images grows, the chances increase that a new image will closely resemble an older one. Similar images need to have similar keywords. Clients will not trust a search system that reveals great differences in the keywords of obviously similar images; they want to know that a single search will find them all. At the same time, we don’t want to have to go to great pains to compare all such images to be sure we are giving them the same keywords. Our keyworders don’t have to track down and study the data for an older image every time they encounter a new image that is similar. All they need to know is the number of the older image. If the new image is a close similar of one that has been keyworded before, the keyworder can simply enter the number of the older image that it resembles. The program displays the title of the older image and asks for confirmation. It then copies all of that image’s keywords to the new image. Only visual descriptions are copied; in- house statistics unique to each image are unaffected. The photographer’s name is not copied, so this method can be used with similar images by different photographers. This “cloning” option is especially useful when many similar images are being worked on in succession. Even when the images are not identical, this option can be used as a starting point. For example, the keyworder may have a series of abstract background patterns that differ only in color. The data from the first such image can be copied to all the others, and then the data relating to color can be changed manually, perhaps by another worker who is in charge of entering colors. © 1998 Westlight Page 5 of 18 11/12/2016
  • 6. Getting keywords based on subject In addition to our physical filing system, we have constructed a separate system of subject codes exclusively for keywording, based on what kinds of images we have tended to select for digital scanning. This system is more visually oriented and makes finer distinctions between certain compositional elements that specifically affect keywording. For example, out on the floor we may have just one physical file for all mountain climbing shots, whereas when keywording we will distinguish between images that contain one and two people, because of the different concepts suggested by each. The subject codes are organized into a hierarchy that can be represented by an outline, rather like the Dewey Decimal System (except that our system doesn’t have to account for every possible subject, only those in which we specialize). These codes act as macros that call up short lists of selected keywords associated with the subject. The use of codes is entirely optional, but helpful. At his or her discretion, the keyworder can determine what subject code (if any) applies to the image and then enter that code to get a list of suggested keywords. He or she then can deselect any unwanted words before appending the words to the image. (Alternatively, the keyworder can simply accept all the words and wait until a later step before examining them.) More than one code can be entered. Though we store the last code entered, it is only a means of copying keywords and generally is never referred to again. Thus, we can make updates to the coding system, possibly making some codes obsolete in the process, without affecting the data for older images. Appendix A shows our subject coding system in detail. The short list is for the keyworder’s reference while working and shows only enough information to allow all codes to be seen at a glance. The long document reveals the entire contents of the code database, showing the list of words that each code calls up. This system is entirely flexible and enables us to add, delete, and move codes or words as our knowledge improves. Changes to the codes require that the keyworders be given updated lists, but changes to the words do not. © 1998 Westlight Page 6 of 18 11/12/2016
  • 7. Geographical locations For reasons that will become clear, we ask our keyworders to look at the location information for the image, if applicable, before proceeding to the keywords field. Locations carry with them a whole group of considerations that other keywords do not have: • They are numerous (about a third of all keywords). • They occur in clusters rather than individually: “Hollywood, Los Angeles, California, United States, North America.” Some sort of automation is required to ensure that all terms in the cluster will be included every time. • Any attempt to keep track of them by categorizing them creates problems with proper placement. For example, creating a separate data field for cities results in terms being put there that are not cities. (Westlight has separate fields for cities, states, countries, and continents, plus a special fifth field we call region—and many location terms are not any of these five.) • Many terms are ambiguous; for example, Victoria is a city in both Canada and China (Hong Kong) and a territory in Australia. Westlight has vigorously tackled the problem of monitoring the constant creation of new location terms, and we believe we have it largely under control. Part of our success is due to our creating a temporary storage space for newly created terms, a technique we discuss in the next section. Deciding when to include the location Even when an image’s location is known, it is not necessarily relevant. For example, on the CDs we send to clients, we generally omit the location information for generic indoor shots of offices or laboratories. Nevertheless, we have allowed ourselves the option of changing our minds by creating a storage area for this information that does not require us to include it. As stated above, our database contains separate fields for city, state, country, continent, and region. If the photographer has provided us with such information, we always enter it in these fields. However, this information will not be included in anything we release to clients unless we take the additional step of copying it to the keywords field. Copying the data results in the automatic addition of the keyword “locations.” Our output program looks for this word and interprets it to mean that the contents of the five location fields are to be included in the output. If we decide that including the location was a mistake, we can delete the location terms from the keywords field. The location information will no longer be generated, but we still have it stored in the separate location fields for possible later use. © 1998 Westlight Page 7 of 18 11/12/2016
  • 8. Controlling proper entry of locations Because Westlight maintains five separate fields for location data, we have to make sure not only that terms are spelled correctly, but also that they are placed in the correct one of the five fields. When clients click on the country field, they expect to see only countries and only one spelling for each country. Any time we enter location data, our spell checking database is active. If we misspell a name and the spell checker recognizes the misspelling, the spelling will be corrected on the spot. If the misspelling is not in the spell checker, it probably will be flagged as a “new” word and stored in a cache where it later can be evaluated and possibly added to the spell checker. The spell checker recognizes all two-letter postal abbreviations and replaces them with fully spelled-out names, thereby allowing the keyworder to enter U.S. states quickly. It also converts certain other abbreviations, such as “USA,” “NYC,” “L.A.,” and ‘S.F.” After checking the spelling, the data entry routine then looks at our thesaurus. This database generates new words in response to existing words; for example, it guarantees that every image that says “Los Angeles” also will say “California.” (See “Other additions to keywords” and Appendix D.) The thesaurus is where we store the information about whether a term is a continent, country, state, city, or region. A number code from 1 to 5 indicates which it is. The data entry routine places the word in the appropriate field based on the number it finds here. The keyworder needs only to enter the minimum that he or she knows the routine will recognize, and it will fill in everything else. For example, with Los Angeles, the keyworder need only enter “L.A.” The spell checker will change this to “Los Angeles,” and the thesaurus then will fill in all the fields based on “Los Angeles.” The original entry “L.A.” doesn’t even have to be put in the right field; no matter where it is entered, all five fields will be filled in correctly. For that matter, the original entry doesn’t have to belong in any of the five fields at all. For example, “Grand Canyon” does not fall into any of our five categories. With Grand Canyon images, “Arizona” is the most specific term that can be entered in the location fields. Nevertheless, if the keyworder incorrectly enters “Grand Canyon” as, say, a city, the thesaurus not only will add “Arizona” and all other terms that derive from it, but also will delete “Grand Canyon” (because that term does not show a number code from 1 to 5). When the keyworder sees the original term disappear, he or she knows that it can be entered only in the general keywords field and not in one of the special location fields. With locations that are obscure or ambiguous, the keyworder may have to enter more information. For example, entering only “Victoria” will not work. This is an ambiguous name that has been purposely left out of the thesaurus, so no additional terms will appear. But “Victoria, BC” will work, because the routine does recognize “BC.” In such a case the keyworder must be responsible for putting the unrecognized term in the right field; the thesaurus doesn’t know where it belongs, and if it’s in the wrong field, the thesaurus probably will overwrite it with something else that belongs there. Few situations require the entry of more than two terms. (Westlight has given thought to the idea of attaching codes to ambiguous terms that would specify their meaning, so that the thesaurus can process them.) © 1998 Westlight Page 8 of 18 11/12/2016
  • 9. Viewing and editing keywords Only after all the preceding matters have been addressed is the keyworder ready to look at the contents of the keywords field. The keyworder selects the keywords field and sees all the keywords that have been created so far (scrolling if necessary): those generated by any subject codes that were entered, the location if it was added (plus the word “locations”), and any other words that may have been put there by someone else. The keyworder spends as much time in this screen as necessary, deleting unwanted words and adding others. It is not necessary to preserve alphabetical order. An experienced keyworder can enter fewer words, knowing that the program will automatically generate some. When the keyworder is satisfied with the list as it stands, he or she exits this screen. Doing so launches a series of programs that evaluate the keywords. Spell checking The keywords are spell checked before anything else is done with them (see Appendix C). The subsequent programs then will base their actions on the correct spelling. If a word is one we have decided to forbid (usually a vague term such as “life” or “weather”), the spell checker will delete it altogether by replacing it with a blank. As with any spell checker, there are some things it cannot do. For example, it does not know if the word “painting” refers to an art object or to the act of painting a house. Therefore, it does not change the word to “paintings,” nor will the thesaurus later add “paintings.” Instead, such words are handled by the program that follows. Consistency with other fields The next program looks for any of a group of problem words and compares them with other fields in Westlight’s QUESTock database structure, flagging any conflicts. This program is analogous to a grammar checker. For example, if the keyword list contains the word “men,” the program looks at our “age” field to see if it is marked “adults,” and at our “gender” field to see if it is marked either “male” or “both.” If a discrepancy is found, the program asks the keyworder what to do: delete the keyword or correct the other field. In this case, the keyworder would have no other options. The program also makes optional suggestions. For example, we use the keyword “pairs” for any image containing two people, but we also use it in other senses. Upon finding this word, the program will check the “number of people” field to see if it says “two.” If it doesn’t, the program will offer to change it, but will not require any changes. After checking the other fields, the program looks for certain pairs of words that contradict each other. For example, if an image of Florida or Bermuda has the keyword “tropics,” the program tells the keyworder that this is not a tropical location and that the © 1998 Westlight Page 9 of 18 11/12/2016
  • 10. image would more appropriately take our special keyword “generic tropics” that allows for nonliteral searches. If the shot is not generic in nature, “tropics” can be deleted altogether. Finally, the program looks for singular nouns that may or may not take a plural form. For example, if the image has the keyword “aspen,” the program asks if this is a city in Colorado or a tree. If it is a city, the program does nothing; if it is a tree, the program adds an ‘s’ to it. Any such occurrence that results in a change will cause the program to run through another cycle. When the program has run through an entire cycle without any changes being made, it stops. Trapping new words The next program compares the keywords to our vocabulary database. This database, which is updated regularly, lists every keyword we have ever used, plus some (mostly locations) that have not been used but have been approved for use should they occur. (See Appendix H.) If a keyword isn’t found in the vocabulary, it must be either (1) a misspelling that the spell checker didn’t catch, (2) a weak choice that the keyworder ought to reconsider, or (3) a legitimate new term that should be added to the vocabulary. A popup appears showing the unrecognized word and asking the keyworder to categorize it. One of the options is to correct a typing error. If the word is a simple typo, the keyworder can call it up and correct it; or if it is decided that the word is no good, it can be deleted. (An additional option, available only to experienced workers, adds the typo to the spell checker so that it will be automatically corrected if it ever occurs again.) If the word still isn’t recognized even after it is corrected, the popup appears again. If the word is spelled correctly and not recognized, the keyworder can choose one of about 20 major categories to attach to the image, and possibly a subcategory. This option is useful mainly for locations. The keyworder can here specify that the new term describes, say, a city in Italy, which will help the editor later when the term is evaluated. On the other hand, the keyworder can elect to let the editor decide by choosing “Don’t know.” The word and its category are appended to a small database that duplicates the field structure of the main vocabulary database. This small database acts as a cache, or trap, for all keywords that have been created since the last time the vocabulary was updated. The keywording program recognizes words that currently are in the trap, so if the keyworder plans to use this same word again with the next image, he or she will not have to go through this step again. Our vocabulary database is tightly controlled; it would become an ungovernable mess if everyone were adding words to it directly. The trap allows the keyworder to use the new keyword, but he or she has merely submitted it for consideration; it will not permanently become part of the vocabulary database until it passes inspection. © 1998 Westlight Page 10 of 18 11/12/2016
  • 11. Richard periodically inspects the trap, putting misspellings into the spell checker and cleaning up the categories for the remaining words. When the trap contains only clean data, it is appended to the vocabulary and cleared out. Note that if the keyworder is found to have entered misspellings, it is not necessary to hunt down that person’s work and correct it. It is enough to put the misspellings in the spell checker. All keywords are spell checked once more before being released to clients, so the misspellings will be corrected no matter where they exist in the system. Suggesting additional words Once the program is able to recognize all the keywords, it alphabetizes and redisplays them in their current, spell-checked form. It then generates further keyword suggestions based on these existing words. The program finds each of the current keywords in the vocabulary database, except those that have been trapped. For each keyword, it pulls the contents of four fields that contain lists of additional words. (How these lists get written is explained under “Getting statistics on keyword usage.”) When it has pulled all the lists for all the words, it sorts them, removes duplicates (including words the image already has), and displays the resulting compilation as three lists. The first list contains compulsory keywords. For example, if “apples” was one of the original keywords, but “fruits” was not, then “fruits” will show up in the compulsory list (with “apples” next to it to show where it came from). Compulsory words are inseparable from the word that generated them; they cannot be deleted unless the other word also is deleted. Normally there should be no problems with any of the compulsory words, but the keyworder should look at them, to confirm both the legitimacy of the original keywords and our judgment regarding compulsory status. The other two lists contain optional keywords. They are divided according to probability: the words in one list are considered more likely to apply to the image than the words in the other list. (If one of the original keywords gave a word high probability and another gave it low probability, it is considered to have high probability.) The keyworder can go back and forth between these lists as many times as he or she wants, selecting and deselecting words. When this process is finished, the selected words are appended to the image. At this point the keyworder can move on to the next image. However, the selection process may have prompted ideas for even more words or revealed errors in the original words. In that case, the keyworder can open the keywords field again and repeat the entire procedure. Appendix B shows sample excerpts from the vocabulary database. For each excerpted word, the lists of compulsory and optional words are shown. These lists are stored in the form of FoxPro memo fields. © 1998 Westlight Page 11 of 18 11/12/2016
  • 12. Other additions to keywords Words generated by other databases The output routine that formats our keywords for publication uses the same spell checker and thesaurus that are used during data entry. The spell checker changes any misspellings it finds, and the thesaurus then generates additional compulsory words based on the words that are there. Because we always spell check the keywords before publishing them, we can use the spell checker to make global changes. If we decide to change the way we spell a word, we can simply have the spell checker make the change instead of tracking down every occurrence of the word and changing it manually. Of course, we also have to make sure the new spelling is reflected in the thesaurus. Appendix C shows an excerpt from the spell checker. If the spelling on the left is found, it is replaced with the spelling on the right. These two fields are all that the spell checker consists of. Note that forbidden keywords are replaced with blanks, which the output routine ignores. The thesaurus makes essentially the same additions that the “compulsory” word field (Appendix B) makes during data entry. Because the contents of the compulsory field may change, the thesaurus assures that all images will reflect the current state of the compulsory field. Also, the thesaurus is more thorough because it goes through two cycles; after generating words, it then creates second-generation words from the words it generated the first time. Additionally, the thesaurus fills in the location fields in a manner similar to that described earlier under “Controlling proper entry of locations.” For example, someone may have entered “Arizona” in keywords but neglected to enter anything in the location fields. The thesaurus will automatically fill in the location fields based on “Arizona,” as well as adding these terms (plus the keyword “locations”) to keywords. Appendix D contains two views of the thesaurus. The first list is alphabetized on the found keyword; this list shows which words the thesaurus generates if it finds the boldfaced word. The second list is alphabetized on the generated keyword; it shows all the ways the boldfaced word might be generated. If the thesaurus reflects any poor decisions on our part, a list like the second one helps us diagnose the problem. Getting rid of a bad keyword may be as simple as deleting a single entry from the thesaurus. Words generated by other fields In the last section we discussed comparing certain keywords to other fields, such as making sure that the keyword “men” agrees with what is in the age and gender fields. Westlight’s current database structure creates about 40 such situations where keywords must agree with other fields. Most such keywords do not have to be manually entered in the keywords field at all. The data entry program looks for them as a quality control measure, to ensure that they © 1998 Westlight Page 12 of 18 11/12/2016
  • 13. are correct where they exist; but if they don’t exist, it is not necessary to add them. As long as the other field that they relate to has been properly filled in, these words will be automatically added to the keywords field. For example, if our “season” field specifies “spring,” that word will automatically be added to the keywords during output if it is not already there. An experienced keyworder, who knows that this automatic addition will take place, will not bother to enter the season as a keyword, unless he or she has been specifically instructed to do seasons. Words such as “men” and “women” that describe people are a little different because the program cannot always be sure they apply. If the age field specifies only adults, then “men” and/or “women” can be safely added; but if the age field also specifies children, an ambiguity results. In such cases “men” and “women” would be among the words offered as optional during keywording (see “Suggesting additional words” and Appendix B). Our keyworders are advised to become familiar with the 40 keywords that relate to other fields, so that they know not to expend effort on them unless they are told otherwise. © 1998 Westlight Page 13 of 18 11/12/2016
  • 14. Synonyms and variants Westlight keeps track of word pairs and groups that essentially should be treated as one word. For example, we want all our programs to respond the same way regardless of whether the keyword field contains “automobiles” or “cars.” We have several ways of controlling such word pairs. In the thesaurus (Appendix D), each pair is represented by twin entries; for example, automobiles→cars and cars→automobiles. Regardless of which word is entered, the other word will be included in the output. In the vocabulary database, both words contain the same data in their records, so that the same lists of compulsory and optional words will be pulled regardless of which is entered. One of the four list fields in the vocabulary database is exclusively for synonyms (see Appendix B). Usually this field contains only one word, but some words have two synonyms, such as “cougars/mountain lions/pumas.” Synonyms are considered a special kind of compulsory word and are simply put into the compulsory list during keywording. Giving them their own field helps us to remember that any changes we make to a word’s data need also to be made to its synonym’s. An additional field allows us to choose one synonym as a primary spelling and make the other synonyms subordinate to it. When we print reports, we can choose to suppress the subordinate synonyms so that only one version of the term is printed. © 1998 Westlight Page 14 of 18 11/12/2016
  • 15. Adapting the keywords to different styles Sometimes Westlight’s images are combined with images from other agencies, as when we submit them for inclusion on the Stock Workbook’s CDs. In such cases we are expected to have our keywords adhere to editorial guidelines that may differ from those we use for our proprietary QUESTock editions that we publish ourselves. The Picture Agency Council of America (PACA) did not release an official statement on keywording style until 1995, a year after Westlight released QUESTock to the market. As PACA-style keywording has become more common, Westlight has begun including two search engines on our CDs: one in our classic QUESTock style, and one in PACA style for clients who are used to that or who want something simpler. We continue to market our QUESTock interface aggressively (and successfully) because its fragmented database structure provides a flexibility that cannot be accommodated by PACA standards. To conform to PACA standards, we maintain a separate spell checker and thesaurus that work together to change the appearance of the keywords. When we process the keywords, we simply specify that the alternate spell checker and thesaurus are to be used. The PACA spell checker is much bigger than our normal one because it contains all of our “correct” spellings in addition to all of our recognized misspellings. Besides correcting misspellings, it also converts our normally uppercase keywords to upper and lower case. Appendix E shows an excerpt. The alternate thesaurus contains many more synonym pairs than our normal one. For example, normally we restrict the number of singular nouns we include, by having our spell checker change singulars to plurals. The Stock Workbook is more inclusive and asks for both singulars and plurals. Our alternate thesaurus contains many twin entries for such variations. It also has an “expand” field that we use to mark these “extra” spellings, so that we can suppress them should the occasion arise. Appendix F shows an excerpt. Because we regularly add new words to our vocabulary, we need to do regular updating of our alternate spell checker so it will convert the new words to upper and lower case. We periodically run a program that compares the PACA spell checker and thesaurus to the QUESTock ones, looking for potential new entries and putting them into a temporary work space. If we overlook anything, with luck the effect will be only aesthetic, as the Stock Workbook’s browser is not case-sensitive. © 1998 Westlight Page 15 of 18 11/12/2016
  • 16. Foreign translations Westlight has taken on the ambitious task of translating its keywords and menu options into six foreign languages for international distribution. The translations are being provided by our foreign agents and by our bilingual employees. Our translation database contains not only all our keywords but also all the terms that appear in our other QUESTock search fields. Each entry has a field to store a translation in each of the six languages. If the foreign language has two or more equally valid synonyms for the same English word, they all can be stored in one field, delimited with commas, as space permits. Appendix G shows an excerpt. Various other fields help to classify the terms so that, for example, keywords can be isolated from other terms, or locations from other keywords. We can give our translators sorted lists, grouping similar terms together to aid their comprehension. This database is used by a special output program that formats our keywords in the usual manner, then looks up their translations and makes all the necessary substitutions. The result is more accurate than that produced by over-the-counter translation programs, which work poorly with mere lists of words that have no grammatical context. For example, generic translators tend to translate many plural nouns, such as “controls,” as if they were third person singular verbs. Because we know that we are not using the word as a verb, we can guarantee that our translator will never interpret it as one. Our customized translator has only one option, the one translation that we know is always correct. Nevertheless, even among our keywords there are ambiguities. When we have used an English word in more than one sense, we need a way of handling the multiple possibilities. Such words have multiple records in our translation database, one for each meaning we have used. Each record contains the same English word but different translations of it. A separate “definition” field contains a short description of which usage is being addressed. At present our customized translator has no way of knowing which translation to use. It can, however, identify ambiguous words by looking to see if the definition field has anything in it. If the word is ambiguous, the translator keeps it in English, but flags it by putting an asterisk in front of it. When we see the finished output, the ambiguous words are all at the top of the alphabetized list, and we can translate them manually by referring to a printout of the translations. Westlight is exploring the possibility of attaching codes to ambiguous keywords at the time they are originally entered, specifying their meaning, so that the translation program will know from the code which translation to use. © 1998 Westlight Page 16 of 18 11/12/2016
  • 17. Keyword subsets Westlight’s QUESTock CDs contain as many as 50 database fields on which clients can search. Some of these fields are essentially pared-down versions of the keywords field. For example, the “major subject” field contains a short list of selected keywords, such as “sports,” that are analogous to the major divisions of a printed catalog. The “concept” field lists abstract words such as “success” that are especially useful to advertisers. All of the words in these fields also can be found in the keywords field, just like any other keyword; but because they are among the most common choices, the special fields act as a way of screening out all other keywords, making them easier to find. These special fields do not exist in the database in which we store finished keywords, and our keyworders normally pay no attention to such categories. The fields are created by the output routine we use to format the keywords for inclusion on our CDs. The program looks for these specific words among the keywords, and any that it finds are copied to the new field that it creates. Several tiny databases contain the keywords that are to be isolated. A “subjects” database contains the approximately 30 keywords we have selected to be major subjects. A “concepts” database contains 200 selected concepts, and so on. The output routine compares an image’s keywords to the words in these databases. If the image has a keyword that is in the subjects database, that word is copied to the subjects field in addition to being left in the keywords field. Whenever we run the output routine, any changes we have made to the keywords will be automatically reflected in the special fields, which are newly created each time. The same process works in reverse. Occasionally our workers may do rudimentary keywording within the image browser of images that have not yet been added to the FoxPro database. In so doing, they may use the special fields as an easy way of selecting major subjects and concepts, rather than entering them in the keywords field. Then they export the image data out of the image browser and import it into the FoxPro database. The import routine takes whatever words are in the special fields and puts them into the keywords field. © 1998 Westlight Page 17 of 18 11/12/2016
  • 18. Getting statistics on keyword usage Several of our programs involve calculating the frequency with which a given keyword occurs among the images in our image database. The most obvious use of this information is simply to list the words together with their counts. Appendix H is an example of such a report. It lists every keyword in our vocabulary database except those with a count of zero. The category heads and subheads come from the category fields in the vocabulary database. The words are grouped by category and then in descending order of frequency, enabling us to determine, for example, which species of animal or which American cities are most represented. We also can run a similar report that groups all our search terms according to the QUESTock field in which they appear on our CDs. For the kind of list shown here, the counts are determined by running the entire keyword database through our output routine, creating a temporary database from the output, and then running a program that counts each word and puts the count into the vocabulary database. The counts thus take into account the effects of spell checking and generation of additional thesaurus words. The process, which can be accomplished in a few hours on a fast computer, is repeated periodically. For some uses we do not require as much accuracy and can get the information directly from the “raw,” unprocessed keywords in the keyword database. We can link the keyword data to sales reports, telling us if certain keywords tend to occur repeatedly among images that are high sellers. Relationship of one keyword to another One of our programs generates a list of keywords that tend to coexist with a given keyword. The program finds all images that have the keyword, pulls all of those images’ other keywords, and calculates how often each of those other keywords appears. The result is a list that tells us, for example, that 35% of images containing mountains also contain trees. We use such lists as a guide in helping us decide what optional keywords should be displayed during data entry. Words that invariably occur are placed in the compulsory group if they are clearly related. Other words that occur more than half the time are placed in the higher-level optional group. Words that occur less than half the time but more than a quarter of the time are placed in the lower-level optional group. Thus, a keyworder who enters “mountains” but does not enter “trees” will see “trees” displayed among the lower-level options. The same program can filter out certain categories of keywords so that, for example, we can generate a list of concepts associated with a given keyword. © 1998 Westlight Page 18 of 18 11/12/2016