It bi ibm


Published on

Published in: Spiritual, Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

It bi ibm

  1. 1. H. P. LuhnA Business Intelligence SystemAbstract: An automatic system i s being developed to disseminate information to the various sections of anyindustrial, scientific or government organization. This intelligence system will utilize data-processingmachines for auto-abstracting and auto-encoding of documents and for creating interest profiles for eachof the “action points” in an organization. Both incoming and internally generated documents are automati-cally abstracted, characterized by a word pattern, and sent automatically to appropriate action points. Thispaper shows the flexibility of such a system in identifying known information, in finding who needs to knowit and in disseminating it efficiently either in abstract form or a s a complete document.IntroductionEfficient communication is a key to progress in all fields in its original form, disseminate the data promptly to theof human endeavor. It has become evidentin recent years proper places and furnish information on demand.that presentcommunication methods are totallyinade- The techniques proposed here to make these things pos-quate for future requirements. Information is now being sible are:generated and utilized at an ever-increasing rate because 1. Auto-abstracting of documents;of the accelerated pace and scope of human activities andthe steady rise in the average level of education. At the 2 . Auto-encoding of documents;same time the growth of organizations and increased spe- 3. Automaticcreationandupdating of action-pointcialization and divisionalization have created new barriers the flow of information. There is also a growing needfor more prompt decisions at levels of responsibility far All of these techniques are based on statistical proce-below those customary in thepast. Undoubtedly the most dures which can be performed on present-day data proc-formidable communications problem is the sheer bulk of essing machines. Together with proper communicationinformationthathasto be dealt with. In view of the facilities andinput-outputequipment acomprehensivepresentgrowthtrends, automation appears to offer the system may be assembled to accommodateall informationmost efficient methods for retrieval and dissemination of problems of an organization. We call this a Business Intel-this information. ligence System. During the past decade significant progress has beenmade in applying machines to the processes of informa- Objectives and principlestion retrieval. Automatic dissemination has so far been Before the system operation is described, the term Bus-given little consideration; however, unless substantial por- iness Intelligence System should be defined and the objec-tions of human effort in this area can bereplaced by tives and principles stated.automatic operations, no significant over-all improvement In this paper, business is a collection of activities car-will be achieved. Eventheinformation retrievalpro- ried on for whatever purpose, be it science, technology,cesses mechanized so far still require appreciable human commerce, industry, law, government, defense, et cetera.effort to organize the information before it is entered into The communication facility serving the conduct of a bus-machines. iness (in the broad sense) may be referred to as an intel- It is believed that techniques now being developed will ligence system. The notion of intelligence is also definedgreatly contribute to the solution of the problem by ex- here, in a more general sense, as “the ability to apprehendtending automatic processes to the preparatory phases of the interrelationships of presented facts in such a way as mechanical information-retrieval systems, to the area of to guide action towards a desired goal.”l dissemination andto associatedfunctions.Ideally, an The term document is used to designateablock of automatic system is needed which can accept information information confined physically in a medium such as a
  2. 2. letter, report, paper or book. The term may also include that its existence will be readily recognized.the medium itself. 3. Transmittal of information either as a result of dis- The objective of the system is to supply suitable infor- semination or of retrieval is to beguided by pro-mation to support specific activities carried out by indi- gressive stages of acceptance by anaction point. This viduals, groups, departments, divisions, or even larger procedure saves the recipient’s time by reducing theunits. These are the action points previously referred to. amount of material to be transmitted and eliminating theTo this end the system concerns itself with the admission non-pertinent material.or acquisition of new information, its dissemination, stor- 4. The system is to provide means for quickly discoveringage, retrieval and transmittalto the action points it serves. similarity of interests and activities that might exist More particularly the objectof the system is to perform amongst action points so that subjects and problems ofthese functions speedily and efficiently, taking advantage common concernmay be discussed and advanced throughof novel procedures which utilize the inherentcapabilities directinterchange of ideas between such points, if SOof electronic devices. desired. One of the most crucial problems in communication is 5. The system is notto imposeconditions on its userthat of channeling a given item of information to those which require special training to obtain its services.who need to know it. Present methods of accomplishing Instead the system is to be operated by experiencedthis are inadequate and the general practice is to dissem- library workers. Thus, in the case of an inquiry, the userinate information rather broadly to be on the safe side. will be required only to call the librarian, who will acceptSince this methodtends to swamp the recipientswith the query and will ask for any amplification which,inpaper, the probability of not communicatingat all be- accordance with his experience, will be most helpful incomes great. The Business Intelligence System provides securing the desired information.means for selective dissemination to each of itsaction 6. Similarly, information lingering at an action point butpoints in accordance with their current requirements or of potential value to other action points is mobilizeddesires. This is accomplished by the mechanical creation for efficient communication through inquiries of skilledof profiles reflecting the sphere of interest of each point reporters.and by updating these profiles as dictated by changes intheattitude of the respective actionpoints and as re- Description of the Business Intelligence Systemcorded by the system on the basis of certain transactions. The following description is given in rathergeneral terms, Another problem in communication is to discover the and references to any specific type of business have beenperson or section within an organization whose interests substantially avoided. Furthermore, the fact that certainor activities coincide most closely with a given situation. devices are being referred to as implementation of thePresently, the difficulty of finding such relationships often system, should not be interpreted as implying a specificresults in improper decisions, wrong actions, inaction, or size of the operation.duplication. An objective of the Business Intelligence The description is given in accordancewith main func-System is to identify related interests by use of profiles of tional sections of the system, each illustrated by the dia-action points. gram. Our assembly of these functional sections into a The problem of discovering information which has a complete system is shown in Fig. 1.bearing on a given situation has probably received the Document inputmost attention in recent years, and various mechanicalsystems have been developed and put intooperation. This Each document entering the system shown in Fig. 1 isphase of communication is commonly referred toas infor- assigned a serial number and is photographically repro-mation retrieval or, more broadly, as the library problem duced on some medium such as microfilm. In those casesInformation retrieval is necessarily a major function of where the document hasbeen addressed specifically to anthe Business Intelligence System. Means are provided not action point, the original is promptly transmitted to theonly to integrate this function with the rest of the system addressee. In all other cases the original is stored in a filebut also to produce additional useful functions, as will be for a reasonably short time and thereafter destroyed, un-described later. less there are reasons for preserving it for longer periods. The achievement of these objectives is governed by The microfilm copy of the documentis transcribed ontoprinciples essential to effective service and convenience of magnetic tape by a human transcriber or a print-readingthe user. Some of these are listed below: device. In those cases where the original document is1. Information admitted to the system includes communi- available in machine-readable form, the transcription is cations, addressed to action points individually, which done mechanically. The document is now available bothcontain information of potential interest to other action as a microfilm copy and a magnetic tape record.points. The microfilm copy is then recopied onto the storage2. New information which is pertinent or useful to cer- medium of a document microcopy storage device. The tain action points is selectively disseminated to such microfilm record is stored elsewhere to constitute a micro-points without delay. A function of the system is to pre- film master file which may serve to regenerate records insent this information to the action point in such a manner cases of emergency. IBM J
  3. 3. The magnetic tape record is now introduced into the Initially, the creation of these action-point profiles is auto-abstracting and encoding device. This device submits best accomplished by having each action point create a the document to a statistical analysis based on the physi- document describing the various aspects of its activities cal properties of the text, and data are derived on word and enumerating the types of information needed. Such frequency and distribution. From these data the device documents are then introduced at the inputof the system then selects certain sentences of the document to produce and are identified by action-point designation. The ma- an auto-abstract.2 This is printed out, together with the chine-readable transcripts of thesedocuments are then title, author, and document serial number. This printout described in connectionwith the document input. The is photographically transferred onto the storage medium resulting patterns are then stored in the Pattern Storage of the auto-abstract microcopy storagedevice. area in a special profile-storage device. Also stored, with The process of creating auto-abstracts consists of ascer- each of these profile patterns, is the date of entry. taining the frequency word occurrences in a document. of Selective dissemination of new information Apredeterminedportion of the words of highest fre- quency is then given the status of significant words and Based on the document-input operation and the creation an analysis is made of all the sentences in the textcontain- of profiles, the system is ready to perform the service ing such words. A relative valueof sentence significance is function of selective dissemination of new information. then established by a formula which reflects the number As soon as a new document has been entered into the of significant words contained in a sentenceand the prox- system and its pattern developed, this pattern is set up in imity of these words to each other within this sentence. a comparison device which has access to all of the action- Several sentences which rank highest in value of signifi- point profiles. The comparisons are carried out on the cance are then extracted from the text to constitute the basis of degree of similarity, expressed in terms of a frac- auto-abstract. tion, for each of the profile patterns. This fraction is sub- As soon as the auto-abstract has been created, the sta- ject to changeas time goes on, depending upon conditions tistical data are further processed to derive an information to be explained later. pattern which characterizes the document. This process Whenever a profile agrees to a given extent with agiven of encoding constitutes a further abstraction and involves document pattern, the serial number, title, and author of procedures such as the categorization of words by means the affected document, together withthe action-point pro- of a thesaurus.3 file designation, are transferred and stored in a monitoring Useful patterns may be derived by listing a given por- device. Thisprocedure is repeated for anysubsequent tion of the words of highest frequency together with a similar occasion. The monitor is substantially a random- selection of specific words. The interrelationship of words access storage device and has the functional capabilities may also be indicated and certain frequently occurring of performing inventory operations. In this capacity it will combinations of words may be noted. Because of varia- transmit the serial number, title and author of the docu- tion of word usage amongst authors the normalization of ment in question to the desk printer at theselected action such words becomes an important function of encoding. point and keep a record of this transaction. Index lookup in a thesaurus-like dictionary will replace Of the various ways in which such an announcement words, including those of foreign languages, by a notional may be transmitted to theaffected action points, the most family designation. The selection of specific words may effective one is by means of a printing device at each also be accomplished by index lookup. action-pointlocation. An objective of the system is to The document pattern derived by the above process is command attention of the recipient. The use of individ- then transferredinto a special pattern-storage device ual printing devices is more effective than are centrally together with the title, author, and documentserial num- located devices serving several action points. ber. This information is stored in coded formon a Selective acceptance of disseminated information medium that may be subjected to serial scanning. As an alternative the resulting pattern may be rearranged and The dissemination of information so far has consisted in bedistributedoverastorage arraytopermitrandom furnishing the action point with the serial number, title, access according tocharacteristics. and author of documents selected for it. This selection, The tape or film transcript of the document may be however, is considered to be a provisional one, and the stored in a library for reference if it later becomes neces- system withholds any further information if the action sary to change the method or scope of encoding. point can determine, on the basis of information given so far, that certain of the selected subjects are not of suffi- Action-point profiles cient interest. If an announcement of interest, and more is As indicated earlier, one of the basic requirements of the detailed information on thesubject is desired, the system system is the ability to recognize by mechanical means will produce such information on demand. This step is the sphere of interest and the type of activities that char- initiated when the action point connects itself by tele- acterize each of the action points the system is to serve. phone to the monitor anddials the serial numbers of the This is accomplished by means of an information pattern documents affected. Upon receipt of this message theI 316 similar to that of the documents. monitor will relay an instruction to themicrocopy storageI IBM JOURNAL OCTOBER 1958
  4. 4. device toproducephotoprints of the auto-abstracts of The resulting query pattern,together with a serial num- these documents and to mark them with the action-point ber and designation of the originating action point, is then designation. The auto-abstracts are then transmitted to sent to the queries section of the pattern-storage device. the action point either in the form of a paper copy or by Subsequently, a copy of this query pattern is set up in the speedier means, such as Telefax or TV display. comparison device and. is compared with all of the docu- The action point may now peruse the abstracts to de- ment patterns storedin thedocument-pattern storage termine which of thedocumentsare desired intheir device. This operation is similar to the one described in entirety. These decisions are then entered into the system connection with selective dissemination. In the present in the form of acceptances. An acceptance is made at an case, the query pattern replaces the profile pattern. action point by dialing the document number, prefixed Whenever similar patterns are detected by this means, by a code symbol, whereupon the monitor will instruct the document designation is transmitted to the monitor, the microcopy storage device to produce a photocopy of where it is registered and then announced to the action the complete document, properly markedwith the action- point. point designation. These photocopies are then delivered Although the service of a librarian is considered a con- to the action point. venience to the action point, in certain cases, means may The monitorwill record theincidence of acceptance by be provided at the action-point location to permit direct modifying the affected records contained in its storage. access to the system. This would be justified where many At the same time the monitorwill also instruct the auto- of the inquiries concern lookup-type retrieval of data. encoding device to transfer copies of the code patternsof When an action point desires information relative to a the affected documents to the profile section of pattern given document, the number of the document at hand storage, together withthe identification of the action point would be dialed and instructions for search given to the involved and the dateof transferral. monitor. Thereupon the monitor would select the corre- As a result of these operations the profile of a given sponding pattern from document-pattern storage and pro- action point has been updated to reflect interest in a cur- vide instruction for use as a query pattern in the ensuing rently communicated subject. As time goes on thereis the comparison operation. probability that an increasing number of new documents Selective acceptance of retrieved information will be announced to an action point because of possible shift of interests. Inorder to avoid such cumulative The considerations which prompted the step-by-step ac- effects, the system is so arranged that theresponse to past ceptance of documents in the dissemination process are interests is gradually relaxed. This relaxation is related to also applied to information retrieval. The processes em- the date affixed to each new pattern that is superimposed ployed, therefore, are identical. on anaction point’s profile. Depending on theage of each The function of information retrieval, however, differs of these patterns, an adjustment is made on the fraction from thatof dissemination in that thechoice is not that of of similarity that must be met in the comparison process accepting or rejecting one document, but rather a selec- of new documents. The older the profile pattern,the tion of one or several from a special group of potentially closer an agreement is needed €or selection for dissemina- relevant documents. Although in some cases a first search tion, and consequently the fewer documents are selected. may have produced satisfactory references, in other cases On the other hand those documents selected are more the material produced may not be satisfactory.The action closely related to the original subject. point must then relay this fact to the librarian and dis- cuss with him how the searching procedure or the query Information retrieval should be modified so as to improve the probability of This phase of the system concerns itself with the retrieval getting relevant material. of those stored documents which might be relevant to a In those cases wherepertinentinformationhas been topic under consideration by an action point. The infor- discovered, the acceptance of the complete documents of mation tobe discovered may vary widely and may consist such information will cause the updating of the action- of anythingranging from factual data to an extensive point profile, as was the case in dissemination. The query bibliography on a broad subject. Under the supervision pattern will be impressed on the profile as a matter of of an experienced librarian the process of information course, whether or not the inquiry has been satisfied, so retrieval is performed in the following way. that new documents relevant to the subject of the inquiry An action point telephones the librarian and states the will be made known subsequently. information wanted. The librarian will then interpret the Detection of an action point having inquiry and will solicit sufficient background information given characteristics from the action point in order to provide a document similar in format to thatof documents normally entering In the process of transacting business it is often desired to the system. This query document is transmitted to the determine who concerns himself with a given subject. The auto-encoding device in machine-readable form. An in- usual type of question asked is: “Who does or knows a formation pattern then derived from the query document is certain thing?” A function of the Business Intelligence31 8 in a manner similar to that used for normal documents. System is to answer questions of this type.IBM .JOURNAL (
  5. 5. The manner in which this function is performed by the Since a history of the usage of the system is stored insystem is similar to the information retrieval procedure. the monitor, an analysis of its records will disclose theHowever,instead of simulatinga document pattern, a efficiency of system operation. The findings may serve toprofile pattern is developed which represents mostclosely adjust the system for optimumefficiency.the characteristics of an action point sought. This syn- There are many details which might have to be pro-thetic profile is then compared with those in the profile vided to adjust the general form of the system to specificstorage and when a given degree of similarity is discov- applications. One such requirement might be classifica-ered, the identification of the affected actionpoint is tion, by an editor, of documents with regard to security,transferred to the monitor, together with the identification proprietary interests and proper utilization of information.of the inquirer point. Thereafter the identities are an- A plurality of systems may be organized in hierarchicalPounced by the tape-printing device atthe inquiring fashion, in which a first system would serve a number oflction point so that personal contacts may be made. more specialized systems. In this case the specialized sys- tem would each assume the role of an action pointin theDocument output mother system.The functions described so far haveconcerned themselves It also appears quitefeasible to share the system equip-with documents admitted or acquired by the system from ment among a number of organizations.the outside. The document-output phase deals with in-ternally generated documents. This type of document is Prospects for establishing aessentially the product of action points and may be ad- Business Intelligence Systemdressed tootheraction pointswithin the organization The system described here employs rather advanced de-or to external points. An objective of the system is to sign techniques and thequestion arises as to how farawayfacilitate selective dissemination and retrieval of such such systems may be from realization. It may thereforedocuments in substantially the same way as for outside be of interest to review the state of system and machinedocuments. development. When a document has been created at an action point, The availability of documents in machine-readablea copy is produced, preferably in machinable form. This form is a basic requirement of the system. Typewriterscopy is then dispatched for processing to the input point with paper-tape punching attachments are already usedof the system and the original is sent to the addressee. extensively in information processing and communication Sincethistype of document is an indication of the operations. Their use as standard equipment in the futureinterest of the originating action point, the information would provide machine-readable records of new informa-pattern derived by the auto-encoding process is not only tion. The transcription of old records would pose a prob-stored in document-pattern storage but also is impressed lem,since in most cases it would be uneconomical toon the profile of its originator, thereby updating it. perform this job by hand.The mechanization of this Inthe dissemination process thisinternallycreated operation will thcrefore have to wait until print-readingdocument is announced to otheraction points in the same devices have been as were outside documents. The type of equipment required for processing infor- mation in accordance with the system is presently avail-Miscellaneous functions of the system able as far as the functions are concerned. It is safe toThe comprehensive system for the various functions so assume that special equipment will eventually be requiredfar described is illustrated by Fig. 1. A number of addi- to optimize the operation.tional useful functions which may be derived from the The auto-abstracting and auto-encoding systems are insystem are briefly described here. their early stage of development and a great deal of re- It mightbedesirable to checkeach new document search has yet to be done to perfect them. Perhaps thefor duplication by comparing it with all of the documents techniques which ultimately find greatest use will bearinstorage.Similarlya list of related documents may little resemblance to those now visualized, but some formbe preparedto serveasreferencesapplying to a new of automation will ultimately provide an effective answerdocument. to business intelligence problems. When retrieving information it might be found advan-tageous tocompare aquery first with all the queries Referencesstored, in order to discover whether similar queries have 1. Webster’s New CollegiateDictionary, G. & C . Merriarnbeen submittedin the past. If a list of the documents Co., Springfield, Mass. 2. H. P. Luhn. “The Automatic Creation of Literature Ab-retrieved is available, the process of retrievalmay be stracts,” IBM Jor~rrzul of ResearchandDevelopment, 2,greatly simplified. This method may also be used to bring No. 2, 159 (April 1958).together the respective inquirers to furnish an opportun- 3. H. P. Luhn, ‘*A Statistical Approach to Mechanized Encod-ity to discuss the problemswhich apparently brought ing and Searching of Literary Information,” ZBM Journalabout similar inquiries. Periodic analysis of the profiles of Research und Development, 1, No. 4, 309 (October 1957).may also furnish valuable information on trends and pos-sible overlapping of activities or interests. Received July I , 1958 319 IB M JOURNAL . OCTOBER 1958