November 2012E-discovery —Taking PredictiveCoding Out ofthe Black BoxJoseph H. LoobySenior Managing DirectorFTI TECHNOLOGYIN CASES OF COMMERCIAL LITIGATION, the process of discovery canplace a huge burden on defendants. Often, they have to review millionsor even tens of millions of documents to find those that are relevant to acase. These costs can spiral into millions of dollars quickly.
November 2012 M anual reviews are time consuming and very expensive. And the computerized alternate — keyword search — is too imprecise a tool for extracting relevant documents from a huge pool. As the number of documents to be reviewed for a typical lawsuit increases, the legal profession needs a better way. Predictive discovery is that approach. this year courts in the United StatesIllustration by Big Human Predictive discovery is a machine- have begun to endorse the use of this based method that rapidly can method. But according to a survey review a large pool of documents to FTI Consulting recently conducted, a determine which are relevant to a significant obstacle stands in the way of lawsuit. It takes attorneys’ judgments widespread adoption: Most attorneys do about the relevance of a sample set of not understand how predictive discovery documents and builds a model that works and are apprehensive about using it. accurately extrapolates that expertise to the entire population of potential In this article, we explain how predictive documents. Predictive discovery is discovery functions and why attorneys faster, cheaper and more accurate than should familiarize themselves with it. traditional discovery approaches, andCompanies defending themselves commercial lawsuit might have entailed about $10 million — a staggering sum.against litigation frequently must searching through tens of thousands of Conducting keyword searches of onlineproduce documents that may be documents. A team of attorneys would documents, of course, is much faster. Butrelevant to a case by searching through pore through all the paper, with perhaps in certain circumstances, such searchesarchives. The same is true for companies one or two subject matter experts have been shown to produce as little asbeing investigated by a regulatory to define the search parameters and 20 percent of the relevant documents,body. The process of plumbing one’s oversee the procedure. A larger number of along with many that are irrelevant.own archives to produce relevant more junior lawyers frequently would dodocuments is known as discovery. When the bulk of the work. This process (known With the number of documents in largethe documents are in electronic form, in the trade as linear human or manual cases soaring (we currently have a casethe process is called e-discovery. Large review) would cost a few thousand dollars involving 50 million documents), costscompanies today often must review (today, it runs about $1 per document). are getting out of hand. Yet the price ofmillions or even tens of millions of letters, failing to produce relevant documentse-mails, records, manuscripts, transcripts But now, the large number of documents can be extraordinarily high in big-stakesand other documents for this purpose. to be reviewed for a single case can make litigation. The legal profession needs a manual searches a hugely expensive and new approach that improves reliabilityThe discovery process once was entirely time-consuming solution. For a case with and minimizes costs.manual. In the 1980s, the typical 10 million documents, the bill could run
November 2012There now is a third approach. Predictive documents to identify the full responsive set. because most of the expense is incurreddiscovery takes expert judgment from to establish the upfront search rules.a sample set of documents relevant (or Studies show that predictive coding Afterwards, the cost increases only slightlyresponsive) to a case and uses computer usually finds a greater proportion — as the size of the collection grows.modeling to extrapolate that judgment typically in the range of 75 percentto the remainder of the pool. In a typical (a measure known as recall) — of the In a recent engagement, we were askeddiscovery process using predictive responsive documents than other to review 5.4 million documents. Usingcoding, an expert reviews a sample of approaches. A seminal 1985 study found predictive discovery, we easily met thethe full set of documents, say 10,000- keyword searches yielded average recall deadline and identified 480,000 relevant20,000 of them, and ranks them as rates of about 20 percent. A 2011 study documents — around 9 percent of theresponsive (R) or non-responsive (NR) to showed recall rates for linear human total — for about $1 million less than linearthe case. A computer model then builds review to be around 60 percent. And, manual review or keyword search woulda set of rules that reflects the attorney’s according to these same studies, the have cost.judgment on the sample set. Working on proportion of relevant documents inthe sample set and comparing its verdict the pool of documents identified byon each document with the expert’s, the predictive coding (a measure known as Definition: Trial and Errorsoftware continually improves its model precision) is better, too. The process of experimentinguntil the results meet the standards with various methods of doingagreed to by the parties and sometimes The cost of predictive discovery is a fraction something until one finds thea court. Then the software applies of that for linear manual review or keyword most successful.its rules to the entire population of search, especially for large document sets,So how does predictive discovery work?As suggested above, there are five key steps: 1 Experts code a representative sample of the document set as responsive or non-responsive. 2 Predictive coding software reviews the sample and assigns weights to features of the documents. The software refines the weights by testing them against every 3 document in the sample set. 4 The effectiveness of the model is measured statistically to assess whether it is producing acceptable results. 5 The software is run on all the documents to identify the responsive set.1. “An Evaluation Of Retrieval Effectiveness For A Full-Text Document Retrieval System,” by David C. Blair of the University of Michigan and M.E. Maron of the University of California at Berkeley, 19852. “Technology-Assisted Review in E-discovery Can Be More Effective and More Efficient than Exhaustive Manual Review,” by Maura R. Grossman and Gordon V. Cormack, Richmond Journal of Law and Technology, Vol. XVII, Issue 3, 2011.
November 2012Let’s look at these steps in more detail.1 COLLECTION REFERENCE SET EXPERT REVIEW RESULTS Experts code a representative sample 17,000 of the document set as Responsive responsive or non- Sample Training Set (23%) responsive. 1,000,000In a typical case, a plaintiff might ask a 983,000 Attorney Expert Reviewcompany to produce the universe of Document Collection Non–Responsivedocuments regarding some specifics aboutProject X. The company prepares a list of Prediction Set responsive 3,900its employees involved in the project. From non–responsive 13,100all the places where teams of employeesworking on Project X stored documents, Estimate Of Goalthe firm secures every one of the potentially We are 99% confident that 23% (+/- 1%) of the 1,000,000 documents are responsive.relevant e-mails and documents. These caneasily total 1 million or more items.These documents are broken into two and valuation issues ….” for instance, Each feature is given a weight. If thesubsets: the sample or training set and the yields 33 features as shown below. feature tends to indicate a document isprediction set (i.e., the remainder). To have responsive, the weight will be positive.a high level of confidence in the sample, The total number of features in a set If the feature tends to indicate athe training set must be around 17,000 of 1 million documents is huge, but document is non-responsive, the weightdocuments. One or more expert attorneys even large numbers of features easily will be negative. At the start, all thecodes the training set’s documents as R or can be processed today with relatively weights are zero.NR with respect to the litigation. inexpensive technology.The results of this exercise will predict withreasonable accuracy the total number TRAINING SET EXAMPLE FEATURE SETof Rs in the full set. A statistically validrandom sample of 17,000 documents Words, Phrases & responsive 3,900 Feature # Metadata skillings 1generally is sufficient to ensure that our non–responsive 13,100 Unigrams abrupt 2answer will be accurate to +/-1 percent departure 3 will 4at the 99 percent confidence level. For raise 5example, if the number of R documents suspicions 6in the sample set is 3,900 (23 percent PROCESSING of 7 accounting 8of 17,000), it is 99 percent certain that improprieties 9between 22 to 24 percent of the 1 million and 10 1 Extract Machine-Readable Text valuation 11documents are responsive. issues 122 Example skillings abrupt 13 Predictive coding Bigrams abrupt departure 14 departure will 15 software reviews the From: Sherron Watkins will raise 16 sample and assigns To: Kenneth Lay raise suspicions 17 Sent: August 2001 suspicions of 18 weights to features of accounting 19 of the documents. Skilling’s abrupt departure will raise accounting improprieties 20 improprieties and 21 suspicions of accounting improprieties and valuation 22 and valuation issues...For the sample set of documents, the valuation issues 23 skillings abrupt departure 24software creates a list of all the features Trigrams abrupt departure will 25of every document in the sample set. 3 Make Features departure will raise 26Features are single words or strings - Break stream of text into tokens will raise suspicions 27 raise suspicions of 28of consecutive words; for example, - Remove whitespace and punctuation suspicions of accounting 29up to three words long. The sentence - Normalize to lowercase of accounting improprieties 30 accounting improprieties and 31“Skilling’s abrupt departure will raise - Convert tokens into Feature Set improprieties and valuation 32suspicions of accounting improprieties and valuation issues 33
November 20123 The software refines the score for every document according to the WEIGHT TABLE weights by testing them final weight table. The machine score for a against every document Feature Weight document is the sum of the weights of all in the sample set. will raise -0.6 its features. In the illustration to the left, we show two documents: a document with aThe next step is one of trial and error. The suspicions 0.7 score of -5.3 (probably non-responsive) andmodel picks a document from the training of - a document with a score of +8.7 (probablyset and tries to guess if the document is responsive).responsive. To do this, the software sums improprieties and valuation 1.8the importance weights for the words in Then, if predictive coding is to be used tothe document to calculate a score. If the select the highly responsive documentsscore is positive (+), the software guesses from the collection of 1 million and NR Rresponsive, or R — this is the trial. to discard the highly non-responsive documents, a line has to be drawn. The lineIf the model guesses wrong (i.e., the model is the minimum score for documents thatand attorney disagree), the software will be considered responsive. Everythingadjusts the importance weights of the with a higher score (i.e., above the line),words recorded in the model’s weight subject to review, will comprise thetable. This, of course, is the error. Gradually, -5.3 0 +8.7 responsive set.through trial and error, the importanceweights become better indicators of In our 17,000 document sample, of whichresponsiveness. Pretty soon, the computer better reflection of the attorney expert’s the expert found 3,900 to be R, there mightwill have created a weight table with judgment. This process is called machine be 7,650 documents that scored morescores of importance and unimportance learning. The computer can take several than zero. Of these, 3,519 are documentsfor the words and phrases in the training passes through the sample set until the the expert coded R, and 4,131 are not.set, an excerpt of which might look like the weight table is as good as it can be based Therefore, with the line drawn at 0, we willillustration at the right. on the sample. achieve a recall of 90 percent (3,519 divided by 3,900) and a precision of 46 percentAs the software reviews more documents, Once the weight table is ready, the program (3,519 divided by 7,650).the weight table becomes an increasingly is run on the full sample to generate a ATTORNEY EXPERT COMPUTER MODEL REVIEW (AT >0) NR R 7,650 FN TP TN Vs FP -1 0 +1 Found responsive 3,900 True + 3,519 90% Recall non–responsive 13,100 False + 4,131 46% Precision 17,000 7,650 FALSE TRUE FALSE POSITIVES POSITIVES NEGATIVES
November 20124 The effectiveness of between 0 and +0.3 are missed). On the the model is measured basis of the sample set, the software statistically to assess can determine recall and precision for whether it is producing any line. Typically, there is a trade-off; acceptable results. as precision goes up (and the program delivers fewer NR documents), recallBy drawing the line higher, one can declines (and more of the R documentsincrease the ratio of recall to precision. are left behind). However, not only areFor example, if the line is drawn at +0.3, recall and precision scores higher thanthe precision may be significantly higher for other techniques, the trade-off can be(say 70 percent) but with lower recall managed much more explicitly and precisely.(i.e., 80 percent because some of the Rs ATTORNEY EXPERT COMPUTER MODEL REVIEW (AT >0.3) NR R 4,450 FN TP Vs. TN FP -1 0 0.3 +1 Found responsive 3,900 True + 3,125 80% Recall non–responsive 13,100 False + 1,325 70% Precision 17,000 4,450 FALSE TRUE FALSE POSITIVES POSITIVES NEGATIVES Recall measures how well a process retrieves Precision measures how well a process relevant documents. For example, 80% recall retrieves only relevant documents. For example, means a process is returning 80% of the estimated 70% precision means for every 70 responsive responsive documents in the collection. documents found, 30 non-responsive documents also were returned. 80% = 3,125 / 3,125 + 775 70% = 3,125 / 3,125 + 1,325 Recall True True False Precision True True False Positives Positives Negatives Positives Positives Positives (TP) (TP) (FN) (TP) (TP) (FP)