ANALYTICS OF PATENT CASE RULINGS: EMPIRICAL EVALUATION OF MODELS FOR LEGAL RELEVANCE

SUBMITTED   DRAFT      BEING   REVISED   FOR   ASAIL   WORKSHOP,   JUNE16,   2017
INPUT,   QUESTIONS   &   FEEDBACK   WELCOME
https://nms.kcl.ac.uk/icail2017/icail2017.php
ANALYTICS   OF   PATENT   CASE   RULINGS:   EMPIRICAL
EVALUATION   OF   MODELS   FOR   LEGAL   RELEVANCE

Kripa   Rajshekhar
Metonymy   Labs
Chicago,   USA
kripa@metolabs.com
Wlodek   Zadrozny
Department   of   Computer   Science
UNC   Charlotte,   USA
wzadrozn@uncc.edu
Sri   Sneha   Varsha   Garapati
Department   of   Computer   Science
UNC   Charlotte,   USA
sgarapat@uncc.edu

ABSTRACT
1
Recent progress in incorporating word order and semantics
to the decadesold, triedandtested bagofwords
representation of text meaning has yielded promising
results in computational text classification and analysis. This
development, and the availability of a large number of legal
rulings from the PTAB (Patent Trial and Appeal Board
motivated us to revisit possibilities for practical,
computational models of legal relevance starting with this
narrow and approachable niche of jurisprudence. We
present results from our analysis and experiments towards
this goal using a corpus of approximately 8000 rulings from
the PTAB. This work makes three important contributions
towards the development of models for legal relevance
semantics: (a) Using stateofart Natural Language
Processing (NLP) methods, we characterize the diversity
and types of semantic relationships that are implicit in select
judgements of legal relevance at the PTAB (b) We achieve
new stateofart results on practical information retrieval
tasks using our customized semantic representations on
this corpus (c) We outline promising avenues for future work
in the area including preliminary evidence from
humaninloop interaction, and new forms of text
representation developed using input from over a hundred
interviews with practitioners in the field. Using the PTAB
data set for testing relevance in patent document retrieval,
instead of traditional citations search, also shows a bigger
gap between the needs of practitioners and the capabilities
of   current   information   retrieval   and   NLP   technologies.
1
   Produces   the   permission   block,   and   copyright   information
†
The   full   version   of   the   author’s   guide   is   available   as   acmart.pdf   document
It   is   a   datatype.
Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for thirdparty
components of this work must be honored. For all other uses, contact the
owner/author(s).

©   2016   Copyright   held   by   the   owner/author(s).
Keywords
patent   litigation,   text   analytics,   semantic   search,   data   sets
1.   INTRODUCTION
Recent progress in incorporating word order and semantics
to the decadesold, triedandtested bagofwords
representation of text meaning has yielded promising
results in computational text classification and analysis. This
development, and the availability of a large number of legal
rulings from the PTAB (Patent Trial and Appeal Board), a
special court instituted by the United States Congress as
part of the America Invents Act in 2011 motivated us to
revisit possibilities for practical, computational models of
legal relevance starting with this narrow and
approachable   niche   of   jurisprudence.
In addition to developing practical models for legal
relevance we are motivated by the clear need of
practitioners in the area of Patent law for tools to more
efficiently improve the quality of outcomes. A 1970 Stanford
Law Review paper [2] offered prescient remarks for the field
of AI and Law and concluded with a number of potential
implications, among which we noted the following: “Lawyers
might rely too heavily on a restricted, and thus somewhat
incompetent, system with a resulting decline in the quality of
legal services”. Would that remark apply to the tools
available in Patent Law today? Recent analysis of litigation
outcomes suggest that “nearly half of all patents litigated to
judgment were held invalid” [1]. Furthermore, the need for
more thorough research and preparation of quality patents
is perhaps as strong as ever: US Patent quality appears to
be lagging international peers and the US Patent and
Trademark Office (USPTO) initiated its quality improvement
initiative with a postprosecution pilot announced on July 11,
2016.
We suggest that a computational representation of
legal relevance should include a reasonably small set of
computable models that capture the common modes of
abductive reasoning used by practitioners exercising legal
1

judgment within a specific area of the law. The models are
considered adequate in aggregate under some arbitrarily
reducible measure of prediction accuracy across a corpus
selected   from   that   specific   area   of   the   law.
The purpose of this paper is to outline our
approach to the development and testing of several
computational models for legal relevance in the narrow
domain of patent law, specifically as documented through
select proceedings of the USPTO PTAB cases. For our
tests of we use a collection of Ex Parte Reexamination
(EPR) patent case rulings, including the patents explicitly
mentioned in decisions of the Patent Trial and Appeal Board
(PTAB).
We present results from our analysis and
experiments towards this goal using a corpus of
approximately 8000 rulings from the PTAB. This work
makes three important contributions towards the
development of models for legal relevance semantics: (a)
Using stateofart Natural Language Processing (NLP)
methods, we characterize the diversity and types of
semantic relationships that are implicit in select judgements
of legal relevance at the PTAB (b) We achieve new
stateofart results on practical information retrieval tasks
using our customized semantic representations on this
corpus (c) We outline promising avenues for future work in
the area including preliminary evidence from
humaninloop interaction, and new forms of text
representation developed using input from over a hundred
interviews   with   practitioners   in   the   field.
Using the PTAB data set for testing relevance in
patent document retrieval, instead of traditional citations
search, also shows a bigger gap between the needs of
practitioners and the capabilities of current information
retrieval and NLP technologies. For example, in contrast to
recent results [8], we do not find that documents not in the
semantic neighborhood of the query document, can still be
very relevant for the query. The inadequacies of using
citations were also discussed in different context by
researchers studying innovation [14, 17]. Together they
point to the need to use other data sets and not just
citations.

The remainder of the paper is organized as follows: In
Section 2 we discuss the practical motivations and
practitioners requirements of prior art search. Section 3
introduces the data set. The results are presented in
Section 4, of which Subsection 4.2 gives the details of our
experiments. Since the experiments reveal limitations of
current forms of representing legal relevance, the question
is how we go about building better models for this purpose –
this is discussed in Section 5. Conclusions (Section 6)
summarize   our   results.

2.   PRACTITIONER   REQUIREMENTS
Given the nuance and complexity implicit in legal
judgement, we are skeptical that a onesizefitsall
“magicbullet” AI solution will adequately model outcomes in
the field. Furthermore, comparing the current state of art to
legal information retrieval over 50 years ago [7], we observe
that changes in algorithms and models of text
representation have lagged far behind the dramatically
improved access to data and growth in computational
power. This disappointing state of art has been noted by
others, for example in discussing the inadequacies of
leading   search   engines   [8].
We believe this is in part due to the lack of practical
methods for computational modelling and for representing
legal relevance, and in particular the relevance of other
documents   (patents)   to   a   particular   examined   technology.
Towards this end, we see this paper as a small part of a
broader undertaking: the development of practical models
and theories of legal relevance that can be shared, added to
and built upon by practitioners and researchers alike. While
this work focuses on patent law, there are synergies with
work in other areas that bring domain aware case factors
into   computer   models[16].

While limited scholarly attention has been given to the
requirements of practitioners in patent litigation and related
areas, we were able to use informal interviews and literature
in the area of complex search to identify a few themes of
interest. We seek to explore some of these themes further
in this paper and in future work. In particular, this paper is
focused on the more foundational topic of modelling legal
relevance. These models are likely to be helpful in the
practical work of legal professionals in the field and the
evaluation of legal procedures across the field to improve
the quality of patent grant and enforcement procedures. A
descriptive model of relevance is also arguably a
precondition for a computational theory of semantics in the
domain.
Patent cases have substantial uncertainty [12],
primarily due to the challenges implicit in knowing the entire
universe of prior art before litigation commences and
reconciling the case at hand with relevant prior case law:
“difficulty in knowing the relevant facts to the dispute and
difficulty in knowing how a trier of fact will evaluate the
facts… knowing the entire universe of prior art is impossible
2

before   litigation   commences”[12].
We note that the typical litigation workflow is
accompanied by a diverse set of requirements at different
phases of the process for instance, exploration of case
law and the technology landscape at the outset of a case,
followed by an analysis of semantically and contextually
linked outcomes relevant to the matter at hand, and then
assistance in selecting and narrowing in on more specific
artifacts (for example, highly relevant patents) to be used in
the preparation for potential litigation. Restricting ourselves
to the last step of the patent litigation workflow, identifying
highly relevant prior art is a particular use case of interest in
this paper. After a patent is granted, its validity can be
challenged in litigation or in several postgrant proceedings.
In the majority of these challenges, it is necessary to find
and examine a number of documents from a potentially very
large pool of patent and technical literature; that is,
establish   the   relationship   of   the   invention   to   the   prior   art.
For the purpose of this paper, we do not need to
get into the legal differences between different types of
proceedings
. Also, we do not need to attend to the
2
differences between different patent jurisdictions, because
the technical problems of text analytics and information
retrieval   are   the   same.
Finding references potentially invalidating a patent
is perhaps more challenging than finding (some) relevant
prior art. For example, the average number of cited
references in a patent is about 40
, while the number cited
3
in invalidation decisions is usually less than 5. Arguably, any
patent   search   supporting   invalidation   has   to   be   very   precise.
Finding such relevant documents is nontrivial,
because many documents refer to the same concepts that
describe the invention at hand, and these documents can
appear in multiple patent classes and broad scientific and
technical literature. Moreover, similar concepts, relations
and functionalities might be expressed in different words, so
keyword search is not sufficient to find all relevant
documents. Therefore this search process is labor
intensive, costly and possibly error prone, even with the
support   of   modern   information   retrieval   tools.
Analyzing a collection of patents and related product or
scientific literature is also costly, mostly because it takes
time and requires highly trained workforce (lawyers and
domain experts). What is important from our perspective,
2
   For   example,   http://www.pillsburylaw.com/postgrantproceedings   or
http://fishpostgrant.com/postgrantreview/.      See   also
https://en.wikipedia.org/wiki/Patent_Trial_and_Appeal_Board   and
http://www.uspto.gov/patentsapplicationprocess/patenttrialandappealboa
rd0. 3
   http://patentlyo.com/patent/2015/08/citingreferencesalternative.html
there are few analytic tools that can support this process.
Most of the patent analytics tools analyze metadata
, for
4
example probabilities of finding a patent invalid based on
statistics on trial location, examination artunit, etc. Allison
et al. [1] provide an indepth analysis of the “Realities of
Modern Patent Litigation” relating “the outcomes (…) to a
host of variables, including variables related to the parties,
the   patents,   and   the   courts”.
Our goal as technology developers lies in
improving patent analytic tools; our goal as researchers is to
understand the obstacles on this path, and finding ways of
avoiding   them.
We note that legal reasoning is abductive since the models
implicit in particular cases are individually neither necessary,
nor sufficient, to explain all cases, but rather, are good
enough to model outcomes in only some reasonable
sample of cases. For instance, our analysis shows that
aggregate document level semantic relatedness is an
adequate mode of reasoning in only a small minority of
USPTO   Ex   Parte   Reexamination   (EPR)   cases.
Clearly other abductive reasons (models) for
relevance are needed to explain the remaining instances.
Manual examination of cases with variance demonstrates
that while relevant terms and semantic links are present, a
high frequency of related words and phrase occurrence is
neither a necessary nor a sufficient condition for legal
relevance.
Before we discuss the models, let us say a few
words   about   the   data   we   use   to   test   them.

3. PATENT TRIAL AND APPEAL BOARD
(PTAB)   DATA   SETS
Post grant review and Inter Partes Review (IPR) is
conducted at the USPTO Patent Trial and Appeal Board
(PTAB) and is aimed at reviewing the patentability of one or
more claims in a patent. It begins with a third party petition
to which the patent owner may respond. A post grant review
is instituted if it is more likely than not that at least one claim
challenged is patentable. If the petition is not dismissed, the
Board issues a final decision within 11.5 year
. Chien and
5
Helmers [4] discuss “Inter Partes Review and the Design of
PostGrant Patent Reviews” processes and key statistics,
including the statistics of case dispositions. USPTO notes
that 80% of the IPR reviews ending with some or all claims
4
   E.g.   https://lexmachina.com/legalanalytics/ 5

http://www.uspto.gov/patentsapplicationprocess/appealingpatentdecisions/
trials/postgrantreview
3

invalidated
6

What is in the PTAB data? Patent Trial and Appeal Board
(PTAB) publicly available dataset, as of Jan 2017 has about
100 zip files containing 10 GB of data (compressed)
.
7
These files are either image or text .pdf files with PTAB
decisions. Each decision pertains to the validity of claims of
one   patent.

Why care about PTAB data? Because each case has a
relatively small collection of highly relevant documents used
as evidence. The outcomes are clear and the reasoning can
be modeled. There’s enough data for statistical inference
(although perhaps not enough to train a neural net from
scratch). Also, as mentioned earlier, the PTAB data set
might represent better the practitioner needs, as contrasted
with   using   citations   as   such   representation.

In this paper we report on some initial experiments on the
PTAB data sets. Given the relatively structured form of the
data available and the more streamlined process used in
adjudication, we believe that PTAB data represents a
unique training corpus to develop and improve customized
tools used in the areas of patent litigation and licensing, and
as we discuss later, it might also be a better measure of
satisfying   practitioner   needs   than   citations   retrieval.

4.      EXPERIMENTS   AND   RESULTS   FROM
SEMANTIC   ANALYSIS   OF   PTAB   RULINGS
Encouraged by the recent development in neural language
model representations [11], and the availability of a rich
corpus of documents capturing relevance judgements in
Patent Law, we sought to explore the extent to which a
computational theory of semantic relevance in this area of
law was possible. As described in the section on practical
motivation, such a theory would be of great utility to
practitioners and policy makers in this area of law. Our
experimental approach is therefore both theoretically and
practically motivated, empirical but with an emphasis on
exploring   possibilities   and   limits   of   such   a   theory.

For the experiments, we use a sample of 8000 EPR rulings
from the USPTO Final Decisions of the Patent Trial and
Appeal Board. Our experiments use subsets of the data to
6

http://www.uspto.gov/patentsapplicationprocess/patenttrialandappealboa
rd/statistics 7
   Available   at:   https://bulkdata.uspto.gov/data2/patent/trial/appeal/board/
(i) perform an analysis of relationships between the pairs of
patents associated in the approximately, (ii) conduct an
assessment of the impact types of semantic representations
have on the practically meaningful task of relevant patent
retrieval and (iii) empirically explore possibilities of alternate
forms of text representation to model legal relevance and
enable humaninloop interaction to improve patent retrieval
performance.
4.1.   Details   of   the   experiments
Out tests consist in using different techniques to retrieve
patents cited in PTAB decisions, based on queries built on
the patent whose validity is being questioned. Such queries
typically consisting of the combinations of the patent
abstract, its title, or its first claim. As baselines for our
evaluations we used both bagofwords (BOW) query
representations, and semantic search implemented using
conceptual expansion of query words. The conceptual
expansion was implemented using Wikipedia derived
related concepts, similarly to the standard approaches e.g.
[8,   13].

Experiment 1. To evaluate the hypothesis that aggregate
document level semantic relatedness is an important factor
in a potential model for legal relevance in the patent
domain, we attempted to quantify the correlation between
semantic relatedness and patent relevance using a sample
of 245 semiconductor EPR cases. In this case the recall at
1000   was   30%.
However, the point is that this measure of semantic
similarity is inadequate to capture PTAB relevance: only
30% of the subject patents did the 1000th ranking patent
document returned by our stateofart semanticrelatedness
model score less than the PTAB relevant patent using a
cosinesimilarity measure of relatedness. This drops to 15%
when the 100th document is considered. This result, is
consistent with the expectations of lawyers and other
practitioners   that   we   have   interviewed   as   part   of   this   project.

However this results also provides an interesting contrasts
with the assumptions of other researchers such as Khoury
and Bekkerman [8] who suggest that “if a given document is
not in the semantic neighborhood of the query document, it
simply cannot be relevant for the query document". Our
work challenge, with experimental results, the
understandable intuition, that relevant priorart must
necessarily be found in the set of documents that have a
high degree of semantic similarity as measured by
stateofart text processing methods. Notice we contrast
wouldn’t have unlikely to be discovered without the PTAB
4

data set, given that the other works use citations to model
relevance.

Overall, this experiment suggests the need for additional
methods for association, a greater variety of semantic
connections, and perhaps more sophisticated interpretation
of   the   patent   claims   language.

Experiment 2. To quantify the improvement possible
through the use of better semantic representations, we
benchmarked a Word2Vector [11] model trained on
preprocessed   text,   grouped   by   subsector   of   patents.
In specific, we classify each patent to one of 37
industry groupings (e.g. Computer Hardware & Software,
Metalworking). The groups correspond to the standard
NBER subcategories
. The claim text for each of the
8
patents was then modified to include references to special
words that uniquely identified each patent as a new word in
our vocabulary (e.g. _6435262_ for the patent US 6435262
B1). Cross references, including citations, between claims
were then tagged at the claim level with the relevant unique
patent identifiers to improve locality sensitive mapping
between a patent and the various claims that related to it.
The training of the word vector models was then carried out
individually for each of the 37 preprocessed text corpora,
providing us with a wordtovector model corresponding to
each of the industry groups (Skipgram training, with 200
dimension vector representation and minimum word count
of   4   was   used). Given the trained word2vector [11]
models, a simple semantic retrieval task then amounts to
finding the closest patent identifier word (treated as a
special word in the vocabulary) to the identifier for a patent
of interest. Proximity in our case was measured with the
commonly used cosine similarity measure. This measure, or
relative ranking, could be further improved upon with
additional semantically important representations to more
closely model the type of relevance desired in our case,
the relevance of two patent documents based on PTAB
guidelines. We make some suggestions along these lines in
our   experiments   on   humaninloop   emulation.
We used 1500 PTAB pairs in this test. Using bagofwords
with conceptual query expansion resulted in a 4.9% sample
match for Recall @ 100, and was indistinguishable from
using simple BagOfWords (BOW), and thus either could
constitute a baseline. However, using the subsector
specific model resulted in a significant improvement: Recall
@ 100 of 19%. This increase in performance stemming
from the more accurate modelling of semantic relationships
attuned   to   industry   sector   specific   language   use.
8
The subcategories are identified at
http://www.nber.org/patents/subcategories.txt

Experiment 3. In this experiment we attempted to quantify
the relative impact of elementary humaninloop intervention
on retrieval performance. We have observed instances
where simple reranking of search results based on user
feedback on positive/negative document examples, allows
for a matching document that was ranked below 5000 to be
retrieved in the top 100 in onestep of user feedback, for
example helping the ranking methodology disambiguate the
erroneous sense in which the acronym ATM was used the
Asynchronous Transfer Mode telecommunication network
technology, in contrast to the intended payment terminal
technology   or   Automatic   Teller   Machine,   sense   of   the   term.
To further test this intuition we attempted to
emulate the action of a user applying simple heuristics to
improve the results, by eliminating groups of retrieved
patents that on simple visual inspection are unlikely to be
relevant matches. We measure the impact of such
intervention as the improvement in Recall performance. For
example, we show that Recall @ 200 without intervention is
approximately 10% but increases to approximately 15%
using a simple intervention based on humaninloop like
heuristic   intervention.
In specific, using 90 PTAB patent pairs data, we
attempted to emulate humaninloop behavior using a
coarse method of additional screening. While the numbers
small they indicate the potential for improvement in recall
performance using a comparable humaninloop feedback
that relies on actual user judgement (versus the emulated
approach   in   our   experiment).
The specific filters we used: the patent pairs
considered are of the same Type (i.e. ‘device’, ‘method’,
‘system’ or ‘other’ using corresponding keywords in the first
claim). In addition, we used their Aboutness (represented by
the first noun, adjective and verb in the same claim) and
Verb Signature (most frequently cited verbs in the same
claim) share at least one word with the patent that is the
focus   of   the   PTAB   decision.
We dropped the other results that do not meet the
criteria. The rest of the result frame (top 100 ranking
retrieved patents) are filled with other top semantically
sorted search results. In another test, we also considered
Claim 1 length, dropping patents with first claim longer than
200   or   shorter   than   10.
Operating on retrieved results using Type,
Aboutness, Verb Signatures features as filters had
significant scope, as measured in terms of retrieved results
impacted.
5


Figure 1: Experiment 3. Results from HumanInLoop
Emulation. The blue line is the recall baseline using
semantic search. The red line show shows adjustments
based on type, ‘aboutness’ of the claim, and verb
pattern. The red line adds a length of claim 1 filter
(dropping very short and very long claims). This
experiment was performed using 90 patent pairs
derived from PTAB data. The convergence of lines at
small and large values suggests that proper calibration
of   humanintheloop   tools   will   be   crucial.
It is worth noting that small changes in filters, often impact
thousands of results at a time. For example, the top 20 most
frequent Aboutness and Verb Signatures words could,
through 'OR' operations, span tens of thousands of results.
This   is   another   argument   for   humanintheloop   approach.

Experiment 4. To evaluate other forms of representation
that allow a more granular, but human understandable,
control of results, we explored a simple set of words model
of claim language to augment the humaninloop methods
described   above. For this experiment, patent claims were
processed into phrase chunks unordered word sets (110
in length). Each patent typically has 5075 such unique
wordsets, 50% of these chunks were unigrams. The
relatedness of two patents could then be implemented with
easier to intuit user input implemented as chunk (set of
words)   inclusion/exclusion.
Our experiments showed that this representation
had discriminating power and could be a candidate for
further humaninloop experimentation. Using the
representation of abstracts for 100 PTAB pairs, the charts
show the comparison of count of word set intersection
divided by the size of the subject word set. The LHS chart in
“Histograms of extent of wordset overlap, PTAB relevant
….” Figure 2(a) is for the actual relevant pairs. RHS chart in
Figure 2(a) is for the same subject patent and a randomly
selected patent from the list of 300 or so (subject +
matching results) in the set. We note that a set intersect
measure > 10% correlates with patent relevance in 80% of
the cases. (The remaining 20% could be cases where claim
language, detailed spec or other features drove the
relevance match even though abstract language didn't have
this set intersect match). We note that a 10%40%
setintersect accounts for a large majority of the matching
pairs.

Figure 2 (a). Histograms of extent of wordset overlap, PTAB
relevant Vs. random pairs, showing how degree of overlap is
correlated with relevance.

To evaluate the ability of this form of representation to
discriminate between semantically related, but not legally
relevant patents the LHS chart in Figure 2(b) shows the max
wordset intersection for 100 PTAB subject patents and the
top 50 relevant patents, returned by our best performing
subsectorlanguage trained Word2Vector model. The RHS
chart in Fig.2(b): “PTAB subject patents and Top 50
semantically ranked patents”. Figure 2(b) shows the
minimum word set intersect for the same 50 as a point of
comparison.
We note that a majority of the top 50 semantically
similar documents pass the wordset intersect threshold for
"match" as evaluated against the measurement from
relevant pairs versus randomly selected pairs. Since
semantic similarity generally coincides with bagofword
similarity, this is an unsurprising result. We expect that
multiple candidates will match based on the set overlap
representation but be readily amenable to
humaninfeedback given the ease with which a user can
examine and filter the underlying wordset representation.
6

From our interviews with practitioners in the field, we
anticipate that this type of representation, that allows
intuitive human intervention, could offer a convenient
approach to dynamically adjust the search landscape and
reveal   better   quality   candidates   for   human   intuition

Figure 2 (b). Histograms of extent of wordset overlap, PTAB
subject patents and Top 50 semantically ranked patents,
showing correlation of semantic relatedness and word overlap,
in   instances   appearing   higher   than   with   PTAB   relevant   pairs.
4.2.   Results
We report three main results; the first two concerning the
viability of a semantic representation in the domain that
yields practically useful results, and the third result in
representation and humaninloop retrieval methodology,
which we see as having the greatest promise, and
suggesting   a   path   for   future   development.   We   observe:
Result 1. Less than 15% of patents judged as being
relevant according to a PTAB ruling appear to have stronger
aggregate semantic relatedness with other patents that
share word or topic other associations than with each other.
For instance, two patents responsive to the topic “foundry”
that both deal with metal forming may not be highly
related to each other (in the sense of contributing to patent
invalidation), although they may be a strong semantic
match. However, one that deals with slurrying and foundry
processes, which is related, but not a strong semantic
conceptual match, does indeed invalidate one of the patents
in   another   test   case.

Result 2. Improving patent text representation through
preprocessing steps and using word embeddings of patent
specific keywords, and distinct subsector specific training,
improves   recall   performance   from   5%   to   20%.
As a performance measurement baseline, we used
BOW modified with a concept representation along the lines
of [8] and [13]. We focused on the information retrieval (IR)
task of finding patents cited in PTAB documents. where 5%
of the test sample recalled the relevant document in the top
100 results. Using our customized subsector specific
semantic model representation, we were able to retrieve the
relevant document in the top 100 results in 20% of the
cases. This result promises future improvement from
methods that model semantic relatedness in more granular
ways, down from subsector specific to perhaps patenttype
(device,   method,   apparatus)   specific   modelling   of   relevance.

Result 3. Consistent with improvements demonstrated by
others, we have preliminary evidence from our experiments
with Humaninloop retrieval that point to dramatic
performance improvement in particular cases. To improve
methods of incorporating user feedback, we have also
evaluated simple forms of patent language representation to
approximate the heuristics that practitioners rely on in their
search strategies, such as term overlap, sector information,
‘what the patent is about’ and Claim 1 length. We show that
a single iteration of such filtering can improve performance
by 50% as measured by recall @200. However, at other
recall values (smaller and larger) the improvements are
lower, so the proper calibration of human in the loop tools
will   be   crucial.

5.   DISCUSSION
In this section we comment on our use of distributional
approaches vs. traditional NLP techniques, as well as the
advantage of the human in the loop approach. We also add
some comments on a possibility of a computational models
of   legal   relevance.

5.1.   Why   only   use   distributional   approaches?
One possible objection towards the distributional
approaches is that they do not capture deeper semantic
meanings of patent texts or claims. However, the current
state of natural language processing suggests we need new
tools to capture finer distinction in meanings, beyond the
standard   words   to   syntax   to   semantics   pipeline.
Experiments with parsing patent claim seem to
show that none of the existing tools can do it. This is not
surprising. The average length of Claim 1 in between 150
and 200 words, when measured in a few weekly samples of
US patents in 2016. In addition more than 90% of the first
claims are longer than 50 words. Average sentence length
in the Wall Street Journal Corpus is 19.3 words [15], ranging
from 3 to 20. And most natural language parsers are trained
7

on similar corpora. We also know that parsing accuracy
decreases with the length of the sentence. For example,
McDonald and Nivre [10; Fig. 2] show that parsing accuracy
drops 10 points or more per 40 words, and similar results
appear elsewhere [5]. Even worse, Boullier and Sagot [3]
entertain a possibility that “(full) parsing of long sentences
would be intractable” (long, meaning more than 100 words).
This means that an analysis of the structure for an average
crucial Claim 1 is likely to be wrong, until we create a better
sentence   analysis   tools.

5.1.   Additional   comments   on   human   in   the   loop
The result of patent humaninloop retrieval methods are
promising. They are also consistent with results from
search query modification experiments [6], where “baseline
performance can be doubled if only one relevant document
was manually provided by the user”. Similarly, IBM Watson
system is advocating an interactive approach to symptoms
classification   [9].
As we show, simple humaninloop intervention by means
of filtering of results using heuristically selected word
features (e.g. type of patent, earliest appearing
nounadjectiveverb sequence) can modify rankings
significantly, with preliminary evidence showing over 50X
improvement, where a retrieval task failed Recall @ 5000
but   succeeding   Recall   @100   with   user   feedback.
Simple methods, a threeword summary or a patent’s
aboutness prove to be a productive way for the user to
include/exclude groups of patents or claim language,
closely mimicking the skimming of the detailed text that is
performed by the expert human reader in practice. These
improvements in practically inspired forms of patent text and
semantics representation are likely to require language
models tuned to the specific nuances of the text in this
narrow   domain.   We   need   more   of   these.
Furthermore, through over a hundred interviews with
industry practitioners to understand their expectations of
search tools and common practices, we recognize the
importance of alternative forms of representation, essential
to support the different points of view implicit in judging
relevance. Again, we are planning to experiment with other
representations.

5.3.   Towards   a   computational   theory   of   legal
relevance
Incorporating word order and semantics in text
representation is a recent win for the field of NLP [11].
Based on the results of our experiments and through
interviews with practitioners, we believe that a
onesizefitsall semantic search approach utilizing these
advancements is incapable of capturing the nuanced
relevance judgements made in the domain of patent
litigation. We demonstrated orders of magnitude
improvement in practically relevant task performance
through modification of semantic representation models,
using preprocessing of text, customized forms of relevance
representation of claims, such as their aboutness and the
use of humaninloop feedback to better curtail the
possibilities from a list of semantically relevant documents.
We observed that given the abductive nature of legal
relevance more foundational work on representation as well
as a descriptive taxonomy of the patterns of relevance
expected by legal practitioner is likely to help in better
performing semantic representations. This will require
further engagement with the legal community to ensure that
the computational work and machine learning protocols are
guided by specific intuitions and areas of focus of
practitioners. The benefit of this work is likely to be twofold:
(i) better performance of patent search and analytic systems
that could support the quality and efficiency of the work in
the field and (ii) a more in depth understanding of the
implicit criteria embodied in the legal record of decades of
judgements and rulings that could serve as a valuable
learning   and   policy   tool   for   the   field   at   large.

5.      SUMMARY   AND   CONCLUSIONS
In this paper we introduced a new data set relevant for
patent retrieval, and more generally, for modeling legal
relevance, namely a collection of rulings from the USPTO
Patent   Trial   and   Appeal   Board   (PTAB).
We have used eight thousand documents from this
data set to perform a collection of experiments. These
experiment show the need for new models of relevance. We
presented and evaluated a number of such models based
on distributional and structural features of patent data. We
also argued that we need a new collection of approaches to
computational modeling of relevance, and the most
promising avenues of research will have to include an
interactive,   human   in   the   loop   approach.
In addition, using the PTAB data set for testing
relevance in patent document retrieval, instead of traditional
citations search, shows a bigger gap between the needs of
practitioners and the capabilities of current information
retrieval   and   NLP   technologies.
Consequent to our conclusions, we believe future
work in this area would include three major streams: (i) The
hypothesizing and testing of semantic representations that
8

allow for automated classification of historically observed
modes of legal relevance (e.g. do PTAB rulings where
patents were found to be relevant to individual claim level
semantic relatedness explain more cases than document
level semantic relatedness?) (ii) Incorporating novel
humaninloop feedback methods and then back testing
their performance in practically valuable litigation scenarios
such as PTAB IPR datasets (iii) Input from legal theorists
and practitioners on the accurate classification of modes of
relevance judgements (e.g. enumeration of the types and
forms of arguments typically used in support of legal
relevance   judgements).
REFERENCES
[1]   Allison,   J.R,   MA   Lemley,   &   D.L.   Schwartz   (2014).
Understanding   the   Realities   of   Modern   Patent   Litigation   .
Texas   Law   Review   1769   (2014;.   Available   at   SSRN:
http://ssrn.com/abstract=2442451
[2]   Buchanan,   B.,   &   Headrick,   T   (1970).   Some   Speculation
About   Artificial   Intelligence   and   Legal   Reasoning.   Stanford
Law   Review,   Volume   23,   No.   1,   November   1970
[3]   Boullier,P.   &   B.   Sagot   (2005).         Efficient   and   robust   LFG
parsing:   SXLFG.   Proceedings   of   the   Ninth   International
Workshop   on   Parsing   Technologies   (IWPT).
http://www.aclweb.org/anthology/W051501
[4]   Chien,   C.V   &   C.   Helmers   (2015).      Inter   Partes   Review
and   the   Design   of   PostGrant   Patent   Reviews.   Santa   Clara
Univ.   Legal   Studies   Research   Paper   No.   1015.
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=260156
2
[5]   Choi,J.D.,      J.   Tetreault      and   A.   Stent   (2015)      It   Depends:
Dependency   Parser   Comparison   Using   A   Webbased
Evaluation   Tool.      Proceedings   of   the   53rd   Annual   Meeting   of
the   Association   for   Computational   Linguistics   and   the   7th
International   Joint   Conference   on   Natural   Language
Processing.
http://www.aclweb.org/anthology/P/P15/P151038.pdf
[6]   Golestan   Far      M.,   S.   Sanne,   (2015)   “On   term   selection
techniques   for   patent   prior   art   search,”   in   Proceedings   of   the
38th   International   ACM   SIGIR   Conference.
[7]   Horty,   J.   (1962).   The   "Key   Words   in   Combination"
Approach.   MULL:   Modern   Uses   of   Logic   in   Law,   Vol.   3,   No.
1   (MARCH   1962),   pp.   5464
[8]   Khoury,   A.   and   R.   Bekkerman.   "Automatic   Discovery   of
Prior   Art:   Big   Data   to   the   Rescue   of   the   Patent   System,   16
J.   Marshall   Rev.   Intell.   Prop.   L.   44   (2016)."   The   John
Marshall   Review   of   Intellectual   Property   Law   16.1   (2016):   3.
[9]   Lally,   A.   et   al   (2014).   WatsonPaths:   scenariobased
question   answering   and   inference   over   unstructured
information.      IBM   Research   (2014).   RC25489
(WAT1409048)   September   17,   2014
[10]   McDonald,R   &   J.   Nivre   (2007).   Characterizing   the
Errors   of   DataDriven   Dependency   Parsing   Models.
Proceedings   of   the   2007   Joint   Conference   on   Empirical
Methods   in   Natural   Language   Processing   and
Computational   Natural   Language   Learning.
http://www.aclweb.org/anthology/D071013.
[11]   Mikolov,   T.   et   al.   2013.   Distributed   representations   of
words   and   phrases   and   their   compositionality.   Advances   in
neural   information   processing   systems.
[12]   Schwartz,   D.L.   (2012).      The   Rise   of   Contingent   Fee
Representation   in   Patent   Litigation.   Alabama   Law   Review
335   (2012).   Available   at   SSRN:
http://ssrn.com/abstract=1990651
[13]   Shalaby,   W.   and      W.   Zadrozny   (2015).   Measuring
Semantic   Relatedness   using   Mined   Semantic   Analysis.
arXiv   preprint   arXiv:1512.03465.
[14]   Strumsky,   D.   and   J.   Lobo.   2015.   Identifying   the   sources
of   technological   novelty   in   the   process   of   invention.
Research   Policy   (44/8).
[15]   Strzalkowski,   T.      (ed.)   (1999).   Natural   Language
Information   Retrieval.   Springer.
[16]   Wyner,   A.   &   Peters,   W.,   Lexical   Semantics   and   Expert
Legal   Knowledge   Towards   the   Identification   of   Legal   Case
Factors,   Proc.   JURIX   2010,   127136.
[17]   Youn,   H.   et   al;   (2015).      Invention   as   a   combinatorial
process:   evidence   from   US   patents.      Journal   of   The   Royal
Society   Interface   12   106   20150272.   The   Royal   Society.

9

ANALYTICS OF PATENT CASE RULINGS: EMPIRICAL EVALUATION OF MODELS FOR LEGAL RELEVANCE

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to ANALYTICS OF PATENT CASE RULINGS: EMPIRICAL EVALUATION OF MODELS FOR LEGAL RELEVANCE

Similar to ANALYTICS OF PATENT CASE RULINGS: EMPIRICAL EVALUATION OF MODELS FOR LEGAL RELEVANCE (20)

More from Kripa (कृपा) Rajshekhar

More from Kripa (कृपा) Rajshekhar (9)

Recently uploaded

Recently uploaded (20)

ANALYTICS OF PATENT CASE RULINGS: EMPIRICAL EVALUATION OF MODELS FOR LEGAL RELEVANCE