Recent progress in incorporating word order and semantics to the decades-old, tried-and-tested bag-of-words representation of text meaning has yielded promising results in computational text classification and analysis. This development, and the availability of a large number of legal rulings from the PTAB (Patent Trial and Appeal Board motivated us to revisit possibilities for practical, computational models of legal relevance -- starting with this narrow and approachable niche of jurisprudence. We present results from our analysis and experiments towards this goal using a corpus of approximately 8000 rulings from the PTAB. This work makes three important contributions towards the development of models for legal relevance semantics: (a) Using state-of-art Natural Language Processing (NLP) methods, we characterize the diversity and types of semantic relationships that are implicit in select judgements of legal relevance at the PTAB (b) We achieve new state-of-art results on practical information retrieval tasks using our customized semantic representations on this corpus (c) We outline promising avenues for future work in the area - including preliminary evidence from human-in-loop interaction, and new forms of text representation developed using input from over a hundred interviews with practitioners in the field. Using the PTAB data set for testing relevance in patent document retrieval, instead of traditional citations search, also shows a bigger gap between the needs of practitioners and the capabilities of current information retrieval and NLP technologies.
2. SUBMITTED DRAFT BEING REVISED FOR ASAIL WORKSHOP, JUNE16, 2017
INPUT, QUESTIONS & FEEDBACK WELCOME
https://nms.kcl.ac.uk/icail2017/icail2017.php
judgment within a specific area of the law. The models are
considered adequate in aggregate under some arbitrarily
reducible measure of prediction accuracy across a corpus
selected from that specific area of the law.
The purpose of this paper is to outline our
approach to the development and testing of several
computational models for legal relevance in the narrow
domain of patent law, specifically as documented through
select proceedings of the USPTO PTAB cases. For our
tests of we use a collection of Ex Parte Reexamination
(EPR) patent case rulings, including the patents explicitly
mentioned in decisions of the Patent Trial and Appeal Board
(PTAB).
We present results from our analysis and
experiments towards this goal using a corpus of
approximately 8000 rulings from the PTAB. This work
makes three important contributions towards the
development of models for legal relevance semantics: (a)
Using stateofart Natural Language Processing (NLP)
methods, we characterize the diversity and types of
semantic relationships that are implicit in select judgements
of legal relevance at the PTAB (b) We achieve new
stateofart results on practical information retrieval tasks
using our customized semantic representations on this
corpus (c) We outline promising avenues for future work in
the area including preliminary evidence from
humaninloop interaction, and new forms of text
representation developed using input from over a hundred
interviews with practitioners in the field.
Using the PTAB data set for testing relevance in
patent document retrieval, instead of traditional citations
search, also shows a bigger gap between the needs of
practitioners and the capabilities of current information
retrieval and NLP technologies. For example, in contrast to
recent results [8], we do not find that documents not in the
semantic neighborhood of the query document, can still be
very relevant for the query. The inadequacies of using
citations were also discussed in different context by
researchers studying innovation [14, 17]. Together they
point to the need to use other data sets and not just
citations.
The remainder of the paper is organized as follows: In
Section 2 we discuss the practical motivations and
practitioners requirements of prior art search. Section 3
introduces the data set. The results are presented in
Section 4, of which Subsection 4.2 gives the details of our
experiments. Since the experiments reveal limitations of
current forms of representing legal relevance, the question
is how we go about building better models for this purpose –
this is discussed in Section 5. Conclusions (Section 6)
summarize our results.
2. PRACTITIONER REQUIREMENTS
Given the nuance and complexity implicit in legal
judgement, we are skeptical that a onesizefitsall
“magicbullet” AI solution will adequately model outcomes in
the field. Furthermore, comparing the current state of art to
legal information retrieval over 50 years ago [7], we observe
that changes in algorithms and models of text
representation have lagged far behind the dramatically
improved access to data and growth in computational
power. This disappointing state of art has been noted by
others, for example in discussing the inadequacies of
leading search engines [8].
We believe this is in part due to the lack of practical
methods for computational modelling and for representing
legal relevance, and in particular the relevance of other
documents (patents) to a particular examined technology.
Towards this end, we see this paper as a small part of a
broader undertaking: the development of practical models
and theories of legal relevance that can be shared, added to
and built upon by practitioners and researchers alike. While
this work focuses on patent law, there are synergies with
work in other areas that bring domain aware case factors
into computer models[16].
While limited scholarly attention has been given to the
requirements of practitioners in patent litigation and related
areas, we were able to use informal interviews and literature
in the area of complex search to identify a few themes of
interest. We seek to explore some of these themes further
in this paper and in future work. In particular, this paper is
focused on the more foundational topic of modelling legal
relevance. These models are likely to be helpful in the
practical work of legal professionals in the field and the
evaluation of legal procedures across the field to improve
the quality of patent grant and enforcement procedures. A
descriptive model of relevance is also arguably a
precondition for a computational theory of semantics in the
domain.
Patent cases have substantial uncertainty [12],
primarily due to the challenges implicit in knowing the entire
universe of prior art before litigation commences and
reconciling the case at hand with relevant prior case law:
“difficulty in knowing the relevant facts to the dispute and
difficulty in knowing how a trier of fact will evaluate the
facts… knowing the entire universe of prior art is impossible
2
3. SUBMITTED DRAFT BEING REVISED FOR ASAIL WORKSHOP, JUNE16, 2017
INPUT, QUESTIONS & FEEDBACK WELCOME
https://nms.kcl.ac.uk/icail2017/icail2017.php
before litigation commences”[12].
We note that the typical litigation workflow is
accompanied by a diverse set of requirements at different
phases of the process for instance, exploration of case
law and the technology landscape at the outset of a case,
followed by an analysis of semantically and contextually
linked outcomes relevant to the matter at hand, and then
assistance in selecting and narrowing in on more specific
artifacts (for example, highly relevant patents) to be used in
the preparation for potential litigation. Restricting ourselves
to the last step of the patent litigation workflow, identifying
highly relevant prior art is a particular use case of interest in
this paper. After a patent is granted, its validity can be
challenged in litigation or in several postgrant proceedings.
In the majority of these challenges, it is necessary to find
and examine a number of documents from a potentially very
large pool of patent and technical literature; that is,
establish the relationship of the invention to the prior art.
For the purpose of this paper, we do not need to
get into the legal differences between different types of
proceedings
. Also, we do not need to attend to the
2
differences between different patent jurisdictions, because
the technical problems of text analytics and information
retrieval are the same.
Finding references potentially invalidating a patent
is perhaps more challenging than finding (some) relevant
prior art. For example, the average number of cited
references in a patent is about 40
, while the number cited
3
in invalidation decisions is usually less than 5. Arguably, any
patent search supporting invalidation has to be very precise.
Finding such relevant documents is nontrivial,
because many documents refer to the same concepts that
describe the invention at hand, and these documents can
appear in multiple patent classes and broad scientific and
technical literature. Moreover, similar concepts, relations
and functionalities might be expressed in different words, so
keyword search is not sufficient to find all relevant
documents. Therefore this search process is labor
intensive, costly and possibly error prone, even with the
support of modern information retrieval tools.
Analyzing a collection of patents and related product or
scientific literature is also costly, mostly because it takes
time and requires highly trained workforce (lawyers and
domain experts). What is important from our perspective,
2
For example, http://www.pillsburylaw.com/postgrantproceedings or
http://fishpostgrant.com/postgrantreview/. See also
https://en.wikipedia.org/wiki/Patent_Trial_and_Appeal_Board and
http://www.uspto.gov/patentsapplicationprocess/patenttrialandappealboa
rd0. 3
http://patentlyo.com/patent/2015/08/citingreferencesalternative.html
there are few analytic tools that can support this process.
Most of the patent analytics tools analyze metadata
, for
4
example probabilities of finding a patent invalid based on
statistics on trial location, examination artunit, etc. Allison
et al. [1] provide an indepth analysis of the “Realities of
Modern Patent Litigation” relating “the outcomes (…) to a
host of variables, including variables related to the parties,
the patents, and the courts”.
Our goal as technology developers lies in
improving patent analytic tools; our goal as researchers is to
understand the obstacles on this path, and finding ways of
avoiding them.
We note that legal reasoning is abductive since the models
implicit in particular cases are individually neither necessary,
nor sufficient, to explain all cases, but rather, are good
enough to model outcomes in only some reasonable
sample of cases. For instance, our analysis shows that
aggregate document level semantic relatedness is an
adequate mode of reasoning in only a small minority of
USPTO Ex Parte Reexamination (EPR) cases.
Clearly other abductive reasons (models) for
relevance are needed to explain the remaining instances.
Manual examination of cases with variance demonstrates
that while relevant terms and semantic links are present, a
high frequency of related words and phrase occurrence is
neither a necessary nor a sufficient condition for legal
relevance.
Before we discuss the models, let us say a few
words about the data we use to test them.
3. PATENT TRIAL AND APPEAL BOARD
(PTAB) DATA SETS
Post grant review and Inter Partes Review (IPR) is
conducted at the USPTO Patent Trial and Appeal Board
(PTAB) and is aimed at reviewing the patentability of one or
more claims in a patent. It begins with a third party petition
to which the patent owner may respond. A post grant review
is instituted if it is more likely than not that at least one claim
challenged is patentable. If the petition is not dismissed, the
Board issues a final decision within 11.5 year
. Chien and
5
Helmers [4] discuss “Inter Partes Review and the Design of
PostGrant Patent Reviews” processes and key statistics,
including the statistics of case dispositions. USPTO notes
that 80% of the IPR reviews ending with some or all claims
4
E.g. https://lexmachina.com/legalanalytics/ 5
http://www.uspto.gov/patentsapplicationprocess/appealingpatentdecisions/
trials/postgrantreview
3
4. SUBMITTED DRAFT BEING REVISED FOR ASAIL WORKSHOP, JUNE16, 2017
INPUT, QUESTIONS & FEEDBACK WELCOME
https://nms.kcl.ac.uk/icail2017/icail2017.php
invalidated
6
What is in the PTAB data? Patent Trial and Appeal Board
(PTAB) publicly available dataset, as of Jan 2017 has about
100 zip files containing 10 GB of data (compressed)
.
7
These files are either image or text .pdf files with PTAB
decisions. Each decision pertains to the validity of claims of
one patent.
Why care about PTAB data? Because each case has a
relatively small collection of highly relevant documents used
as evidence. The outcomes are clear and the reasoning can
be modeled. There’s enough data for statistical inference
(although perhaps not enough to train a neural net from
scratch). Also, as mentioned earlier, the PTAB data set
might represent better the practitioner needs, as contrasted
with using citations as such representation.
In this paper we report on some initial experiments on the
PTAB data sets. Given the relatively structured form of the
data available and the more streamlined process used in
adjudication, we believe that PTAB data represents a
unique training corpus to develop and improve customized
tools used in the areas of patent litigation and licensing, and
as we discuss later, it might also be a better measure of
satisfying practitioner needs than citations retrieval.
4. EXPERIMENTS AND RESULTS FROM
SEMANTIC ANALYSIS OF PTAB RULINGS
Encouraged by the recent development in neural language
model representations [11], and the availability of a rich
corpus of documents capturing relevance judgements in
Patent Law, we sought to explore the extent to which a
computational theory of semantic relevance in this area of
law was possible. As described in the section on practical
motivation, such a theory would be of great utility to
practitioners and policy makers in this area of law. Our
experimental approach is therefore both theoretically and
practically motivated, empirical but with an emphasis on
exploring possibilities and limits of such a theory.
For the experiments, we use a sample of 8000 EPR rulings
from the USPTO Final Decisions of the Patent Trial and
Appeal Board. Our experiments use subsets of the data to
6
http://www.uspto.gov/patentsapplicationprocess/patenttrialandappealboa
rd/statistics 7
Available at: https://bulkdata.uspto.gov/data2/patent/trial/appeal/board/
(i) perform an analysis of relationships between the pairs of
patents associated in the approximately, (ii) conduct an
assessment of the impact types of semantic representations
have on the practically meaningful task of relevant patent
retrieval and (iii) empirically explore possibilities of alternate
forms of text representation to model legal relevance and
enable humaninloop interaction to improve patent retrieval
performance.
4.1. Details of the experiments
Out tests consist in using different techniques to retrieve
patents cited in PTAB decisions, based on queries built on
the patent whose validity is being questioned. Such queries
typically consisting of the combinations of the patent
abstract, its title, or its first claim. As baselines for our
evaluations we used both bagofwords (BOW) query
representations, and semantic search implemented using
conceptual expansion of query words. The conceptual
expansion was implemented using Wikipedia derived
related concepts, similarly to the standard approaches e.g.
[8, 13].
Experiment 1. To evaluate the hypothesis that aggregate
document level semantic relatedness is an important factor
in a potential model for legal relevance in the patent
domain, we attempted to quantify the correlation between
semantic relatedness and patent relevance using a sample
of 245 semiconductor EPR cases. In this case the recall at
1000 was 30%.
However, the point is that this measure of semantic
similarity is inadequate to capture PTAB relevance: only
30% of the subject patents did the 1000th ranking patent
document returned by our stateofart semanticrelatedness
model score less than the PTAB relevant patent using a
cosinesimilarity measure of relatedness. This drops to 15%
when the 100th document is considered. This result, is
consistent with the expectations of lawyers and other
practitioners that we have interviewed as part of this project.
However this results also provides an interesting contrasts
with the assumptions of other researchers such as Khoury
and Bekkerman [8] who suggest that “if a given document is
not in the semantic neighborhood of the query document, it
simply cannot be relevant for the query document". Our
work challenge, with experimental results, the
understandable intuition, that relevant priorart must
necessarily be found in the set of documents that have a
high degree of semantic similarity as measured by
stateofart text processing methods. Notice we contrast
wouldn’t have unlikely to be discovered without the PTAB
4
5. SUBMITTED DRAFT BEING REVISED FOR ASAIL WORKSHOP, JUNE16, 2017
INPUT, QUESTIONS & FEEDBACK WELCOME
https://nms.kcl.ac.uk/icail2017/icail2017.php
data set, given that the other works use citations to model
relevance.
Overall, this experiment suggests the need for additional
methods for association, a greater variety of semantic
connections, and perhaps more sophisticated interpretation
of the patent claims language.
Experiment 2. To quantify the improvement possible
through the use of better semantic representations, we
benchmarked a Word2Vector [11] model trained on
preprocessed text, grouped by subsector of patents.
In specific, we classify each patent to one of 37
industry groupings (e.g. Computer Hardware & Software,
Metalworking). The groups correspond to the standard
NBER subcategories
. The claim text for each of the
8
patents was then modified to include references to special
words that uniquely identified each patent as a new word in
our vocabulary (e.g. _6435262_ for the patent US 6435262
B1). Cross references, including citations, between claims
were then tagged at the claim level with the relevant unique
patent identifiers to improve locality sensitive mapping
between a patent and the various claims that related to it.
The training of the word vector models was then carried out
individually for each of the 37 preprocessed text corpora,
providing us with a wordtovector model corresponding to
each of the industry groups (Skipgram training, with 200
dimension vector representation and minimum word count
of 4 was used). Given the trained word2vector [11]
models, a simple semantic retrieval task then amounts to
finding the closest patent identifier word (treated as a
special word in the vocabulary) to the identifier for a patent
of interest. Proximity in our case was measured with the
commonly used cosine similarity measure. This measure, or
relative ranking, could be further improved upon with
additional semantically important representations to more
closely model the type of relevance desired in our case,
the relevance of two patent documents based on PTAB
guidelines. We make some suggestions along these lines in
our experiments on humaninloop emulation.
We used 1500 PTAB pairs in this test. Using bagofwords
with conceptual query expansion resulted in a 4.9% sample
match for Recall @ 100, and was indistinguishable from
using simple BagOfWords (BOW), and thus either could
constitute a baseline. However, using the subsector
specific model resulted in a significant improvement: Recall
@ 100 of 19%. This increase in performance stemming
from the more accurate modelling of semantic relationships
attuned to industry sector specific language use.
8
The subcategories are identified at
http://www.nber.org/patents/subcategories.txt
Experiment 3. In this experiment we attempted to quantify
the relative impact of elementary humaninloop intervention
on retrieval performance. We have observed instances
where simple reranking of search results based on user
feedback on positive/negative document examples, allows
for a matching document that was ranked below 5000 to be
retrieved in the top 100 in onestep of user feedback, for
example helping the ranking methodology disambiguate the
erroneous sense in which the acronym ATM was used the
Asynchronous Transfer Mode telecommunication network
technology, in contrast to the intended payment terminal
technology or Automatic Teller Machine, sense of the term.
To further test this intuition we attempted to
emulate the action of a user applying simple heuristics to
improve the results, by eliminating groups of retrieved
patents that on simple visual inspection are unlikely to be
relevant matches. We measure the impact of such
intervention as the improvement in Recall performance. For
example, we show that Recall @ 200 without intervention is
approximately 10% but increases to approximately 15%
using a simple intervention based on humaninloop like
heuristic intervention.
In specific, using 90 PTAB patent pairs data, we
attempted to emulate humaninloop behavior using a
coarse method of additional screening. While the numbers
small they indicate the potential for improvement in recall
performance using a comparable humaninloop feedback
that relies on actual user judgement (versus the emulated
approach in our experiment).
The specific filters we used: the patent pairs
considered are of the same Type (i.e. ‘device’, ‘method’,
‘system’ or ‘other’ using corresponding keywords in the first
claim). In addition, we used their Aboutness (represented by
the first noun, adjective and verb in the same claim) and
Verb Signature (most frequently cited verbs in the same
claim) share at least one word with the patent that is the
focus of the PTAB decision.
We dropped the other results that do not meet the
criteria. The rest of the result frame (top 100 ranking
retrieved patents) are filled with other top semantically
sorted search results. In another test, we also considered
Claim 1 length, dropping patents with first claim longer than
200 or shorter than 10.
Operating on retrieved results using Type,
Aboutness, Verb Signatures features as filters had
significant scope, as measured in terms of retrieved results
impacted.
5
6. SUBMITTED DRAFT BEING REVISED FOR ASAIL WORKSHOP, JUNE16, 2017
INPUT, QUESTIONS & FEEDBACK WELCOME
https://nms.kcl.ac.uk/icail2017/icail2017.php
Figure 1: Experiment 3. Results from HumanInLoop
Emulation. The blue line is the recall baseline using
semantic search. The red line show shows adjustments
based on type, ‘aboutness’ of the claim, and verb
pattern. The red line adds a length of claim 1 filter
(dropping very short and very long claims). This
experiment was performed using 90 patent pairs
derived from PTAB data. The convergence of lines at
small and large values suggests that proper calibration
of humanintheloop tools will be crucial.
It is worth noting that small changes in filters, often impact
thousands of results at a time. For example, the top 20 most
frequent Aboutness and Verb Signatures words could,
through 'OR' operations, span tens of thousands of results.
This is another argument for humanintheloop approach.
Experiment 4. To evaluate other forms of representation
that allow a more granular, but human understandable,
control of results, we explored a simple set of words model
of claim language to augment the humaninloop methods
described above. For this experiment, patent claims were
processed into phrase chunks unordered word sets (110
in length). Each patent typically has 5075 such unique
wordsets, 50% of these chunks were unigrams. The
relatedness of two patents could then be implemented with
easier to intuit user input implemented as chunk (set of
words) inclusion/exclusion.
Our experiments showed that this representation
had discriminating power and could be a candidate for
further humaninloop experimentation. Using the
representation of abstracts for 100 PTAB pairs, the charts
show the comparison of count of word set intersection
divided by the size of the subject word set. The LHS chart in
“Histograms of extent of wordset overlap, PTAB relevant
….” Figure 2(a) is for the actual relevant pairs. RHS chart in
Figure 2(a) is for the same subject patent and a randomly
selected patent from the list of 300 or so (subject +
matching results) in the set. We note that a set intersect
measure > 10% correlates with patent relevance in 80% of
the cases. (The remaining 20% could be cases where claim
language, detailed spec or other features drove the
relevance match even though abstract language didn't have
this set intersect match). We note that a 10%40%
setintersect accounts for a large majority of the matching
pairs.
Figure 2 (a). Histograms of extent of wordset overlap, PTAB
relevant Vs. random pairs, showing how degree of overlap is
correlated with relevance.
To evaluate the ability of this form of representation to
discriminate between semantically related, but not legally
relevant patents the LHS chart in Figure 2(b) shows the max
wordset intersection for 100 PTAB subject patents and the
top 50 relevant patents, returned by our best performing
subsectorlanguage trained Word2Vector model. The RHS
chart in Fig.2(b): “PTAB subject patents and Top 50
semantically ranked patents”. Figure 2(b) shows the
minimum word set intersect for the same 50 as a point of
comparison.
We note that a majority of the top 50 semantically
similar documents pass the wordset intersect threshold for
"match" as evaluated against the measurement from
relevant pairs versus randomly selected pairs. Since
semantic similarity generally coincides with bagofword
similarity, this is an unsurprising result. We expect that
multiple candidates will match based on the set overlap
representation but be readily amenable to
humaninfeedback given the ease with which a user can
examine and filter the underlying wordset representation.
6
7. SUBMITTED DRAFT BEING REVISED FOR ASAIL WORKSHOP, JUNE16, 2017
INPUT, QUESTIONS & FEEDBACK WELCOME
https://nms.kcl.ac.uk/icail2017/icail2017.php
From our interviews with practitioners in the field, we
anticipate that this type of representation, that allows
intuitive human intervention, could offer a convenient
approach to dynamically adjust the search landscape and
reveal better quality candidates for human intuition
Figure 2 (b). Histograms of extent of wordset overlap, PTAB
subject patents and Top 50 semantically ranked patents,
showing correlation of semantic relatedness and word overlap,
in instances appearing higher than with PTAB relevant pairs.
4.2. Results
We report three main results; the first two concerning the
viability of a semantic representation in the domain that
yields practically useful results, and the third result in
representation and humaninloop retrieval methodology,
which we see as having the greatest promise, and
suggesting a path for future development. We observe:
Result 1. Less than 15% of patents judged as being
relevant according to a PTAB ruling appear to have stronger
aggregate semantic relatedness with other patents that
share word or topic other associations than with each other.
For instance, two patents responsive to the topic “foundry”
that both deal with metal forming may not be highly
related to each other (in the sense of contributing to patent
invalidation), although they may be a strong semantic
match. However, one that deals with slurrying and foundry
processes, which is related, but not a strong semantic
conceptual match, does indeed invalidate one of the patents
in another test case.
Result 2. Improving patent text representation through
preprocessing steps and using word embeddings of patent
specific keywords, and distinct subsector specific training,
improves recall performance from 5% to 20%.
As a performance measurement baseline, we used
BOW modified with a concept representation along the lines
of [8] and [13]. We focused on the information retrieval (IR)
task of finding patents cited in PTAB documents. where 5%
of the test sample recalled the relevant document in the top
100 results. Using our customized subsector specific
semantic model representation, we were able to retrieve the
relevant document in the top 100 results in 20% of the
cases. This result promises future improvement from
methods that model semantic relatedness in more granular
ways, down from subsector specific to perhaps patenttype
(device, method, apparatus) specific modelling of relevance.
Result 3. Consistent with improvements demonstrated by
others, we have preliminary evidence from our experiments
with Humaninloop retrieval that point to dramatic
performance improvement in particular cases. To improve
methods of incorporating user feedback, we have also
evaluated simple forms of patent language representation to
approximate the heuristics that practitioners rely on in their
search strategies, such as term overlap, sector information,
‘what the patent is about’ and Claim 1 length. We show that
a single iteration of such filtering can improve performance
by 50% as measured by recall @200. However, at other
recall values (smaller and larger) the improvements are
lower, so the proper calibration of human in the loop tools
will be crucial.
5. DISCUSSION
In this section we comment on our use of distributional
approaches vs. traditional NLP techniques, as well as the
advantage of the human in the loop approach. We also add
some comments on a possibility of a computational models
of legal relevance.
5.1. Why only use distributional approaches?
One possible objection towards the distributional
approaches is that they do not capture deeper semantic
meanings of patent texts or claims. However, the current
state of natural language processing suggests we need new
tools to capture finer distinction in meanings, beyond the
standard words to syntax to semantics pipeline.
Experiments with parsing patent claim seem to
show that none of the existing tools can do it. This is not
surprising. The average length of Claim 1 in between 150
and 200 words, when measured in a few weekly samples of
US patents in 2016. In addition more than 90% of the first
claims are longer than 50 words. Average sentence length
in the Wall Street Journal Corpus is 19.3 words [15], ranging
from 3 to 20. And most natural language parsers are trained
7
8. SUBMITTED DRAFT BEING REVISED FOR ASAIL WORKSHOP, JUNE16, 2017
INPUT, QUESTIONS & FEEDBACK WELCOME
https://nms.kcl.ac.uk/icail2017/icail2017.php
on similar corpora. We also know that parsing accuracy
decreases with the length of the sentence. For example,
McDonald and Nivre [10; Fig. 2] show that parsing accuracy
drops 10 points or more per 40 words, and similar results
appear elsewhere [5]. Even worse, Boullier and Sagot [3]
entertain a possibility that “(full) parsing of long sentences
would be intractable” (long, meaning more than 100 words).
This means that an analysis of the structure for an average
crucial Claim 1 is likely to be wrong, until we create a better
sentence analysis tools.
5.1. Additional comments on human in the loop
The result of patent humaninloop retrieval methods are
promising. They are also consistent with results from
search query modification experiments [6], where “baseline
performance can be doubled if only one relevant document
was manually provided by the user”. Similarly, IBM Watson
system is advocating an interactive approach to symptoms
classification [9].
As we show, simple humaninloop intervention by means
of filtering of results using heuristically selected word
features (e.g. type of patent, earliest appearing
nounadjectiveverb sequence) can modify rankings
significantly, with preliminary evidence showing over 50X
improvement, where a retrieval task failed Recall @ 5000
but succeeding Recall @100 with user feedback.
Simple methods, a threeword summary or a patent’s
aboutness prove to be a productive way for the user to
include/exclude groups of patents or claim language,
closely mimicking the skimming of the detailed text that is
performed by the expert human reader in practice. These
improvements in practically inspired forms of patent text and
semantics representation are likely to require language
models tuned to the specific nuances of the text in this
narrow domain. We need more of these.
Furthermore, through over a hundred interviews with
industry practitioners to understand their expectations of
search tools and common practices, we recognize the
importance of alternative forms of representation, essential
to support the different points of view implicit in judging
relevance. Again, we are planning to experiment with other
representations.
5.3. Towards a computational theory of legal
relevance
Incorporating word order and semantics in text
representation is a recent win for the field of NLP [11].
Based on the results of our experiments and through
interviews with practitioners, we believe that a
onesizefitsall semantic search approach utilizing these
advancements is incapable of capturing the nuanced
relevance judgements made in the domain of patent
litigation. We demonstrated orders of magnitude
improvement in practically relevant task performance
through modification of semantic representation models,
using preprocessing of text, customized forms of relevance
representation of claims, such as their aboutness and the
use of humaninloop feedback to better curtail the
possibilities from a list of semantically relevant documents.
We observed that given the abductive nature of legal
relevance more foundational work on representation as well
as a descriptive taxonomy of the patterns of relevance
expected by legal practitioner is likely to help in better
performing semantic representations. This will require
further engagement with the legal community to ensure that
the computational work and machine learning protocols are
guided by specific intuitions and areas of focus of
practitioners. The benefit of this work is likely to be twofold:
(i) better performance of patent search and analytic systems
that could support the quality and efficiency of the work in
the field and (ii) a more in depth understanding of the
implicit criteria embodied in the legal record of decades of
judgements and rulings that could serve as a valuable
learning and policy tool for the field at large.
5. SUMMARY AND CONCLUSIONS
In this paper we introduced a new data set relevant for
patent retrieval, and more generally, for modeling legal
relevance, namely a collection of rulings from the USPTO
Patent Trial and Appeal Board (PTAB).
We have used eight thousand documents from this
data set to perform a collection of experiments. These
experiment show the need for new models of relevance. We
presented and evaluated a number of such models based
on distributional and structural features of patent data. We
also argued that we need a new collection of approaches to
computational modeling of relevance, and the most
promising avenues of research will have to include an
interactive, human in the loop approach.
In addition, using the PTAB data set for testing
relevance in patent document retrieval, instead of traditional
citations search, shows a bigger gap between the needs of
practitioners and the capabilities of current information
retrieval and NLP technologies.
Consequent to our conclusions, we believe future
work in this area would include three major streams: (i) The
hypothesizing and testing of semantic representations that
8
9. SUBMITTED DRAFT BEING REVISED FOR ASAIL WORKSHOP, JUNE16, 2017
INPUT, QUESTIONS & FEEDBACK WELCOME
https://nms.kcl.ac.uk/icail2017/icail2017.php
allow for automated classification of historically observed
modes of legal relevance (e.g. do PTAB rulings where
patents were found to be relevant to individual claim level
semantic relatedness explain more cases than document
level semantic relatedness?) (ii) Incorporating novel
humaninloop feedback methods and then back testing
their performance in practically valuable litigation scenarios
such as PTAB IPR datasets (iii) Input from legal theorists
and practitioners on the accurate classification of modes of
relevance judgements (e.g. enumeration of the types and
forms of arguments typically used in support of legal
relevance judgements).
REFERENCES
[1] Allison, J.R, MA Lemley, & D.L. Schwartz (2014).
Understanding the Realities of Modern Patent Litigation .
Texas Law Review 1769 (2014;. Available at SSRN:
http://ssrn.com/abstract=2442451
[2] Buchanan, B., & Headrick, T (1970). Some Speculation
About Artificial Intelligence and Legal Reasoning. Stanford
Law Review, Volume 23, No. 1, November 1970
[3] Boullier,P. & B. Sagot (2005). Efficient and robust LFG
parsing: SXLFG. Proceedings of the Ninth International
Workshop on Parsing Technologies (IWPT).
http://www.aclweb.org/anthology/W051501
[4] Chien, C.V & C. Helmers (2015). Inter Partes Review
and the Design of PostGrant Patent Reviews. Santa Clara
Univ. Legal Studies Research Paper No. 1015.
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=260156
2
[5] Choi,J.D., J. Tetreault and A. Stent (2015) It Depends:
Dependency Parser Comparison Using A Webbased
Evaluation Tool. Proceedings of the 53rd Annual Meeting of
the Association for Computational Linguistics and the 7th
International Joint Conference on Natural Language
Processing.
http://www.aclweb.org/anthology/P/P15/P151038.pdf
[6] Golestan Far M., S. Sanne, (2015) “On term selection
techniques for patent prior art search,” in Proceedings of the
38th International ACM SIGIR Conference.
[7] Horty, J. (1962). The "Key Words in Combination"
Approach. MULL: Modern Uses of Logic in Law, Vol. 3, No.
1 (MARCH 1962), pp. 5464
[8] Khoury, A. and R. Bekkerman. "Automatic Discovery of
Prior Art: Big Data to the Rescue of the Patent System, 16
J. Marshall Rev. Intell. Prop. L. 44 (2016)." The John
Marshall Review of Intellectual Property Law 16.1 (2016): 3.
[9] Lally, A. et al (2014). WatsonPaths: scenariobased
question answering and inference over unstructured
information. IBM Research (2014). RC25489
(WAT1409048) September 17, 2014
[10] McDonald,R & J. Nivre (2007). Characterizing the
Errors of DataDriven Dependency Parsing Models.
Proceedings of the 2007 Joint Conference on Empirical
Methods in Natural Language Processing and
Computational Natural Language Learning.
http://www.aclweb.org/anthology/D071013.
[11] Mikolov, T. et al. 2013. Distributed representations of
words and phrases and their compositionality. Advances in
neural information processing systems.
[12] Schwartz, D.L. (2012). The Rise of Contingent Fee
Representation in Patent Litigation. Alabama Law Review
335 (2012). Available at SSRN:
http://ssrn.com/abstract=1990651
[13] Shalaby, W. and W. Zadrozny (2015). Measuring
Semantic Relatedness using Mined Semantic Analysis.
arXiv preprint arXiv:1512.03465.
[14] Strumsky, D. and J. Lobo. 2015. Identifying the sources
of technological novelty in the process of invention.
Research Policy (44/8).
[15] Strzalkowski, T. (ed.) (1999). Natural Language
Information Retrieval. Springer.
[16] Wyner, A. & Peters, W., Lexical Semantics and Expert
Legal Knowledge Towards the Identification of Legal Case
Factors, Proc. JURIX 2010, 127136.
[17] Youn, H. et al; (2015). Invention as a combinatorial
process: evidence from US patents. Journal of The Royal
Society Interface 12 106 20150272. The Royal Society.
9