SlideShare a Scribd company logo
Enhancing Legal Discovery with Linguistic Processing                                               1
D.G. Bobrow, T.H. King, and L.C. Lee
PARC



                      Enhancing Legal Discovery with Linguistic Processing

                       Daniel G. Bobrow, Tracy H. King, and Lawrence C. Lee
                                   Palo Alto Research Center Inc.
                                        www.parc.com/nlp

Introduction
The U.S. Federal Rules of Civil Procedure have greatly increased the importance of
understanding the content within very large collections of electronically stored information.
Traditional search methods using full-text indexing and Boolean keyword queries are often
inadequate for e-discovery. They typically return too many results (low precision) or require
tightly defined queries that miss critical documents (low recall.) Linguistic processing offers a
solution to increase both the precision and recall of e-discovery applications. We discuss four
issues in legal discovery that can be enhanced with linguistic processing: improving recall for
characterization, improving precision for search, protection of sensitive information, and
scalability.

Characterization: Recall
Especially in the initial stages of trial preparation, attorneys need to be able to retrieve all of the
information in a collection that is relevant to some characterization of interest. These
characterizations depend on the legal strategy and so need to be able to be quickly and flexibly
formulated. The most natural way to describe such content is in natural language and not in
heavily formalized regular expression languages. Linguistic processing on the query can help
generate rules in a higher level language much closer to natural language.

Two basic linguistic tools to aid in query generation for characterization are morphological
analysis and ontological information. For example, morphological analysis of the term 'buy' in a
query will produce 'buy', 'buying', and 'bought'. The more abbreviated and elliptical texts found
in email documents can be treated similarly. For example, common email abbreviations like
'mtg' can be run through a type of morphological analysis to match against 'mtg', 'meeting', and
'meetings'. Using a disjunction of all these forms in the search increases recall, which returns
both more relevant documents and more passages with examples from which to produce novel
queries. Ontologies, both domain specific and general, automatically produce synonyms
('buy'='purchase') and hypernyms (a boy is type of child is a type of human) which can be used to
expand the sample query into alternatives, again allowing for greater recall at the initial stages of
the characterization task. During this initial step, where recall is important and the entire
information collection is being culled, linguistic processing is only being done on the queries,
while the search over the information is done with more standard search techniques. This allows
massive information collections to be quickly processed more rapidly and thoroughly.

Search: Precision
An important aspect of legal discovery is finding information that answer specific questions or
that say specific things. By automatically processing the texts into more normalized, deep
semantic structures and then indexing these structures into a large database optimized for


May 2007
Enhancing Legal Discovery with Linguistic Processing                                          2
D.G. Bobrow, T.H. King, and L.C. Lee
PARC



semantic search, queries over the information collection can be made in natural language. These
linguistic structures normalize away from the vagaries of natural language sentences, encoding
the underlying meaning. At the simplest level, surface forms of words are stemmed to their
dictionary entry and synonyms and hypernyms are inserted. However, the linguistic processing
can go much deeper, normalizing different syntactic constructions so that expressions which
mean the same thing have the same linguistic structure. As a simple example, 'Mr. Smith bought
4000 shares of common stock.' and '4000 shares of common stock were bought by Mr. Smith'
will be mapped to the same structure and indexed identically. Thus the creation of this
semantically based index of information stores a normalized but highly detailed version of the
content in the information and includes links back to the original passages in the information.

The queries against the information collection are similarly automatically processed into
semantic representations at query time, and these semantic representations are used to query the
database for relevant documents. Unlike more standard search techniques, using the deeper
semantic structures allows for greater precision and hence fewer irrelevant documents to review.
The linguistic structures encode the relations between entities and actions (e.g., who did what
when) so that only documents describing entities in the desired relations are retrieved. For
example, standard search techniques would retrieve both 'X hit Y' and 'Y hit X' from a search on
the entities X and Y and the 'hit' relation since all of the relevant items are mentioned. However,
when searching for evidence in a massive information collection, it is important to return only
the text passages which refer to the intended relationship among the entities.

Redaction
E-discovery increases in complexity when issues of confidentiality are considered. Over the past
several years we have been researching intelligent document security solutions, initially focusing
on redaction. This line of research involves building better tools to detect sensitive material in
documents, especially entities and sensitive relations between entities, determining whether
inferences can be made even when sensitive passages have been redacted, and providing efficient
encryption techniques to allow content-driven access control.

The detection of sensitive material works on the same underlying technology described above for
enhancing recall and precision. The use of stemming, synonyms and hypernyms, and automatic
alias production increase recall, allowing for a single search to retrieve entities in many surface
forms. The structural normalization provided by the deep processing similarly allows for better
relation and context detection. As an additional part of the content discovery for redaction, our
current research examines ways to allow for collaborative work on the same document collection
so that knowledge discovery workers can benefit from each other's work and so that experts can
help hone the skills of novices. Another component of the project involves using the Web and
other large information collections to determine whether the identity of entities can be detected
even when they have been redacted. For example, removing someone's name but leaving their
birthdate, sex, and zip code may uniquely identify them, thereby suggesting that further material
needs to be redacted.




May 2007
Enhancing Legal Discovery with Linguistic Processing                                          3
D.G. Bobrow, T.H. King, and L.C. Lee
PARC



Once the sensitive text passages have been identified, we provide tools for encrypting document
passages and assigning keys so that different users can have access to different types of redacted
material. This makes it possible for the document to be viewed in different ways by different
people: some may have access to the whole document, some may not be able to see anything
related to entity X, and some may only be able to see publicly available material. This
encryption capability can either be used actively on the electronic versions of the documents or
can be used to prepare specially redacted versions for printing and shipping to different parties.

Scalability
As the average number of documents involved in each legal discovery process increases,
scalability is an important issue for any technology used in the process. The linguistic
processing that we advocate here is more computationally intensive than shallower methods such
as keyword search or basic regular expression pattern matching over plain text. To surmount this
issue, we use faster processes to go from, for example, 100 million documents to a few million
documents; these faster processes may be facilitated by some linguistic processing, e.g.
stemming of words so that more matches on basic keyword searches are found. Once the
original information collection is reduced to a more manageable load, then the slower but more
accurate linguistically-enhanced processes can be used to prune to a few hundred thousand. We
have evidence that this deeper linguistic processing will scale to hundreds of thousands of
documents, with processing time approaching one second per sentence. Once this initial
linguistic processing is done, then the resulting indexed documents can be used repeatedly in the
applications described above, thereby creating a resource to be shared across the discovery
processes.

Conclusion
There are a number of benefits from using linguistic processing in e-discovery applications.
Linguistic processing can provide fast and flexible characterization of large information
collections in pre-trial preparation, as well as enable high precision search and confidential
information access in discovery. While linguistic processing is more computationally intensive
than keyword search, the technology does scale well to large information collections and can
also be used in combination with standard search approaches to improve the management and
discovery of electronically stored information.




May 2007

More Related Content

Viewers also liked

Page design 2
Page design 2Page design 2
Page design 2zsmith
 
Abortion Decisions And The Duty To Screen Clinical, Ethical, And Legal Implic...
Abortion Decisions And The Duty To Screen Clinical, Ethical, And Legal Implic...Abortion Decisions And The Duty To Screen Clinical, Ethical, And Legal Implic...
Abortion Decisions And The Duty To Screen Clinical, Ethical, And Legal Implic...legalservices
 
How To Configure Email Enabled Lists In Moss2007 Rtm Using Exchange 2003
How To Configure Email Enabled Lists In Moss2007 Rtm Using Exchange 2003How To Configure Email Enabled Lists In Moss2007 Rtm Using Exchange 2003
How To Configure Email Enabled Lists In Moss2007 Rtm Using Exchange 2003LiquidHub
 
Office Share Point Server2007 Functionaland Architectural Overview
Office Share Point Server2007 Functionaland Architectural OverviewOffice Share Point Server2007 Functionaland Architectural Overview
Office Share Point Server2007 Functionaland Architectural OverviewLiquidHub
 
Kickstart Tutorial Xml
Kickstart Tutorial XmlKickstart Tutorial Xml
Kickstart Tutorial XmlLiquidHub
 

Viewers also liked (6)

Page design 2
Page design 2Page design 2
Page design 2
 
Diables
DiablesDiables
Diables
 
Abortion Decisions And The Duty To Screen Clinical, Ethical, And Legal Implic...
Abortion Decisions And The Duty To Screen Clinical, Ethical, And Legal Implic...Abortion Decisions And The Duty To Screen Clinical, Ethical, And Legal Implic...
Abortion Decisions And The Duty To Screen Clinical, Ethical, And Legal Implic...
 
How To Configure Email Enabled Lists In Moss2007 Rtm Using Exchange 2003
How To Configure Email Enabled Lists In Moss2007 Rtm Using Exchange 2003How To Configure Email Enabled Lists In Moss2007 Rtm Using Exchange 2003
How To Configure Email Enabled Lists In Moss2007 Rtm Using Exchange 2003
 
Office Share Point Server2007 Functionaland Architectural Overview
Office Share Point Server2007 Functionaland Architectural OverviewOffice Share Point Server2007 Functionaland Architectural Overview
Office Share Point Server2007 Functionaland Architectural Overview
 
Kickstart Tutorial Xml
Kickstart Tutorial XmlKickstart Tutorial Xml
Kickstart Tutorial Xml
 

Similar to Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organization Representative

Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperContent Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperJohn Felahi
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
s0P5a41b
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
IOSR Journals
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning   sstose
 
Use text mining method to support criminal case judgment
Use text mining method to support criminal case judgmentUse text mining method to support criminal case judgment
Use text mining method to support criminal case judgment
ZhongLI28
 
Dictionary based concept mining an application for turkish
Dictionary based concept mining  an application for turkishDictionary based concept mining  an application for turkish
Dictionary based concept mining an application for turkish
csandit
 
Chapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and RetrievalChapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and Retrieval
captainmactavish1996
 
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
Sarah Morrow
 
A Simple Information Retrieval Technique
A Simple Information Retrieval TechniqueA Simple Information Retrieval Technique
A Simple Information Retrieval Technique
idescitation
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
El Habib NFAOUI
 
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISHDICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
cscpconf
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 
Legal Technology 2011 and the Paralegal
Legal Technology 2011 and the ParalegalLegal Technology 2011 and the Paralegal
Legal Technology 2011 and the ParalegalAubrey Owens
 
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
kevig
 
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
ijnlc
 
Semantic tagging for documents using 'short text' information
Semantic tagging for documents using 'short text' informationSemantic tagging for documents using 'short text' information
Semantic tagging for documents using 'short text' information
csandit
 
A Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesA Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital Libraries
Fulvio Rotella
 

Similar to Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organization Representative (20)

Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperContent Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  
 
Use text mining method to support criminal case judgment
Use text mining method to support criminal case judgmentUse text mining method to support criminal case judgment
Use text mining method to support criminal case judgment
 
Dictionary based concept mining an application for turkish
Dictionary based concept mining  an application for turkishDictionary based concept mining  an application for turkish
Dictionary based concept mining an application for turkish
 
Chapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and RetrievalChapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and Retrieval
 
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
 
A Simple Information Retrieval Technique
A Simple Information Retrieval TechniqueA Simple Information Retrieval Technique
A Simple Information Retrieval Technique
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISHDICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
Legal Technology 2011 and the Paralegal
Legal Technology 2011 and the ParalegalLegal Technology 2011 and the Paralegal
Legal Technology 2011 and the Paralegal
 
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
 
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
 
Semantic tagging for documents using 'short text' information
Semantic tagging for documents using 'short text' informationSemantic tagging for documents using 'short text' information
Semantic tagging for documents using 'short text' information
 
A Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesA Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital Libraries
 

More from legalservices

Notice Grant And Cooperative Agreement Awards Civil Legal Services To Eligibl...
Notice Grant And Cooperative Agreement Awards Civil Legal Services To Eligibl...Notice Grant And Cooperative Agreement Awards Civil Legal Services To Eligibl...
Notice Grant And Cooperative Agreement Awards Civil Legal Services To Eligibl...legalservices
 
Information Inflation Can The Legal System Adapt
Information Inflation Can The Legal System AdaptInformation Inflation Can The Legal System Adapt
Information Inflation Can The Legal System Adaptlegalservices
 
Copy Of Acme Legal Contract
Copy Of Acme Legal ContractCopy Of Acme Legal Contract
Copy Of Acme Legal Contractlegalservices
 
Adams The Legalized Crime Of Banking And A Constitutional Remedy (1958)
Adams   The Legalized Crime Of Banking And A Constitutional Remedy (1958)Adams   The Legalized Crime Of Banking And A Constitutional Remedy (1958)
Adams The Legalized Crime Of Banking And A Constitutional Remedy (1958)legalservices
 
Activating Legal Protections For Archaeological Remains
Activating Legal Protections For Archaeological RemainsActivating Legal Protections For Archaeological Remains
Activating Legal Protections For Archaeological Remainslegalservices
 
537 Legal Questions About The Income Tax Code
537 Legal Questions About The Income Tax Code537 Legal Questions About The Income Tax Code
537 Legal Questions About The Income Tax Codelegalservices
 
Scientific And Legal Perspectives On Science Generated For Regulatory Activities
Scientific And Legal Perspectives On Science Generated For Regulatory ActivitiesScientific And Legal Perspectives On Science Generated For Regulatory Activities
Scientific And Legal Perspectives On Science Generated For Regulatory Activitieslegalservices
 
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...legalservices
 
Scientific And Legal Perspectives On Science Generated For Regulatory Activities
Scientific And Legal Perspectives On Science Generated For Regulatory ActivitiesScientific And Legal Perspectives On Science Generated For Regulatory Activities
Scientific And Legal Perspectives On Science Generated For Regulatory Activitieslegalservices
 
Submission By Vusi Pikolis Legal Team To The Office Of The President
Submission By Vusi Pikolis Legal Team To The Office Of The PresidentSubmission By Vusi Pikolis Legal Team To The Office Of The President
Submission By Vusi Pikolis Legal Team To The Office Of The Presidentlegalservices
 
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...legalservices
 
Rule Legal Assistance Eligibility; Maximum Income Guidelines
Rule Legal Assistance Eligibility; Maximum Income GuidelinesRule Legal Assistance Eligibility; Maximum Income Guidelines
Rule Legal Assistance Eligibility; Maximum Income Guidelineslegalservices
 
Rule Aliens; Legal Assistance Restrictions Legal Assistance To Citizens Of Mi...
Rule Aliens; Legal Assistance Restrictions Legal Assistance To Citizens Of Mi...Rule Aliens; Legal Assistance Restrictions Legal Assistance To Citizens Of Mi...
Rule Aliens; Legal Assistance Restrictions Legal Assistance To Citizens Of Mi...legalservices
 
PublicacióN De Formatos Legales
PublicacióN De Formatos LegalesPublicacióN De Formatos Legales
PublicacióN De Formatos Legaleslegalservices
 
Proposed Rule Legal Assistance Eligibility; Maximum Income Guidelines
Proposed Rule Legal Assistance Eligibility; Maximum Income GuidelinesProposed Rule Legal Assistance Eligibility; Maximum Income Guidelines
Proposed Rule Legal Assistance Eligibility; Maximum Income Guidelineslegalservices
 
Proposed Rule Legal And Related Services Intercountry Adoption; Hague Convent...
Proposed Rule Legal And Related Services Intercountry Adoption; Hague Convent...Proposed Rule Legal And Related Services Intercountry Adoption; Hague Convent...
Proposed Rule Legal And Related Services Intercountry Adoption; Hague Convent...legalservices
 
Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...
Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...
Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...legalservices
 
Podcasting Legal Guide For Canada
Podcasting Legal Guide For CanadaPodcasting Legal Guide For Canada
Podcasting Legal Guide For Canadalegalservices
 
Notice Grants And Cooperative Agreements; Availability, Etc. Civil Legal Serv...
Notice Grants And Cooperative Agreements; Availability, Etc. Civil Legal Serv...Notice Grants And Cooperative Agreements; Availability, Etc. Civil Legal Serv...
Notice Grants And Cooperative Agreements; Availability, Etc. Civil Legal Serv...legalservices
 
Legal Pluralism, Alemayehu Fentaw
Legal Pluralism, Alemayehu FentawLegal Pluralism, Alemayehu Fentaw
Legal Pluralism, Alemayehu Fentawlegalservices
 

More from legalservices (20)

Notice Grant And Cooperative Agreement Awards Civil Legal Services To Eligibl...
Notice Grant And Cooperative Agreement Awards Civil Legal Services To Eligibl...Notice Grant And Cooperative Agreement Awards Civil Legal Services To Eligibl...
Notice Grant And Cooperative Agreement Awards Civil Legal Services To Eligibl...
 
Information Inflation Can The Legal System Adapt
Information Inflation Can The Legal System AdaptInformation Inflation Can The Legal System Adapt
Information Inflation Can The Legal System Adapt
 
Copy Of Acme Legal Contract
Copy Of Acme Legal ContractCopy Of Acme Legal Contract
Copy Of Acme Legal Contract
 
Adams The Legalized Crime Of Banking And A Constitutional Remedy (1958)
Adams   The Legalized Crime Of Banking And A Constitutional Remedy (1958)Adams   The Legalized Crime Of Banking And A Constitutional Remedy (1958)
Adams The Legalized Crime Of Banking And A Constitutional Remedy (1958)
 
Activating Legal Protections For Archaeological Remains
Activating Legal Protections For Archaeological RemainsActivating Legal Protections For Archaeological Remains
Activating Legal Protections For Archaeological Remains
 
537 Legal Questions About The Income Tax Code
537 Legal Questions About The Income Tax Code537 Legal Questions About The Income Tax Code
537 Legal Questions About The Income Tax Code
 
Scientific And Legal Perspectives On Science Generated For Regulatory Activities
Scientific And Legal Perspectives On Science Generated For Regulatory ActivitiesScientific And Legal Perspectives On Science Generated For Regulatory Activities
Scientific And Legal Perspectives On Science Generated For Regulatory Activities
 
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
 
Scientific And Legal Perspectives On Science Generated For Regulatory Activities
Scientific And Legal Perspectives On Science Generated For Regulatory ActivitiesScientific And Legal Perspectives On Science Generated For Regulatory Activities
Scientific And Legal Perspectives On Science Generated For Regulatory Activities
 
Submission By Vusi Pikolis Legal Team To The Office Of The President
Submission By Vusi Pikolis Legal Team To The Office Of The PresidentSubmission By Vusi Pikolis Legal Team To The Office Of The President
Submission By Vusi Pikolis Legal Team To The Office Of The President
 
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organi...
 
Rule Legal Assistance Eligibility; Maximum Income Guidelines
Rule Legal Assistance Eligibility; Maximum Income GuidelinesRule Legal Assistance Eligibility; Maximum Income Guidelines
Rule Legal Assistance Eligibility; Maximum Income Guidelines
 
Rule Aliens; Legal Assistance Restrictions Legal Assistance To Citizens Of Mi...
Rule Aliens; Legal Assistance Restrictions Legal Assistance To Citizens Of Mi...Rule Aliens; Legal Assistance Restrictions Legal Assistance To Citizens Of Mi...
Rule Aliens; Legal Assistance Restrictions Legal Assistance To Citizens Of Mi...
 
PublicacióN De Formatos Legales
PublicacióN De Formatos LegalesPublicacióN De Formatos Legales
PublicacióN De Formatos Legales
 
Proposed Rule Legal Assistance Eligibility; Maximum Income Guidelines
Proposed Rule Legal Assistance Eligibility; Maximum Income GuidelinesProposed Rule Legal Assistance Eligibility; Maximum Income Guidelines
Proposed Rule Legal Assistance Eligibility; Maximum Income Guidelines
 
Proposed Rule Legal And Related Services Intercountry Adoption; Hague Convent...
Proposed Rule Legal And Related Services Intercountry Adoption; Hague Convent...Proposed Rule Legal And Related Services Intercountry Adoption; Hague Convent...
Proposed Rule Legal And Related Services Intercountry Adoption; Hague Convent...
 
Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...
Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...
Proposed Rule Aliens; Legal Assistance Restrictions Negotiated Rulemaking Wor...
 
Podcasting Legal Guide For Canada
Podcasting Legal Guide For CanadaPodcasting Legal Guide For Canada
Podcasting Legal Guide For Canada
 
Notice Grants And Cooperative Agreements; Availability, Etc. Civil Legal Serv...
Notice Grants And Cooperative Agreements; Availability, Etc. Civil Legal Serv...Notice Grants And Cooperative Agreements; Availability, Etc. Civil Legal Serv...
Notice Grants And Cooperative Agreements; Availability, Etc. Civil Legal Serv...
 
Legal Pluralism, Alemayehu Fentaw
Legal Pluralism, Alemayehu FentawLegal Pluralism, Alemayehu Fentaw
Legal Pluralism, Alemayehu Fentaw
 

Recently uploaded

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 

Recently uploaded (20)

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 

Rule Legal Services, General Counsel, And Miscellaneous Claims Service Organization Representative

  • 1. Enhancing Legal Discovery with Linguistic Processing 1 D.G. Bobrow, T.H. King, and L.C. Lee PARC Enhancing Legal Discovery with Linguistic Processing Daniel G. Bobrow, Tracy H. King, and Lawrence C. Lee Palo Alto Research Center Inc. www.parc.com/nlp Introduction The U.S. Federal Rules of Civil Procedure have greatly increased the importance of understanding the content within very large collections of electronically stored information. Traditional search methods using full-text indexing and Boolean keyword queries are often inadequate for e-discovery. They typically return too many results (low precision) or require tightly defined queries that miss critical documents (low recall.) Linguistic processing offers a solution to increase both the precision and recall of e-discovery applications. We discuss four issues in legal discovery that can be enhanced with linguistic processing: improving recall for characterization, improving precision for search, protection of sensitive information, and scalability. Characterization: Recall Especially in the initial stages of trial preparation, attorneys need to be able to retrieve all of the information in a collection that is relevant to some characterization of interest. These characterizations depend on the legal strategy and so need to be able to be quickly and flexibly formulated. The most natural way to describe such content is in natural language and not in heavily formalized regular expression languages. Linguistic processing on the query can help generate rules in a higher level language much closer to natural language. Two basic linguistic tools to aid in query generation for characterization are morphological analysis and ontological information. For example, morphological analysis of the term 'buy' in a query will produce 'buy', 'buying', and 'bought'. The more abbreviated and elliptical texts found in email documents can be treated similarly. For example, common email abbreviations like 'mtg' can be run through a type of morphological analysis to match against 'mtg', 'meeting', and 'meetings'. Using a disjunction of all these forms in the search increases recall, which returns both more relevant documents and more passages with examples from which to produce novel queries. Ontologies, both domain specific and general, automatically produce synonyms ('buy'='purchase') and hypernyms (a boy is type of child is a type of human) which can be used to expand the sample query into alternatives, again allowing for greater recall at the initial stages of the characterization task. During this initial step, where recall is important and the entire information collection is being culled, linguistic processing is only being done on the queries, while the search over the information is done with more standard search techniques. This allows massive information collections to be quickly processed more rapidly and thoroughly. Search: Precision An important aspect of legal discovery is finding information that answer specific questions or that say specific things. By automatically processing the texts into more normalized, deep semantic structures and then indexing these structures into a large database optimized for May 2007
  • 2. Enhancing Legal Discovery with Linguistic Processing 2 D.G. Bobrow, T.H. King, and L.C. Lee PARC semantic search, queries over the information collection can be made in natural language. These linguistic structures normalize away from the vagaries of natural language sentences, encoding the underlying meaning. At the simplest level, surface forms of words are stemmed to their dictionary entry and synonyms and hypernyms are inserted. However, the linguistic processing can go much deeper, normalizing different syntactic constructions so that expressions which mean the same thing have the same linguistic structure. As a simple example, 'Mr. Smith bought 4000 shares of common stock.' and '4000 shares of common stock were bought by Mr. Smith' will be mapped to the same structure and indexed identically. Thus the creation of this semantically based index of information stores a normalized but highly detailed version of the content in the information and includes links back to the original passages in the information. The queries against the information collection are similarly automatically processed into semantic representations at query time, and these semantic representations are used to query the database for relevant documents. Unlike more standard search techniques, using the deeper semantic structures allows for greater precision and hence fewer irrelevant documents to review. The linguistic structures encode the relations between entities and actions (e.g., who did what when) so that only documents describing entities in the desired relations are retrieved. For example, standard search techniques would retrieve both 'X hit Y' and 'Y hit X' from a search on the entities X and Y and the 'hit' relation since all of the relevant items are mentioned. However, when searching for evidence in a massive information collection, it is important to return only the text passages which refer to the intended relationship among the entities. Redaction E-discovery increases in complexity when issues of confidentiality are considered. Over the past several years we have been researching intelligent document security solutions, initially focusing on redaction. This line of research involves building better tools to detect sensitive material in documents, especially entities and sensitive relations between entities, determining whether inferences can be made even when sensitive passages have been redacted, and providing efficient encryption techniques to allow content-driven access control. The detection of sensitive material works on the same underlying technology described above for enhancing recall and precision. The use of stemming, synonyms and hypernyms, and automatic alias production increase recall, allowing for a single search to retrieve entities in many surface forms. The structural normalization provided by the deep processing similarly allows for better relation and context detection. As an additional part of the content discovery for redaction, our current research examines ways to allow for collaborative work on the same document collection so that knowledge discovery workers can benefit from each other's work and so that experts can help hone the skills of novices. Another component of the project involves using the Web and other large information collections to determine whether the identity of entities can be detected even when they have been redacted. For example, removing someone's name but leaving their birthdate, sex, and zip code may uniquely identify them, thereby suggesting that further material needs to be redacted. May 2007
  • 3. Enhancing Legal Discovery with Linguistic Processing 3 D.G. Bobrow, T.H. King, and L.C. Lee PARC Once the sensitive text passages have been identified, we provide tools for encrypting document passages and assigning keys so that different users can have access to different types of redacted material. This makes it possible for the document to be viewed in different ways by different people: some may have access to the whole document, some may not be able to see anything related to entity X, and some may only be able to see publicly available material. This encryption capability can either be used actively on the electronic versions of the documents or can be used to prepare specially redacted versions for printing and shipping to different parties. Scalability As the average number of documents involved in each legal discovery process increases, scalability is an important issue for any technology used in the process. The linguistic processing that we advocate here is more computationally intensive than shallower methods such as keyword search or basic regular expression pattern matching over plain text. To surmount this issue, we use faster processes to go from, for example, 100 million documents to a few million documents; these faster processes may be facilitated by some linguistic processing, e.g. stemming of words so that more matches on basic keyword searches are found. Once the original information collection is reduced to a more manageable load, then the slower but more accurate linguistically-enhanced processes can be used to prune to a few hundred thousand. We have evidence that this deeper linguistic processing will scale to hundreds of thousands of documents, with processing time approaching one second per sentence. Once this initial linguistic processing is done, then the resulting indexed documents can be used repeatedly in the applications described above, thereby creating a resource to be shared across the discovery processes. Conclusion There are a number of benefits from using linguistic processing in e-discovery applications. Linguistic processing can provide fast and flexible characterization of large information collections in pre-trial preparation, as well as enable high precision search and confidential information access in discovery. While linguistic processing is more computationally intensive than keyword search, the technology does scale well to large information collections and can also be used in combination with standard search approaches to improve the management and discovery of electronically stored information. May 2007