SlideShare a Scribd company logo
Adventures in Real-World
Data Science
Automated Patent Classification
Rollie D. Goodman
TechLink Center
• Established as a technology transfer center in 1996
• Facilitates ~60% of DoD’s license agreements with industry
• Helps small companies secure R&D contracts
• Established as a technology transfer center in 1996
• Facilitates ~60% of DoD’s license agreements with industry
• Helps small companies secure R&D contracts
• Established as a technology transfer center in 1996
• Facilitates ~60% of DoD’s license agreements with industry
• Helps small companies secure R&D contracts
Adventures in Real-World Data Science
training set: ~9,000 labeled patents
Patent Data
Document Number US 9,832,220
Assignee US Air Force
Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin
Attorneys Mancini, Joseph A.
Title Security Method for Allocation of Virtual Machines in a Cloud Computing
Network
Abstract A method for enhancing security in a cloud computing system by allocating
virtual machines over hypervisors, in a cloud computing environment, in a
security-aware fashion. The invention solves the cloud user risk problem by
inducing a state such that, unless there is a change in the conditions under
which the present invention operates, the cloud users do not gain by deviating
from the allocation induced by the present invention. The invention’s methods
include grouping virtual machines of similar loss potential on the same
hypervisor, creating hypervisor environments of similar total loss, and
implementing a risk tiered system of hypervisors based on expense factors.
Publication Date 11-28-2017
CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
Patent Data
Document Number US 9,832,220
Assignee US Air Force
Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin
Attorneys Mancini, Joseph A.
Title Security Method for Allocation of Virtual Machines in a Cloud Computing
Network
Abstract A method for enhancing security in a cloud computing system by allocating
virtual machines over hypervisors, in a cloud computing environment, in a
security-aware fashion. The invention solves the cloud user risk problem by
inducing a state such that, unless there is a change in the conditions under
which the present invention operates, the cloud users do not gain by deviating
from the allocation induced by the present invention. The invention’s methods
include grouping virtual machines of similar loss potential on the same
hypervisor, creating hypervisor environments of similar total loss, and
implementing a risk tiered system of hypervisors based on expense factors.
Publication Date 11-28-2017
CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
Patent Data
Document Number US 9,832,220
Assignee US Air Force
Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin
Attorneys Mancini, Joseph A.
Title Security Method for Allocation of Virtual Machines in a Cloud Computing
Network
Abstract A method for enhancing security in a cloud computing system by allocating
virtual machines over hypervisors, in a cloud computing environment, in a
security-aware fashion. The invention solves the cloud user risk problem by
inducing a state such that, unless there is a change in the conditions under
which the present invention operates, the cloud users do not gain by deviating
from the allocation induced by the present invention. The invention’s methods
include grouping virtual machines of similar loss potential on the same
hypervisor, creating hypervisor environments of similar total loss, and
implementing a risk tiered system of hypervisors based on expense factors.
Publication Date 11-28-2017
CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
A01B33/028
A: Human necessities
A01: Agriculture
A01B: Machines for soil working in agriculture or industry
A01B33: Tilling implements with rotary driven tools
A01B33/02: …with tools on horizontal shaft transverse to direction of travel
A01B33/028: …of the walk-behind type
CPC Terms
A01B33/028
A: Human necessities
A01: Agriculture
A01B: Machines for soil working in agriculture or industry
A01B33: Tilling implements with rotary driven tools
A01B33/02: …with tools on horizontal shaft transverse to direction of travel
A01B33/028: …of the walk-behind type
CPC Terms
vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045}
instance: {A61K038/00, A61K038/005, A61K039/00}
CPC Vectorization
vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045}
instance: {A61K038/00, A61K038/005, A61K039/00}
{A61K038, A61K038, A61K039}
CPC Vectorization
vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045}
instance: {A61K038/00, A61K038/005, A61K039/00}
{A61K038, A61K038, A61K039}
[ 0, 2, 1, 0, 0 ]
CPC Vectorization
Support Vector Machines
a
x
y
Support Vector Machines
a
b
x
y
Support Vector Machines
a
b
x
x
y
y
Support Vector Machines
a
b
x
x
x
y
y
z
Support Vector Machines
a
b
x
x
x
y
y
z
Cross-Validation
experiment 1
experiment 2
experiment 3
experiment 4
experiment 5
fold 1 fold 2 fold 3 fold 4 fold 5
overall accuracy
randomized training data
Ensemble Learners
• Train and combine multiple learners to solve a single problem
• also: “multiple classifier systems”
• Often outperform single classifiers
• e.g. Netflix Competition, KDD 2009, and Kaggle
Text Processing
Text Processing
• Stopwords: remove words that appear frequently but do not
give any information about content
• a, an, and, for, from, is, it, the, to, with…
Text Processing
• Stopwords: remove words that appear frequently but do not
give any information about content
• a, an, and, for, from, is, it, the, to, with…
• Stemming: reduce derived words to root (“stemmed”) form
• different, differently, differ, differing, differed → differ
Text Processing
• Stopwords: remove words that appear frequently but do not
give any information about content
• a, an, and, for, from, is, it, the, to, with…
• Stemming: reduce derived words to root (“stemmed”) form
• different, differently, differ, differing, differed → differ
• Weighting: term frequency – inverse document frequency
!"#$"% = '()* +)(,-(./0% ∗ log
.-*5() 6+ 76/-*(.'8
.-*5() 6+ 76/-*(.'8 9ℎ()( '()* ; 6//-)8
the results are computed from the resulting generated text
Text Processing
the results are computed from the resulting generated text
results computed resulting generated text
Text Processing
the results are computed from the resulting generated text
results computed resulting generated text
result comput result gener text
Text Processing
the results are computed from the resulting generated text
results computed resulting generated text
result comput result gener text
3.03, 1.24, 0.68, 4.79. . .
Text Processing
CPC classifier
(SVM)
text classifier
(SVM)
?
“The results are computed from the
resulting generated text…”
{A61K036, A61K038, A61K039,
A61K041, A61K045}
final classification
class 1 class 2[class 1, class 2]
Decision Trees
outlook
humidity wind
N Y
Y
Y N
high low
sunny
overcast
rainy
high low
outlook: {sunny, overcast, rainy}
humidity: {high, low}
wind: {high, low}
hiking: {Yes, No}
CPC classifier
(SVM)
text classifier
(SVM)
decision tree
“The results are computed from the
resulting generated text…”
{A61K036, A61K038, A61K039,
A61K041, A61K045}
final classification
class 1 class 2[class 1, class 2]
87% 76%
98%
Adventures in Real-World Data Science
Adventures in Real-World Data Science
Adventures in Real-World Data Science
Adventures in Real-World Data Science
Questions?

More Related Content

Similar to Adventures in Real-World Data Science

The Optimizing Information Leakage in Multicloud Storage Services
The Optimizing Information Leakage in Multicloud Storage ServicesThe Optimizing Information Leakage in Multicloud Storage Services
The Optimizing Information Leakage in Multicloud Storage Services
ijtsrd
 
Security TechTalk | AWS Public Sector Summit 2016
Security TechTalk | AWS Public Sector Summit 2016Security TechTalk | AWS Public Sector Summit 2016
Security TechTalk | AWS Public Sector Summit 2016
Amazon Web Services
 
Public Verifiability in Cloud Computing Using Signcryption Based on Elliptic ...
Public Verifiability in Cloud Computing Using Signcryption Based on Elliptic ...Public Verifiability in Cloud Computing Using Signcryption Based on Elliptic ...
Public Verifiability in Cloud Computing Using Signcryption Based on Elliptic ...
IOSR Journals
 
F01113945
F01113945F01113945
F01113945
IOSR Journals
 
Recording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesRecording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid Services
Martin Szomszor
 
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Jisc
 
International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)
IJNSA Journal
 
Azure Overview Arc
Azure Overview ArcAzure Overview Arc
Azure Overview Arc
rajramab
 
Cloud computing lab open stack
Cloud computing lab open stackCloud computing lab open stack
Cloud computing lab open stack
arunuiet
 
Elastic Software Infrastructure to Support the Industrial Internet
Elastic Software Infrastructure to Support the Industrial InternetElastic Software Infrastructure to Support the Industrial Internet
Elastic Software Infrastructure to Support the Industrial Internet
Real-Time Innovations (RTI)
 
Everything you want to know about microservices
Everything you want to know about microservicesEverything you want to know about microservices
Everything you want to know about microservices
Youness Lasmak
 
Webinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows AzureWebinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows Azure
Common Sense
 
Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...
Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...
Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...
MSDEVMTL
 
Network Security and Access Control within AWS
Network Security and Access Control within AWSNetwork Security and Access Control within AWS
Network Security and Access Control within AWS
Amazon Web Services
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
Animesh Chaturvedi
 
AZUG.BE - Azure User Group Belgium - First public meeting
AZUG.BE - Azure User Group Belgium - First public meetingAZUG.BE - Azure User Group Belgium - First public meeting
AZUG.BE - Azure User Group Belgium - First public meeting
Maarten Balliauw
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
rustd
 
Cloud Computing
Cloud Computing Cloud Computing
Cloud Computing
Delight List
 
Virtualization on embedded boards
Virtualization on embedded boardsVirtualization on embedded boards
Virtualization on embedded boards
Mohamed Ramadan
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 

Similar to Adventures in Real-World Data Science (20)

The Optimizing Information Leakage in Multicloud Storage Services
The Optimizing Information Leakage in Multicloud Storage ServicesThe Optimizing Information Leakage in Multicloud Storage Services
The Optimizing Information Leakage in Multicloud Storage Services
 
Security TechTalk | AWS Public Sector Summit 2016
Security TechTalk | AWS Public Sector Summit 2016Security TechTalk | AWS Public Sector Summit 2016
Security TechTalk | AWS Public Sector Summit 2016
 
Public Verifiability in Cloud Computing Using Signcryption Based on Elliptic ...
Public Verifiability in Cloud Computing Using Signcryption Based on Elliptic ...Public Verifiability in Cloud Computing Using Signcryption Based on Elliptic ...
Public Verifiability in Cloud Computing Using Signcryption Based on Elliptic ...
 
F01113945
F01113945F01113945
F01113945
 
Recording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesRecording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid Services
 
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
 
International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)
 
Azure Overview Arc
Azure Overview ArcAzure Overview Arc
Azure Overview Arc
 
Cloud computing lab open stack
Cloud computing lab open stackCloud computing lab open stack
Cloud computing lab open stack
 
Elastic Software Infrastructure to Support the Industrial Internet
Elastic Software Infrastructure to Support the Industrial InternetElastic Software Infrastructure to Support the Industrial Internet
Elastic Software Infrastructure to Support the Industrial Internet
 
Everything you want to know about microservices
Everything you want to know about microservicesEverything you want to know about microservices
Everything you want to know about microservices
 
Webinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows AzureWebinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows Azure
 
Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...
Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...
Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...
 
Network Security and Access Control within AWS
Network Security and Access Control within AWSNetwork Security and Access Control within AWS
Network Security and Access Control within AWS
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
AZUG.BE - Azure User Group Belgium - First public meeting
AZUG.BE - Azure User Group Belgium - First public meetingAZUG.BE - Azure User Group Belgium - First public meeting
AZUG.BE - Azure User Group Belgium - First public meeting
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Cloud Computing
Cloud Computing Cloud Computing
Cloud Computing
 
Virtualization on embedded boards
Virtualization on embedded boardsVirtualization on embedded boards
Virtualization on embedded boards
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 

More from roblund

2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)
roblund
 
ES6 Primer
ES6 PrimerES6 Primer
ES6 Primer
roblund
 
Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016
Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016
Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016
roblund
 
Will Price - Venture Capital in Montana - BSDC 2016
Will Price - Venture Capital in Montana - BSDC 2016Will Price - Venture Capital in Montana - BSDC 2016
Will Price - Venture Capital in Montana - BSDC 2016
roblund
 
Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016
Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016
Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016
roblund
 
Chris Omland - AWS Code Deploy - BSDC 2016
Chris Omland - AWS Code Deploy - BSDC 2016Chris Omland - AWS Code Deploy - BSDC 2016
Chris Omland - AWS Code Deploy - BSDC 2016
roblund
 
Josef Verbanac - Voice is (a) Best Practice
Josef Verbanac - Voice is (a) Best PracticeJosef Verbanac - Voice is (a) Best Practice
Josef Verbanac - Voice is (a) Best Practice
roblund
 
Emergence Of Code Schools
Emergence Of Code SchoolsEmergence Of Code Schools
Emergence Of Code Schools
roblund
 
Nora McDougall-Collins - I Can Do That
Nora McDougall-Collins - I Can Do ThatNora McDougall-Collins - I Can Do That
Nora McDougall-Collins - I Can Do That
roblund
 
Better tests automagically (big sky dev con 2015)
Better tests automagically (big sky dev con 2015)Better tests automagically (big sky dev con 2015)
Better tests automagically (big sky dev con 2015)
roblund
 
Ben Werner - Mountains and startups
Ben Werner - Mountains and startupsBen Werner - Mountains and startups
Ben Werner - Mountains and startups
roblund
 
Jason Moore - Interaction design in enterprise teams
Jason Moore - Interaction design in enterprise teamsJason Moore - Interaction design in enterprise teams
Jason Moore - Interaction design in enterprise teams
roblund
 

More from roblund (12)

2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)
 
ES6 Primer
ES6 PrimerES6 Primer
ES6 Primer
 
Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016
Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016
Jason Moore - Why releasing 50 features are less than 1 solution - BSDC 2016
 
Will Price - Venture Capital in Montana - BSDC 2016
Will Price - Venture Capital in Montana - BSDC 2016Will Price - Venture Capital in Montana - BSDC 2016
Will Price - Venture Capital in Montana - BSDC 2016
 
Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016
Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016
Pete Sveen - How to Build, Grow, and Monetize Your Online Platform - BSDC 2016
 
Chris Omland - AWS Code Deploy - BSDC 2016
Chris Omland - AWS Code Deploy - BSDC 2016Chris Omland - AWS Code Deploy - BSDC 2016
Chris Omland - AWS Code Deploy - BSDC 2016
 
Josef Verbanac - Voice is (a) Best Practice
Josef Verbanac - Voice is (a) Best PracticeJosef Verbanac - Voice is (a) Best Practice
Josef Verbanac - Voice is (a) Best Practice
 
Emergence Of Code Schools
Emergence Of Code SchoolsEmergence Of Code Schools
Emergence Of Code Schools
 
Nora McDougall-Collins - I Can Do That
Nora McDougall-Collins - I Can Do ThatNora McDougall-Collins - I Can Do That
Nora McDougall-Collins - I Can Do That
 
Better tests automagically (big sky dev con 2015)
Better tests automagically (big sky dev con 2015)Better tests automagically (big sky dev con 2015)
Better tests automagically (big sky dev con 2015)
 
Ben Werner - Mountains and startups
Ben Werner - Mountains and startupsBen Werner - Mountains and startups
Ben Werner - Mountains and startups
 
Jason Moore - Interaction design in enterprise teams
Jason Moore - Interaction design in enterprise teamsJason Moore - Interaction design in enterprise teams
Jason Moore - Interaction design in enterprise teams
 

Recently uploaded

Sustainable construction is the use of renewable and recyclable materials in ...
Sustainable construction is the use of renewable and recyclable materials in ...Sustainable construction is the use of renewable and recyclable materials in ...
Sustainable construction is the use of renewable and recyclable materials in ...
RohitGhulanavar2
 
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele..."Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
nanduchaihan9
 
ISO 9001 - 2015 Quality Management Awareness.pdf
ISO 9001 - 2015 Quality Management Awareness.pdfISO 9001 - 2015 Quality Management Awareness.pdf
ISO 9001 - 2015 Quality Management Awareness.pdf
InfoDqms
 
Probability and Statistics by sheldon ross (8th edition).pdf
Probability and Statistics by sheldon ross (8th edition).pdfProbability and Statistics by sheldon ross (8th edition).pdf
Probability and Statistics by sheldon ross (8th edition).pdf
utkarshakusnake
 
ANATOMY OF SOA - Thomas Erl - Service Oriented Architecture
ANATOMY OF SOA - Thomas Erl - Service Oriented ArchitectureANATOMY OF SOA - Thomas Erl - Service Oriented Architecture
ANATOMY OF SOA - Thomas Erl - Service Oriented Architecture
Divya Rajasekar
 
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Robert Pyke
 
Chapter 1 Introduction to Software Engineering and Process Models.pdf
Chapter 1 Introduction to Software Engineering and Process Models.pdfChapter 1 Introduction to Software Engineering and Process Models.pdf
Chapter 1 Introduction to Software Engineering and Process Models.pdf
MeghaGupta952452
 
AFCAT STATIC Genral knowledge important CAPSULE.pdf
AFCAT STATIC Genral knowledge important CAPSULE.pdfAFCAT STATIC Genral knowledge important CAPSULE.pdf
AFCAT STATIC Genral knowledge important CAPSULE.pdf
vibhapatil140
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
rawankhanlove256
 
the potential for the development of autonomous aircraft
the potential for the development of autonomous aircraftthe potential for the development of autonomous aircraft
the potential for the development of autonomous aircraft
huseindihon
 
Cisco Intersight Technical OverView.pptx
Cisco Intersight Technical OverView.pptxCisco Intersight Technical OverView.pptx
Cisco Intersight Technical OverView.pptx
Duy Nguyen
 
Presentation slide on DESIGN AND FABRICATION OF MOBILE CONTROLLED DRAINAGE.pptx
Presentation slide on DESIGN AND FABRICATION OF MOBILE CONTROLLED DRAINAGE.pptxPresentation slide on DESIGN AND FABRICATION OF MOBILE CONTROLLED DRAINAGE.pptx
Presentation slide on DESIGN AND FABRICATION OF MOBILE CONTROLLED DRAINAGE.pptx
Er. Kushal Ghimire
 
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Robert Pyke
 
charting the development of the autonomous train
charting the development of the autonomous traincharting the development of the autonomous train
charting the development of the autonomous train
huseindihon
 
OME754 – INDUSTRIAL SAFETY - unit notes.pptx
OME754 – INDUSTRIAL SAFETY - unit notes.pptxOME754 – INDUSTRIAL SAFETY - unit notes.pptx
OME754 – INDUSTRIAL SAFETY - unit notes.pptx
shanmugamram247
 
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
amzhoxvzidbke
 
Adv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdfAdv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdf
T.D. Shashikala
 
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagneEAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
idelewebmestre
 
How to Formulate A Good Research Question
How to Formulate A  Good Research QuestionHow to Formulate A  Good Research Question
How to Formulate A Good Research Question
rkpv2002
 
Generative AI and Large Language Models (LLMs)
Generative AI and Large Language Models (LLMs)Generative AI and Large Language Models (LLMs)
Generative AI and Large Language Models (LLMs)
rkpv2002
 

Recently uploaded (20)

Sustainable construction is the use of renewable and recyclable materials in ...
Sustainable construction is the use of renewable and recyclable materials in ...Sustainable construction is the use of renewable and recyclable materials in ...
Sustainable construction is the use of renewable and recyclable materials in ...
 
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele..."Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
 
ISO 9001 - 2015 Quality Management Awareness.pdf
ISO 9001 - 2015 Quality Management Awareness.pdfISO 9001 - 2015 Quality Management Awareness.pdf
ISO 9001 - 2015 Quality Management Awareness.pdf
 
Probability and Statistics by sheldon ross (8th edition).pdf
Probability and Statistics by sheldon ross (8th edition).pdfProbability and Statistics by sheldon ross (8th edition).pdf
Probability and Statistics by sheldon ross (8th edition).pdf
 
ANATOMY OF SOA - Thomas Erl - Service Oriented Architecture
ANATOMY OF SOA - Thomas Erl - Service Oriented ArchitectureANATOMY OF SOA - Thomas Erl - Service Oriented Architecture
ANATOMY OF SOA - Thomas Erl - Service Oriented Architecture
 
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
 
Chapter 1 Introduction to Software Engineering and Process Models.pdf
Chapter 1 Introduction to Software Engineering and Process Models.pdfChapter 1 Introduction to Software Engineering and Process Models.pdf
Chapter 1 Introduction to Software Engineering and Process Models.pdf
 
AFCAT STATIC Genral knowledge important CAPSULE.pdf
AFCAT STATIC Genral knowledge important CAPSULE.pdfAFCAT STATIC Genral knowledge important CAPSULE.pdf
AFCAT STATIC Genral knowledge important CAPSULE.pdf
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
 
the potential for the development of autonomous aircraft
the potential for the development of autonomous aircraftthe potential for the development of autonomous aircraft
the potential for the development of autonomous aircraft
 
Cisco Intersight Technical OverView.pptx
Cisco Intersight Technical OverView.pptxCisco Intersight Technical OverView.pptx
Cisco Intersight Technical OverView.pptx
 
Presentation slide on DESIGN AND FABRICATION OF MOBILE CONTROLLED DRAINAGE.pptx
Presentation slide on DESIGN AND FABRICATION OF MOBILE CONTROLLED DRAINAGE.pptxPresentation slide on DESIGN AND FABRICATION OF MOBILE CONTROLLED DRAINAGE.pptx
Presentation slide on DESIGN AND FABRICATION OF MOBILE CONTROLLED DRAINAGE.pptx
 
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
Updated Limitations of Simplified Methods for Evaluating the Potential for Li...
 
charting the development of the autonomous train
charting the development of the autonomous traincharting the development of the autonomous train
charting the development of the autonomous train
 
OME754 – INDUSTRIAL SAFETY - unit notes.pptx
OME754 – INDUSTRIAL SAFETY - unit notes.pptxOME754 – INDUSTRIAL SAFETY - unit notes.pptx
OME754 – INDUSTRIAL SAFETY - unit notes.pptx
 
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
 
Adv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdfAdv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdf
 
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagneEAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
 
How to Formulate A Good Research Question
How to Formulate A  Good Research QuestionHow to Formulate A  Good Research Question
How to Formulate A Good Research Question
 
Generative AI and Large Language Models (LLMs)
Generative AI and Large Language Models (LLMs)Generative AI and Large Language Models (LLMs)
Generative AI and Large Language Models (LLMs)
 

Adventures in Real-World Data Science

  • 1. Adventures in Real-World Data Science Automated Patent Classification Rollie D. Goodman TechLink Center
  • 2. • Established as a technology transfer center in 1996 • Facilitates ~60% of DoD’s license agreements with industry • Helps small companies secure R&D contracts
  • 3. • Established as a technology transfer center in 1996 • Facilitates ~60% of DoD’s license agreements with industry • Helps small companies secure R&D contracts
  • 4. • Established as a technology transfer center in 1996 • Facilitates ~60% of DoD’s license agreements with industry • Helps small companies secure R&D contracts
  • 6. training set: ~9,000 labeled patents
  • 7. Patent Data Document Number US 9,832,220 Assignee US Air Force Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin Attorneys Mancini, Joseph A. Title Security Method for Allocation of Virtual Machines in a Cloud Computing Network Abstract A method for enhancing security in a cloud computing system by allocating virtual machines over hypervisors, in a cloud computing environment, in a security-aware fashion. The invention solves the cloud user risk problem by inducing a state such that, unless there is a change in the conditions under which the present invention operates, the cloud users do not gain by deviating from the allocation induced by the present invention. The invention’s methods include grouping virtual machines of similar loss potential on the same hypervisor, creating hypervisor environments of similar total loss, and implementing a risk tiered system of hypervisors based on expense factors. Publication Date 11-28-2017 CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
  • 8. Patent Data Document Number US 9,832,220 Assignee US Air Force Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin Attorneys Mancini, Joseph A. Title Security Method for Allocation of Virtual Machines in a Cloud Computing Network Abstract A method for enhancing security in a cloud computing system by allocating virtual machines over hypervisors, in a cloud computing environment, in a security-aware fashion. The invention solves the cloud user risk problem by inducing a state such that, unless there is a change in the conditions under which the present invention operates, the cloud users do not gain by deviating from the allocation induced by the present invention. The invention’s methods include grouping virtual machines of similar loss potential on the same hypervisor, creating hypervisor environments of similar total loss, and implementing a risk tiered system of hypervisors based on expense factors. Publication Date 11-28-2017 CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
  • 9. Patent Data Document Number US 9,832,220 Assignee US Air Force Inventors Kwiat, Luke; Kamhoua, Charles; Kwiat, Kevin Attorneys Mancini, Joseph A. Title Security Method for Allocation of Virtual Machines in a Cloud Computing Network Abstract A method for enhancing security in a cloud computing system by allocating virtual machines over hypervisors, in a cloud computing environment, in a security-aware fashion. The invention solves the cloud user risk problem by inducing a state such that, unless there is a change in the conditions under which the present invention operates, the cloud users do not gain by deviating from the allocation induced by the present invention. The invention’s methods include grouping virtual machines of similar loss potential on the same hypervisor, creating hypervisor environments of similar total loss, and implementing a risk tiered system of hypervisors based on expense factors. Publication Date 11-28-2017 CPC Classes H04L63/1441, G06F9/45558, H04L63/1408, H04L63/20
  • 10. A01B33/028 A: Human necessities A01: Agriculture A01B: Machines for soil working in agriculture or industry A01B33: Tilling implements with rotary driven tools A01B33/02: …with tools on horizontal shaft transverse to direction of travel A01B33/028: …of the walk-behind type CPC Terms
  • 11. A01B33/028 A: Human necessities A01: Agriculture A01B: Machines for soil working in agriculture or industry A01B33: Tilling implements with rotary driven tools A01B33/02: …with tools on horizontal shaft transverse to direction of travel A01B33/028: …of the walk-behind type CPC Terms
  • 12. vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045} instance: {A61K038/00, A61K038/005, A61K039/00} CPC Vectorization
  • 13. vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045} instance: {A61K038/00, A61K038/005, A61K039/00} {A61K038, A61K038, A61K039} CPC Vectorization
  • 14. vocabulary: {A61K036, A61K038, A61K039, A61K041, A61K045} instance: {A61K038/00, A61K038/005, A61K039/00} {A61K038, A61K038, A61K039} [ 0, 2, 1, 0, 0 ] CPC Vectorization
  • 20. Cross-Validation experiment 1 experiment 2 experiment 3 experiment 4 experiment 5 fold 1 fold 2 fold 3 fold 4 fold 5 overall accuracy randomized training data
  • 21. Ensemble Learners • Train and combine multiple learners to solve a single problem • also: “multiple classifier systems” • Often outperform single classifiers • e.g. Netflix Competition, KDD 2009, and Kaggle
  • 23. Text Processing • Stopwords: remove words that appear frequently but do not give any information about content • a, an, and, for, from, is, it, the, to, with…
  • 24. Text Processing • Stopwords: remove words that appear frequently but do not give any information about content • a, an, and, for, from, is, it, the, to, with… • Stemming: reduce derived words to root (“stemmed”) form • different, differently, differ, differing, differed → differ
  • 25. Text Processing • Stopwords: remove words that appear frequently but do not give any information about content • a, an, and, for, from, is, it, the, to, with… • Stemming: reduce derived words to root (“stemmed”) form • different, differently, differ, differing, differed → differ • Weighting: term frequency – inverse document frequency !"#$"% = '()* +)(,-(./0% ∗ log .-*5() 6+ 76/-*(.'8 .-*5() 6+ 76/-*(.'8 9ℎ()( '()* ; 6//-)8
  • 26. the results are computed from the resulting generated text Text Processing
  • 27. the results are computed from the resulting generated text results computed resulting generated text Text Processing
  • 28. the results are computed from the resulting generated text results computed resulting generated text result comput result gener text Text Processing
  • 29. the results are computed from the resulting generated text results computed resulting generated text result comput result gener text 3.03, 1.24, 0.68, 4.79. . . Text Processing
  • 30. CPC classifier (SVM) text classifier (SVM) ? “The results are computed from the resulting generated text…” {A61K036, A61K038, A61K039, A61K041, A61K045} final classification class 1 class 2[class 1, class 2]
  • 31. Decision Trees outlook humidity wind N Y Y Y N high low sunny overcast rainy high low outlook: {sunny, overcast, rainy} humidity: {high, low} wind: {high, low} hiking: {Yes, No}
  • 32. CPC classifier (SVM) text classifier (SVM) decision tree “The results are computed from the resulting generated text…” {A61K036, A61K038, A61K039, A61K041, A61K045} final classification class 1 class 2[class 1, class 2] 87% 76% 98%