SlideShare a Scribd company logo
1 of 88
Download to read offline
Visually Exploring
Patent Collections for Events
and Patterns
Derek X. Wang
Associate Director of the Charlotte Visualization Center

Together with:
Wenwen Dou, Wlodek Zadrozny, Suraj Ankam, Debbie Strumsky, Terry Rabinowitz
Value
Value

Businesses
Value

Businesses
Value

Businesses

• 800 patents:
• $1 billion worth of patents from AOL to Microsoft
Value

Businesses

• 800 patents:
• $1 billion worth of patents from AOL to Microsoft
• 1,100 patents from Kodak
• 525 Million to group license
Value

Businesses

• 800 patents:
• $1 billion worth of patents from AOL to Microsoft
• 1,100 patents from Kodak
• 525 Million to group license
• 17, 000 Patents
• $12.5 billion Motorola Mobility to Google
Value
Dataset: 123 Publications from VAST
proceedings from 2006-2010.

2006

2007

2008

2009

2010
Value

Technology
Dataset: 123 Publications from VAST
proceedings from 2006-2010.

2006

2007

2008

2009

2010
Value

Technology
Dataset: 123 Publications from VAST
proceedings from 2006-2010.

Cyan topic:
variable uncertainty trend correlation
linear multivariate sensitivity

2006

2007

2008

2009

2010
Value

Technology
Dataset: 123 Publications from VAST
proceedings from 2006-2010.

Cyan topic:
variable uncertainty trend correlation
linear multivariate sensitivity

Blue topic:
dimension quality cluster measure lda
attribute reduction projection

2006

2007

2008

2009

2010
Value

Technology
Dataset: 123 Publications from VAST
proceedings from 2006-2010.

Cyan topic:
variable uncertainty trend correlation
linear multivariate sensitivity

FODAVA

Blue topic:
dimension quality cluster measure lda
attribute reduction projection

2006

2007

2008

2009

2010
Value

Technology
Dataset: 123 Publications from VAST
proceedings from 2006-2010.

Cyan topic:
variable uncertainty trend correlation
linear multivariate sensitivity

FODAVA

Blue topic:
dimension quality cluster measure lda
attribute reduction projection

2006

2007

2008

2009

2010

**X. Wang et al., ParallelTopics: A probabilistic approach to exploring document collections, IEEE VAST 2011
Value

Goal
Value Goal
• Can we spot an emerging new technology?
Value Goal
• Can we spot an emerging new technology?
• Text mining and visualization
Value Goal
• Can we spot an emerging new technology?
• Text mining and visualization
• Can we spot novelty within a patent?
Value Goal
• Can we spot an emerging new technology?
• Text mining and visualization
• Can we spot novelty within a patent?
• How much do claims differ from class descriptions?
Value Goal
• Can we spot an emerging new technology?
• Text mining and visualization
• Can we spot novelty within a patent?
• How much do claims differ from class descriptions?
• How much do claims differ from claims in other similar patents
Value Goal
• Can we spot an emerging new technology?
• Text mining and visualization
• Can we spot novelty within a patent?
• How much do claims differ from class descriptions?
• How much do claims differ from claims in other similar patents
• Can we list “all” patents relevant for some technology? (and
what does it mean)
Value Goal

A Robust and Scalable Patent Analysis Infrastructure
Is Needed
Value Goal

A Robust and Scalable Patent Analysis Infrastructure
Is Needed

Balanced
Analytics
Technology
Visual Analytics Will Play a Key Role
Value Goal

A Robust and Scalable Patent Analysis Infrastructure
Is Needed

Balanced
Analytics
Technology

Human
=

+

Computer

Visual Analytics Will Play a Key Role
Value Goal

Challenge
Value Goal Challenge
Value Goal Challenge
Unstructured or semi-structured
Highly heterogeneous
Leading to highly heterogeneous models
Incomplete or with holes
With intrinsic uncertainty (and in some cases deception)
Inside and outside the enterprise
Containing detailed time and space information:
Value Goal Challenge

Research
Value Goal Challenge Research
Structuring the Unstructured:
Topic Modeling
Value Goal Challenge Research
Structuring the Unstructured:
Topic Modeling
• Latent Dirichlet Allocation (LDA)
Value Goal Challenge Research
Structuring the Unstructured:
Topic Modeling
• Latent Dirichlet Allocation (LDA)
• Reveals Latent topics from large textual corpus
Value Goal Challenge Research
Structuring the Unstructured:
Topic Modeling
• Latent Dirichlet Allocation (LDA)
• Reveals Latent topics from large textual corpus
• Coherent sets of most likely words to describe topics
Value Goal Challenge Research
Structuring the Unstructured:
Topic Modeling
• Latent Dirichlet Allocation (LDA)
• Reveals Latent topics from large textual corpus
• Coherent sets of most likely words to describe topics
• Topics defined by keyword groups
Value Goal Challenge Research
Structuring the Unstructured:
Topic Modeling
• Latent Dirichlet Allocation (LDA)
• Reveals Latent topics from large textual corpus
• Coherent sets of most likely words to describe topics
• Topics defined by keyword groups
• Topics in text collections can effectively be inferred
Value Goal Challenge Research
Value Goal Challenge Research
Structuring the Unstructured:
Investigative Element Extraction
Value Goal Challenge Research
Structuring the Unstructured:
Investigative Element Extraction
• Recognition of entities including people, locations, buildings,
organizations.
Value Goal Challenge Research
Structuring the Unstructured:
Investigative Element Extraction
• Recognition of entities including people, locations, buildings,
organizations.
• Recognition of times and dates.
Value Goal Challenge Research
Structuring the Unstructured:
Investigative Element Extraction
• Recognition of entities including people, locations, buildings,
organizations.
• Recognition of times and dates.
• Construct near-real-time analysis pipeline for entity
association
Value Reality Challenge Research
Value Reality Challenge Research
Structuring the Unstructured:
Event Structuring
Value Reality Challenge Research
Structuring the Unstructured:
Event Structuring
Events: Meaningful occurrences in space and time
Value Reality Challenge Research
Structuring the Unstructured:
Event Structuring
Events: Meaningful occurrences in space and time
Motivating Event
Particular Topic Stream
Value Reality Challenge Research
Structuring the Unstructured:
Event Structuring
Events: Meaningful occurrences in space and time
Motivating Event
Particular Topic Stream

Narrative: a series of clustered (event-based) stories
temporally-linked based on content similarity.
Value Reality Challenge Research

Results
Value Reality Challenge Research Results
Can we spot an emerging new technology?
Value Reality Challenge Research Results
Can we spot an emerging new technology?
Data:
50,000 telecommunication patents, in past 10 years
Abstract text and patent meta-information;
	 1.5 Gb Raw Patent Documents
Value Reality Challenge Research Results
Can we spot an emerging new technology?
Data:
50,000 telecommunication patents, in past 10 years
Abstract text and patent meta-information;
	 1.5 Gb Raw Patent Documents
Methods: Topic modeling and visualization
Value Reality Challenge Research Results
Can we spot an emerging new technology?
Data:
50,000 telecommunication patents, in past 10 years
Abstract text and patent meta-information;
	 1.5 Gb Raw Patent Documents
Methods: Topic modeling and visualization
Results:
We can see a significant change in the topic of “software
and storage” in communication around 2007
(corresponding to Apple iPhone?)
Value Reality Challenge Research Results
Can we spot an emerging new technology?

**W. Dou et al., HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies, IEEE VAST 2013
Value Reality Challenge Research Results
Can we spot an emerging new technology?
Model:
§ 100 topics
§ Each topic a distribution on
words
§ Each abstract a combination
of topics
!

Note: Width of the graph
proportional to the number of
patents and the number of words
from a particular topic (topic signal
strength).
Number of class 455 patents grew
from 2234 in 2005 to 7647 in 2012
**W. Dou et al., HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies, IEEE VAST 2013
Value Reality Challenge Research Results
Can we spot an emerging new technology?
Model:
§ 100 topics
§ Each topic a distribution on
words
§ Each abstract a combination
of topics
!

Note: Width of the graph
proportional to the number of
patents and the number of words
from a particular topic (topic signal
strength).
Number of class 455 patents grew
from 2234 in 2005 to 7647 in 2012
**W. Dou et al., HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies, IEEE VAST 2013
Value Reality Challenge Research Results
Value Reality Challenge Research Results
Typical Keyword: “transistor”
Value Reality Challenge Research Results
Typical Keyword: “transistor”

!

Emergent:
“storage,
software, …”
Value Reality Challenge Research Results
Value Reality Challenge Research Results
Can we spot novelty within an existing patent?
Value Reality Challenge Research Results
Can we spot novelty within an existing patent?
Data$$
$Ini(ally:$A"random"sample"of"40"patents"in"several"
classes"with"focus"on"455"(telecom)."""
$Recently:$Confirmed"through"automated"analysis"of"
several"subclasses"of"455.""
$
Method:"Compare"words"in"claims"with"words"in"class"plus"
subclass"definiAon"
"
Results:"Large"symmetric"differences
""#$%&(()*+,&)÷"#$%&(./0+1+2+#1)"
""#$%&(34&2$*52&)÷"#$%&(./0+1+2+#1)"
Value Reality Challenge Research Results

Example

h)p://pa,t.uspto.gov/netacgi/nph-­‐Parser?Sect1=PTO2&p=1&u=%2Fnetahtml%2Fsearch-­‐
bool.html&r=2&f=G&l=50&d=pall&s1=449%2F8.CCLS.&OS=CCL/449/8&RS=CCL/449/8

	
  

Patent	
  Title	
  
Process	
  for	
  rearing	
  bumblebee	
  queens	
  and	
  process	
  for	
  
rearing	
  bumblebees	
  	
  

Main	
  ClassificaTon	
  	
  
449/1	
  ;	
  449/2;	
  449/8	
  

Class	
  449	
  –	
  Bee	
  Culture	
  /	
  Subclass	
  1	
  
Class	
  449	
  –	
  Bee	
  Culture	
  /	
  Subclass	
  8
Value Reality Challenge Research Results
We	
  claim:	
  




1.	
  A	
  process	
  for	
  rearing	
  bumblebee	
  queens	
  (genus	
  Bombus)	
  comprising	
  generaTng	
  a	
  colony	
  with	
  workers	
  in	
  the	
  
presence	
  of	
  ferTlized	
  eggs	
  and/or	
  larvae	
  from	
  at	
  least	
  one	
  colony,	
  in	
  a	
  room	
  with	
  a	
  controlled	
  climate	
  provided	
  
with	
  food,	
  and	
  allowing	
  the	
  colony	
  to	
  grow	
  unTl	
  bumblebee	
  queens	
  are	
  produced,	
  wherein	
  subadult	
  and/or	
  adult	
  
workers	
  that	
  originate	
  from	
  at	
  least	
  one	
  different	
  colony	
  are	
  brought	
  together	
  with	
  said	
  ferTlized	
  eggs	
  and/or	
  
larvae.	
  

2.	
  The	
  process	
  according	
  to	
  claim	
  1,	
  wherein	
  the	
  workers	
  that	
  originate	
  from	
  said	
  at	
  least	
  one	
  different	
  colony	
  are	
  
brought	
  together	
  with	
  a	
  young	
  colony	
  in	
  the	
  eusocial	
  phase,	
  consisTng	
  of	
  a	
  ferTlized	
  queen,	
  brood	
  and	
  the	
  first	
  
born	
  workers.	
  

3.	
  The	
  process	
  according	
  to	
  claim	
  1,	
  wherein	
  more	
  than	
  100	
  workers	
  are	
  brought	
  together.	
  

4.	
  The	
  process	
  according	
  to	
  claim	
  1,	
  wherein	
  rearing	
  is	
  carried	
  out	
  using	
  a	
  workers:	
  ferTlized	
  eggs	
  raTo	
  of	
  0.5-­‐4.	
  

5.	
  The	
  process	
  according	
  to	
  claim	
  1,	
  wherein	
  the	
  workers	
  originaTng	
  from	
  said	
  at	
  least	
  one	
  different	
  colony	
  are	
  
first	
  kept	
  in	
  a	
  room	
  without	
  any	
  queen	
  and	
  without	
  brood	
  for	
  one	
  day.	
  

6.	
  The	
  process	
  according	
  to	
  claim	
  1,	
  wherein	
  brood	
  and	
  workers	
  from	
  different	
  bumblebee	
  species	
  are	
  brought	
  
together.	
  

7.	
  A	
  process	
  for	
  rearing	
  bumblebees	
  (genus	
  Bombus),	
  comprising	
  rearing	
  bumblebee	
  queens	
  by	
  generaTng	
  a	
  
colony	
  with	
  workers	
  in	
  the	
  presence	
  of	
  ferTlized	
  eggs	
  and/or	
  larvae	
  from	
  at	
  least	
  one	
  colony,	
  in	
  a	
  room	
  with	
  a	
  
controlled	
  climate	
  provided	
  with	
  food,	
  and	
  allowing	
  the	
  colony	
  to	
  grow,	
  wherein	
  subadult	
  and/or	
  adult	
  workers	
  
that	
  originate	
  from	
  at	
  least	
  one	
  different	
  colony	
  are	
  brought	
  together	
  with	
  said	
  ferTlized	
  eggs	
  and/or	
  larvae,	
  and	
  
using	
  said	
  bumblebee	
  queens	
  for	
  rearing	
  bumblebees.
Value Reality Challenge Research Results

Subclass Nesting
Class 449
1 -> Class Definition
8 -> 7 -> 3 -> Class Definition
Value Reality Challenge Research Results

Subclass Nesting
Class 449
1 -> Class Definition
8 -> 7 -> 3 -> Class Definition

Class	
  Name:	
  Bee	
  Culture	
  
Class	
  Defini;on:	
  	
  This	
  class	
  includes	
  the	
  methods	
  
of	
  and	
  structures	
  for	
  propagaTng,	
  raising	
  and	
  
caring	
  for	
  bees;	
  as	
  well	
  as	
  certain	
  ancillary	
  
methods	
  and	
  structures.
Value Reality Challenge Research Results

Class	
  449	
  Subclass	
  1
Subclass	
  Name:	
  Method	
  
Subclass	
  Defini;on:	
  	
  This	
  subclass	
  is	
  indented	
  
under	
  the	
  class	
  definiTon.	
  	
  Process.
Value Reality Challenge Research Results

Class	
  449	
  Subclass	
  8

Subclass	
  Name:	
  Queen	
  Raising	
  
Subclass	
  Defini;on:	
  	
  This	
  subclass	
  is	
  indented	
  
under	
  subclass	
  7.	
  	
  Structure	
  with	
  provision	
  to	
  
encourage	
  and	
  care	
  for	
  the	
  producTon	
  of	
  a	
  bee	
  
larvae	
  into	
  a	
  queen	
  bee.
Value Reality Challenge Research Results
Words	
  in	
  class	
  /	
  subclass	
  defini;ons	
  found	
  in	
  patent	
  claim
method

0

colony

11

process

7

culture

0

queen

6

propagate

0

raise

0

encourage

0

care

0

larvae

4

producTon

1

bee

7

mulT

0

swarm

0

capture

0

house

0

hive

0

structure

0
Value Reality Challenge Research Results
Words	
  in	
  claim	
  that	
  were	
  not	
  in	
  definiTons
rearing

5

worker

10

egg

5

ferTlize

6

climate

2

food

2

different

5

control

2
Value Reality Challenge Research Results
Value Reality Challenge Research Results
Can we spot novelty within an existing patent?

Observations
• Novelty is in words/relations that are not part of the definition (but appear in
patent claims or its abstract)
• Some things can be left unsaid. Is there a boundary?
• Happens in all patents (but degree varies)
Value Reality Challenge Research Results
Can we spot novelty within an existing patent?

Next
• Opportunity to text mine these differences
– Are they random on a time scale?
– Would descriptions of emerging technologies emerge from these
patterns?
– Do combination patents have more of these?
Value Reality Challenge Research Results
Value Reality Challenge Research Results
Can we list “all” patents relevant for some technology?
Value Reality Challenge Research Results
Can we list “all” patents relevant for some technology?
– Data: Patents, Wikipedia
Value Reality Challenge Research Results
Can we list “all” patents relevant for some technology?
– Data: Patents, Wikipedia
– Potential Data: Cell phone manuals or other descriptions
Value Reality Challenge Research Results
Can we list “all” patents relevant for some technology?
– Data: Patents, Wikipedia
– Potential Data: Cell phone manuals or other descriptions
Value Reality Challenge Research Results
Can we list “all” patents relevant for some technology?
– Data: Patents, Wikipedia
– Potential Data: Cell phone manuals or other descriptions
– Method: Text mining of patents in certain classes, text mining of filing
by certain market/technology players, text mining of other patents,
using Wikipedia and manuals as a guidance what to look for.
Value Reality Challenge Research Results
Can we list “all” patents relevant for some technology?
– Data: Patents, Wikipedia
– Potential Data: Cell phone manuals or other descriptions
– Method: Text mining of patents in certain classes, text mining of filing
by certain market/technology players, text mining of other patents,
using Wikipedia and manuals as a guidance what to look for.
Value Reality Challenge Research Results

Scale
Value Reality Challenge Research Results Scale
Scalable Computing Architecture for
Extracting Latent Topics and Events* 

Value Reality Challenge Research Results Scale
Scalable Computing Architecture for
Extracting Latent Topics and Events* 

Value Reality Challenge Research Results Scale
Scalable Computing Architecture for
Extracting Latent Topics and Events* 


Distributed Data Storage and Pre-Processing Environment
Value Reality Challenge Research Results Scale
Scalable Computing Architecture for
Extracting Latent Topics and Events* 


Distributed Data Storage and Pre-Processing Environment

**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
Value Reality Challenge Research Results Scale
Scalable Computing Architecture for
Extracting Latent Topics and Events* 


Distributed Data Storage and Pre-Processing Environment
MapReduce procedures for data-cleaning and pre-processing
Distributed Storage Solution (MongoDB), is used for data storage,
analysis and Retrieval
**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
Value Reality Challenge Research Results Scale
Scalable Computing Architecture for
Extracting Latent Topics and Events* 


Distributed Data Storage and Pre-Processing Environment
MapReduce-based social media crawlers for Twitter, blogs and news articles:
Unstructured Contents: Textual Information, Image, Comments
Structured Contents: User Graph, Geo-tags, HashTag
**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
Value Reality Challenge Research Results Scale
Scalable Computing Architecture for
Extracting Latent Topics and Events* 


**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
Value Reality Challenge Research Results Scale
Scalable Computing Architecture for
Extracting Latent Topics and Events* 


Parallel Data Analytics Cluster
MPI-based Parallel-LDA implementation for Topic modeling with
Memory Sharing Optimization
**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
Value Reality Challenge Research Results Scale
Scalable Computing Architecture for
Extracting Latent Topics and Events* 


Parallel Data Analytics Cluster
OpenNLP-based Parallel Implementation for Entity-Extraction
Customized PBS to schedule jobs for parallel computing environment
**X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
Value Reality Challenge Research Results Scale

News Briefing App
Value Reality Challenge Research Results Scale
Resources we’d be happy to share
• Complete US patents and applications (until 1q2013)
with with a search engine (Lucene) interface
• Patent Classes
• Other text resources (Wikipedia, Wiktionary etc)
!

We’d be happy to prepare specialized extracts or
combination for those who need them.
Value Reality Challenge Research Results Scale
Thank you!
Derek Xiaoyu Wang
xiaoyu.wang@uncc.edu

News Briefing App
@News_Briefing
Now FREE at App Store

More Related Content

Similar to Visually Exploring Patent Collections for Events and Patterns

Webinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxWebinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxCloudian
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Daniel Mendez
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
2017 bio it world
2017 bio it world2017 bio it world
2017 bio it worldChris Dwan
 
Navigating The Long Tail
Navigating The Long TailNavigating The Long Tail
Navigating The Long TailJim Kalbach
 
Overview AG AKSW
Overview AG AKSWOverview AG AKSW
Overview AG AKSWSören Auer
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731jeffreylancaster
 
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...Dr. Haxel Consult
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Marieke van Erp
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadRoelof Pieters
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTao Xie
 
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingTiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingRobert Sanderson
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationFrank van Harmelen
 
Invincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in TapioInvincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in TapioInvincea, Inc.
 
Enabling Innovation in eResearch
Enabling Innovation in eResearchEnabling Innovation in eResearch
Enabling Innovation in eResearchNicholas May
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 

Similar to Visually Exploring Patent Collections for Events and Patterns (20)

Webinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxWebinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptx
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
2017 bio it world
2017 bio it world2017 bio it world
2017 bio it world
 
Navigating The Long Tail
Navigating The Long TailNavigating The Long Tail
Navigating The Long Tail
 
Overview AG AKSW
Overview AG AKSWOverview AG AKSW
Overview AG AKSW
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
 
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Semantic Discovery in the Web of Things
Semantic Discovery in the Web of ThingsSemantic Discovery in the Web of Things
Semantic Discovery in the Web of Things
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking ahead
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingTiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge Representation
 
Invincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in TapioInvincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in Tapio
 
Enabling Innovation in eResearch
Enabling Innovation in eResearchEnabling Innovation in eResearch
Enabling Innovation in eResearch
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Visually Exploring Patent Collections for Events and Patterns

  • 1. Visually Exploring Patent Collections for Events and Patterns Derek X. Wang Associate Director of the Charlotte Visualization Center Together with: Wenwen Dou, Wlodek Zadrozny, Suraj Ankam, Debbie Strumsky, Terry Rabinowitz
  • 5. Value Businesses • 800 patents: • $1 billion worth of patents from AOL to Microsoft
  • 6. Value Businesses • 800 patents: • $1 billion worth of patents from AOL to Microsoft • 1,100 patents from Kodak • 525 Million to group license
  • 7. Value Businesses • 800 patents: • $1 billion worth of patents from AOL to Microsoft • 1,100 patents from Kodak • 525 Million to group license • 17, 000 Patents • $12.5 billion Motorola Mobility to Google
  • 8. Value Dataset: 123 Publications from VAST proceedings from 2006-2010. 2006 2007 2008 2009 2010
  • 9. Value Technology Dataset: 123 Publications from VAST proceedings from 2006-2010. 2006 2007 2008 2009 2010
  • 10. Value Technology Dataset: 123 Publications from VAST proceedings from 2006-2010. Cyan topic: variable uncertainty trend correlation linear multivariate sensitivity 2006 2007 2008 2009 2010
  • 11. Value Technology Dataset: 123 Publications from VAST proceedings from 2006-2010. Cyan topic: variable uncertainty trend correlation linear multivariate sensitivity Blue topic: dimension quality cluster measure lda attribute reduction projection 2006 2007 2008 2009 2010
  • 12. Value Technology Dataset: 123 Publications from VAST proceedings from 2006-2010. Cyan topic: variable uncertainty trend correlation linear multivariate sensitivity FODAVA Blue topic: dimension quality cluster measure lda attribute reduction projection 2006 2007 2008 2009 2010
  • 13. Value Technology Dataset: 123 Publications from VAST proceedings from 2006-2010. Cyan topic: variable uncertainty trend correlation linear multivariate sensitivity FODAVA Blue topic: dimension quality cluster measure lda attribute reduction projection 2006 2007 2008 2009 2010 **X. Wang et al., ParallelTopics: A probabilistic approach to exploring document collections, IEEE VAST 2011
  • 15. Value Goal • Can we spot an emerging new technology?
  • 16. Value Goal • Can we spot an emerging new technology? • Text mining and visualization
  • 17. Value Goal • Can we spot an emerging new technology? • Text mining and visualization • Can we spot novelty within a patent?
  • 18. Value Goal • Can we spot an emerging new technology? • Text mining and visualization • Can we spot novelty within a patent? • How much do claims differ from class descriptions?
  • 19. Value Goal • Can we spot an emerging new technology? • Text mining and visualization • Can we spot novelty within a patent? • How much do claims differ from class descriptions? • How much do claims differ from claims in other similar patents
  • 20. Value Goal • Can we spot an emerging new technology? • Text mining and visualization • Can we spot novelty within a patent? • How much do claims differ from class descriptions? • How much do claims differ from claims in other similar patents • Can we list “all” patents relevant for some technology? (and what does it mean)
  • 21. Value Goal A Robust and Scalable Patent Analysis Infrastructure Is Needed
  • 22. Value Goal A Robust and Scalable Patent Analysis Infrastructure Is Needed Balanced Analytics Technology Visual Analytics Will Play a Key Role
  • 23. Value Goal A Robust and Scalable Patent Analysis Infrastructure Is Needed Balanced Analytics Technology Human = + Computer Visual Analytics Will Play a Key Role
  • 26. Value Goal Challenge Unstructured or semi-structured Highly heterogeneous Leading to highly heterogeneous models Incomplete or with holes With intrinsic uncertainty (and in some cases deception) Inside and outside the enterprise Containing detailed time and space information:
  • 28. Value Goal Challenge Research Structuring the Unstructured: Topic Modeling
  • 29. Value Goal Challenge Research Structuring the Unstructured: Topic Modeling • Latent Dirichlet Allocation (LDA)
  • 30. Value Goal Challenge Research Structuring the Unstructured: Topic Modeling • Latent Dirichlet Allocation (LDA) • Reveals Latent topics from large textual corpus
  • 31. Value Goal Challenge Research Structuring the Unstructured: Topic Modeling • Latent Dirichlet Allocation (LDA) • Reveals Latent topics from large textual corpus • Coherent sets of most likely words to describe topics
  • 32. Value Goal Challenge Research Structuring the Unstructured: Topic Modeling • Latent Dirichlet Allocation (LDA) • Reveals Latent topics from large textual corpus • Coherent sets of most likely words to describe topics • Topics defined by keyword groups
  • 33. Value Goal Challenge Research Structuring the Unstructured: Topic Modeling • Latent Dirichlet Allocation (LDA) • Reveals Latent topics from large textual corpus • Coherent sets of most likely words to describe topics • Topics defined by keyword groups • Topics in text collections can effectively be inferred
  • 35. Value Goal Challenge Research Structuring the Unstructured: Investigative Element Extraction
  • 36. Value Goal Challenge Research Structuring the Unstructured: Investigative Element Extraction • Recognition of entities including people, locations, buildings, organizations.
  • 37. Value Goal Challenge Research Structuring the Unstructured: Investigative Element Extraction • Recognition of entities including people, locations, buildings, organizations. • Recognition of times and dates.
  • 38. Value Goal Challenge Research Structuring the Unstructured: Investigative Element Extraction • Recognition of entities including people, locations, buildings, organizations. • Recognition of times and dates. • Construct near-real-time analysis pipeline for entity association
  • 40. Value Reality Challenge Research Structuring the Unstructured: Event Structuring
  • 41. Value Reality Challenge Research Structuring the Unstructured: Event Structuring Events: Meaningful occurrences in space and time
  • 42. Value Reality Challenge Research Structuring the Unstructured: Event Structuring Events: Meaningful occurrences in space and time Motivating Event Particular Topic Stream
  • 43. Value Reality Challenge Research Structuring the Unstructured: Event Structuring Events: Meaningful occurrences in space and time Motivating Event Particular Topic Stream Narrative: a series of clustered (event-based) stories temporally-linked based on content similarity.
  • 44. Value Reality Challenge Research Results
  • 45. Value Reality Challenge Research Results Can we spot an emerging new technology?
  • 46. Value Reality Challenge Research Results Can we spot an emerging new technology? Data: 50,000 telecommunication patents, in past 10 years Abstract text and patent meta-information; 1.5 Gb Raw Patent Documents
  • 47. Value Reality Challenge Research Results Can we spot an emerging new technology? Data: 50,000 telecommunication patents, in past 10 years Abstract text and patent meta-information; 1.5 Gb Raw Patent Documents Methods: Topic modeling and visualization
  • 48. Value Reality Challenge Research Results Can we spot an emerging new technology? Data: 50,000 telecommunication patents, in past 10 years Abstract text and patent meta-information; 1.5 Gb Raw Patent Documents Methods: Topic modeling and visualization Results: We can see a significant change in the topic of “software and storage” in communication around 2007 (corresponding to Apple iPhone?)
  • 49. Value Reality Challenge Research Results Can we spot an emerging new technology? **W. Dou et al., HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies, IEEE VAST 2013
  • 50. Value Reality Challenge Research Results Can we spot an emerging new technology? Model: § 100 topics § Each topic a distribution on words § Each abstract a combination of topics ! Note: Width of the graph proportional to the number of patents and the number of words from a particular topic (topic signal strength). Number of class 455 patents grew from 2234 in 2005 to 7647 in 2012 **W. Dou et al., HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies, IEEE VAST 2013
  • 51. Value Reality Challenge Research Results Can we spot an emerging new technology? Model: § 100 topics § Each topic a distribution on words § Each abstract a combination of topics ! Note: Width of the graph proportional to the number of patents and the number of words from a particular topic (topic signal strength). Number of class 455 patents grew from 2234 in 2005 to 7647 in 2012 **W. Dou et al., HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies, IEEE VAST 2013
  • 52. Value Reality Challenge Research Results
  • 53. Value Reality Challenge Research Results Typical Keyword: “transistor”
  • 54. Value Reality Challenge Research Results Typical Keyword: “transistor” ! Emergent: “storage, software, …”
  • 55. Value Reality Challenge Research Results
  • 56. Value Reality Challenge Research Results Can we spot novelty within an existing patent?
  • 57. Value Reality Challenge Research Results Can we spot novelty within an existing patent? Data$$ $Ini(ally:$A"random"sample"of"40"patents"in"several" classes"with"focus"on"455"(telecom).""" $Recently:$Confirmed"through"automated"analysis"of" several"subclasses"of"455."" $ Method:"Compare"words"in"claims"with"words"in"class"plus" subclass"definiAon" " Results:"Large"symmetric"differences ""#$%&(()*+,&)÷"#$%&(./0+1+2+#1)" ""#$%&(34&2$*52&)÷"#$%&(./0+1+2+#1)"
  • 58. Value Reality Challenge Research Results Example h)p://pa,t.uspto.gov/netacgi/nph-­‐Parser?Sect1=PTO2&p=1&u=%2Fnetahtml%2Fsearch-­‐ bool.html&r=2&f=G&l=50&d=pall&s1=449%2F8.CCLS.&OS=CCL/449/8&RS=CCL/449/8   Patent  Title   Process  for  rearing  bumblebee  queens  and  process  for   rearing  bumblebees     Main  ClassificaTon     449/1  ;  449/2;  449/8   Class  449  –  Bee  Culture  /  Subclass  1   Class  449  –  Bee  Culture  /  Subclass  8
  • 59. Value Reality Challenge Research Results We  claim:  
 
 1.  A  process  for  rearing  bumblebee  queens  (genus  Bombus)  comprising  generaTng  a  colony  with  workers  in  the   presence  of  ferTlized  eggs  and/or  larvae  from  at  least  one  colony,  in  a  room  with  a  controlled  climate  provided   with  food,  and  allowing  the  colony  to  grow  unTl  bumblebee  queens  are  produced,  wherein  subadult  and/or  adult   workers  that  originate  from  at  least  one  different  colony  are  brought  together  with  said  ferTlized  eggs  and/or   larvae.  
 2.  The  process  according  to  claim  1,  wherein  the  workers  that  originate  from  said  at  least  one  different  colony  are   brought  together  with  a  young  colony  in  the  eusocial  phase,  consisTng  of  a  ferTlized  queen,  brood  and  the  first   born  workers.  
 3.  The  process  according  to  claim  1,  wherein  more  than  100  workers  are  brought  together.  
 4.  The  process  according  to  claim  1,  wherein  rearing  is  carried  out  using  a  workers:  ferTlized  eggs  raTo  of  0.5-­‐4.  
 5.  The  process  according  to  claim  1,  wherein  the  workers  originaTng  from  said  at  least  one  different  colony  are   first  kept  in  a  room  without  any  queen  and  without  brood  for  one  day.  
 6.  The  process  according  to  claim  1,  wherein  brood  and  workers  from  different  bumblebee  species  are  brought   together.  
 7.  A  process  for  rearing  bumblebees  (genus  Bombus),  comprising  rearing  bumblebee  queens  by  generaTng  a   colony  with  workers  in  the  presence  of  ferTlized  eggs  and/or  larvae  from  at  least  one  colony,  in  a  room  with  a   controlled  climate  provided  with  food,  and  allowing  the  colony  to  grow,  wherein  subadult  and/or  adult  workers   that  originate  from  at  least  one  different  colony  are  brought  together  with  said  ferTlized  eggs  and/or  larvae,  and   using  said  bumblebee  queens  for  rearing  bumblebees.
  • 60. Value Reality Challenge Research Results Subclass Nesting Class 449 1 -> Class Definition 8 -> 7 -> 3 -> Class Definition
  • 61. Value Reality Challenge Research Results Subclass Nesting Class 449 1 -> Class Definition 8 -> 7 -> 3 -> Class Definition Class  Name:  Bee  Culture   Class  Defini;on:    This  class  includes  the  methods   of  and  structures  for  propagaTng,  raising  and   caring  for  bees;  as  well  as  certain  ancillary   methods  and  structures.
  • 62. Value Reality Challenge Research Results Class  449  Subclass  1 Subclass  Name:  Method   Subclass  Defini;on:    This  subclass  is  indented   under  the  class  definiTon.    Process.
  • 63. Value Reality Challenge Research Results Class  449  Subclass  8 Subclass  Name:  Queen  Raising   Subclass  Defini;on:    This  subclass  is  indented   under  subclass  7.    Structure  with  provision  to   encourage  and  care  for  the  producTon  of  a  bee   larvae  into  a  queen  bee.
  • 64. Value Reality Challenge Research Results Words  in  class  /  subclass  defini;ons  found  in  patent  claim method 0 colony 11 process 7 culture 0 queen 6 propagate 0 raise 0 encourage 0 care 0 larvae 4 producTon 1 bee 7 mulT 0 swarm 0 capture 0 house 0 hive 0 structure 0
  • 65. Value Reality Challenge Research Results Words  in  claim  that  were  not  in  definiTons rearing 5 worker 10 egg 5 ferTlize 6 climate 2 food 2 different 5 control 2
  • 66. Value Reality Challenge Research Results
  • 67. Value Reality Challenge Research Results Can we spot novelty within an existing patent? Observations • Novelty is in words/relations that are not part of the definition (but appear in patent claims or its abstract) • Some things can be left unsaid. Is there a boundary? • Happens in all patents (but degree varies)
  • 68. Value Reality Challenge Research Results Can we spot novelty within an existing patent? Next • Opportunity to text mine these differences – Are they random on a time scale? – Would descriptions of emerging technologies emerge from these patterns? – Do combination patents have more of these?
  • 69. Value Reality Challenge Research Results
  • 70. Value Reality Challenge Research Results Can we list “all” patents relevant for some technology?
  • 71. Value Reality Challenge Research Results Can we list “all” patents relevant for some technology? – Data: Patents, Wikipedia
  • 72. Value Reality Challenge Research Results Can we list “all” patents relevant for some technology? – Data: Patents, Wikipedia – Potential Data: Cell phone manuals or other descriptions
  • 73. Value Reality Challenge Research Results Can we list “all” patents relevant for some technology? – Data: Patents, Wikipedia – Potential Data: Cell phone manuals or other descriptions
  • 74. Value Reality Challenge Research Results Can we list “all” patents relevant for some technology? – Data: Patents, Wikipedia – Potential Data: Cell phone manuals or other descriptions – Method: Text mining of patents in certain classes, text mining of filing by certain market/technology players, text mining of other patents, using Wikipedia and manuals as a guidance what to look for.
  • 75. Value Reality Challenge Research Results Can we list “all” patents relevant for some technology? – Data: Patents, Wikipedia – Potential Data: Cell phone manuals or other descriptions – Method: Text mining of patents in certain classes, text mining of filing by certain market/technology players, text mining of other patents, using Wikipedia and manuals as a guidance what to look for.
  • 76. Value Reality Challenge Research Results Scale
  • 77. Value Reality Challenge Research Results Scale Scalable Computing Architecture for Extracting Latent Topics and Events* 

  • 78. Value Reality Challenge Research Results Scale Scalable Computing Architecture for Extracting Latent Topics and Events* 

  • 79. Value Reality Challenge Research Results Scale Scalable Computing Architecture for Extracting Latent Topics and Events* 
 Distributed Data Storage and Pre-Processing Environment
  • 80. Value Reality Challenge Research Results Scale Scalable Computing Architecture for Extracting Latent Topics and Events* 
 Distributed Data Storage and Pre-Processing Environment **X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
  • 81. Value Reality Challenge Research Results Scale Scalable Computing Architecture for Extracting Latent Topics and Events* 
 Distributed Data Storage and Pre-Processing Environment MapReduce procedures for data-cleaning and pre-processing Distributed Storage Solution (MongoDB), is used for data storage, analysis and Retrieval **X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
  • 82. Value Reality Challenge Research Results Scale Scalable Computing Architecture for Extracting Latent Topics and Events* 
 Distributed Data Storage and Pre-Processing Environment MapReduce-based social media crawlers for Twitter, blogs and news articles: Unstructured Contents: Textual Information, Image, Comments Structured Contents: User Graph, Geo-tags, HashTag **X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
  • 83. Value Reality Challenge Research Results Scale Scalable Computing Architecture for Extracting Latent Topics and Events* 
 **X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
  • 84. Value Reality Challenge Research Results Scale Scalable Computing Architecture for Extracting Latent Topics and Events* 
 Parallel Data Analytics Cluster MPI-based Parallel-LDA implementation for Topic modeling with Memory Sharing Optimization **X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
  • 85. Value Reality Challenge Research Results Scale Scalable Computing Architecture for Extracting Latent Topics and Events* 
 Parallel Data Analytics Cluster OpenNLP-based Parallel Implementation for Entity-Extraction Customized PBS to schedule jobs for parallel computing environment **X. Wang et al., I-SI: Scalable Visual Analytics Architecture for Analyzing Latent Topical-Level Information From Social Media Data, Journal of Computer Graphics Forum, 2012
  • 86. Value Reality Challenge Research Results Scale News Briefing App
  • 87. Value Reality Challenge Research Results Scale Resources we’d be happy to share • Complete US patents and applications (until 1q2013) with with a search engine (Lucene) interface • Patent Classes • Other text resources (Wikipedia, Wiktionary etc) ! We’d be happy to prepare specialized extracts or combination for those who need them.
  • 88. Value Reality Challenge Research Results Scale Thank you! Derek Xiaoyu Wang xiaoyu.wang@uncc.edu News Briefing App @News_Briefing Now FREE at App Store