Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data donderdag
1. Treparel
Delftechpark 26
2628 XH Delft
The Netherlands
www.treparel.com
Turn
Big Content
in to
Business
Insights
Jeroen Kleinhoven
CEO
September 4, 2014
2. Gartner
Hype
Cycle,
Emerging
Technologie
July
2014:
Where
are
Content
Analy?cs
and
Big
Data?
Mainstream adoption
• Content Analytics is 2 to 5 years away.
• Big Data is 5 to 10 years away.
Treparel KMX – All Rights Reserved 2014 www.treparel.com 2
3. About
Treparel
• Company
– HQ
in
DelG
(The
Netherlands)
– R&D
in
ecosystem:
DelG
University
of
Technology,
Univ.
of
Paris
and
Sao
Paulo
– Founded
by
a
Data
Scien?st,
a
Visualiza?on
Prof
and
Search/Machine
Learning
engineers.
Managed
by
Gartner
VP
since
2013
• Treparel
is
a
solu?on
provider:
– Rooted
in
Patent
Analy?cs
&
Visualiza?on,
Evolved
in
to
Big
Content
Solu?ons
– Big
Content
and
KMX:
content
type
agnos?c
Search,
Text
Analy?cs
&
Visualiza?on
– KMX
(Knowledge
Mapping
eXplora?on)
provides
fast
and
accurate
insights
in
Big
Content
(email,
patents,
literature,
web,
social)
for
making
be_er
informed
decisions
• 3
types
of
clients:
– End
Users:
Client/Server
applica?on
(Download,
Install,
Run)
– Partners:
Client/Server
+
Developer
API
(Download,
Install,
Run
+
Integrate)
– Independent:
Developers,
Researchers:
Developer
API
+
C/S…
OpenSource
(tbd)
Treparel KMX – All Rights Reserved 2014 www.treparel.com 3
4. KMX
-‐
extract,
analyze
&
visualize
pa_erns
in
large
content
collec?ons
1. Landscaping / Clustering:
Examine a content cluster and extract entities or
references to people, products, locations, and other
concepts
2. Categorization/Classification:
Group similar information together
Treparel KMX – All Rights Reserved 2014 www.treparel.com 5
5. Value
from
Big
Content
in
Publishing
• Examples
of
added
value
for
Publishers
:
1. Content
dashboarding:
offering
Business
Intelligence
style
Search,
Repor?ng,
Analy?cs
and
Visualiza?on
3. Explora?on
of
content
that
will
not
show
up
in
a
standard
search
query
4. Interac?ve
Content
Naviga?on
As
well
as:
4. Ar?cle
recommenda?ons,
Smart
collec?ons,
Group
tagging
Treparel KMX – All rights reserved 2014 6
6. 1.
Content
Dashboard:
ease-‐of-‐use
naviga?on
in
large
sets
of
content
(Report
–
Search
–
Analyse
–
Visualize)
Page 7 |
Ease of Use access to
Recorded
Demo:
h_p://treparel.com/next-‐gen-‐ip-‐
rd-‐dashboard/
Research, Patents, Business News, Legislation
Treparel KMX – All Rights Reserved 2014 7
7. 2.
Enhance
users
ability
to
visually
explore
relevant
(hidden)
content
-‐
2
Page 8 |
Interactive taxonomy with multiple coupled views incl. integrated
visualizations and search in large sets of documents
Treparel KMX – All Rights Reserved 2014 8
8. 2.
Enhance
users
ability
to
visually
explore
content
(example:
search
in
research
on
Ebola)
Page 9 |
Zoomlevel 1
Zoomlevel 2 Zoomlevel 3
Clustering: Automatic annotation and zooming on large sets of
Treparel KMX – All Rights Reserved 2014 documents 9
9. 3.
Explora?on
through
classifica?on
of
content
(that
will
not
show
up
in
a
standard
search
query)
Publishing
Database
10.000 documents
1.000 documents
10 documents
Ranking
Queries
Filtering
Content
Dashboard
Present Final Results
Ranking
Filtering
Ranking
Filtering
Treparel KMX – All rights reserved 2014
10. Key
Take
Aways
Treparel is interested to partner to empower Content Rich Search-
Driven solutions.
• Mail me your details at jeroen@treparel.com when you’re
interested in:
1. Getting a 30 days free trial
2. Test driving the KMX API in your content application
or
3. To be part of the pre launch group for… KMX OpenSource.
Page 11 |
Treparel KMX – All Rights Reserved 2014 11
12. How
to
posi?on
KMX
in
Big
Content
Analy?cs
KMX & Developer API
Content Dashboard
Developer Partnerships
Key Solutions:
1. Intellectual Property
2. eDiscovery
3. Publishing: Law, IP & Science
4. Risk & Compliance
5. Fraud & Forensics
Today’s topic
Treparel KMX – All Rights Reserved 2014 13
13. KMX
Text
Analy?cs
Applica?on
overview
Text
Preprocessing
and
Indexing
Clustering
Classifica?on
Visualiza?on
Acquire
documents
Present
Results
Taxonomies,
Ontologies
Seman?c
Analysis
KMX
unique
func?ons:
• Extract
concepts
in
context
using
clustering
and
classifica?on
of
documents
• Use
classifica?on
to
create
ranked
lists
and
to
tag
subsets
• Support
of
binary
and
mul?-‐
class
Classifica?on
• Enterprise
edi?on
(server/
cloud)
&
Professional
edi?on
(desktop)
• Integra?on
with
other
applica?ons
through
KMX
API
Query &
Search Tools
Treparel KMX – All rights reserved 2014 www.treparel.com 14
14. Clustering:
User
Unsupervised
Analy?cs
Benefits:
Get
quick
insights
through
automated
visual
clusters
with
annota?ons
to
enhance
the
discovery
process
1. Analyze
the
clusters
and
the
rela?onships
in
the
data
2. Explore
outliers
in
the
data
3. Find
documents
of
interest
What
it
does:
A
visualiza?on
of
clusters
where
the
documents
are
displayed
as
points
and
the
distance
between
them
shows
their
similarity.
What
KMX
delivers:
Use
KMX
to
do:
1. Perform
text
preprocessing
(stemming/tokeniza?on
etc)
2. Calculate
between
all
documents
a
similarity
measure
3. Calculate
visualiza?on
(landscape)
with
automa?c
annota?on
4. Create
the
visualiza?on
– As
a
sta?c
image
– Or
provide
interac?on
where
the
user
can
zoom
in/out
with
support
for
adap?ve
annota?on
Treparel KMX – All rights reserved 2014 15
15. Classifica?on:
User
Supervised
Analy?cs
Benefits:
Finding
fast,
accurate
and
precise
small
result
sets
and
enabling
trend
repor?ng
and
Aler?ng
by
reusing
predefined
categoriza?on
models.
1. Obtain
a
ranked
list
of
the
most
relevant
documents
2. Separate
the
important
documents
from
the
irrelevant
documents
(noise)
How
it
works:
A
list
of
the
relevant
documents
defined
from
a
users
perspec?ve.
What
KMX
delivers
Use
KMX
to
do:
1. Tag
(label)
a
small
number
of
relevant
and
irrelevant
documents
– Use
search
to
iden?fy
documents
that
need
to
be
tagged
– Perform
manual
tagging
– Select
documents
interac?ve
from
the
visualiza?on
2. Create
a
Classifier
(categorizer)
using
the
tagged
documents
3. Automa?cally
perform
the
classifica?on
on
all
documents
4. Obtain
the
important
documents
as
ranked
high
and
the
irrelevant
documents
which
are
ranked
low
Treparel KMX – All rights reserved 2014 16
16. KMX
API:
Embed
Advanced
Text
Analy?cs
func?ons
Clustering
Provides users
unsupervised analytics and
automatically identifies
inherent themes or
information clusters.
Through a dynamic
hierarchical topic view into
search results it enables
users to quickly focus on
annotated subjects rather
than scrolling through long
results lists.
Classification
Supervised analytics to help
users automatically
categorize large sets of
documents.
The Classification process
can use a small number of
documents sets for learn-by-example
categorization.
By sorting the content of
documents by topic,
relevancy and keywords
users can apply their own
models or rules for
classification.
Visualization
Advanced visual knowledge
discovery for displaying,
exporting and sharing data
results, ranked document lists,
labeled and enriched data or
interactive visualizations.
Terms can be extracted to use
in building thesauri or
taxonomies.
KMX API
XML-RPC and REST (JSON)
Python Pickle protocol
Server: User / Tenant mgt
User objects mgt (datasets,
work spaces, classifiers, stop
lists,.)
Databases: Oracle,
PostgreSQL
Client Application:
Native Windows (for creating
Analysis pipelines)
Using QT for GUI
Using OpenGL for
visualizations
17. Industry
Thought
Leaders
about
KMX
“Treparel
KMX’s
visualiza(on
capabili(es
around
its
auto-‐categoriza>on
and
clustering
offer
immediate
insight
into
unstructured
data
sets
and
appear
to
be
adaptable
and
customizable
to
customer
needs.
Its
approach
to
auto-‐categoriza>on
u>lizes
sta>s>cal
principles
and
machine
learning
that
require
significantly
less
training
and
tuning
on
the
part
of
customers
than
other
approaches.”
David
Schubmehl,
IDC
“As
we
acquire
more
and
more
informa>on,
we
need
tools
that
will
guide
us
through
the
data
maze.
Analysts
need
tools
to
help
them
understand
pa;erns
and
define
clusters.
Users
need
to
explore
data
to
uncover
rela>onships
from
scaNered
sources.
Treparel’s
KMX
serves
both
these
needs
with
its
ability
to
cluster
and
categorize
collec(ons
of
data
with
a
high
degree
of
accuracy,
and
its
interac>ve
visualiza>on
tools
that
enable
explora>on
of
large
data
sets.”
Sue
Feldman,
Synthexis.com
(author:
The
Answer
Machine.
Treparel KMX – All Rights Reserved 2014 18