Lumify is a relatively new open source platform for big data analysis and visualization, designed to help organizations derive actionable insights from the large volumes of diverse data flowing through their enterprise. Utilizing popular big data tools like Hadoop, Accumulo, and Storm, it ingests and integrates many kinds of data, from unstructured text documents and structured datasets, to images and video. Several open source analytic tools (including Tika, OpenNLP, CLAVIN, OpenCV, and ElasticSearch) are used to enrich the data, increase its discoverability, and automatically uncover hidden connections. All information is stored in a secure graph database implemented on top of Accumulo to support cell-level security of all data and metadata elements. A modern, browser-based user interface enables analysts to explore and manipulate their data, discovering subtle relationships and drawing critical new insights. In addition to full-text search, geospatial mapping, and multimedia processing, Lumify features a powerful graph visualization supporting sophisticated link analysis and complex knowledge representation.
9. Key Concepts
Ontology
structure for organizing information (i.e., your data model)
Entities
any “thing” you want to represent (e.g., person, place, event)
Relationships
a link between two entities (e.g., leader of, works for, sibling of)
Properties
data about an entity (e.g., first name, last name, date of birth)
Graph
collection of entities and the relationships between them
11. Wallmart (vertex V3)
Row
ID
Column
Family
Column
Qualifier
Visibility
Value
V3
V
-‐
U
V3
EIN
E1
TS
Is
leader
V3
VIN
V1
TS
Is
leader
V3
EIN
E3
S
works
for
V3
VIN
V4
S
works
for
V3
EOUT
E2
U
headquartered
in
V3
VOUT
V2
U
headquartered
in
V3
PROP
name1
U
Wallmart
V3
PROP
founded1
S
1962-‐01-‐01
12. User with U, S, and TS visibility
Row
ID
Column
Family
Column
Qualifier
Visibility
Value
V3
V
-‐
U
V3
EIN
E1
TS
Is
leader
V3
VIN
V1
TS
Is
leader
V3
EIN
E3
S
works
for
V3
VIN
V4
S
works
for
V3
EOUT
E2
U
headquartered
in
V3
VOUT
V2
U
headquartered
in
V3
PROP
name1
U
Wallmart
V3
PROP
founded1
S
1962-‐01-‐01
13. User with U and S visibility
Row
ID
Column
Family
Column
Qualifier
Visibility
Value
V3
V
-‐
U
V3
EIN
E1
TS
Is
leader
V3
VIN
V1
TS
Is
leader
V3
EIN
E3
S
works
for
V3
VIN
V4
S
works
for
V3
EOUT
E2
U
headquartered
in
V3
VOUT
V2
U
headquartered
in
V3
PROP
name1
U
Wallmart
V3
PROP
founded1
S
1962-‐01-‐01
14. User with U visibility
Row
ID
Column
Family
Column
Qualifier
Visibility
Value
V3
V
-‐
U
V3
EIN
E1
TS
Is
leader
V3
VIN
V1
TS
Is
leader
V3
EIN
E3
S
works
for
V3
VIN
V4
S
works
for
V3
EOUT
E2
U
headquartered
in
V3
VOUT
V2
U
headquartered
in
V3
PROP
name1
U
Wallmart
V3
PROP
founded1
S
1962-‐01-‐01
16. Zarka de Mexico Vertex (V3)
Row
ID
Column
Family
Column
Qualifier
Visibility
Value
V3
V
-‐
U
V3
EIN
E1
TS
Is
leader
V3
VIN
V1
TS
Is
leader
V3
EIN
E3
S
works
for
V3
VIN
V4
S
works
for
V3
EOUT
E2
U
headquartered
in
V3
VOUT
V2
U
headquartered
in
V3
EIN
E8
S&WS1
works
for
V3
VIN
V8
S&WS1
works
for
18. Implemented in ElasticSearch
• Use parent/child document indexing. One
document per property.
• Store visibility with indexed docs.
• Custom-developed ES filter uses
Accumulo’s visibility evaluation code to
filter out documents prior to query eval.