Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by Paul Nelson, Search Technologies

O C T O B E R
1 1 -‐ 1 4 ,
2 0 1 6

•

B O S T O N ,
M A

O C T O B E R
1 1 -‐ 1 4 ,
2 0 1 6

•

B O S T O N ,
M A

Searching the Enterprise Data Lake with Solr - Watch us do it!
Paul Nelson – pnelson@searchtechnologies.com
Chief Architect, Search Technologies

THERE
WILL
BE
A
DEMO

Stay
Tuned!

205+
Search
Consultants
Worldwide

San
Diego

San
Jose,
CR

Cincinna6

Manila,
PH

Washington

(HQ)

•  Founded
2005

•  Deep
search
experLse

•  900+
customers
worldwide

•  Consistent
proﬁtability

•  Search
engines
&
Big
Data

•  Vendor
independent

London,
UK

Frankfurt,
DE

Prague,
CZ

Agenda

•  The
Enterprise
Data
Lake
(EDL)

•  Why
Search
the
EDL?

•  The
Process

•  How
To:

Step
By
Step

•  And
then
what?

In
The
Beginning

Applica6on

Computer
Users

Database

Dashboards

Reports

Search
&

Troubleshoo6ng

Alerts

This
Evolved
to
Data
Warehouses

Many
Computer
Users

Dozens
of

Applica6ons
Dozens
of

Applica6ons
Dozens
of

Applica6ons
Dozens
of

Applica6ons
Dozens
of

Applica6ons
Dozens
of

Applica6ons

Extract

Transform

Load

Enterprise

Data
Warehouse

Dashboards

Reports

Search
&

Troubleshoo6ng

Alerts

And
Now
the
Enterprise
Data
Lake

Many,
many,
many

Computer
Users

Enterprise

Data
Lake

Dashboards

Reports

Search
&

Troubleshoo6ng

Alerts

Analyze

Hundreds
of

Applica6ons
Raw
Data

And
Processed

Data

What’s
new
about
the
Data
Lake?

•  Ingest
RAW
DATA

•  Keep
it
FOREVER

•  Make
it
ALL
AVAILABLE

•  Analyze
it
ONLY
WHEN
NEEDED

•  Do
it
at
MASSIVE
SCALE

Why
the
Data
Lake?

•  You
never
know
what’s
important
up
front

–  New
data
mining
techniques
invented
daily

–  Therefore,
keep
everything

•  There
is
too
much
data
variety

–  Therefore,
only
process
what
you
need

•  Save
money
by
not
ETL’ing
useless
stuﬀ

•  There
are
many
diﬀerent
use
cases

–  Shared
re-‐use
of
data
by
anyone

–  Data
is
power!
Power
to
the
people!

But
Now
There’s
a
Problem:

•  10’s
of
thousands
of
databases

•  Billions
of
records

How
to
ﬁnd
the
data
you
need?

SO
LET’S
SEARCH
THE
DATA
LAKE

“People
today
think
search
and
big

data
are
separate
but
in
two
or
three

years,
everyone
will
wonder
why
we

ever
thought
that.”

Doug
Cu?ng

Chief
Architect,
Cloudera

Creator
of
Lucene
&
Hadoop

The
Process

Ingest

1

Research

the
Data

2

Conﬁgure
Solr

3

Parse
&

Index

4

Search
&

Analyze

5

Produc6on

6

1.

Ingest

HDFS

Load

Data

Hadoop

2.

Research
the
Data

HDFS

Research

Hadoop

3.

Conﬁgure
Solr

HDFS

solrconﬁg.xml

schema.xml

Hadoop

4.

Parse
&
Index

HDFS

Index
Morphlines

Hadoop

5.

Search
&
Analyze

HDFS

Index

Hadoop

Hue
Morphlines

6.
Move
to
Produc6on

•  Tes6ng,
Quality
Control

–  Field
processing

–  Search
Features

–  Analy6cs

•  Incremental
Processing

–  Flume,
Spark
Streaming,
Incremental
Batches

•  Workﬂow
/
Scheduled
Jobs
(Oozie)

•  Security
Controls

Resources

•  HDFS
File
System
Commands

–  hips://hadoop.apache.org/docs/r2.7.3/hadoop-‐project-‐dist/hadoop-‐common/FileSystemShell.html

•  solrctl
Reference
Guide

–  hips://www.cloudera.com/documenta6on/enterprise/5-‐7-‐x/topics/search_solrctl_ref.html

•  Morphlines
Reference
Guide

–  hip://kitesdk.org/docs/1.1.0/morphlines/morphlines-‐reference-‐guide.html

–  hips://github.com/typesafehub/conﬁg/blob/master/HOCON.md

•  MapReduce
Indexer
Tool

–  hips://github.com/cloudera/search/tree/cdh5-‐1.0.0_5.2.1/search-‐mr

•  Crunch
Indexer

–  hips://github.com/cloudera/search/tree/cdh5-‐1.0.0_5.2.1/search-‐crunch

•  Lily
HBase
Indexer

–  hip://www.cloudera.com/documenta6on/enterprise/latest/topics/search_hbase_batch_indexer.html

What’s
Next

•  Explore
other
analy6c
interfaces

–  Banana,
Zoom
Data

•  Spark

–  Streaming
Data

–  Complex
Analy6cs
à
Store
results
in
Solr
à
More
analy6cs!

•  Index
Many
More
Collec6ons

–  Create
a
Process:

Data
research
à
Data
Model
Design
à
Implement

•  Self-‐Service
Inges6on

–  Document
processes
for
others
to
use

–  Templates
for
inges6on

•  Hire
Search
Technologies!

QUESTIONS?
ANSWERS!

Thank
you!

Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by Paul Nelson, Search Technologies

Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by Paul Nelson, Search Technologies

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by Paul Nelson, Search Technologies

Similar to Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by Paul Nelson, Search Technologies (20)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by Paul Nelson, Search Technologies