The document discusses searching enterprise data lakes with Apache Solr. It begins with an overview of how data storage has evolved from single databases to data warehouses to modern data lakes that store vast amounts of raw and processed data. The challenge is finding needed data in this environment. The document then covers the process for indexing data lake contents with Solr, including ingesting data, configuring Solr, parsing and indexing data, searching and analyzing data. It concludes with a demonstration of performing these steps and resources for further information.
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by Paul Nelson, Search Technologies
1. O C T O B E R
1 1 -‐ 1 4 ,
2 0 1 6
•
B O S T O N ,
M A
2. O C T O B E R
1 1 -‐ 1 4 ,
2 0 1 6
•
B O S T O N ,
M A
Searching the Enterprise Data Lake with Solr - Watch us do it!
Paul Nelson – pnelson@searchtechnologies.com
Chief Architect, Search Technologies
4. 205+
Search
Consultants
Worldwide
San
Diego
San
Jose,
CR
Cincinna6
Manila,
PH
Washington
(HQ)
• Founded
2005
• Deep
search
experLse
• 900+
customers
worldwide
• Consistent
profitability
• Search
engines
&
Big
Data
• Vendor
independent
London,
UK
Frankfurt,
DE
Prague,
CZ
5. Agenda
• The
Enterprise
Data
Lake
(EDL)
• Why
Search
the
EDL?
• The
Process
• How
To:
Step
By
Step
• And
then
what?
6. In
The
Beginning
Applica6on
Computer
Users
Database
Dashboards
Reports
Search
&
Troubleshoo6ng
Alerts
7. This
Evolved
to
Data
Warehouses
Many
Computer
Users
Dozens
of
Applica6ons
Dozens
of
Applica6ons
Dozens
of
Applica6ons
Dozens
of
Applica6ons
Dozens
of
Applica6ons
Dozens
of
Applica6ons
Extract
Transform
Load
Enterprise
Data
Warehouse
Dashboards
Reports
Search
&
Troubleshoo6ng
Alerts
8. And
Now
the
Enterprise
Data
Lake
Many,
many,
many
Computer
Users
Enterprise
Data
Lake
Dashboards
Reports
Search
&
Troubleshoo6ng
Alerts
Analyze
Hundreds
of
Applica6ons
Raw
Data
And
Processed
Data
9. What’s
new
about
the
Data
Lake?
• Ingest
RAW
DATA
• Keep
it
FOREVER
• Make
it
ALL
AVAILABLE
• Analyze
it
ONLY
WHEN
NEEDED
• Do
it
at
MASSIVE
SCALE
10. Why
the
Data
Lake?
• You
never
know
what’s
important
up
front
– New
data
mining
techniques
invented
daily
– Therefore,
keep
everything
• There
is
too
much
data
variety
– Therefore,
only
process
what
you
need
• Save
money
by
not
ETL’ing
useless
stuff
• There
are
many
different
use
cases
– Shared
re-‐use
of
data
by
anyone
– Data
is
power!
Power
to
the
people!
11. But
Now
There’s
a
Problem:
• 10’s
of
thousands
of
databases
• Billions
of
records
How
to
find
the
data
you
need?
13. “People
today
think
search
and
big
data
are
separate
but
in
two
or
three
years,
everyone
will
wonder
why
we
ever
thought
that.”
Doug
Cu?ng
Chief
Architect,
Cloudera
Creator
of
Lucene
&
Hadoop
14. The
Process
Ingest
1
Research
the
Data
2
Configure
Solr
3
Parse
&
Index
4
Search
&
Analyze
5
Produc6on
6
23. What’s
Next
• Explore
other
analy6c
interfaces
– Banana,
Zoom
Data
• Spark
– Streaming
Data
– Complex
Analy6cs
à
Store
results
in
Solr
à
More
analy6cs!
• Index
Many
More
Collec6ons
– Create
a
Process:
Data
research
à
Data
Model
Design
à
Implement
• Self-‐Service
Inges6on
– Document
processes
for
others
to
use
– Templates
for
inges6on
• Hire
Search
Technologies!