SlideShare a Scribd company logo
Charlie Hull - Managing Director
20th
October 2015
Enterprise Search Europe
charlie@flax.co.uk
www.flax.co.uk/blog
+44 (0) 8700 118334
Twitter: @FlaxSearch
The Future of Search
Fishing The Streams of Big Data
Building open source search applications since 2001
Independent, honest advice and analysis
Expert design & development, Apache Solr committers
Test-driven relevancy and performance tuning
Custom training & mentoring for your staff
Flexible support up to 24/7/365 with SLAs
Building open source search applications since 2001
Independent, honest advice and analysis
Expert design & development, Apache Solr committers
Test-driven relevancy and performance tuning
Custom training & mentoring for your staff
Flexible support up to 24/7/365 with SLAs
Come and join the open source search community (tonight?)
Fishing the streams...
10
@FlaxSearch
Where are we now?
Analytics based on search
Trends & predictions
Two new ideas
But what's this all got to do with search?
The future of search
@FlaxSearch
What sort of enterprise projects do we see at Flax?
– Migrations
– Greenfield
– Speculative
Where are we now?
@FlaxSearch
What sort of enterprise projects do we see at Flax?
– Migrations
– Greenfield
– Speculative
Unsurprisingly most are using open source
Where are we now?
@FlaxSearch
What sort of enterprise projects do we see at Flax?
– Migrations
– Greenfield
– Speculative
Unsurprisingly most are using open source
….unless they're Sharepoint
Where are we now?
@FlaxSearch
Two main stacks
– Apache Lucene/Solr
• Lucidworks (Fusion, SiLK)
– Elasticsearch
• Elastic (ELK)
What's happening with open source?
@FlaxSearch
Two main stacks
– Apache Lucene/Solr
• Lucidworks (Fusion, SiLK)
– Elasticsearch
• Elastic (ELK)
Lots of other players (less for Elasticsearch)
What's happening with open source?
@FlaxSearch
Two main stacks
– Apache Lucene/Solr
• Lucidworks (Fusion, SiLK)
– Elasticsearch
• Elastic (ELK)
Lots of other players (less for Elasticsearch)
Focus on:
– Log file analysis
– Stability & scalability
– Visualisation
What's happening with open source?
@FlaxSearch
Analytics based on search
Log files,
messages,
transactions
@FlaxSearch
Analytics based on search
Log files,
messages,
transactions
@FlaxSearch
Analytics based on search
Log files,
messages,
transactions
@FlaxSearch
Analytics based on search
Log files,
messages,
transactions
@FlaxSearch
Analytics based on search
Log files,
messages,
transactions
@FlaxSearch
Analytics & visualisations
11
@FlaxSearch
Analytics & visualisations
11
@FlaxSearch
Big Data
Internet of Things
Cloud
Semantic Search
Mobile / Wearables
Trends @FlaxSearch
Big Data
Internet of Things
Cloud
Semantic Search
Mobile / Wearables
Trends @FlaxSearch
Charlie's All Purpose Big Data Graph
Data
Time
@FlaxSearch
Charlie's All Purpose Big Data Graph
Data
Time
@FlaxSearch
“The IoT has the potential to connect 10X as many (28 billion)
“things” to the Internet by 2020, ranging from bracelets to cars.”
Goldman Sachs1
Scary scale
@FlaxSearch
“The IoT has the potential to connect 10X as many (28 billion)
“things” to the Internet by 2020, ranging from bracelets to cars.”
Goldman Sachs1
“IoT threatens to generate massive amounts of input data from
sources that are globally distributed. Transferring the entirety of
that data to a single location for processing will not be
technically and economically viable” Gartner2
Scary scale
@FlaxSearch
“The IoT has the potential to connect 10X as many (28 billion)
“things” to the Internet by 2020, ranging from bracelets to cars.”
Goldman Sachs1
“IoT threatens to generate massive amounts of input data from
sources that are globally distributed. Transferring the entirety of
that data to a single location for processing will not be
technically and economically viable” Gartner2
“Streaming analytics is anything but a sleepy, rearview mirror
analysis of data. No, it is about knowing and acting on what’s
happening in your business at this very moment — now.”
Forrester3
Scary scale
@FlaxSearch
“We’re going to go from information in computers, like your
supercomputer in your pocket, to us being surrounded by
information. … The next information age will be an era of
unbounded malignant complexity. Because there’s a lot of
stuff. We have to get ready for that.”13
Autodesk research fellow Mickey McManus
Even more scary scale
@FlaxSearch
It will become increasingly difficult, if not impossible, to store
data for later processing
Some predictions
@FlaxSearch
It will become increasingly difficult, if not impossible, to store
data for later processing
It must therefore be processed as it appears, in real-time
(Real Time Analytics)
Some predictions
@FlaxSearch
It will become increasingly difficult, if not impossible, to store
data for later processing
It must therefore be processed as it appears, in real-time
(Real Time Analytics)
Much of this data will be unstructured, noisy and badly
formatted
Some predictions
@FlaxSearch
It will become increasingly difficult, if not impossible, to store
data for later processing
It must therefore be processed as it appears, in real-time
(Real Time Analytics)
Much of this data will be unstructured, noisy and badly
formatted
There are many exciting applications - if this is done right
Some predictions
@FlaxSearch
Unified Log
New ideas
4
5
@FlaxSearch
Unified Log
New ideas
4
5
“The Log: What every software engineer
should know about real-time data's unifying
abstraction” – Jay Kreps, LinkedIn (now
Confluent)6
@FlaxSearch
Everything is a stream
More new ideas
“...think of a database as an always-growing collection of
immutable facts. You can query it at some point in time —
but that’s still old, imperative style thinking. A more fruitful
approach is to take the streams of facts as they come in, and
functionally process them in real-time”.- Martin Kleppman,
LinkedIn 7
@FlaxSearch
Everything is a stream
More new ideas
8
“...think of a database as an always-growing collection of
immutable facts. You can query it at some point in time —
but that’s still old, imperative style thinking. A more fruitful
approach is to take the streams of facts as they come in, and
functionally process them in real-time”.- Martin Kleppman,
LinkedIn 7
@FlaxSearch
Streaming data platforms
New technologies
@FlaxSearch
Streaming data platforms
New technologies
@FlaxSearch
Streaming data platforms
New technologies
@FlaxSearch
Streaming data platforms
New technologies
@FlaxSearch
Hadoop MapReduce is for batch processing, not stream
processing
...although Storm etc. can run on HDFS
No elephant in the room?
@FlaxSearch
How can you currently process data in a stream?
– SQL like
– Regular Expressions
– Machine Learning
So what's this got to do
with Search?
@FlaxSearch
How can you currently process data in a stream?
– SQL like
– Regular Expressions
– Machine Learning
Why not use full-text search?
– Easier to create queries
– Great with unstructured data
– Handles noisy data
So what's this got to do
with Search?
@FlaxSearch
How can you currently process data in a stream?
– SQL like
– Regular Expressions
– Machine Learning
Why not use full-text search?
– Easier to create queries
– Great with unstructured data
– Handles noisy data
But you can do search already!
– But only near real time
So what's this got to do
with Search?
@FlaxSearch
What's a stream?
1 2 3 4 5 6 7 8 9 10 11….. …..
append
– “..an append-only, totally ordered sequence of
records (also called events or messages).”
Martin Kleppman, LinkedIn 9
@FlaxSearch
How do we search it?
1 2 3 4 5 6 7 8 9 10 11 12….. …..
append
?
Result
@FlaxSearch
How do we search it?
1 2 3 4 5 6 7 8 9 10 11 12….. …..
append
?
Result
Oops!
@FlaxSearch
Search for Media Monitoring
– Many complex stored profiles (searches)
– High volume of news stories every day
Here's something we're
doing already
@FlaxSearch
Search for Media Monitoring
– Many complex stored profiles (searches)
– High volume of news stories every day
The solution:
– Build an index of the stored profiles
– Turn each news story into a query
– Search the stored profiles to find which might match
– Use these candidates to search the news story
Here's something we're
doing already
@FlaxSearch
Search for Media Monitoring
– Many complex stored profiles (searches)
– High volume of news stories every day
The solution:
– Build an index of the stored profiles
– Turn each news story into a query
– Search the stored profiles to find which might match
– Use these candidates to search the news story
We call it 'search turned upside down'
– But it's effectively searching a stream
Here's something we're
doing already
@FlaxSearch
Like this...
(((";!MOBILE PHONE*"; OR ";PHONE MAST*"; OR ";HANDSET*"; OR ";CELL* PHONE*"; OR ";3G"; OR ";GPRS"; OR ";G.P.R.S";
OR ";!GENERAL !RADIO PACKET SERVICE*"; OR ";GSM"; OR ";G.S.M"; OR ";!GLOBAL SYSTEM FOR !MOBILE COMM*"; OR
";HSDPA"; OR ";H.S.D.P.A"; OR ";HIGH SPEED DOWNLINK !PACKET ACCESS"; OR ";HSUPA"; OR ";H.S.U.P.A"; OR ";HIGH
SPEED !UPLINK !PACKET ACCESS"; OR ";UMTS"; OR ";U.M.T.S"; OR ";MVNO"; OR ";M.V.N.O"; OR ";SMS"; OR ";SHORT
MESSAGE !SERVICE*"; OR ";MMS"; OR ";!MULTIMEDIA MESSAGE !SERVICE*"; OR ";!MOBILES"; OR ";!CELLPHONE*"; OR ";!
TELECOM*"; OR ";!LANDLINE*"; OR ";!TELEPHONE*"; OR ";PHONE*"; OR ";!TELEKOM*"; OR ";TELCO*"; OR ";VODAFONE"; OR
";T-MOBILE"; OR ";TMOBILE"; OR ";!TELEFONICA"; OR ";BT"; OR ";!MOBILE USER*"; OR ";TEXT MESSAG*"; OR
";SMARTPHONE*"; OR ";!VIRGIN !MEDIA*"; OR ";CABLE & !WIRELESS"; OR ";CABLE AND !WIRELESS";) W/48
((";PROFIT*"; OR ";LOSS*"; OR ";BAN"; OR ";BANNED"; OR ";PREMIUM RATE*"; OR ";FINANC*"; OR ";!REFINANC*"; OR
";OFFICE OF FAIR TRADING"; OR ";MERGER*"; OR ";!ACQUISIT*"; OR ";ACQUIR*"; OR ";TAKEOVER*"; OR ";BUYOUT*"; OR
";BUY-OUT*"; OR ";NEW PRODUCT*"; OR ";INVEST*"; OR ";SHARES"; OR ";MARKET*"; OR ";ACCOUNT*"; OR ";MONEY"; OR
";CASH*"; OR ";SECURIT*"; OR ";!ENTERPRIS*"; OR ";!BUSINESS*"; OR ";PRICE*"; OR ";JOINT*"; OR ";NEW VENTURE*"; OR
";PRICING"; OR ";COST*"; OR ";CHAIRM?N"; OR ";APPOINT*"; OR ";!EXECUTIVE"; OR ";SALE*"; OR ";SELL*"; OR ";FULL
YEAR"; OR ";REGULAT*"; OR ";!DIRECTIVE*"; OR ";LAW"; OR ";LAWS"; OR ";!LEGISLAT*"; OR ";GREEN PAPER"; OR ";WHITE
PAPER*"; OR ";!MEDIAWATCH"; OR ";MORAL*"; OR ";ETHIC*"; OR ";ADVERT*"; OR ";AD"; OR ";ADS"; OR ";MARKETING"; OR
";!COMPLAIN*"; OR ";MIS-SOLD"; OR ";MIS-SELL*"; OR ";SPONSOR"; OR ";COSTCUT*"; OR ";COST CUT*"; OR ";CUT* COST*";
OR ";FIBRE OPTIC*"; OR ";TAX"; OR ";TAXES"; OR ";TAXED"; OR ";EXPAND*"; OR ";!EXPANSION"; OR ";EMPLOY*"; OR
";STAFF"; OR ";WORKER*"; OR ";SPOKESM?N"; OR ";DEBUT"; OR ";BRAND*"; OR ";DIRECTOR*";) OR ((";FAIR"; OR
";UNFAIR"; OR ";%UNSCRUPULOUS"; OR ";NOT FAIR"; OR ";UNJUST*"; OR ";!PENALISE*";) W/12 (";CHARG*"; OR ";TARIFF*";
OR ";PRICE PLAN*"; OR ";GLOBAL";)))) AND NOT (";EXPRESS OFFER"; OR ";TIMES OFFER"; OR ";READER OFFER"; OR
((";CALLS COST";) W/6 (";FROM A LANDLINE"; OR ";FROM LANDLINE*"; OR ";BT LANDLINE*";))))
…and that's an easy one!
@FlaxSearch
Turning search upside down..
Docs
Query
QueryStored
Queries 1.
Pre
1 million profiles
Some 250k long
Complex rules
Doc 1 million news
items a day
”
@FlaxSearch
Query
Subset
~200
Turning search upside down..
Docs
Query
QueryStored
Queries 1.
Pre
Query
Subset
Result
1 million profiles
Some 250k long
Complex rules
~200
2.
Search
Doc 1 million news
items a day
“this news item is of
interest to my client”
@FlaxSearch
Turning search upside down..
Docs
Query
QueryStored
Queries 1.
Pre
Query
Subset
Result
1 million profiles
Some 250k long
Complex rules
~200
2.
Search
Doc 1 million news
items a day
“this news item is of
interest to my client”
@FlaxSearch
Stream
Stream
Stream
Flax solution (Luwak) scales to 1m queries over 1m items/day
We need to be faster:
– Network monitoring – up to 1m items/second
– Restaurant reservations/reviews – 840m messages/day
– IoT
So we built a proof of concept...
Searching streaming data at scale
@FlaxSearch
Real-time scalable search for streams
1 2 3 4 5 6 7
…..
…..A B C D E F G
….. …..
Which queries match?
N N N Y N….. …..
Documents
Queries
Parse
In-
memory
1-doc
index
Index of
queries
Matches
https://github.com/romseygeek/samza-luwak
@FlaxSearch
Build queries using a standard search box, facets etc.
Set them to run on a stream of data - matches appear as
another stream
What could you do with this?
@FlaxSearch
Build queries using a standard search box, facets etc.
Set them to run on a stream of data - matches appear as
another stream
Use cases
– Monitor a network
– Watch social media
– Check IoT for faults, errors, trends
– Look for patterns in customer interactions
– Distributed web search
– Combine with analytics & visualisations
What could you do with this?
@FlaxSearch
Big Data....is about to get a lot bigger
In conclusion....
@FlaxSearch
Big Data....is about to get a lot bigger
We'll need to react to it in real time
In conclusion....
@FlaxSearch
Big Data....is about to get a lot bigger
We'll need to react to it in real time
Search can help!
In conclusion....
@FlaxSearch
Thankyou!
Any questions?
charlie@flax.co.uk
www.flax.co.uk/blog
+44 (0) 8700 118334
Twitter: @FlaxSearch
1.Internet of Things - HorizonWatch 2015 Trend Report, Bill Chamberlin, IBM
2.Gartner Says the Internet of Things Will Transform the Data Center
http://www.gartner.com/newsroom/id/2684915
3.Big Data Forrester Wave 2014
4.Unified Logging Layer: Turning Data into Action, Kiyoto Tamura, Fluentd
5.Unified Log Processing, Alexander Dean
6.The Log: What every software engineer should know about real-time data's unifying
abstraction, Jay Kreps, LinkedIn/Confluent
7.Turning the database inside-out with Apache Samza, Martin Kleppman
8.Designing Data-Intensive Applications, Martin Kleppmann
9.Realtime Full-text Search with Luwak and Samza, Martin Kleppmann
10.Copyright Richard Masoner under a Creative Commons license
11.
12. Demonstrations by Mark Harwood of Elastic ()
13. Trillions: Thriving in the Emerging Information Ecology (Wiley 2012), Mickey McManus
References

More Related Content

What's hot

A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
Greg Brown
 
Spark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySpark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the Ugly
Sarah Guido
 
Pragmatic REST APIs
Pragmatic REST APIsPragmatic REST APIs
Pragmatic REST APIs
amesar0
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
 
Which Freaking Database Should I Use?
Which Freaking Database Should I Use?Which Freaking Database Should I Use?
Which Freaking Database Should I Use?
Great Wide Open
 
Stardog Linked Data Catalog
Stardog Linked Data CatalogStardog Linked Data Catalog
Stardog Linked Data Catalog
kendallclark
 
Flink Community Update 2015 June
Flink Community Update 2015 JuneFlink Community Update 2015 June
Flink Community Update 2015 June
Márton Balassi
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Sem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraintsSem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraintsClark & Parsia LLC
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
markgrover
 
DMCA#21: reactive-programming
DMCA#21: reactive-programmingDMCA#21: reactive-programming
DMCA#21: reactive-programming
Olivier Destrebecq
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
TamikaTannis
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
guest432cd6
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
Sri Ambati
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
Philippe Mizrahi
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsDr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Work-Bench
 

What's hot (20)

A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
 
Spark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySpark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the Ugly
 
Pragmatic REST APIs
Pragmatic REST APIsPragmatic REST APIs
Pragmatic REST APIs
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Which Freaking Database Should I Use?
Which Freaking Database Should I Use?Which Freaking Database Should I Use?
Which Freaking Database Should I Use?
 
Stardog Linked Data Catalog
Stardog Linked Data CatalogStardog Linked Data Catalog
Stardog Linked Data Catalog
 
Flink Community Update 2015 June
Flink Community Update 2015 JuneFlink Community Update 2015 June
Flink Community Update 2015 June
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Sem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraintsSem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraints
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
DMCA#21: reactive-programming
DMCA#21: reactive-programmingDMCA#21: reactive-programming
DMCA#21: reactive-programming
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsDr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
 

Similar to Enterprise Search Europe 2015: Fishing the big data streams - the future of search

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
Splunk
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
Splunk
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbeta
Ahnku Toh
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
confluent
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
Eli White
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
Hortonworks
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
Spark Summit
 
Hadoop summit socialize_v1.0
Hadoop summit socialize_v1.0Hadoop summit socialize_v1.0
Hadoop summit socialize_v1.0Isaac Mosquera
 
Splunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop SummitSplunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop Summit
Isaac Mosquera
 
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the EdgeBig Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
DataWorks Summit
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
suresh sood
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
Globus
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Ian Foster
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
Amazon Web Services
 
Big Data Hadoop ,Business Analytics& Data Warehousing Online Training
Big Data Hadoop ,Business Analytics& Data Warehousing Online TrainingBig Data Hadoop ,Business Analytics& Data Warehousing Online Training
Big Data Hadoop ,Business Analytics& Data Warehousing Online Training
sarala vanga
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
Splunk
 
Big Data
Big DataBig Data
Big Data
Ameya Hattarke
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
Amazon Web Services
 

Similar to Enterprise Search Europe 2015: Fishing the big data streams - the future of search (20)

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbeta
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
 
Hadoop summit socialize_v1.0
Hadoop summit socialize_v1.0Hadoop summit socialize_v1.0
Hadoop summit socialize_v1.0
 
Splunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop SummitSplunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop Summit
 
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the EdgeBig Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Big Data Hadoop ,Business Analytics& Data Warehousing Online Training
Big Data Hadoop ,Business Analytics& Data Warehousing Online TrainingBig Data Hadoop ,Business Analytics& Data Warehousing Online Training
Big Data Hadoop ,Business Analytics& Data Warehousing Online Training
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Big Data
Big DataBig Data
Big Data
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
 

Recently uploaded

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 

Recently uploaded (20)

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 

Enterprise Search Europe 2015: Fishing the big data streams - the future of search

  • 1. Charlie Hull - Managing Director 20th October 2015 Enterprise Search Europe charlie@flax.co.uk www.flax.co.uk/blog +44 (0) 8700 118334 Twitter: @FlaxSearch The Future of Search Fishing The Streams of Big Data
  • 2. Building open source search applications since 2001 Independent, honest advice and analysis Expert design & development, Apache Solr committers Test-driven relevancy and performance tuning Custom training & mentoring for your staff Flexible support up to 24/7/365 with SLAs
  • 3. Building open source search applications since 2001 Independent, honest advice and analysis Expert design & development, Apache Solr committers Test-driven relevancy and performance tuning Custom training & mentoring for your staff Flexible support up to 24/7/365 with SLAs Come and join the open source search community (tonight?)
  • 4.
  • 6. Where are we now? Analytics based on search Trends & predictions Two new ideas But what's this all got to do with search? The future of search @FlaxSearch
  • 7. What sort of enterprise projects do we see at Flax? – Migrations – Greenfield – Speculative Where are we now? @FlaxSearch
  • 8. What sort of enterprise projects do we see at Flax? – Migrations – Greenfield – Speculative Unsurprisingly most are using open source Where are we now? @FlaxSearch
  • 9. What sort of enterprise projects do we see at Flax? – Migrations – Greenfield – Speculative Unsurprisingly most are using open source ….unless they're Sharepoint Where are we now? @FlaxSearch
  • 10. Two main stacks – Apache Lucene/Solr • Lucidworks (Fusion, SiLK) – Elasticsearch • Elastic (ELK) What's happening with open source? @FlaxSearch
  • 11. Two main stacks – Apache Lucene/Solr • Lucidworks (Fusion, SiLK) – Elasticsearch • Elastic (ELK) Lots of other players (less for Elasticsearch) What's happening with open source? @FlaxSearch
  • 12. Two main stacks – Apache Lucene/Solr • Lucidworks (Fusion, SiLK) – Elasticsearch • Elastic (ELK) Lots of other players (less for Elasticsearch) Focus on: – Log file analysis – Stability & scalability – Visualisation What's happening with open source? @FlaxSearch
  • 13. Analytics based on search Log files, messages, transactions @FlaxSearch
  • 14. Analytics based on search Log files, messages, transactions @FlaxSearch
  • 15. Analytics based on search Log files, messages, transactions @FlaxSearch
  • 16. Analytics based on search Log files, messages, transactions @FlaxSearch
  • 17. Analytics based on search Log files, messages, transactions @FlaxSearch
  • 20. Big Data Internet of Things Cloud Semantic Search Mobile / Wearables Trends @FlaxSearch
  • 21. Big Data Internet of Things Cloud Semantic Search Mobile / Wearables Trends @FlaxSearch
  • 22. Charlie's All Purpose Big Data Graph Data Time @FlaxSearch
  • 23. Charlie's All Purpose Big Data Graph Data Time @FlaxSearch
  • 24. “The IoT has the potential to connect 10X as many (28 billion) “things” to the Internet by 2020, ranging from bracelets to cars.” Goldman Sachs1 Scary scale @FlaxSearch
  • 25. “The IoT has the potential to connect 10X as many (28 billion) “things” to the Internet by 2020, ranging from bracelets to cars.” Goldman Sachs1 “IoT threatens to generate massive amounts of input data from sources that are globally distributed. Transferring the entirety of that data to a single location for processing will not be technically and economically viable” Gartner2 Scary scale @FlaxSearch
  • 26. “The IoT has the potential to connect 10X as many (28 billion) “things” to the Internet by 2020, ranging from bracelets to cars.” Goldman Sachs1 “IoT threatens to generate massive amounts of input data from sources that are globally distributed. Transferring the entirety of that data to a single location for processing will not be technically and economically viable” Gartner2 “Streaming analytics is anything but a sleepy, rearview mirror analysis of data. No, it is about knowing and acting on what’s happening in your business at this very moment — now.” Forrester3 Scary scale @FlaxSearch
  • 27. “We’re going to go from information in computers, like your supercomputer in your pocket, to us being surrounded by information. … The next information age will be an era of unbounded malignant complexity. Because there’s a lot of stuff. We have to get ready for that.”13 Autodesk research fellow Mickey McManus Even more scary scale @FlaxSearch
  • 28. It will become increasingly difficult, if not impossible, to store data for later processing Some predictions @FlaxSearch
  • 29. It will become increasingly difficult, if not impossible, to store data for later processing It must therefore be processed as it appears, in real-time (Real Time Analytics) Some predictions @FlaxSearch
  • 30. It will become increasingly difficult, if not impossible, to store data for later processing It must therefore be processed as it appears, in real-time (Real Time Analytics) Much of this data will be unstructured, noisy and badly formatted Some predictions @FlaxSearch
  • 31. It will become increasingly difficult, if not impossible, to store data for later processing It must therefore be processed as it appears, in real-time (Real Time Analytics) Much of this data will be unstructured, noisy and badly formatted There are many exciting applications - if this is done right Some predictions @FlaxSearch
  • 33. Unified Log New ideas 4 5 “The Log: What every software engineer should know about real-time data's unifying abstraction” – Jay Kreps, LinkedIn (now Confluent)6 @FlaxSearch
  • 34. Everything is a stream More new ideas “...think of a database as an always-growing collection of immutable facts. You can query it at some point in time — but that’s still old, imperative style thinking. A more fruitful approach is to take the streams of facts as they come in, and functionally process them in real-time”.- Martin Kleppman, LinkedIn 7 @FlaxSearch
  • 35. Everything is a stream More new ideas 8 “...think of a database as an always-growing collection of immutable facts. You can query it at some point in time — but that’s still old, imperative style thinking. A more fruitful approach is to take the streams of facts as they come in, and functionally process them in real-time”.- Martin Kleppman, LinkedIn 7 @FlaxSearch
  • 36. Streaming data platforms New technologies @FlaxSearch
  • 37. Streaming data platforms New technologies @FlaxSearch
  • 38. Streaming data platforms New technologies @FlaxSearch
  • 39. Streaming data platforms New technologies @FlaxSearch
  • 40. Hadoop MapReduce is for batch processing, not stream processing ...although Storm etc. can run on HDFS No elephant in the room? @FlaxSearch
  • 41. How can you currently process data in a stream? – SQL like – Regular Expressions – Machine Learning So what's this got to do with Search? @FlaxSearch
  • 42. How can you currently process data in a stream? – SQL like – Regular Expressions – Machine Learning Why not use full-text search? – Easier to create queries – Great with unstructured data – Handles noisy data So what's this got to do with Search? @FlaxSearch
  • 43. How can you currently process data in a stream? – SQL like – Regular Expressions – Machine Learning Why not use full-text search? – Easier to create queries – Great with unstructured data – Handles noisy data But you can do search already! – But only near real time So what's this got to do with Search? @FlaxSearch
  • 44. What's a stream? 1 2 3 4 5 6 7 8 9 10 11….. ….. append – “..an append-only, totally ordered sequence of records (also called events or messages).” Martin Kleppman, LinkedIn 9 @FlaxSearch
  • 45. How do we search it? 1 2 3 4 5 6 7 8 9 10 11 12….. ….. append ? Result @FlaxSearch
  • 46. How do we search it? 1 2 3 4 5 6 7 8 9 10 11 12….. ….. append ? Result Oops! @FlaxSearch
  • 47. Search for Media Monitoring – Many complex stored profiles (searches) – High volume of news stories every day Here's something we're doing already @FlaxSearch
  • 48. Search for Media Monitoring – Many complex stored profiles (searches) – High volume of news stories every day The solution: – Build an index of the stored profiles – Turn each news story into a query – Search the stored profiles to find which might match – Use these candidates to search the news story Here's something we're doing already @FlaxSearch
  • 49. Search for Media Monitoring – Many complex stored profiles (searches) – High volume of news stories every day The solution: – Build an index of the stored profiles – Turn each news story into a query – Search the stored profiles to find which might match – Use these candidates to search the news story We call it 'search turned upside down' – But it's effectively searching a stream Here's something we're doing already @FlaxSearch
  • 50. Like this... (((";!MOBILE PHONE*"; OR ";PHONE MAST*"; OR ";HANDSET*"; OR ";CELL* PHONE*"; OR ";3G"; OR ";GPRS"; OR ";G.P.R.S"; OR ";!GENERAL !RADIO PACKET SERVICE*"; OR ";GSM"; OR ";G.S.M"; OR ";!GLOBAL SYSTEM FOR !MOBILE COMM*"; OR ";HSDPA"; OR ";H.S.D.P.A"; OR ";HIGH SPEED DOWNLINK !PACKET ACCESS"; OR ";HSUPA"; OR ";H.S.U.P.A"; OR ";HIGH SPEED !UPLINK !PACKET ACCESS"; OR ";UMTS"; OR ";U.M.T.S"; OR ";MVNO"; OR ";M.V.N.O"; OR ";SMS"; OR ";SHORT MESSAGE !SERVICE*"; OR ";MMS"; OR ";!MULTIMEDIA MESSAGE !SERVICE*"; OR ";!MOBILES"; OR ";!CELLPHONE*"; OR ";! TELECOM*"; OR ";!LANDLINE*"; OR ";!TELEPHONE*"; OR ";PHONE*"; OR ";!TELEKOM*"; OR ";TELCO*"; OR ";VODAFONE"; OR ";T-MOBILE"; OR ";TMOBILE"; OR ";!TELEFONICA"; OR ";BT"; OR ";!MOBILE USER*"; OR ";TEXT MESSAG*"; OR ";SMARTPHONE*"; OR ";!VIRGIN !MEDIA*"; OR ";CABLE & !WIRELESS"; OR ";CABLE AND !WIRELESS";) W/48 ((";PROFIT*"; OR ";LOSS*"; OR ";BAN"; OR ";BANNED"; OR ";PREMIUM RATE*"; OR ";FINANC*"; OR ";!REFINANC*"; OR ";OFFICE OF FAIR TRADING"; OR ";MERGER*"; OR ";!ACQUISIT*"; OR ";ACQUIR*"; OR ";TAKEOVER*"; OR ";BUYOUT*"; OR ";BUY-OUT*"; OR ";NEW PRODUCT*"; OR ";INVEST*"; OR ";SHARES"; OR ";MARKET*"; OR ";ACCOUNT*"; OR ";MONEY"; OR ";CASH*"; OR ";SECURIT*"; OR ";!ENTERPRIS*"; OR ";!BUSINESS*"; OR ";PRICE*"; OR ";JOINT*"; OR ";NEW VENTURE*"; OR ";PRICING"; OR ";COST*"; OR ";CHAIRM?N"; OR ";APPOINT*"; OR ";!EXECUTIVE"; OR ";SALE*"; OR ";SELL*"; OR ";FULL YEAR"; OR ";REGULAT*"; OR ";!DIRECTIVE*"; OR ";LAW"; OR ";LAWS"; OR ";!LEGISLAT*"; OR ";GREEN PAPER"; OR ";WHITE PAPER*"; OR ";!MEDIAWATCH"; OR ";MORAL*"; OR ";ETHIC*"; OR ";ADVERT*"; OR ";AD"; OR ";ADS"; OR ";MARKETING"; OR ";!COMPLAIN*"; OR ";MIS-SOLD"; OR ";MIS-SELL*"; OR ";SPONSOR"; OR ";COSTCUT*"; OR ";COST CUT*"; OR ";CUT* COST*"; OR ";FIBRE OPTIC*"; OR ";TAX"; OR ";TAXES"; OR ";TAXED"; OR ";EXPAND*"; OR ";!EXPANSION"; OR ";EMPLOY*"; OR ";STAFF"; OR ";WORKER*"; OR ";SPOKESM?N"; OR ";DEBUT"; OR ";BRAND*"; OR ";DIRECTOR*";) OR ((";FAIR"; OR ";UNFAIR"; OR ";%UNSCRUPULOUS"; OR ";NOT FAIR"; OR ";UNJUST*"; OR ";!PENALISE*";) W/12 (";CHARG*"; OR ";TARIFF*"; OR ";PRICE PLAN*"; OR ";GLOBAL";)))) AND NOT (";EXPRESS OFFER"; OR ";TIMES OFFER"; OR ";READER OFFER"; OR ((";CALLS COST";) W/6 (";FROM A LANDLINE"; OR ";FROM LANDLINE*"; OR ";BT LANDLINE*";)))) …and that's an easy one! @FlaxSearch
  • 51. Turning search upside down.. Docs Query QueryStored Queries 1. Pre 1 million profiles Some 250k long Complex rules Doc 1 million news items a day ” @FlaxSearch Query Subset ~200
  • 52. Turning search upside down.. Docs Query QueryStored Queries 1. Pre Query Subset Result 1 million profiles Some 250k long Complex rules ~200 2. Search Doc 1 million news items a day “this news item is of interest to my client” @FlaxSearch
  • 53. Turning search upside down.. Docs Query QueryStored Queries 1. Pre Query Subset Result 1 million profiles Some 250k long Complex rules ~200 2. Search Doc 1 million news items a day “this news item is of interest to my client” @FlaxSearch Stream Stream Stream
  • 54. Flax solution (Luwak) scales to 1m queries over 1m items/day We need to be faster: – Network monitoring – up to 1m items/second – Restaurant reservations/reviews – 840m messages/day – IoT So we built a proof of concept... Searching streaming data at scale @FlaxSearch
  • 55. Real-time scalable search for streams 1 2 3 4 5 6 7 ….. …..A B C D E F G ….. ….. Which queries match? N N N Y N….. ….. Documents Queries Parse In- memory 1-doc index Index of queries Matches https://github.com/romseygeek/samza-luwak @FlaxSearch
  • 56. Build queries using a standard search box, facets etc. Set them to run on a stream of data - matches appear as another stream What could you do with this? @FlaxSearch
  • 57. Build queries using a standard search box, facets etc. Set them to run on a stream of data - matches appear as another stream Use cases – Monitor a network – Watch social media – Check IoT for faults, errors, trends – Look for patterns in customer interactions – Distributed web search – Combine with analytics & visualisations What could you do with this? @FlaxSearch
  • 58. Big Data....is about to get a lot bigger In conclusion.... @FlaxSearch
  • 59. Big Data....is about to get a lot bigger We'll need to react to it in real time In conclusion.... @FlaxSearch
  • 60. Big Data....is about to get a lot bigger We'll need to react to it in real time Search can help! In conclusion.... @FlaxSearch
  • 62. 1.Internet of Things - HorizonWatch 2015 Trend Report, Bill Chamberlin, IBM 2.Gartner Says the Internet of Things Will Transform the Data Center http://www.gartner.com/newsroom/id/2684915 3.Big Data Forrester Wave 2014 4.Unified Logging Layer: Turning Data into Action, Kiyoto Tamura, Fluentd 5.Unified Log Processing, Alexander Dean 6.The Log: What every software engineer should know about real-time data's unifying abstraction, Jay Kreps, LinkedIn/Confluent 7.Turning the database inside-out with Apache Samza, Martin Kleppman 8.Designing Data-Intensive Applications, Martin Kleppmann 9.Realtime Full-text Search with Luwak and Samza, Martin Kleppmann 10.Copyright Richard Masoner under a Creative Commons license 11. 12. Demonstrations by Mark Harwood of Elastic () 13. Trillions: Thriving in the Emerging Information Ecology (Wiley 2012), Mickey McManus References