Enhanced Site Search with
Cognitive APIs
Glynn Bird
Developer Advocate @ IBM Cloud Data Services
glynn.bird@uk.ibm.com
@glynn_bird
●What is search?
●Simple Search
●Adding some "cognitive"
Agenda
@glynn_bird
Primary search
@glynn_bird
In-site search
@glynn_bird
Elasticsearch
• Stores JSON Documents
• Search based on Apache Lucene
• Provides HTTP search API
• Pay per-GB on compose.com
@glynn_bird
Cloudant
• Stores JSON Documents
• Based on Apache CouchDB
• Search based on Apache Lucene
• Provides HTTP search API
• PAYG/Dedicated-as-a-service or Local
@glynn_bird
Get started - Simple Search Service
https://developer.ibm.com/clouddataservices/simple-search-service/
@glynn_bird
Game of Thrones search demo
http://sss-got-theme.mybluemix.net/
@glynn_bird
Structured vs Unstructured Data
Structured Data
● known schema
● predictable
● indexable
Unstructured Data
● unknown schema
● difficult to parse and
index
DB
@glynn_bird
Example data
{
"url": "http://www.bbc.co.uk/news/business-37742991",
"title": "AT&T announces it will buy Time Warner",
"description": "US telecoms giant AT&T announces it will buy entertainment group Time Warner",
"date": "2016-10-22T23:44:03.000Z",
"image_url": "http://c.files.bbci.co.uk/_91950162_breaking_image_large-3-1.png"
}
@glynn_bird
Structured data
{
"url": "http://www.bbc.co.uk/news/business-37742991",
"title": "AT&T announces it will buy Time Warner",
"description": "US telecoms giant AT&T announces it will buy entertainment group Time Warner",
"date": "2016-10-22T23:44:03.000Z",
"image_url": "http://c.files.bbci.co.uk/_91950162_breaking_image_large-3-1.png"
}
@glynn_bird
Unstructured data
{
"url": "http://www.bbc.co.uk/news/business-37742991",
"title": "AT&T announces it will buy Time Warner",
"description": "US telecoms giant AT&T announces it will buy entertainment group Time Warner",
"date": "2016-10-22T23:44:03.000Z",
"image_url": "http://c.files.bbci.co.uk/_91950162_breaking_image_large-3-1.png"
}
@glynn_bird
Let's build news website
● take RSS feeds
● put the data into a database
● index it
○ newest articles first
○ keyword search
@glynn_bird
Node-RED
● visual programming tool
● https://nodered.org/
@glynn_bird
Indexing data in Cloudant - MapReduce
function(doc) {
emit(doc.date, doc.title);
}
● Build index sort articles by date
● Create custom 'map' function
@glynn_bird
Indexing data in Cloudant - MapReduce
@glynn_bird
Front end
@glynn_bird
Indexing data in Cloudant - Search
function(doc) {
index('default', doc.title);
index('default', doc.description);
}
● Build full-text index
● Create custom 'map' function
@glynn_bird
Cloudant Search
● Punctuation removal
● Word splitting/stemming
● Stop-word removal
● Full-text indexing using Apache Lucene
@glynn_bird
Front end
@glynn_bird
Front end
@glynn_bird
Summary so far...
@glynn_bird
But can we do better?
@glynn_bird
Watson Alchemy Language API
● Feed it text or a URL
● Returns:
○ entities - people/places/companies
○ taxonomy
@glynn_bird
Watson Alchemy Language API
Entities
Country: US
Company: AT&T
Company: Time Warner
JobTitle: Telecoms
Taxonomy
/art and entertainment
/technology and computing/internet technology/isps
/business and industrial/company/merger and acquisition
@glynn_bird
How can we use Alchemy in our workflow?
@glynn_bird
How can we use Alchemy in our workflow?
@glynn_bird
More indexing
● Index the Alchemy entities
○ e.g. Country:US
● Index the Alchemy taxonomy
○ e.g. ["Finance","Investing"]
@glynn_bird
Front end
@glynn_bird
@glynn_bird
@glynn_bird
@glynn_bird
Demo
https://glynnbird.github.io/alchemy-news/
@glynn_bird
It's not just language...
@glynn_bird
Watson saw….
@glynn_bird
Just one more
@glynn_bird
Watson saw...
@glynn_bird
@glynn_bird
@glynn_bird
Summary
● Node-RED
● Cloudant
● Alchemy Language API
Bluemix: https://www.ibm.com/cloud-computing/bluemix/
Simple Search Service: https://developer.ibm.com/clouddataservices/simple-search-service/
News Demo: https://glynnbird.github.io/alchemy-news/
Developer Advocate
glynn.bird@uk.ibm.com
Thanks
Glynn Bird
Blog: www.glynnbird.com
Twitter: @glynn_bird

Enhanced site search with cognitive APIs - Glynn Bird

Editor's Notes

  • #4 3
  • #5 4
  • #6 Elasticsearch is built to search. It is a distributed data store, not intended to be your primary database but as a indexed copy of your primary database for search purposes. It operates on a cluster for scalability and failover. Once your data is uploaded (as JSON documents) you can use the HTTP API to get fast reliable searching in your application.
  • #7 Cloudant is a primary data store built to store your data in multiple copies in a highly-available cluster. It presents an HTTP API and you can create Cloudant Search indexes which provide free-text search including faceting.
  • #8 Cloudant is a primary data store built to store your data in multiple copies in a highly-available cluster. It presents an HTTP API and you can create Cloudant Search indexes which provide free-text search including faceting.
  • #10 9
  • #11 10
  • #12 11
  • #13 12
  • #14 13
  • #15 14
  • #16 15
  • #17 16
  • #18 17
  • #19 18
  • #20 19
  • #21 20
  • #22 21
  • #23 22
  • #24 23
  • #25 24
  • #26 25
  • #27 26
  • #28 27
  • #29 28
  • #30 29
  • #35 34
  • #36 35
  • #37 36
  • #38 37
  • #42 All available in Bluemix