Enhanced site search with cognitive APIs - Glynn Bird

Enhanced Site Search with
Cognitive APIs
Glynn Bird
Developer Advocate @ IBM Cloud Data Services
glynn.bird@uk.ibm.com
@glynn_bird

●What is search?
●Simple Search
●Adding some "cognitive"
Agenda

@glynn_bird
Elasticsearch
• Stores JSON Documents
• Search based on Apache Lucene
• Provides HTTP search API
• Pay per-GB on compose.com

@glynn_bird
Cloudant
• Stores JSON Documents
• Based on Apache CouchDB
• Search based on Apache Lucene
• Provides HTTP search API
• PAYG/Dedicated-as-a-service or Local

@glynn_bird
Get started - Simple Search Service
https://developer.ibm.com/clouddataservices/simple-search-service/

@glynn_bird
Game of Thrones search demo
http://sss-got-theme.mybluemix.net/

@glynn_bird
Structured vs Unstructured Data
Structured Data
● known schema
● predictable
● indexable
Unstructured Data
● unknown schema
● difficult to parse and
index
DB

@glynn_bird
Example data
{
"url": "http://www.bbc.co.uk/news/business-37742991",
"title": "AT&T announces it will buy Time Warner",
"description": "US telecoms giant AT&T announces it will buy entertainment group Time Warner",
"date": "2016-10-22T23:44:03.000Z",
"image_url": "http://c.files.bbci.co.uk/_91950162_breaking_image_large-3-1.png"
}

@glynn_bird
Structured data
{
"date": "2016-10-22T23:44:03.000Z",
}

@glynn_bird
Unstructured data
{
"date": "2016-10-22T23:44:03.000Z",
}

@glynn_bird
Let's build news website
● take RSS feeds
● put the data into a database
● index it
○ newest articles first
○ keyword search

@glynn_bird
Node-RED
● visual programming tool
● https://nodered.org/

@glynn_bird
Indexing data in Cloudant - MapReduce
function(doc) {
emit(doc.date, doc.title);
}
● Build index sort articles by date
● Create custom 'map' function

@glynn_bird
Indexing data in Cloudant - MapReduce

@glynn_bird
Indexing data in Cloudant - Search
function(doc) {
index('default', doc.title);
index('default', doc.description);
}
● Build full-text index
● Create custom 'map' function

@glynn_bird
Cloudant Search
● Punctuation removal
● Word splitting/stemming
● Stop-word removal
● Full-text indexing using Apache Lucene

@glynn_bird
But can we do better?

@glynn_bird
Watson Alchemy Language API
● Feed it text or a URL
● Returns:
○ entities - people/places/companies
○ taxonomy

@glynn_bird
Watson Alchemy Language API
Entities
Country: US
Company: AT&T
Company: Time Warner
JobTitle: Telecoms
Taxonomy
/art and entertainment
/technology and computing/internet technology/isps
/business and industrial/company/merger and acquisition

@glynn_bird
How can we use Alchemy in our workflow?

@glynn_bird
More indexing
● Index the Alchemy entities
○ e.g. Country:US
● Index the Alchemy taxonomy
○ e.g. ["Finance","Investing"]

@glynn_bird
Demo
https://glynnbird.github.io/alchemy-news/

@glynn_bird
It's not just language...

Summary
● Node-RED
● Cloudant
● Alchemy Language API
Bluemix: https://www.ibm.com/cloud-computing/bluemix/
Simple Search Service: https://developer.ibm.com/clouddataservices/simple-search-service/
News Demo: https://glynnbird.github.io/alchemy-news/

Developer Advocate
glynn.bird@uk.ibm.com
Thanks
Glynn Bird
Blog: www.glynnbird.com
Twitter: @glynn_bird

Enhanced site search with cognitive APIs - Glynn Bird

More Related Content

What's hot

Viewers also liked

Similar to Enhanced site search with cognitive APIs - Glynn Bird

More from Data Driven Innovation

Recently uploaded

Enhanced site search with cognitive APIs - Glynn Bird

Editor's Notes