I am going to show how you can use Kibana 4 to create some cool visualizations. The visualizations will be done on top of open data from Norwegian Alcohol monolopoly, Vinmonopolet or Wine monopoly, focusing on the beer part of their catalogue.
The invention of bread and beer has been argued to be responsible for humanity's ability to develop technology and build civilization (wikipedia)
Agenda -
Short intro to elasticsearch, and vinmonopolet, how I got data into Kibana.
Demo use cases. Going to show how you can use Kibana to answer questions
Comperio search consultancy company.
2004 - Fast -> 2008 -> sharepoint ,Norch, FAST, elasticsearch, solr, Neo4j, machine learning...
What’s so fun about search engines
difference between search engine and database.
search engine has a human being as end user
database is technic
Creating good search solutions involves both deeply technical issues and human issues: What is a good search result?
How it all fits together -
Elastic is the company behind development of open source projects logstash, elasticsearch, kibana, ++
Elasticsearch is the main product
grew out of compass, 2004 with dev usability for Lucene. Lucene -1999 - (Who used google in 1999?) Demand for scalability led to elasticsearch 2010
Logstash - log processing tool - general input, output filter
Kibana 4 - latest gen of kibana, suppport for aggregations - d3.js, angular.js
All beverages containing alchohol content higher than 4,75% is sold by Vinmonopolet. (max 60% )
Regulated opening hours
High tax - taxed by alchol content
queues at 1500 saturday, etc.
preplay/afterplay culture
beer below 4,8% is sold in grocery stores. -20 -18
restaurants and pubs may have othre products not sold a t vinmonopolet. (so the list does not include all alcholho availble in Norway)
vinmonopolet product listing - look at all the nice metadat
color
freshness
bitterness
fullness depth
Elasticsearch is a search engine. period. no crawler , connector. put data into it with JSON REST AP
iconv - fix encoding of file
csv columns
drop first line
fix decimal
convert fields to float
output to elasticsearhc
template
simple search listing
Vagrant -elk box at github
Use of discovery tab
questions on next slide
Discover tab search and filter
Select fields
Sort by fields
save searches
URL?
sELECT Varetype: Øl . add Filter
Search for Stout
select Bitterhet - show field stats - Visualize
TF-IDF
How can lucene be so fast and effective looking up search results?
Documents are converted into an inverted index . terms and the frequency.
Lucene
Term dictionary. -
A dictionary containing all of the terms used in all of the indexed fields of all of the documents.
The dictionary also contains the number of documents which contain the term,
and pointers to the term's frequency and proximity data.
How can lucene be so fast and effective looking up search results?
Documents are converted into an inverted index . terms and the frequency.
Lucene
Term dictionary. -
A dictionary containing all of the terms used in all of the indexed fields of all of the documents.
The dictionary also contains the number of documents which contain the term,
and pointers to the term's frequency and proximity data.
How can we create a scoring algorithm?
we have a query and documents. what’s the best way to rank them.
Use term frequency: Count the number of occurences of each term, and add up.
Docs with lots of matching terms come up at no.1 (prefers long documents)
#1 has “stout” 7 times
#2 has “imperial” 2 times, “Stout” 4 times
#3 has “Russian 3 times, “Imperial” once, Stout once
https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html
term frequency (tf ) = count of term in document
document frequency (df) = count of term in all docs
inverse doc frequency (idf) = log(count of docs/df)
tf/idf = tf * idf
The illustration is simplified!
Top 20 bitterness
X- AXIS - BITTERNESS NUMBERS
Y AXIS - COUNT OF PRODUCTS WITH THIS BITTERNESS
QUERY - STOUT
add sig terms???
add number of countries
add Varetype
Add alcohol range
Top 8 unusual terms in lukt_smak
brødbakst
syrlig
balsamico
gjær
rosin
anslag
kirsebær
eik