Your SlideShare is downloading. ×
How elasticsearch powers the Guardian's newsroom
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

How elasticsearch powers the Guardian's newsroom

5,437
views

Published on

http://qconlondon.com/london-2014/presentation/How%20Elasticsearch%20Powers%20the%20Guardian's%20Newsroom: …

http://qconlondon.com/london-2014/presentation/How%20Elasticsearch%20Powers%20the%20Guardian's%20Newsroom:

theguardian.com is one of the world's most popular news websites, visited by over 80 million unique browsers every month. Yet in the past, their journalists and editors found it difficult to get meaningful, timely data on what people were reading.

In response to these issues, Graham and colleagues at the Guardian built "ophan", an in-house real-time analytics system based on Elasticsearch. By working closely with journalists and editors, they've focused on what they can action to provide a better experience for the Guardian's existing readers and enable more people discover their unique content.

In this talk, Graham will dive into the details of ophan - obstacles faced by the newsroom that prompted them to build the system, how it works for alerting and how the tool has made the Guardian's readers - and staffers - lives better. While Graham explores this real world use case, Shay will cover the technical underpinnings of ophan with a deep dive into the Elasticsearch features and functionality that power the ophan system.

Attendees will leave with a solid understanding of Elasticsearch's features and architecture, all gained through the lens of a real-world and hyperlocal use case.

Published in: Technology

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,437
On Slideshare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
15
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Produce a newspaper since 1821 (193 years ago) - today publish the guardian mon-sat and observer sun.
    Regularly break new ground with quality of our investigative journalism
  • Web site since 1995 as guardian unlimited
    now get 4.5m unique browsers a day (2nd most popular uk newspaper... after the mail)
    in addition a whole set of other platforms - iphone, driod, kindle, ipad ...
  • demo!
    graph - top20 - zoom in on article - adhoc filtering
    live filtering though around 25 million events a day - 150 gb data a week
    browser performance (RUM)
    live google search terms
    now over 400 active monthly users inside the guardian - CEO of GMG uses it to demonstrate our “open” strategy to other companies
  • we had the logs. i wanted to process. how?
  • my trade secret
  • Transcript

    • 1. How Elasticsearch powers the Guardian’s newsroom graham tackley ■ @tackers director of architecture guardian news and media shay banon ■ @kimchy creator, co-founder and cto elasticsearch
    • 2. “created in 1936 ... to secure the financial and editorial independence of the Guardian in perpetuity”
    • 3. our in-house real-time traffic tool
    • 4. my desktop workstation production apaches something htmly ?
    • 5. ssh $SERVER "nice tail -f /apache2/logs/guardian-access_log"
    • 6. my desktop workstation 2 x production apaches publisher ssh “tail” zeromq x SEO dashboard
    • 7. my desktop workstationx
    • 8. Javascript in browser SNS SQS hidden pixel Dashboard Tracker
    • 9. Javascript in browser Tracker SNS SQS hidden pixel SQS Dashboard Serf elasticsearch Dashboard
    • 10. 12 * m3.xlarge in an autoscaling group (with manual scaling) instance store (SSD) https://github.com/guardian/status-app
    • 11. { "dt": "2014-03-03T02:01:48.026Z", "url": "http://www.theguardian.com/film/2014/mar/03/oscars-2014-winners-list", "queryString": "", "host": "www.theguardian.com", "path": "/film/2014/mar/03/oscars-2014-winners-list", "section": "film", "platform": "r2", "userAgent": { "type": "Browser", "family": "Safari 5.1.9", "os": "OS X 10.6.8", "device": "Personal computer" }, "documentReferrer": "http://www.theguardian.com/world", "browser": { "id": "gA6RUFLhWNQvWdt0rW4r78Fg", "isNew": false }, "referringHost": "theguardian.com", "referringPath": "/world", "isContent": true, "contentPublicationDate": "2014-03-03", "countryCode": "US", "countryName": "United States", "location": { "lonlat": [-73.4409, 41.2094] } } ⇠filter ⇠filter ⇠count per minute
    • 12. { "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "term" : { "path" : "/film/2014/mar/03/oscars-2014-winners-list" } } } }, …
    • 13. … "facets": { "Reddit": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "term": { "referringHost": "reddit.com" } } }, "Facebook": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "term": { "referringHost": "facebook.com" } } }, "Google": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "or": { "filters": [ { "prefix": { "referringHost": "www.google." } }, { "prefix": { "referringHost": "news.google." } } ] } } } } }
    • 14. /graph/breakdown?section=commentisfree
    • 15. ?section=commentisfree ophan.StandardFilters ophan.StandardFiltersToElasticsearch org.elasticsearch.index. query.FilterBuilder
    • 16. { "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "term" : { "path" : "/film/2014/mar/03/oscars-2014-winners-list" } } } }, …
    • 17. "filter": { "and": { "filters": [ { "range": { "dt": { "from": "2014-03-03T00:00:00.000Z", "to": "2014-03-03T22:30:59.999Z", "include_lower": true, "include_upper": false } } }, { "not": { "filter": { "term": { "countryCode": "GNM" } } } }, { "not": { "filter": { "term": { "userAgent.type": "Robot" } } } }, { "filter": { "terms": { "section": [ "commentisfree" ] }} } ] } }
    • 18. thank you graham tackley ■ @tackers director of architecture guardian news and media shay banon ■ @kimchy creator, co-founder and cto elasticsearch

    ×