NoSQL and Big Data Analytics at NOSQL NOW! 2013
Upcoming SlideShare
Loading in...5
×
 

NoSQL and Big Data Analytics at NOSQL NOW! 2013

on

  • 4,020 views

This presentation by Tim Moreton at NoSQL NOW! 2013 looks at the history of doing analytics in NoSQL databases. We look at the relative strengthes of normalized and denormalized approaches, and look ...

This presentation by Tim Moreton at NoSQL NOW! 2013 looks at the history of doing analytics in NoSQL databases. We look at the relative strengthes of normalized and denormalized approaches, and look at how Twitter and Facebook have built custom denormalized systems over NoSQL to support real-time analytics. We look at the lambda architecture, and show how Acunu Analytics provides OLAP cubes over NOSQL, combining denormalization with expressive SQL-like queries.

You can see the full talk here:
http://www.slideshare.net/Dataversity/nosql-and-big-data-analytics

Statistics

Views

Total Views
4,020
Views on SlideShare
3,744
Embed Views
276

Actions

Likes
3
Downloads
75
Comments
0

5 Embeds 276

http://www.bigdatanosql.com 171
http://www.scoop.it 56
https://twitter.com 28
http://ec2-54-243-189-159.compute-1.amazonaws.com 11
http://eventifier.co 10

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NoSQL and Big Data Analytics at NOSQL NOW! 2013 NoSQL and Big Data Analytics at NOSQL NOW! 2013 Presentation Transcript

  • NOSQL and Big Data Analytics Tim Moreton Founder and CTO
  • In the beginning, NOSQL was about storage
  • Google Personalized Search, 2006 profiles Serve customised search results using user profiles (read only, low latency) Collect user queries, clickstream (write only, high throughput) user_id searches clicks BigTable MapReduce via GFS Out-of band batch analysis to produce user profiles
  • Discovery Analytics Unstructured Warehouses Data Mining ? Machine Learning Operational Intelligence Dashboards Real-time Decisions Alerting ! Complex, long-running Total lack of structure Low latency, fresh data Some structure to exploit When NOSQL, when Hadoop?
  • Normalization and its limits For each update: A few random writes For each query: Many random reads
  • Denormalization For each query: One sequential read For each update: Many writes, sequential IO
  • Building block: Distributed counters +1 +1 +1 +1 Total tweets @timmoreton 2013-08-12 By date By user 752 +1 +1 CASSANDRA HBASE RIAK UPDATE table SET col = col + 1 WHERE id = 2; curl -i http://host:8098/buckets/x/ counters/count2 -X POST -d "1" table.incrementColumnValue(row, cf, col, 1);
  • Twitter’s Rainbird Source:Twitter
  • Facebook’s Puma, ODS, Claspin Source: Facebook
  • "I believe firmly that ... you should "denormalize" only as a last resort. That is, you should back off from a fully normalized design only if all other strategies for improving performance have somehow failed to meet requirements." C J Date 2005
  • Denormalization and agility
  • ‘Lambda Architecture’ http://www.josemalvarez.es/web/wp-content/uploads/2013/03/toy-lambda-arch.png
  • Acunu Analytics count by day count by hour of day uniques by hashtag raw events 2 New events update cubes 1 Define aggregate cubes CREATE CUBE APPROX TOP(hashtag) WHERE browser, time GROUP BY time 3 Rich instant queries over cubes SELECT TOP(x) FROM t WHERE .. GROUP BY d1, d2, ... JOIN ... HAVING.. ORDER BY .. + 4 Drilldown to raw events5 Backfill new cubes using historic data
  • API event stream event roll-up cubes Ingest Processing dashboard queries programatic interface API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interface Cassandra stores raw events and aggregates Acunu Analytics manages cubes and maps inserts and SQL-like queries to Cassandra reads and writes API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interface PROCESSING AT INGEST JSON, CSV, log ingest via RESTful HTTP API, Flume, Storm, AMQP Storm, MQ HTTP Acunu Dashboards provides rich, real-time, embeddable visualizations SELECT AVG(r) FROM metrics GROUP BY host; AQL Alerting ! Cubes MILLISECOND QUERIES API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interface API for rich queries, threshold alerting Acunu Analytics
  • Conclusions NoSQL is a great fit for collecting or serving datasets with some structure at high scale, performance, availability Real-time Big Data apps can’t use unplanned rich queries Use atomic counters to pre-materialize quantitative results in real-time -- but think carefully about flexibility Do analytics out-of-band if timeliness is unimportant A lambda architecture combines real-time with richer processing, but adds complexity Acunu Analytics offers real-time OLAP-style queries
  • Thanks! @timmoreton @acunu