0
Pictures at an Exhibition                                    Ruby, Rails, NoSQL and Big Data                              ...
Agenda   The Goal: Exploring Big Data with NoSQL and Ruby on Rails   Just Two Solutions – Here’s How We Get There   •     ...
So How Did We Get to Big Data Anyway?  Source: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg   Source...
Why is Everyone Diving into Big Data?       There Are Big Data Breakthroughs Everywhere…                                  ...
Exploring Big Data           Big Data frequently provides solutions to a common set of problems                     Source...
Exploring Big Data   The variety of Big Data wins in the press fall into just two solution patterns    • Foresight        ...
Exploring Big Data        In this light, let’s take a look at the “10 Hadoop-able Problems” of Big Data                   ...
Exploring Big Data       These two solution types apply generally to the Hadoop-able problems                         Summ...
The Big Data Platform Provides with Rich Analytics Tools                             Key Big Data Analytics Solution Patte...
Exploring Big Data                    With Just Two Standard Solution Models We Can                             Solve Most...
Agenda   The Goal: Exploring Big Data   Just Two Solutions – Here’s How We Get There   •      Key-Value Data Stores       ...
The Core Development Platform     •      Clean install of 12.04 and all latest            updates     •      sudo apt-get ...
Agenda   The Goal: Exploring Big Data   Just Two Solutions – Here’s How We Get There   •      Key-Value Data Stores       ...
Redis                                                                                                             Source: ...
Riak                                                                                Source: http://kkovacs.eu/cassandra-vs...
Agenda   The Goal: Exploring Big Data   Just Two Solutions – Here’s How We Get There   •      Key-Value Data Stores       ...
MongoDB                                                                                                                 So...
Cassandra                                                                                                         Source: ...
Agenda   Exploring Big Data   Just Two Solutions – Here’s How We Get There   •      Key-Value Data Stores           –    R...
Neo4J                                                                            Source: http://kkovacs.eu/cassandra-vs-mo...
Agenda   Exploring Big Data   Just Two Solutions – Here’s How We Get There   •      Key-Value Data Stores           –    R...
MapReduce via Hadoop, Thrift and AWS    •      Example:                                                                   ...
MapReduce via Riak / MongoDB  •     Example:         –     http://www.control-alt-del.org/2011/09/14/fun-with-bloom-filter...
Elastic MapReduce    •      Example:            –    http://www.commoncrawl.org/mapreduce-for-the-masses/    •      Backin...
Summary                    This Is Only The Beginning. With A                Standard Platform We’ll See Richer Big Data  ...
Contacts      •     John Repko:               john.repko@pikasoft.com                    http://pikasoft.s3.amazonaws.com/...
Upcoming SlideShare
Loading in...5
×

Ruby, rails, no sql and big data

1,172

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,172
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Ruby, rails, no sql and big data"

  1. 1. Pictures at an Exhibition Ruby, Rails, NoSQL and Big Data John RepkoJohn Repko -- Pikasoft LLC
  2. 2. Agenda The Goal: Exploring Big Data with NoSQL and Ruby on Rails Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak / MongoDB – Through Elastic MapreduceJohn Repko -- Pikasoft LLC 2
  3. 3. So How Did We Get to Big Data Anyway? Source: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg Source: http://www.startribune.com/sports/164830346.html Big Data Is Not Just About “Big” Data … It’s About FAST Data! (http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html)John Repko -- Pikasoft LLC 3
  4. 4. Why is Everyone Diving into Big Data? There Are Big Data Breakthroughs Everywhere… Google Wins Progressive’s the Search Instant Market “Overnight” rate quotes Source: https://newshour.s3.amazonaws.com/photos/2011/02/16/ kayjay_1_blog_main_horizontal.jpg Progressive creates an Massively parallel insurance quote for web searches with “Watson” Wins on Jeopardy results back in a tenth every car and truck in the US – every night Beat the best Jeopardy players of all time of a secondJohn Repko -- Pikasoft LLC 4
  5. 5. Exploring Big Data Big Data frequently provides solutions to a common set of problems Source: http://www.slideshare.net/cloudera/20100806-cloudera-10-hadoopable-problems-webinar-4931616 These appear to be “10 Problems” but are really only “2 Problems”John Repko -- Pikasoft LLC 5
  6. 6. Exploring Big Data The variety of Big Data wins in the press fall into just two solution patterns • Foresight – We are presented a pattern – What has the outcome been when we’ve seen similar patterns in the past? • Hindsight – We are presented an outcome -- What pattern of events anticipated the outcome in the past? You Don’t Need Dozens Of Solution Approaches For Big Data – Just TwoJohn Repko -- Pikasoft LLC 6
  7. 7. Exploring Big Data In this light, let’s take a look at the “10 Hadoop-able Problems” of Big Data Summary – 10 Common Hadoop-able Problems* 1. Modeling True Risk • What past patterns led to success or default? 1. Customer Churn Analysis • What do customer churn patterns predict about our products and markets? 1. Recommendation Engine • We have search terms – what have the results been from similar searches in the past? 1. Ad Targeting • We have profile information – what offers have led to sales for similar profiles in the past? 1. PoS Transaction Analysis • We have your purchase history – what deals might we offer in the future? Foresight HindsightJohn Repko -- Pikasoft LLC 7
  8. 8. Exploring Big Data These two solution types apply generally to the Hadoop-able problems Summary – 10 Common Hadoop-able Problems 6. Analyzing Data Logs to Forecast Events • We have your logs – what pattern of events have anticipated failures before? 6. Threat Analysis • We have a specific event – what results have we seen from similar threats in the past? 6. Trade Surveillance • Does this parcel raise any alarms, based on our history of past parcel-tracking? 6. Search Quality • We have a set of search terms – what have similar searches succeeded in finding in the past? 6. Data “Sandbox” • We have your data, possibly unstructured data. What patterns in that data might we bring to your attention now? Foresight HindsightJohn Repko -- Pikasoft LLC 8
  9. 9. The Big Data Platform Provides with Rich Analytics Tools Key Big Data Analytics Solution Patterns 1. Predictive Modeling 5. Outlier Analysis 2. Data Visualization 6. AB Testing 7. Markov Chains 3. Cluster Partitioning 8. Bloom Filters 4. Collaborative FilteringJohn Repko -- Pikasoft LLC 9
  10. 10. Exploring Big Data With Just Two Standard Solution Models We Can Solve Most Big Data Problems The Key Is To Shape Big Data Into A Standard Platform Onto Which We Can Apply These Analytics Tools… “It is not the technology that creates a competitive edge, but the management process that exploits technology." ~ Shaping the Future- Peter Keen (1991)John Repko -- Pikasoft LLC 10
  11. 11. Agenda The Goal: Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak / MongoDB – Through Elastic MapreduceJohn Repko -- Pikasoft LLC 11
  12. 12. The Core Development Platform • Clean install of 12.04 and all latest updates • sudo apt-get update • sudo apt-get upgrade Core Platform: Ubuntu 12.04 + AWS • sudo apt-get dist-upgrade • sudo apt-get install build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison subversion • sudo apt-get install libcurl3 libcurl3-gnutls libcurl4-openssl-dev • bash -s stable < <(curl -shttps ://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer ) • source ~/.bashrc • gem update --system (Latest version currently installed) • rvm ruby-1.9.2-p290@rails31 --create --default • sudo apt-get install nodejs • gem install rake • gem install rails -v=3.1.3John Repko -- Pikasoft LLC 12
  13. 13. Agenda The Goal: Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak – Through Elastic MapreduceJohn Repko -- Pikasoft LLC 13
  14. 14. Redis Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis • Example: – http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key -value-example-for-the-holidays.html • Backing Articles: – http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/ • Code: – http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key-value-example-for-the-holidays.html The good news is, weve already got our base image, and adding a new Redis data store and example app to it only took about an hour. As before, you can play with the URL-shortener at Redis URL Shortener, and you can download and play with the code for the application at:Redis URL Shortener Source Code. Play with this online at: http://jkr-blog.dyndns.org:3001/mini_urlsJohn Repko -- Pikasoft LLC 14
  15. 15. Riak Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis • Example: – http://www.pikasoft.com/journal/2012/1/15/you -only-live-twice-basho-and-riak.html • Backing Articles: – http://jit.nuance9.com/2010/07/ruby-192-rails-3-riak-and- ripple.html – http://jbbarth.com/archives/2011/4/23/basic_usage_of_riak_in / • Code:John Repko -- Pikasoft LLC 15
  16. 16. Agenda The Goal: Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak / MongoDB – Through Elastic MapreduceJohn Repko -- Pikasoft LLC 16
  17. 17. MongoDB Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs- redis • Example: – http://www.pikasoft.com/journal/2010/7/31/nosql-on-the-cloud-our-first- application.html • Backing Articles: – http://www.mongodb.org/display/DOCS/Building+for+ Linux • Code: – http://www.pikasoft.com/journal/2010/8/16/why-our-little- nosql-app-matters.html So lets sum up -- after a handful of posts and a small but still sorrowful amount of command-line and rails code, weve managed to accomplish the following "Hello World" tasks in NoSQL on the cloud: •Created a cloud account •Got our first app created, and saw it in a browser on the web •Loaded up real development environments (Ruby/Rails we added, Java we got for free) •Added a stronger app server (thin >> webrick) and a stronger web server (nginx >> almost anything) •Added our first NoSQL data store (MongoDB) and mapping software to simulate ActiveRecord in NoSQL •Created a little NoSQL app to show all this, and made it visible though a dynamic DNS address: Rails Mongo Notes Example Just to wrap the little app up: I updated John Nunemakers Mongomapper demo app to work with Rails3 and the cloud, and if you like you can take a look at the code for it here: Rails Mongo Code. Play with this online at: http://jkr-code.dyndns.org:3000/notesJohn Repko -- Pikasoft LLC 17
  18. 18. Cassandra Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis • Example: – http://www.pikasoft.com/journal/2011/2/14/casi-casi- cassandra.html • Backing Articles: – http://www.25hoursaday.com/weblog/2008/05/23/ SomeThoughtsOnTwittersAvailabilityProblems.aspx • Code: Heres what the code for that broadcast might look like: # Tweeter class Tweeter < ActiveRecord::Base has_many :followers end - class Follower < ActiveRecord::Base belongs_to :tweeter end All fine so far -- thats the twittery world we all live in. I can send out my breathless message of what I had for breakfast, and then Twitter picks it up and broadcasts the message from me (and all the messages from the other tweeters): @tweeters = Tweeter.find(:all_tweeters) @tweeters.each do |tweeter| @followers = tweeter.find(:all) @followers.each do |follower| tweeter.broadcast_to :recipient => follower end end end So here were going to do a query for each of the X tweeters, and for them well do another query for each of their Y followers. Code smell! Fail Whale!!!John Repko -- Pikasoft LLC 18
  19. 19. Agenda Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak / MongoDB – Through Elastic MapreduceJohn Repko -- Pikasoft LLC 19
  20. 20. Neo4J Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis • Example: – http://www.pikasoft.com/journal/2011/1/21/graph-databases-and-star- wars.html • Backing Articles: – http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/ • Code Play with this online at: Six Degrees of Kevin Bacon = http://jkr-blog.dyndns.org:9292/John Repko -- Pikasoft LLC 20
  21. 21. Agenda Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak – Through Elastic MapreduceJohn Repko -- Pikasoft LLC 21
  22. 22. MapReduce via Hadoop, Thrift and AWS • Example: Reduce – http://www.pikasoft.com/journal/2011/1/9/nosql-next-up-hadoop-and- cloudera.html • Backing Articles: – http://www.joelonsoftware.com/items/2006/08/01. html • Code: MapJohn Repko -- Pikasoft LLC 22
  23. 23. MapReduce via Riak / MongoDB • Example: – http://www.control-alt-del.org/2011/09/14/fun-with-bloom-filters-using-riak-mapreduce / – http://verboselogging.com/2010/03/22/super-mongodb-mapreduce-max-out • Backing Articles: – MapReduce on Riak • http://wiki.basho.com/MapReduce.html • http://stackoverflow.com/questions/2123004/mapreduce-with- riak • http://www.readwriteweb.com/hack/2011/06/riak-pipe-rethinks-its- mapreduce.php • http://www.quora.com/What-are-the-advantages-and-limitations-of-MapReduce-backed-by-distributed-key-value-store Riak – MapReduce on MongoDB • http://dllhell.net/2010/07/17/on-mapreduce-in-mongodb / • http://www.mongodb.org/display/DOCS/ MapReduce • http://jonathanhui.com/mongodb-mapreduce • http://blog.boxedice.com/2010/06/21/map- reduce-and-mongodb/ Source: http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/John Repko -- Pikasoft LLC 23
  24. 24. Elastic MapReduce • Example: – http://www.commoncrawl.org/mapreduce-for-the-masses/ • Backing Articles: – http://www.commoncrawl.org/mapreduce-for-the-masses/ • Code:John Repko -- Pikasoft LLC 24
  25. 25. Summary This Is Only The Beginning. With A Standard Platform We’ll See Richer Big Data Discoveries Become Routine The Solution Tools (Slide 9) Become Straightforward if We Run Them on a Standard Architecture “One man’s noise is another man’s data.” ~ Bill Stensrud - InstantEncoreJohn Repko -- Pikasoft LLC 25
  26. 26. Contacts • John Repko: john.repko@pikasoft.com http://pikasoft.s3.amazonaws.com/Pictures_at_an_Exhibition.pptxJohn Repko -- Pikasoft LLC 26
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×