SlideShare a Scribd company logo
1 of 41
Download to read offline
A Walk Down NOSQL
  Lane in the Cloud
    Part 2: Riak
  NYC Cloud Computing Group, March 2011

                       Alexander Sicular
                              @siculars
Who is this blowhard?
Columbia University pays my mortgage

For the better part of a decade in Medical
Informatics

Am not shilling for any of these companies

Am not a computer scientist

Am a computer science enthusiast
particularly in the area of Informatics
Riak, eh?
Dynamo inspired

Homogeneous

Single key-space

Distributed

Replicated

Predictable
scaleability
Origins
Show me your friends...


Amazon’s Dynamo
http://www.allthingsdistributed.com/
2007/10/amazons_dynamo.html



Akamai
http://www.basho.com/bios.html




                                           Paramount Home Video
CAP Theorem
                       http://en.wikipedia.org/wiki/CAP_theorem




        Consistency

        Availability

        Partition tolerance


        Pick two?

                                                                  http://guide.couchdb.org/draft/consistency.html

Riak says: pick two at a time.
Homogeneous

Every node is the
same

Any node can service
any request

Nodes gossip on their
own port
One Ring to Rule Them
Single 160 bit key space

Huh?

No Sharding!
Distributed            (!= replicated)

riak is not sharded
                         ★Considerations:
vnodes = units of         -must plan maximum
distribution              ring size
vnodes != physical        -think about number
nodes (pnodes)            of vnodes per pnode
vnodes map to             -generally no less than
pnodes                    10 vnodes per pnode

data is distributed at
the vnode level
Conflict Resolution
    Vector Clocks

    ancestry / divergency maintained

    automatic or manual resolution
★   Considerations:

      X-Riak-ClientId,

      X-Riak-Vclock

      allow_mult
Replicated          (!= distributed)



configurable replication values (“N”)

configurable consistency and availability
values at read and write time

-
    read

-
    write

-
    durable write
Predictable Scaleability

    How much performance per node?

    Scale in both directions
>   bin/riak-admin
>   Usage: riak-admin { join | leave |
    backup | restore | test | status |
    reip | js_reload | wait-for-
    service | ringready | transfers }
Data Agnostic
      schemaless

      data objects may be of any type

      binary, text (json, xml)

      use content types

>curl -v -d 'this is a test' -H "Content-Type: text/plain" 
http://127.0.0.1:8098/riak/testBucket/testKey
Extra Goodies

Erlang
http://www.pragprog.com/titles/jaerlang/
programming-erlang



Code Architecture

basho_bench

Multiple backends

       bitcask, innodb, mem
Code architecture
Highly modularized

  riak_core

  riak_kv

  bitcask

  erlang_js

                     http://bitbucket.org/basho
basho_bench

Performance profiling

highly customizable

pretty pictures

key/value store generalized
https://wiki.basho.com/display/RIAK/Benchmarking+with+Basho+Bench

http://pics.livejournal.com/demmonoid/pic/00001sa7
Bitcask
Riak’s default disk backend

Write Only Log

Heavy updates will grow your footprint
  -   Look into compaction/merging settings

Keys are cached in memory with disk offsets
https://spreadsheets.google.com/ccc?
key=0Ak4OBkABJPsxdEowYXc2akxnYU9xNkJmbmZscnhaTFE&hl=en&authkey=CMHw8tYO
Speak my language?
   HTTP
http://wiki.basho.com/display/RIAK/REST+API



   Protocol Buffers
http://wiki.basho.com/display/RIAK/PBC+API



   Native Erlang
http://wiki.basho.com/display/RIAK/Erlang+Client
+PBC


                                                                http://www.zazzle.com/
                                                   speak_to_me_in_tagalog_tshirt-235376204895796392
Ok sounds good.
    How do I get it?
>git|hg clone http://bitbucket.org/
basho/riak
>cd riak
>make all && make rel

           OR if you’re on a mac:
>brew install riak
Ok sounds good.
        How do I get it?
>git|hg clone http://bitbucket.org/basho/
riak_search
>cd riak_search
>make all && make rel

            OR if you’re on a mac:
>brew install riak-search
What does that get me?


 Fully functional

 Self contained (<3)

 Default configuration

-64 vnodes, “riak” cookie, N = 3
Work... like so.

      Config files
http://wiki.basho.com/display/RIAK/Configuration+Files




app.config
-ring_creation_size


vm.args
-name, -settings
Fire it up


>   bin/riak

>   Usage: riak {start|stop|restart|
    reboot|ping|console|attach}

>   bin/riak start
Do Stuff!
    GET:

>   curl -v http://127.0.0.1:8098/ping

>   curl -v http://127.0.0.1:8098/stats

>   curl -v http://127.0.0.1:8098/riak/myBucket

>   curl -v http://127.0.0.1:8098/riak/myBucket/myKey

    PUT:

>   curl -v -X PUT -H "Content-Type: application/json" -d
    '{"backend": "ets"}' http://127.0.0.1:8098/riak/myBucket

>   curl -v -X PUT -d 'test key' http://127.0.0.1:8098/riak/
    myBucket/myKey

>   curl -v -X POST -d 'autogen key' http://127.0.0.1:8098/
    riak/myBucket
Links
Lightweight Graphing

Practical limitations re. number of links per
object

Unidirectional object linking

relationship modeling (one to one, one to many)

Returns “Content-Type: multipart/mixed;”

  - Library needs to be multipart aware
  - nodejs, formidable
Link Walking
First level depth
>curl http://localhost:8098/riak/myBucket/myKey/_,_,_

Via Map/Reduce
>$ curl -X POST -H "content-type:application/json" 
   http://localhost:8098/mapred --data @-
{"inputs":[["myBucket","myKey"]],"query":[{"link":{}},{"map":
{"language":"javascript","source":"function(v)
{ return [v]; }"}}]}
^D

N level depth
>curl http://localhost:8098/riak/myBucket/myKey/_,_,_/_,_,_


More Info:
http://blog.basho.com/2010/02/24/link-walking-by-example/
http://wiki.basho.com/display/RIAK/Links
http://wiki.basho.com/display/RIAK/REST+API#RESTAPI-Linkwalking
Map/Reduce
Functions written in either Erlang or
JavaScript

Map is distributed to where the data lives

Reduce is run on the node coordinating the
M/R

Erlang > JavaScript

Tweak JavaScript settings in app.conf
M/R in Riak
  An input to start from
                                    function(v, keydata, args) {
      bucket                        !      if (v.values) {
                                    !        var ret = [], o = {};
                                    !        o = Riak.mapValuesJson(v)[0]; !
      list of keys / keyfilter       !        o.lastModifiedParsed = Date.parse(v["values"][0] 
                                    ["metadata"]["X-Riak-Last-Modified"]);
                                    !        o.key = v["key"];
  ★   keys > bucket                 !        ret.push(o);
                                    !        return ret;
  possible link phase               !      } else {
                                    !        return [];
                                    !      }
  one or more map phases            !   };


  (many) possible reduce phase(s)


 Map = SQL Select/Where clause
Reduce = SQL Aggregates (SUM, COUNT, GROUP
BY)
Pre/Post Commit Hooks
 Pre Commit

   JavaScript or   Post Commit
   Erlang
                     Erlang
   Validation
                     Indexing
   Modify data
                     Messaging
   Kill writes
Chief complaints
No index

No native sort

No increment

No native data
structures
Riak Search
Betalicious

Superset of Riak

Full text search

http://wiki.basho.com/
display/RIAK/Riak
+Search
http://www.slideshare.net/rklophaus/
riak-search-erlang-factory-london-2010
                                         http://www.seowebworx.co.uk/
Riak Search... more

uses a modified bitcask backend called
merge_index

enabled on a per bucket basis

access via http and command line
Riak-JS
NodeJS Riak module

Written in Coffeescript

HTTP and Protobuf

Customizable via “meta” options

http://riakjs.org
Code demo
nodejs

riak-js

redis

simple post site

tags

json data passing
Javascript Map
var map = function(v, keydata, args) {
!    if (v.values) {
!      var ret = [], o = {};
!      o = Riak.mapValuesJson(v)[0];
!      o.key = v["key"]; / /put the key in the returned data object
!      o.lastModified = v["values"][0]["metadata"]["X-Riak-Last-Modified"];
!      ret.push(o);
!      return ret;
!    } else {
!      return [];
!    }
! };
Javascript Reduce
var sortInt = function ( data , args ) {
  var sortBy = (typeof args === "undefined" || args === null) ? undefined : args.field;
  var desc = ((typeof args === "undefined" || args === null) ? undefined : args.order) === 'desc';
! ! data.sort ( function(a,b) {
! ! ! if (desc) {
! ! !          var _ref = [b, a];
! ! !          a = _ref[0];
! ! !          b = _ref[1];
! ! ! }! !
! !        return a[sortBy] - b[sortBy]
! ! } );
! ! return data
! };
Putting it all together
riak
! .add(“bucket”)
    //map function
! .map(map)
    //reduce fuction
! .reduce(sortInt, { field: "lastModified", order: "desc" })
! .run(function(err, response) {
        //send out an error if there is one
!       if (err) res.simpleJSON(400, {errortxt: 'mapreduce gone bad :('} );
!       / /otherwise send the data back...
!       res.simpleJSON(200, { response } );
!
!     });
Hybrid architectures are
           the future!
Use tools like Redis to augment shortcomings!
1,456,023 Or “A Lot”

At scale, precision
does not matter in
practice.

   Google

   Twitter


                      http://photography.nationalgeographic.com/photography/enlarge/
                                  okavango-cape-buffalo_pod_image.html
Google
         Look Ma!

         No exact counts!
Twitter


No Totals!



                       No Pagination!
Questions?


NYC Cloud Computing Group, March 2011

                     Alexander Sicular
                            @siculars

More Related Content

What's hot

Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
Roll Your Own API Management Platform with nginx and Lua
Roll Your Own API Management Platform with nginx and LuaRoll Your Own API Management Platform with nginx and Lua
Roll Your Own API Management Platform with nginx and LuaJon Moore
 
HTML5 JavaScript Interfaces
HTML5 JavaScript InterfacesHTML5 JavaScript Interfaces
HTML5 JavaScript InterfacesAaron Gustafson
 
Lua tech talk
Lua tech talkLua tech talk
Lua tech talkLocaweb
 
Forget the Web
Forget the WebForget the Web
Forget the WebRemy Sharp
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1medcl
 
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...PROIDEA
 
No more (unsecure) secrets, Marty
No more (unsecure) secrets, MartyNo more (unsecure) secrets, Marty
No more (unsecure) secrets, MartyMathias Herberts
 
4069180 Caching Performance Lessons From Facebook
4069180 Caching Performance Lessons From Facebook4069180 Caching Performance Lessons From Facebook
4069180 Caching Performance Lessons From Facebookguoqing75
 
Testing Web Applications with GEB
Testing Web Applications with GEBTesting Web Applications with GEB
Testing Web Applications with GEBHoward Lewis Ship
 
Facebook的缓存系统
Facebook的缓存系统Facebook的缓存系统
Facebook的缓存系统yiditushe
 
BlockChain implementation by python
BlockChain implementation by pythonBlockChain implementation by python
BlockChain implementation by pythonwonyong hwang
 
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03Mathias Herberts
 
Using ngx_lua in UPYUN
Using ngx_lua in UPYUNUsing ngx_lua in UPYUN
Using ngx_lua in UPYUNCong Zhang
 

What's hot (20)

Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
Whats new in iOS5
Whats new in iOS5Whats new in iOS5
Whats new in iOS5
 
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
 
Roll Your Own API Management Platform with nginx and Lua
Roll Your Own API Management Platform with nginx and LuaRoll Your Own API Management Platform with nginx and Lua
Roll Your Own API Management Platform with nginx and Lua
 
Da APK al Golden Ticket
Da APK al Golden TicketDa APK al Golden Ticket
Da APK al Golden Ticket
 
HTML5 JavaScript Interfaces
HTML5 JavaScript InterfacesHTML5 JavaScript Interfaces
HTML5 JavaScript Interfaces
 
Lua tech talk
Lua tech talkLua tech talk
Lua tech talk
 
Forget the Web
Forget the WebForget the Web
Forget the Web
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
dotCloud and go
dotCloud and godotCloud and go
dotCloud and go
 
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
 
No more (unsecure) secrets, Marty
No more (unsecure) secrets, MartyNo more (unsecure) secrets, Marty
No more (unsecure) secrets, Marty
 
4069180 Caching Performance Lessons From Facebook
4069180 Caching Performance Lessons From Facebook4069180 Caching Performance Lessons From Facebook
4069180 Caching Performance Lessons From Facebook
 
Testing Web Applications with GEB
Testing Web Applications with GEBTesting Web Applications with GEB
Testing Web Applications with GEB
 
Facebook的缓存系统
Facebook的缓存系统Facebook的缓存系统
Facebook的缓存系统
 
BlockChain implementation by python
BlockChain implementation by pythonBlockChain implementation by python
BlockChain implementation by python
 
Groovy.pptx
Groovy.pptxGroovy.pptx
Groovy.pptx
 
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
 
Geth important commands
Geth important commandsGeth important commands
Geth important commands
 
Using ngx_lua in UPYUN
Using ngx_lua in UPYUNUsing ngx_lua in UPYUN
Using ngx_lua in UPYUN
 

Similar to I don't have an exact count of the number of search queries Google handles each day. Google processes extremely large volumes of queries, but keeps many statistics private

Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Trickssiculars
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik ErlandsonDatabricks
 
PUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY RiakPUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY RiakPhilipp Fehre
 
Real World Lessons on the Pain Points of Node.JS Application
Real World Lessons on the Pain Points of Node.JS ApplicationReal World Lessons on the Pain Points of Node.JS Application
Real World Lessons on the Pain Points of Node.JS ApplicationBen Hall
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011Nick Sieger
 
Design Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron PattersonDesign Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron PattersonManageIQ
 
Couchdb: No SQL? No driver? No problem
Couchdb: No SQL? No driver? No problemCouchdb: No SQL? No driver? No problem
Couchdb: No SQL? No driver? No problemdelagoya
 
Behavior driven oop
Behavior driven oopBehavior driven oop
Behavior driven oopPiyush Verma
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Chris Alfano
 
Ruby MVC from scratch with Rack
Ruby MVC from scratch with RackRuby MVC from scratch with Rack
Ruby MVC from scratch with RackDonSchado
 
Integrating React.js Into a PHP Application
Integrating React.js Into a PHP ApplicationIntegrating React.js Into a PHP Application
Integrating React.js Into a PHP ApplicationAndrew Rota
 
Node.js - async for the rest of us.
Node.js - async for the rest of us.Node.js - async for the rest of us.
Node.js - async for the rest of us.Mike Brevoort
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applicationsTom Croucher
 
Reactive Programming - ReactFoo 2020 - Aziz Khambati
Reactive Programming - ReactFoo 2020 - Aziz KhambatiReactive Programming - ReactFoo 2020 - Aziz Khambati
Reactive Programming - ReactFoo 2020 - Aziz KhambatiAziz Khambati
 
Remedie: Building a desktop app with HTTP::Engine, SQLite and jQuery
Remedie: Building a desktop app with HTTP::Engine, SQLite and jQueryRemedie: Building a desktop app with HTTP::Engine, SQLite and jQuery
Remedie: Building a desktop app with HTTP::Engine, SQLite and jQueryTatsuhiko Miyagawa
 

Similar to I don't have an exact count of the number of search queries Google handles each day. Google processes extremely large volumes of queries, but keeps many statistics private (20)

Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
 
Play vs Rails
Play vs RailsPlay vs Rails
Play vs Rails
 
JS everywhere 2011
JS everywhere 2011JS everywhere 2011
JS everywhere 2011
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik Erlandson
 
Rack
RackRack
Rack
 
PUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY RiakPUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY Riak
 
Real World Lessons on the Pain Points of Node.JS Application
Real World Lessons on the Pain Points of Node.JS ApplicationReal World Lessons on the Pain Points of Node.JS Application
Real World Lessons on the Pain Points of Node.JS Application
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
JRuby + Rails = Awesome Java Web Framework at Jfokus 2011
 
Design Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron PattersonDesign Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron Patterson
 
Couchdb: No SQL? No driver? No problem
Couchdb: No SQL? No driver? No problemCouchdb: No SQL? No driver? No problem
Couchdb: No SQL? No driver? No problem
 
Behavior driven oop
Behavior driven oopBehavior driven oop
Behavior driven oop
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011
 
Ruby MVC from scratch with Rack
Ruby MVC from scratch with RackRuby MVC from scratch with Rack
Ruby MVC from scratch with Rack
 
Integrating React.js Into a PHP Application
Integrating React.js Into a PHP ApplicationIntegrating React.js Into a PHP Application
Integrating React.js Into a PHP Application
 
Node.js - async for the rest of us.
Node.js - async for the rest of us.Node.js - async for the rest of us.
Node.js - async for the rest of us.
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applications
 
Reactive Programming - ReactFoo 2020 - Aziz Khambati
Reactive Programming - ReactFoo 2020 - Aziz KhambatiReactive Programming - ReactFoo 2020 - Aziz Khambati
Reactive Programming - ReactFoo 2020 - Aziz Khambati
 
Remedie: Building a desktop app with HTTP::Engine, SQLite and jQuery
Remedie: Building a desktop app with HTTP::Engine, SQLite and jQueryRemedie: Building a desktop app with HTTP::Engine, SQLite and jQuery
Remedie: Building a desktop app with HTTP::Engine, SQLite and jQuery
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

I don't have an exact count of the number of search queries Google handles each day. Google processes extremely large volumes of queries, but keeps many statistics private

  • 1. A Walk Down NOSQL Lane in the Cloud Part 2: Riak NYC Cloud Computing Group, March 2011 Alexander Sicular @siculars
  • 2. Who is this blowhard? Columbia University pays my mortgage For the better part of a decade in Medical Informatics Am not shilling for any of these companies Am not a computer scientist Am a computer science enthusiast particularly in the area of Informatics
  • 3. Riak, eh? Dynamo inspired Homogeneous Single key-space Distributed Replicated Predictable scaleability
  • 4. Origins Show me your friends... Amazon’s Dynamo http://www.allthingsdistributed.com/ 2007/10/amazons_dynamo.html Akamai http://www.basho.com/bios.html Paramount Home Video
  • 5. CAP Theorem http://en.wikipedia.org/wiki/CAP_theorem Consistency Availability Partition tolerance Pick two? http://guide.couchdb.org/draft/consistency.html Riak says: pick two at a time.
  • 6. Homogeneous Every node is the same Any node can service any request Nodes gossip on their own port
  • 7. One Ring to Rule Them Single 160 bit key space Huh? No Sharding!
  • 8. Distributed (!= replicated) riak is not sharded ★Considerations: vnodes = units of -must plan maximum distribution ring size vnodes != physical -think about number nodes (pnodes) of vnodes per pnode vnodes map to -generally no less than pnodes 10 vnodes per pnode data is distributed at the vnode level
  • 9. Conflict Resolution Vector Clocks ancestry / divergency maintained automatic or manual resolution ★ Considerations: X-Riak-ClientId, X-Riak-Vclock allow_mult
  • 10. Replicated (!= distributed) configurable replication values (“N”) configurable consistency and availability values at read and write time - read - write - durable write
  • 11. Predictable Scaleability How much performance per node? Scale in both directions > bin/riak-admin > Usage: riak-admin { join | leave | backup | restore | test | status | reip | js_reload | wait-for- service | ringready | transfers }
  • 12. Data Agnostic schemaless data objects may be of any type binary, text (json, xml) use content types >curl -v -d 'this is a test' -H "Content-Type: text/plain" http://127.0.0.1:8098/riak/testBucket/testKey
  • 14. Code architecture Highly modularized riak_core riak_kv bitcask erlang_js http://bitbucket.org/basho
  • 15. basho_bench Performance profiling highly customizable pretty pictures key/value store generalized https://wiki.basho.com/display/RIAK/Benchmarking+with+Basho+Bench http://pics.livejournal.com/demmonoid/pic/00001sa7
  • 16. Bitcask Riak’s default disk backend Write Only Log Heavy updates will grow your footprint - Look into compaction/merging settings Keys are cached in memory with disk offsets https://spreadsheets.google.com/ccc? key=0Ak4OBkABJPsxdEowYXc2akxnYU9xNkJmbmZscnhaTFE&hl=en&authkey=CMHw8tYO
  • 17. Speak my language? HTTP http://wiki.basho.com/display/RIAK/REST+API Protocol Buffers http://wiki.basho.com/display/RIAK/PBC+API Native Erlang http://wiki.basho.com/display/RIAK/Erlang+Client +PBC http://www.zazzle.com/ speak_to_me_in_tagalog_tshirt-235376204895796392
  • 18. Ok sounds good. How do I get it? >git|hg clone http://bitbucket.org/ basho/riak >cd riak >make all && make rel OR if you’re on a mac: >brew install riak
  • 19. Ok sounds good. How do I get it? >git|hg clone http://bitbucket.org/basho/ riak_search >cd riak_search >make all && make rel OR if you’re on a mac: >brew install riak-search
  • 20. What does that get me? Fully functional Self contained (<3) Default configuration -64 vnodes, “riak” cookie, N = 3
  • 21. Work... like so. Config files http://wiki.basho.com/display/RIAK/Configuration+Files app.config -ring_creation_size vm.args -name, -settings
  • 22. Fire it up > bin/riak > Usage: riak {start|stop|restart| reboot|ping|console|attach} > bin/riak start
  • 23. Do Stuff! GET: > curl -v http://127.0.0.1:8098/ping > curl -v http://127.0.0.1:8098/stats > curl -v http://127.0.0.1:8098/riak/myBucket > curl -v http://127.0.0.1:8098/riak/myBucket/myKey PUT: > curl -v -X PUT -H "Content-Type: application/json" -d '{"backend": "ets"}' http://127.0.0.1:8098/riak/myBucket > curl -v -X PUT -d 'test key' http://127.0.0.1:8098/riak/ myBucket/myKey > curl -v -X POST -d 'autogen key' http://127.0.0.1:8098/ riak/myBucket
  • 24. Links Lightweight Graphing Practical limitations re. number of links per object Unidirectional object linking relationship modeling (one to one, one to many) Returns “Content-Type: multipart/mixed;” - Library needs to be multipart aware - nodejs, formidable
  • 25. Link Walking First level depth >curl http://localhost:8098/riak/myBucket/myKey/_,_,_ Via Map/Reduce >$ curl -X POST -H "content-type:application/json" http://localhost:8098/mapred --data @- {"inputs":[["myBucket","myKey"]],"query":[{"link":{}},{"map": {"language":"javascript","source":"function(v) { return [v]; }"}}]} ^D N level depth >curl http://localhost:8098/riak/myBucket/myKey/_,_,_/_,_,_ More Info: http://blog.basho.com/2010/02/24/link-walking-by-example/ http://wiki.basho.com/display/RIAK/Links http://wiki.basho.com/display/RIAK/REST+API#RESTAPI-Linkwalking
  • 26. Map/Reduce Functions written in either Erlang or JavaScript Map is distributed to where the data lives Reduce is run on the node coordinating the M/R Erlang > JavaScript Tweak JavaScript settings in app.conf
  • 27. M/R in Riak An input to start from function(v, keydata, args) { bucket ! if (v.values) { ! var ret = [], o = {}; ! o = Riak.mapValuesJson(v)[0]; ! list of keys / keyfilter ! o.lastModifiedParsed = Date.parse(v["values"][0] ["metadata"]["X-Riak-Last-Modified"]); ! o.key = v["key"]; ★ keys > bucket ! ret.push(o); ! return ret; possible link phase ! } else { ! return []; ! } one or more map phases ! }; (many) possible reduce phase(s) Map = SQL Select/Where clause Reduce = SQL Aggregates (SUM, COUNT, GROUP BY)
  • 28. Pre/Post Commit Hooks Pre Commit JavaScript or Post Commit Erlang Erlang Validation Indexing Modify data Messaging Kill writes
  • 29. Chief complaints No index No native sort No increment No native data structures
  • 30. Riak Search Betalicious Superset of Riak Full text search http://wiki.basho.com/ display/RIAK/Riak +Search http://www.slideshare.net/rklophaus/ riak-search-erlang-factory-london-2010 http://www.seowebworx.co.uk/
  • 31. Riak Search... more uses a modified bitcask backend called merge_index enabled on a per bucket basis access via http and command line
  • 32. Riak-JS NodeJS Riak module Written in Coffeescript HTTP and Protobuf Customizable via “meta” options http://riakjs.org
  • 33. Code demo nodejs riak-js redis simple post site tags json data passing
  • 34. Javascript Map var map = function(v, keydata, args) { ! if (v.values) { ! var ret = [], o = {}; ! o = Riak.mapValuesJson(v)[0]; ! o.key = v["key"]; / /put the key in the returned data object ! o.lastModified = v["values"][0]["metadata"]["X-Riak-Last-Modified"]; ! ret.push(o); ! return ret; ! } else { ! return []; ! } ! };
  • 35. Javascript Reduce var sortInt = function ( data , args ) { var sortBy = (typeof args === "undefined" || args === null) ? undefined : args.field; var desc = ((typeof args === "undefined" || args === null) ? undefined : args.order) === 'desc'; ! ! data.sort ( function(a,b) { ! ! ! if (desc) { ! ! ! var _ref = [b, a]; ! ! ! a = _ref[0]; ! ! ! b = _ref[1]; ! ! ! }! ! ! ! return a[sortBy] - b[sortBy] ! ! } ); ! ! return data ! };
  • 36. Putting it all together riak ! .add(“bucket”) //map function ! .map(map) //reduce fuction ! .reduce(sortInt, { field: "lastModified", order: "desc" }) ! .run(function(err, response) { //send out an error if there is one ! if (err) res.simpleJSON(400, {errortxt: 'mapreduce gone bad :('} ); ! / /otherwise send the data back... ! res.simpleJSON(200, { response } ); ! ! });
  • 37. Hybrid architectures are the future! Use tools like Redis to augment shortcomings!
  • 38. 1,456,023 Or “A Lot” At scale, precision does not matter in practice. Google Twitter http://photography.nationalgeographic.com/photography/enlarge/ okavango-cape-buffalo_pod_image.html
  • 39. Google Look Ma! No exact counts!
  • 40. Twitter No Totals! No Pagination!
  • 41. Questions? NYC Cloud Computing Group, March 2011 Alexander Sicular @siculars