SlideShare a Scribd company logo
1 of 48
Achieving 100,000
Transactions Per Second
 with a NoSQL Database



         Eric David Bloch
           @eedeebee

          23 May 2012
A bit about me

ā€¢ Iā€™ve written software used by millions of people.

    Apps, libraries, compilers, device drivers, operating systems

ā€¢ This is my third Gluecon and my first talk

ā€¢ Iā€™m the Community Director at MarkLogic, last 2 years.

ā€¢ I survived having 3 kids in less than 2 years.
Like me, but not me
Actually, me
Storyline

ā€¢ Why?
ā€¢ How?
  ā€¢ Whirl-wind tour of database architecture
  ā€¢ Techniques to get to 100K
ļ‚§ Itā€™s about money.




ļ‚§ Top 5 bank needed to manage trades
ļ‚§ Trades look more like documents than tables
ļ‚§ Schemas for trades change all the time
ļ‚§ Transactions
ļ‚§ Scale and velocity (ā€œBig Dataā€)
Trades and Positions


ļ‚§ 1 million trades per day
ļ‚§ Followed by 1 million position reports at end of day
   ļ‚§ Roll up trades of current date for each ā€œbook, instrumentā€ pair
   ļ‚§ Group-by, with key = ā€œdate, book, instrumentā€

                  1M positions
         1M trades




               Day         Day   Day    Day    Day     ...
                1           2     3      4      5
Trades and Positions

<trade>
   <quantity>8540882</quantity>
   <quantity2>1193.71</quantity2>
   <instrument>WASAX</instrument>
   <book>679</book>
   <trade-date>2011-03-13-07:00</trade-date>
   <settle-date>2011-03-17-07:00</settle-date>
</trade>


<position>
  <instrument>EAAFX</instrument>
  <book>679</book>
  <quantity>3</quantity>
  <business-date>2011-03-25Z</business-date>
  <position-date>2011-03-24Z</position-date>
</position>
Requirements

   NoSQL flexibility,
 performance & scale

    Enterprise-grade
transactional guarantees
Now show us

ā€¢ 15K inserts per second
ā€¢ Linear scalability
What NoSQL Database out there can we use?




                     A quote from the bank
ā€œWe threw everything we had at MarkLogic and it didnā€™t break a sweatā€
We werenā€™t content with 15K


So we showed themā€¦




Slide 13   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
What is MarkLogic?


ļ‚§ Non-relational, document-oriented, distributed
  database
           ļ‚§ Shared nothing clustering, linear scale out
           ļ‚§ Multi-Version Concurrency Control (MVCC)
           ļ‚§ Transactions (ACID)


ļ‚§ Search Engine
           ļ‚§    Web scale (Big Data)
           ļ‚§    Inverted indexes (term lists)
           ļ‚§    Real-time updates
           ļ‚§    Compose-able queries

Slide 14       Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Architecture



ļ‚§ Data model
ļ‚§ Indexing
ļ‚§ Clustering
ļ‚§ Query execution
ļ‚§ Transactions



Slide 15   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Key (URI)                    Value (Document)

/trade/153748994             <trade>
                              <id>8</id>
                              <time>2012-02-20T14:00:00</time>
                              <instrument>BYME AAA</instrument>
                              <price cur=ā€œusdā€>600.27</price>
                             </trade>



/user/eedeebee               {
                                 ā€œnameā€ : ā€œEric Blochā€,
                                 ā€œageā€ : 47,
                                 ā€œhairā€ : ā€œgrayā€,
                                 ā€œkidsā€ : [ ā€œGraceā€, ā€œRyanā€, ā€œOwenā€ ]
                             }

/book5293                    It was the best of times, it was the worst of times, it
                             was the age of wisdom, ...


/2012-02-20T14:47:53/01445   .mp3
                             .avi
                             [your favorite binary format]
Inverted Index



ā€œwhichā€                                          123, 127, 129, 152, 344, 791 . . .

ā€œuniquelyā€                                       122, 125, 126, 129, 130, 167 . . .

ā€œidentifyā€                                       123, 126, 130, 142, 143, 167 . . .

ā€œeachā€                                           123, 130, 131, 135, 162, 177 . . .   Document
ā€œuniquely identifyā€
                                                 126, 130, 167, 212, 219, 377 . . .   References
<article>                                        ...
                                                                                      126, 130, 167, 212, 219, 377 . . .
article/abstract/@author                         ...

<product>IMS</product>




  Slide 17   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Range Index

                  <trade>
                   <trader_id>8</trader_id>                                       Value     Docid
                   <time>2012-02-20T14:00:00</time>
                   <instrument>IBM</instrument>                                   0         287
                   ā€¦
                  </trade>                                                        8         1129
                                                                                  13        531    Docid Value
                  <trade>
                   <trader_id>13</trader_id>                                      ā€¦         ā€¦      287    0
                   <time>2012-02-20T14:30:00</time>
Rows               <instrument>AAPL</instrument>                                  ā€¦         ā€¦      531    13
                   ā€¦                                                                               1129   8
                  </trade>
                                                                                                   ā€¦      ā€¦
                  <trade>
                   <trader_id>0</trader_id>                                                        ā€¦      ā€¦
                   <time>2012-02-20T15:30:00</time>
                   <instrument>GOOG</instrument>                                      ā€¢ Column Oriented
                   ā€¦                                                                  ā€¢ Memory Mapped
                  </trade>


       Slide 18   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Shared-Nothing Clustering




     Host D1                                        Host D2                Host D3   ā€¦   Host Dj




Slide 19   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Query Evaluation ā€“ ā€œMapā€


                                         Q
      Host E1                                         Host E2                      Host E3   ā€¦       Host Ei




         Host D1                                        Host D2                    Host D3            Host Dj
                                                                                             ā€¦

Q                                        Q                                     Q                 Q
      F1 ā€¦ Fn                                       F1 ā€¦ Fn                        F1 ā€¦ Fn           F1 ā€¦ Fn
    Slide 21   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Query Evaluation ā€“ ā€œReduceā€

                                           Top
                                           10
        Host E1                                         Host E2                    Host E3   ā€¦     Host Ei




           Host D1                                        Host D2                  Host D3             Host Dj
                                                                                             ā€¦
Top                                             Top                              Top             Top
10                                              10                               10              10
        F1 ā€¦ Fn                                       F1 ā€¦ Fn                      F1 ā€¦ Fn             F1 ā€¦ Fn
      Slide 22   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Queries/Updates with MVCC


ļ‚§ Every query has a timestamp
ļ‚§ Documents do not change

ļ‚§    Reads are lock-free
ļ‚§    Inserts ā€“ see next slide
ļ‚§    Deletes ā€“ mark as deleted
ļ‚§    Edits ā€“
           ļ‚§ copy
           ļ‚§ edit
           ļ‚§ insert the copy
           ļ‚§ mark the original as deleted

Slide 23    Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Insert Mechanics


1)         New URI+Document arrive at E-node
2)         URI Probe ā€“ determine whether URI exists in any forest
3)         URI Lock ā€“ write locks is taken on D node(s)
4)         Forest Assignment ā€“ URI is deterministically placed in Forest
5)         Indexing
6)         Journaling
7)         Commit ā€“ transaction complete
8)         Release URI Locks ā€“ D node(s) are notified to release lock




Slide 24   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Save-and-merge
       (Log Stuctured Tree Merge)

                                                                                  Database
                          Insert/
                          Update



                                                            Forest 1                              Forest 2
                                                 S0
Memory                         Save

Disk
       Journaled
                                                 S1                 S2                       S1     S2
                                                                                    S3                       S3

                                          Merge
                  J

       Slide 25   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Back to
the money
Trades and Positions

ļ‚§ 1 million trades per day
ļ‚§ followed by 1 million position reports at end of day
       ļ‚§ Roll up trades of the current ā€œdateā€ for each ā€œbook:instrumentā€ pair
       ļ‚§ Group-by, with key = ā€œbook:date:instrumentā€


                           1M positions
                  1M trades




                                Day                    Day                 Day   Day   Day   ...
                                 1                      2                   3     4     5


Slide 27   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Naive Query Pseudocode



for each book
for each instrument in that book
  position = position(yesterday, book, instrument)
  for each trade of that instrument in this book
   position += trade(today, book, instrument).quantity
  insert(today, book, instrument, position)




    Slide 28   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Initial results




Single node ā€“ 19,000 inserts per second




Slide 29   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Initial results with cluster


           70000


           60000


           50000


           40000
 doc/sec




           30000                                                                                 Report Query 2


           20000


           10000


               0
                                      1DE                                        2DE       3DE
                                                                              # of nodes




Slide 30      Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Techniques to get to 100K


ļ‚§ Query for Computing New Positions
           ļ‚§ Materialized compound key, Co-Occurrence Query and Aggregation
ļ‚§ Insert of New Positions
           ļ‚§ Batching
           ļ‚§ In-Forest Eval




Slide 31    Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Materializing a compound key

<trade>
   <quantity2>1193.71</quantity2>
   <instrument>WASAX</instrument>
   <book>679</book>
   <trade-date>2011-03-13-07:00</trade-date>
   <settle-date>2011-03-17-07:00</settle-date>
</trade>




  Slide 32   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Materializing a compound key

<trade>
   <roll-up book-date-instrument=ā€œ151333445566782303ā€/>
   <quantity2>1193.71</quantity2>
   <instrument>WASAX</instrument>
   <book>679</book>
   <trade-date>2011-03-13-07:00</trade-date>
   <settle-date>2011-03-17-07:00</settle-date>
</trade>




 Slide 33   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Co-Occurrence and
       Distributed Aggregation
<trade>
 <roll-up
    book-date-instrument=ā€œ151373445566703ā€/>                                       V        D
                                                                                       D         V
 <quantity>8540882</quantity>                                                      ā€¦        ā€¦
 <instrument>WASAX</instrument>                                                        ā€¦        ā€¦
                                                                                   ā€¦        1129
 <book>679</book>                                                                      ā€¦        ā€¦
 <trade-date>2011-03-13-07:00</trade-date>                                         ā€¦         ā€¦
                                                                                       15137ā€¦    ā€¦
 <settle-date>2011-03-17-07:00</settle-date>
                                                                                   ā€¦        ā€¦
</trade>                                                                               ā€¦         ā€¦
                                                                                   ā€¦        ā€¦
                                                                                       ā€¦         ā€¦

ā€¢ Co-occurrences:
                                                                                                 V         D
     Find pairings of range indexed values                                                            D        V
                                                                                                 ā€¦         ā€¦
ā€¢ Aggregate on the D nodes (Map/Reduce):
                                                                                                 8540882
                                                                                                       ā€¦   1129 ā€¦
     Sum up the quantities above
                                                                                                      ā€¦        ā€¦
                                                                                                 ā€¦         ā€¦
                                                                                                      ā€¦        8540882
ā€¢ Similar to a Group-by,                                                                         ā€¦         ā€¦
     ā€¢ in a column-oriented, in-memory                                                           ā€¦    ā€¦    ā€¦   ā€¦
     database                                                                                         ā€¦        ā€¦


        Slide 34   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Initial results with new query




      > 30K inserts/second on a single node




Slide 35   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Co-Occurrence + Aggregate versus
NaĆÆve approach

           70000


           60000


           50000


           40000
 doc/sec




                                                                                                 Report Query 3
           30000
                                                                                                 Report Query 2

           20000


           10000


               0
                                      1DE                                        2DE       3DE
                                                                              # of nodes




Slide 36      Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Techniques


ļ‚§ Computing Positions
           ļ‚§ Materialized compound key, Co-Occurrence Query and Aggregation
ļ‚§ Updates
           ļ‚§ Batching
           ļ‚§ In-Forest Eval




Slide 37    Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Transaction Size and
Throughput

                  35000


                  30000


                  25000
 #docs per sec.




                  20000


                  15000                                                                                                   insert throughput


                  10000


                   5000


                      0
                                1           2            4            8              16   32   100   500   1000   10000
                                                                   # doc inserts per transaction




Slide 38             Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Techniques


ļ‚§ Computing Positions (Data Warehouse)
           ļ‚§ Hash Key Decoration
           ļ‚§ Co-Occurrence
ļ‚§ Updates (Transaction)
           ļ‚§ Batching
           ļ‚§ In-Forest Eval




Slide 39    Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Insert Mechanics


1)         New URI+Document arrive at E-node
2)         *URI Probe ā€“ determine whether URI exists in any forest
3)         *URI Lock ā€“ write locks are created
4)         Forest Assignment ā€“ URI is deterministically placed in Forest
5)         Indexing
6)         Journaling
7)         Commit ā€“ transaction complete
8)         *Release URI Locks ā€“ D nodes are notified to release lock



* Overhead of these operations increases with cluster size


Slide 40   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Deterministic Placement


4) Forest Assignment ā€“ URI is deterministically placed in Forest

                             Hash                                          64-bit   Bucketed        Fi
  URI
                             function                                      number   into a Forest




  ā€¢ Done in C++ within server
  ā€¢ Butā€¦
       ā€¢ Can also be done in the client
       ā€¢ Server allows queries to be evaluated against only one
       forestā€¦




Slide 41   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
In-Forest Eval


                                                        Q
                                                                           E




                                        D1                                              D2


                                                                               Q
                         F1                                F2                      F3        F4



Slide 42   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
In-Forest Eval Insert Mechanics


1) New URI+Document arrive at E-node
           a)      Compute Fi using
           b)      Ask server to evaluated the insert query only against Fi
2)         URI Probe ā€“ Fi Only
3)         URI Lock ā€“ Fi Only
4)         Forest Assignment ā€“ Fi Only
5)         Indexing
6)         Journaling
7)         Commit ā€“ transaction complete
8)         Lock Release - Fi Only



Slide 43    Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Regular Insert Vs. In-Forest Eval


               70000


               60000


               50000
 docs/second




               40000


                                                                                                     regular insert
               30000
                                                                                                     in-forest-eval
                                                                                                     In-Forest Eval

               20000


               10000


                   0
                                           1DE                                       2DE       3DE
                                                                                  # of nodes




Slide 44          Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
Tada! Achieving 100K Updates




                                                                           In-Forest Eval




Slide 45   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
NoSQL document-oriented database
with real-time full-text search and transactions
    Doing 100K transactions per second




                                     (Yes thatā€™s marker felt)
Thank You!

@eedeebee
eric.bloch@marklogic.com




Slide 49   Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.

More Related Content

Viewers also liked

MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesIoan Toma
Ā 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itnathanmarz
Ā 
MiFID II: The Top 5 Data Challenges
MiFID II: The Top 5 Data ChallengesMiFID II: The Top 5 Data Challenges
MiFID II: The Top 5 Data ChallengesMarkLogic
Ā 
36518008 hdfc-bank-3
36518008 hdfc-bank-336518008 hdfc-bank-3
36518008 hdfc-bank-3Dhiraj Ahuja
Ā 
Junli Gu at AI Frontiers: Autonomous Driving Revolution
Junli Gu at AI Frontiers: Autonomous Driving RevolutionJunli Gu at AI Frontiers: Autonomous Driving Revolution
Junli Gu at AI Frontiers: Autonomous Driving RevolutionAI Frontiers
Ā 
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...AI Frontiers
Ā 
Speed up your Symfony2 application and build awesome features with Redis
Speed up your Symfony2 application and build awesome features with RedisSpeed up your Symfony2 application and build awesome features with Redis
Speed up your Symfony2 application and build awesome features with RedisRicard Clau
Ā 
hdfc Bank project
hdfc Bank projecthdfc Bank project
hdfc Bank projectDevendra Jadon
Ā 
Organization study on hdfc bank
Organization study on hdfc bankOrganization study on hdfc bank
Organization study on hdfc bankStudsPlanet.com
Ā 
HDFC Persentation
HDFC PersentationHDFC Persentation
HDFC PersentationSanjiv Yadav
Ā 
GDPR: A Step-By-Step Guide To Compliance
GDPR: A Step-By-Step Guide To ComplianceGDPR: A Step-By-Step Guide To Compliance
GDPR: A Step-By-Step Guide To ComplianceMarkLogic
Ā 
Project report format
Project report formatProject report format
Project report formatrohtb2010
Ā 
Preparing Detailed Project Report and Presenting Business Plan to Investors
Preparing Detailed Project Report  and Presenting Business Plan to InvestorsPreparing Detailed Project Report  and Presenting Business Plan to Investors
Preparing Detailed Project Report and Presenting Business Plan to InvestorsRahul Sharma
Ā 
Preparation of project report for bank finance
Preparation of project report for bank financePreparation of project report for bank finance
Preparation of project report for bank financeRevanth Rao
Ā 
A study on financial analysis of hdfc bank
A study on financial analysis of hdfc bankA study on financial analysis of hdfc bank
A study on financial analysis of hdfc bankMehul Rasadiya
Ā 
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchJeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchAI Frontiers
Ā 
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning WorldSoumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning WorldAI Frontiers
Ā 
Jisheng Wang at AI Frontiers: Deep Learning in Security
Jisheng Wang at AI Frontiers: Deep Learning in SecurityJisheng Wang at AI Frontiers: Deep Learning in Security
Jisheng Wang at AI Frontiers: Deep Learning in SecurityAI Frontiers
Ā 
Charles Fan at AI Frontiers: The New Era of AI Plus
Charles Fan at AI Frontiers: The New Era of AI PlusCharles Fan at AI Frontiers: The New Era of AI Plus
Charles Fan at AI Frontiers: The New Era of AI PlusAI Frontiers
Ā 

Viewers also liked (20)

MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
Ā 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
Ā 
MiFID II: The Top 5 Data Challenges
MiFID II: The Top 5 Data ChallengesMiFID II: The Top 5 Data Challenges
MiFID II: The Top 5 Data Challenges
Ā 
36518008 hdfc-bank-3
36518008 hdfc-bank-336518008 hdfc-bank-3
36518008 hdfc-bank-3
Ā 
Junli Gu at AI Frontiers: Autonomous Driving Revolution
Junli Gu at AI Frontiers: Autonomous Driving RevolutionJunli Gu at AI Frontiers: Autonomous Driving Revolution
Junli Gu at AI Frontiers: Autonomous Driving Revolution
Ā 
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Ā 
Speed up your Symfony2 application and build awesome features with Redis
Speed up your Symfony2 application and build awesome features with RedisSpeed up your Symfony2 application and build awesome features with Redis
Speed up your Symfony2 application and build awesome features with Redis
Ā 
hdfc Bank project
hdfc Bank projecthdfc Bank project
hdfc Bank project
Ā 
Organization study on hdfc bank
Organization study on hdfc bankOrganization study on hdfc bank
Organization study on hdfc bank
Ā 
HDFC Persentation
HDFC PersentationHDFC Persentation
HDFC Persentation
Ā 
GDPR: A Step-By-Step Guide To Compliance
GDPR: A Step-By-Step Guide To ComplianceGDPR: A Step-By-Step Guide To Compliance
GDPR: A Step-By-Step Guide To Compliance
Ā 
Project report format
Project report formatProject report format
Project report format
Ā 
Preparing Detailed Project Report and Presenting Business Plan to Investors
Preparing Detailed Project Report  and Presenting Business Plan to InvestorsPreparing Detailed Project Report  and Presenting Business Plan to Investors
Preparing Detailed Project Report and Presenting Business Plan to Investors
Ā 
Hdfc bank ppt
Hdfc bank pptHdfc bank ppt
Hdfc bank ppt
Ā 
Preparation of project report for bank finance
Preparation of project report for bank financePreparation of project report for bank finance
Preparation of project report for bank finance
Ā 
A study on financial analysis of hdfc bank
A study on financial analysis of hdfc bankA study on financial analysis of hdfc bank
A study on financial analysis of hdfc bank
Ā 
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchJeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Ā 
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning WorldSoumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
Ā 
Jisheng Wang at AI Frontiers: Deep Learning in Security
Jisheng Wang at AI Frontiers: Deep Learning in SecurityJisheng Wang at AI Frontiers: Deep Learning in Security
Jisheng Wang at AI Frontiers: Deep Learning in Security
Ā 
Charles Fan at AI Frontiers: The New Era of AI Plus
Charles Fan at AI Frontiers: The New Era of AI PlusCharles Fan at AI Frontiers: The New Era of AI Plus
Charles Fan at AI Frontiers: The New Era of AI Plus
Ā 

Similar to Real World Big Data, 100,000 transactions a second in a NoSQL Database

UN/EDIFACT Interchange Processing with Smooks v1.4
UN/EDIFACT Interchange Processing  with Smooks v1.4UN/EDIFACT Interchange Processing  with Smooks v1.4
UN/EDIFACT Interchange Processing with Smooks v1.4tfennelly
Ā 
Intro To Starling Framework for ActionScript 3.0
Intro To Starling Framework for ActionScript 3.0Intro To Starling Framework for ActionScript 3.0
Intro To Starling Framework for ActionScript 3.0Rivello Multimedia Consulting
Ā 
Mock Objects from Concept to Code
Mock Objects from Concept to CodeMock Objects from Concept to Code
Mock Objects from Concept to CodeRob Myers
Ā 
Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...
Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...
Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...Cambridge Consultants
Ā 
Community based harversting for USDL
Community based harversting for USDLCommunity based harversting for USDL
Community based harversting for USDLJorge Cardoso
Ā 
Migrating a Cognos Instance between Authentication Sources with Motio
Migrating a Cognos Instance between Authentication Sources with MotioMigrating a Cognos Instance between Authentication Sources with Motio
Migrating a Cognos Instance between Authentication Sources with MotioSenturus
Ā 
SmartRM -- key technologies
SmartRM -- key technologiesSmartRM -- key technologies
SmartRM -- key technologiesSmartRM
Ā 
Sap edi idoc
Sap edi idocSap edi idoc
Sap edi idocLokesh Modem
Ā 
DIADEM WWW 2012
DIADEM WWW 2012DIADEM WWW 2012
DIADEM WWW 2012Giorgio Orsi
Ā 
O leary2012 comp_ppt_ch07
O leary2012 comp_ppt_ch07O leary2012 comp_ppt_ch07
O leary2012 comp_ppt_ch07Dalia Saeed
Ā 
Seed capital
Seed capitalSeed capital
Seed capitalTom Howard
Ā 
Bending The Curve
Bending The CurveBending The Curve
Bending The Curvefinteligent
Ā 
Domain Driven Design Development Spring Portfolio
Domain Driven Design Development Spring PortfolioDomain Driven Design Development Spring Portfolio
Domain Driven Design Development Spring PortfolioSrini Penchikala
Ā 
Building Windows 8 Apps with Windows Azure
Building Windows 8 Apps with Windows AzureBuilding Windows 8 Apps with Windows Azure
Building Windows 8 Apps with Windows AzureSupote Phunsakul
Ā 
Asian Eye for the Slovene Guy
Asian Eye for the Slovene GuyAsian Eye for the Slovene Guy
Asian Eye for the Slovene GuyBenjamin Joffe
Ā 
Penerapan TIK Bagi Dunia Pendidikan
Penerapan TIK Bagi Dunia PendidikanPenerapan TIK Bagi Dunia Pendidikan
Penerapan TIK Bagi Dunia Pendidikaniwita_1
Ā 
SeCold - A Linked Data Platform for Mining Software Repositories
SeCold - A Linked Data Platform for  Mining Software RepositoriesSeCold - A Linked Data Platform for  Mining Software Repositories
SeCold - A Linked Data Platform for Mining Software Repositoriesimanmahsa
Ā 
Scaling Crashlytics: Building Analytics on Redis 2.6
Scaling Crashlytics: Building Analytics on Redis 2.6Scaling Crashlytics: Building Analytics on Redis 2.6
Scaling Crashlytics: Building Analytics on Redis 2.6Crashlytics
Ā 

Similar to Real World Big Data, 100,000 transactions a second in a NoSQL Database (20)

UN/EDIFACT Interchange Processing with Smooks v1.4
UN/EDIFACT Interchange Processing  with Smooks v1.4UN/EDIFACT Interchange Processing  with Smooks v1.4
UN/EDIFACT Interchange Processing with Smooks v1.4
Ā 
Intro To Starling Framework for ActionScript 3.0
Intro To Starling Framework for ActionScript 3.0Intro To Starling Framework for ActionScript 3.0
Intro To Starling Framework for ActionScript 3.0
Ā 
Mock Objects from Concept to Code
Mock Objects from Concept to CodeMock Objects from Concept to Code
Mock Objects from Concept to Code
Ā 
Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...
Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...
Cambridge Consultants Innovation Day 2012: Minimising risk and delay for comp...
Ā 
Community based harversting for USDL
Community based harversting for USDLCommunity based harversting for USDL
Community based harversting for USDL
Ā 
Migrating a Cognos Instance between Authentication Sources with Motio
Migrating a Cognos Instance between Authentication Sources with MotioMigrating a Cognos Instance between Authentication Sources with Motio
Migrating a Cognos Instance between Authentication Sources with Motio
Ā 
SmartRM -- key technologies
SmartRM -- key technologiesSmartRM -- key technologies
SmartRM -- key technologies
Ā 
Elveego circuits
Elveego circuitsElveego circuits
Elveego circuits
Ā 
Sap edi idoc
Sap edi idocSap edi idoc
Sap edi idoc
Ā 
SimD
SimDSimD
SimD
Ā 
DIADEM WWW 2012
DIADEM WWW 2012DIADEM WWW 2012
DIADEM WWW 2012
Ā 
O leary2012 comp_ppt_ch07
O leary2012 comp_ppt_ch07O leary2012 comp_ppt_ch07
O leary2012 comp_ppt_ch07
Ā 
Seed capital
Seed capitalSeed capital
Seed capital
Ā 
Bending The Curve
Bending The CurveBending The Curve
Bending The Curve
Ā 
Domain Driven Design Development Spring Portfolio
Domain Driven Design Development Spring PortfolioDomain Driven Design Development Spring Portfolio
Domain Driven Design Development Spring Portfolio
Ā 
Building Windows 8 Apps with Windows Azure
Building Windows 8 Apps with Windows AzureBuilding Windows 8 Apps with Windows Azure
Building Windows 8 Apps with Windows Azure
Ā 
Asian Eye for the Slovene Guy
Asian Eye for the Slovene GuyAsian Eye for the Slovene Guy
Asian Eye for the Slovene Guy
Ā 
Penerapan TIK Bagi Dunia Pendidikan
Penerapan TIK Bagi Dunia PendidikanPenerapan TIK Bagi Dunia Pendidikan
Penerapan TIK Bagi Dunia Pendidikan
Ā 
SeCold - A Linked Data Platform for Mining Software Repositories
SeCold - A Linked Data Platform for  Mining Software RepositoriesSeCold - A Linked Data Platform for  Mining Software Repositories
SeCold - A Linked Data Platform for Mining Software Repositories
Ā 
Scaling Crashlytics: Building Analytics on Redis 2.6
Scaling Crashlytics: Building Analytics on Redis 2.6Scaling Crashlytics: Building Analytics on Redis 2.6
Scaling Crashlytics: Building Analytics on Redis 2.6
Ā 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
Ā 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
Ā 
Navi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelDeepika Singh
Ā 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Ā 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
Ā 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
Ā 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Ā 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
Ā 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vƔzquez
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
Ā 

Recently uploaded (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Ā 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Navi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Ā 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Ā 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Ā 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Ā 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ā 

Real World Big Data, 100,000 transactions a second in a NoSQL Database

  • 1. Achieving 100,000 Transactions Per Second with a NoSQL Database Eric David Bloch @eedeebee 23 May 2012
  • 2. A bit about me ā€¢ Iā€™ve written software used by millions of people. Apps, libraries, compilers, device drivers, operating systems ā€¢ This is my third Gluecon and my first talk ā€¢ Iā€™m the Community Director at MarkLogic, last 2 years. ā€¢ I survived having 3 kids in less than 2 years.
  • 3. Like me, but not me
  • 5. Storyline ā€¢ Why? ā€¢ How? ā€¢ Whirl-wind tour of database architecture ā€¢ Techniques to get to 100K
  • 6.
  • 7. ļ‚§ Itā€™s about money. ļ‚§ Top 5 bank needed to manage trades ļ‚§ Trades look more like documents than tables ļ‚§ Schemas for trades change all the time ļ‚§ Transactions ļ‚§ Scale and velocity (ā€œBig Dataā€)
  • 8. Trades and Positions ļ‚§ 1 million trades per day ļ‚§ Followed by 1 million position reports at end of day ļ‚§ Roll up trades of current date for each ā€œbook, instrumentā€ pair ļ‚§ Group-by, with key = ā€œdate, book, instrumentā€ 1M positions 1M trades Day Day Day Day Day ... 1 2 3 4 5
  • 9. Trades and Positions <trade> <quantity>8540882</quantity> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> <position> <instrument>EAAFX</instrument> <book>679</book> <quantity>3</quantity> <business-date>2011-03-25Z</business-date> <position-date>2011-03-24Z</position-date> </position>
  • 10. Requirements NoSQL flexibility, performance & scale Enterprise-grade transactional guarantees
  • 11. Now show us ā€¢ 15K inserts per second ā€¢ Linear scalability
  • 12. What NoSQL Database out there can we use? A quote from the bank ā€œWe threw everything we had at MarkLogic and it didnā€™t break a sweatā€
  • 13. We werenā€™t content with 15K So we showed themā€¦ Slide 13 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 14. What is MarkLogic? ļ‚§ Non-relational, document-oriented, distributed database ļ‚§ Shared nothing clustering, linear scale out ļ‚§ Multi-Version Concurrency Control (MVCC) ļ‚§ Transactions (ACID) ļ‚§ Search Engine ļ‚§ Web scale (Big Data) ļ‚§ Inverted indexes (term lists) ļ‚§ Real-time updates ļ‚§ Compose-able queries Slide 14 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 15. Architecture ļ‚§ Data model ļ‚§ Indexing ļ‚§ Clustering ļ‚§ Query execution ļ‚§ Transactions Slide 15 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 16. Key (URI) Value (Document) /trade/153748994 <trade> <id>8</id> <time>2012-02-20T14:00:00</time> <instrument>BYME AAA</instrument> <price cur=ā€œusdā€>600.27</price> </trade> /user/eedeebee { ā€œnameā€ : ā€œEric Blochā€, ā€œageā€ : 47, ā€œhairā€ : ā€œgrayā€, ā€œkidsā€ : [ ā€œGraceā€, ā€œRyanā€, ā€œOwenā€ ] } /book5293 It was the best of times, it was the worst of times, it was the age of wisdom, ... /2012-02-20T14:47:53/01445 .mp3 .avi [your favorite binary format]
  • 17. Inverted Index ā€œwhichā€ 123, 127, 129, 152, 344, 791 . . . ā€œuniquelyā€ 122, 125, 126, 129, 130, 167 . . . ā€œidentifyā€ 123, 126, 130, 142, 143, 167 . . . ā€œeachā€ 123, 130, 131, 135, 162, 177 . . . Document ā€œuniquely identifyā€ 126, 130, 167, 212, 219, 377 . . . References <article> ... 126, 130, 167, 212, 219, 377 . . . article/abstract/@author ... <product>IMS</product> Slide 17 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 18. Range Index <trade> <trader_id>8</trader_id> Value Docid <time>2012-02-20T14:00:00</time> <instrument>IBM</instrument> 0 287 ā€¦ </trade> 8 1129 13 531 Docid Value <trade> <trader_id>13</trader_id> ā€¦ ā€¦ 287 0 <time>2012-02-20T14:30:00</time> Rows <instrument>AAPL</instrument> ā€¦ ā€¦ 531 13 ā€¦ 1129 8 </trade> ā€¦ ā€¦ <trade> <trader_id>0</trader_id> ā€¦ ā€¦ <time>2012-02-20T15:30:00</time> <instrument>GOOG</instrument> ā€¢ Column Oriented ā€¦ ā€¢ Memory Mapped </trade> Slide 18 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 19. Shared-Nothing Clustering Host D1 Host D2 Host D3 ā€¦ Host Dj Slide 19 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 20. Query Evaluation ā€“ ā€œMapā€ Q Host E1 Host E2 Host E3 ā€¦ Host Ei Host D1 Host D2 Host D3 Host Dj ā€¦ Q Q Q Q F1 ā€¦ Fn F1 ā€¦ Fn F1 ā€¦ Fn F1 ā€¦ Fn Slide 21 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 21. Query Evaluation ā€“ ā€œReduceā€ Top 10 Host E1 Host E2 Host E3 ā€¦ Host Ei Host D1 Host D2 Host D3 Host Dj ā€¦ Top Top Top Top 10 10 10 10 F1 ā€¦ Fn F1 ā€¦ Fn F1 ā€¦ Fn F1 ā€¦ Fn Slide 22 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 22. Queries/Updates with MVCC ļ‚§ Every query has a timestamp ļ‚§ Documents do not change ļ‚§ Reads are lock-free ļ‚§ Inserts ā€“ see next slide ļ‚§ Deletes ā€“ mark as deleted ļ‚§ Edits ā€“ ļ‚§ copy ļ‚§ edit ļ‚§ insert the copy ļ‚§ mark the original as deleted Slide 23 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 23. Insert Mechanics 1) New URI+Document arrive at E-node 2) URI Probe ā€“ determine whether URI exists in any forest 3) URI Lock ā€“ write locks is taken on D node(s) 4) Forest Assignment ā€“ URI is deterministically placed in Forest 5) Indexing 6) Journaling 7) Commit ā€“ transaction complete 8) Release URI Locks ā€“ D node(s) are notified to release lock Slide 24 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 24. Save-and-merge (Log Stuctured Tree Merge) Database Insert/ Update Forest 1 Forest 2 S0 Memory Save Disk Journaled S1 S2 S1 S2 S3 S3 Merge J Slide 25 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 26. Trades and Positions ļ‚§ 1 million trades per day ļ‚§ followed by 1 million position reports at end of day ļ‚§ Roll up trades of the current ā€œdateā€ for each ā€œbook:instrumentā€ pair ļ‚§ Group-by, with key = ā€œbook:date:instrumentā€ 1M positions 1M trades Day Day Day Day Day ... 1 2 3 4 5 Slide 27 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 27. Naive Query Pseudocode for each book for each instrument in that book position = position(yesterday, book, instrument) for each trade of that instrument in this book position += trade(today, book, instrument).quantity insert(today, book, instrument, position) Slide 28 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 28. Initial results Single node ā€“ 19,000 inserts per second Slide 29 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 29. Initial results with cluster 70000 60000 50000 40000 doc/sec 30000 Report Query 2 20000 10000 0 1DE 2DE 3DE # of nodes Slide 30 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 30. Techniques to get to 100K ļ‚§ Query for Computing New Positions ļ‚§ Materialized compound key, Co-Occurrence Query and Aggregation ļ‚§ Insert of New Positions ļ‚§ Batching ļ‚§ In-Forest Eval Slide 31 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 31. Materializing a compound key <trade> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> Slide 32 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 32. Materializing a compound key <trade> <roll-up book-date-instrument=ā€œ151333445566782303ā€/> <quantity2>1193.71</quantity2> <instrument>WASAX</instrument> <book>679</book> <trade-date>2011-03-13-07:00</trade-date> <settle-date>2011-03-17-07:00</settle-date> </trade> Slide 33 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 33. Co-Occurrence and Distributed Aggregation <trade> <roll-up book-date-instrument=ā€œ151373445566703ā€/> V D D V <quantity>8540882</quantity> ā€¦ ā€¦ <instrument>WASAX</instrument> ā€¦ ā€¦ ā€¦ 1129 <book>679</book> ā€¦ ā€¦ <trade-date>2011-03-13-07:00</trade-date> ā€¦ ā€¦ 15137ā€¦ ā€¦ <settle-date>2011-03-17-07:00</settle-date> ā€¦ ā€¦ </trade> ā€¦ ā€¦ ā€¦ ā€¦ ā€¦ ā€¦ ā€¢ Co-occurrences: V D Find pairings of range indexed values D V ā€¦ ā€¦ ā€¢ Aggregate on the D nodes (Map/Reduce): 8540882 ā€¦ 1129 ā€¦ Sum up the quantities above ā€¦ ā€¦ ā€¦ ā€¦ ā€¦ 8540882 ā€¢ Similar to a Group-by, ā€¦ ā€¦ ā€¢ in a column-oriented, in-memory ā€¦ ā€¦ ā€¦ ā€¦ database ā€¦ ā€¦ Slide 34 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 34. Initial results with new query > 30K inserts/second on a single node Slide 35 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 35. Co-Occurrence + Aggregate versus NaĆÆve approach 70000 60000 50000 40000 doc/sec Report Query 3 30000 Report Query 2 20000 10000 0 1DE 2DE 3DE # of nodes Slide 36 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 36. Techniques ļ‚§ Computing Positions ļ‚§ Materialized compound key, Co-Occurrence Query and Aggregation ļ‚§ Updates ļ‚§ Batching ļ‚§ In-Forest Eval Slide 37 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 37. Transaction Size and Throughput 35000 30000 25000 #docs per sec. 20000 15000 insert throughput 10000 5000 0 1 2 4 8 16 32 100 500 1000 10000 # doc inserts per transaction Slide 38 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 38. Techniques ļ‚§ Computing Positions (Data Warehouse) ļ‚§ Hash Key Decoration ļ‚§ Co-Occurrence ļ‚§ Updates (Transaction) ļ‚§ Batching ļ‚§ In-Forest Eval Slide 39 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 39. Insert Mechanics 1) New URI+Document arrive at E-node 2) *URI Probe ā€“ determine whether URI exists in any forest 3) *URI Lock ā€“ write locks are created 4) Forest Assignment ā€“ URI is deterministically placed in Forest 5) Indexing 6) Journaling 7) Commit ā€“ transaction complete 8) *Release URI Locks ā€“ D nodes are notified to release lock * Overhead of these operations increases with cluster size Slide 40 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 40. Deterministic Placement 4) Forest Assignment ā€“ URI is deterministically placed in Forest Hash 64-bit Bucketed Fi URI function number into a Forest ā€¢ Done in C++ within server ā€¢ Butā€¦ ā€¢ Can also be done in the client ā€¢ Server allows queries to be evaluated against only one forestā€¦ Slide 41 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 41. In-Forest Eval Q E D1 D2 Q F1 F2 F3 F4 Slide 42 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 42. In-Forest Eval Insert Mechanics 1) New URI+Document arrive at E-node a) Compute Fi using b) Ask server to evaluated the insert query only against Fi 2) URI Probe ā€“ Fi Only 3) URI Lock ā€“ Fi Only 4) Forest Assignment ā€“ Fi Only 5) Indexing 6) Journaling 7) Commit ā€“ transaction complete 8) Lock Release - Fi Only Slide 43 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 43. Regular Insert Vs. In-Forest Eval 70000 60000 50000 docs/second 40000 regular insert 30000 in-forest-eval In-Forest Eval 20000 10000 0 1DE 2DE 3DE # of nodes Slide 44 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 44. Tada! Achieving 100K Updates In-Forest Eval Slide 45 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.
  • 45. NoSQL document-oriented database with real-time full-text search and transactions Doing 100K transactions per second (Yes thatā€™s marker felt)
  • 46.
  • 47.
  • 48. Thank You! @eedeebee eric.bloch@marklogic.com Slide 49 Copyright Ā© 2012 MarkLogicĀ® Corporation. All rights reserved.

Editor's Notes

  1. Pain pointsThree tiered architectureHouse keeping &amp; AvailabilityStraw in a swimming pool
  2. Pain pointsThree tiered architectureHouse keeping &amp; AvailabilityStraw in a swimming pool
  3. Pain pointsThree tiered architectureHouse keeping &amp; AvailabilityStraw in a swimming pool
  4. word and phrase search, booleansearch, proximity, wildcarding, stemming, tokenization, decompounding, case-sensitivity options, punctuation-sensitivity options, diacritic-sensitivity options, document quality settings, numerous relevance algorithms, individual term weighting, topic clustering, faceted navigation, custom-indexed fields, and more.