Your SlideShare is downloading. ×
Leveraging NoSQL Database Technology to Implement Real-time Data Architectures- Webcast
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Leveraging NoSQL Database Technology to Implement Real-time Data Architectures- Webcast


Published on

Impetus webcast "Leveraging NoSQL Database Technology to Implement Real-time Data Architectures” available at …

Impetus webcast "Leveraging NoSQL Database Technology to Implement Real-time Data Architectures” available at

This webcast:
• Presents trade-offs of using different approaches to achieve a real-time architecture
• Closely examines an implementation of a NoSQL based real-time architecture
• Shares specific capabilities offered by NoSQL Databases that enable cost and reliability advantages over other techniques

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Introduction to challenges of building real time solutions with cachingCache Miss performance, URL ReWrites ( alternative - redirects can be slow ), Complex logic ( applied to same request ), debugging difficulty, the distributed lock manager, when data locality is fleeting, time to restart on node failure, many caches – one database process makes a bottleneck – so need partitioning anyway, cache itself becomes the concurrency bottleneck, why are you adding cache to scale requests or to reduce the load on your expensive database, 30-70% database offload to save money, but what about a different database, wasting memory caching at both cache and database space, cache consistency in a distributed scenario, which algorithm is actually useful for your data access patterns ( FIFO, LRU, FFU, MRU, ARC, Second Chance, Time based ) 2.   Need to look at cache hit ratio and cache miss and their latency differenceOpen source caches - Java Caching System (JCS)· Ehcache, OSCache, Cache4J, ShiftOne, WhirlyCache, SwarmCache, JBoss Cache    Physical nesting / storage under hierarcical keys .. As opposed to simple key look-up. How do you ensure data localization in cachingonly solution that does not have rich key semantics?
  • Bottom line: NoSQL is a disruptive technology that is all about “data scalability at cost” first and foremost. That is what is interesting to customers. Sure, there are specific technical features that are also important to the NoSQL technology story, but they are secondary. In fact, as we’ll see in a few slides, each NoSQL database implementation is a little different from the others. Just like RBDMS implementations were in the early to mid-eighties. However, the main reason that NoSQL databasetechnologywas createdwas to provide more efficient and more cost effective data management for fast, flexible, distributed, highly available applications that what is provided today in general purpose RDBMS systems. -- Not Part of the Script –Additional notes: This is not new technology. Non-relational databases, and in particular, key-value databases have been around for a long time. Oracle Berkeley DB is a good example of a key-value database that has been around for 20 years. What is innovative and disruptive about this technology is the fact that scalable, horizontal distribution of data and high availability via replication are built-in to the architecture of the product. This combination of simple get/put, non-relational operations is along with the distributed, scalable, highly available qualities of Big Data is what makes it disruptive.
  • The solution for these challenges is Oracle’s NoSQL Database. It’s a distributed key-value database. Think of key-value as a two column table – a key and a value. Give key-value example: give customer ID as the key and customer profile (and other information) as the value/record. NoSQL Database is good for app’s where just need fast, simple db requests (key/value lookup, no join’s), use a schema defined dynamically at runtime by the application itself, and have extreme scalability requirements.The k-v technology isn’t new: mainframe ISAM, Berkeley DB and other products have implemented this kind of storage in the past. What’s new is: Distributed, high volume, high velocity nature of the storage and the applications that use them,Ability to create multiple, distributed indexes and hash to/look up the appropriate one, instead of just creating a single or limited set of indexes.
  • You can think of a transaction as a single auto-commit API call. That API call can be for a single record, multiple records or multiple operations AS LONG AS all of the records are for the same Major Key. However many records/operations are in that API call, they are all committed atomically (all or nothing). Because they all share the same Major Key, all of the data being affected resides on a single storage node, so we can guarantee the transactional semantics of the transaction commit. We will replicate that transaction to the replicas (copies of the data) as part of the transaction. Of course, not all operations are created equal. In some cases you may want operations that are not completely ACID. One of the benefits of NoSQL is that it relaxes transactional guarantees in order to provide faster throughput. The Oracle NoSQL Database allows you to override the default and relax the ACID properties on a per-operation basis, allowing the application to specify the transactional behavior that is most appropriate. ---------------------------Oracle NoSQL Database allows you to relax/configure the Consistencyand Durability policies for a given operation. Durability is controlled by defining the Write Policy and the HA Acknowledgement Policy. You can increase write transactions performance by relaxing the Durability constraints. The default is Write-to-memory, Majority Ack. Consistency is controlled by defining the Read Guarantees that you require from the system. You can increase read transaction performance by relaxing the Consistency constraints. The default is None.
  • Elasticity refers to dynamic/online expansion changes in a deployed store configuration.  New storage nodes are added to a store to increase performance, reliability, or both.Increase Data Capacity - A Company’s Oracle NoSQL Database application is now obtaining it’s data from several unplanned new sources.  The utilization of the existing configuration as more than adequate  to meet requirements, with one exception, they anticipate running out of disk space later this year.  The company would  like to add the needed disks to the existing servers in existing slots, establish mount points, ask NoSQL Database to fully utilize the new disks along with the disks already in place while the system is up and running Oracle NoSQL Database.  The Administrator after installing the new disks, defines a new topology using the Administrator with the new mount points and capacity value such that new replication nodes can be created on the existing storage nodes.  The administrator can review the plan for errors and then when ready the new topology is deployed while the Oracle NoSQL Database is online and continues to serve the running application with CRUD operations.Increase Throughput-  As a result of an unplanned corporate merger, the live Oracle NoSQL Database will see a substantial increase in write operations.  The read write mix of transactions will go from 50/50 to 85/15.  The need workload will exceeds the I/O capacity available of the available storage nodes.  The company would like to add new hardware and have it be utilized by the existing Oracle NoSQL Database (kvstore) currently in place.  Oh, and of course the Application needs to continue to be available while this upgrade is occurring.With the new elasticity capabilities and topology planning, the administrator can add the new hardware and define a new topology with the new Storage Nodes.  The administrator can then look at the resulting topology (storage nodes, replication nodes, shards, etc)  to confirm it meets their requirements.  Once they are satisfied with the new topolgy they can also determine when they want to deploy the new topology in the background and while the existing application continues to operate.   As partitions/chunks of data are moved they are made available to the live system.  Increase Replication Factor-  A new requirement has been placed on an existing Oracle NoSQL Database to increase the overall availability of the Oracle NoSQL Database by increasing the replication factor by utilizing new storage nodes added in a second geographic location.  This is accomplished by adding at least 1 replication node for every existing shard.  The current configuration has a replication factor of 3.While the system is live, the administrator changes the topology to define the new storage nodes and define the replication factor.  Again the administrator can validate the topology and review it before deploying.  As a side point, the administrator could validate several changes to evaluate alternatives and then decide which topology to deploy.  Just like the other scenarios described the data is automatically moved and partitions are made available as they are moved as part of a background activity.  Meanwhile the KVStore continues to service the existing workload starting to use the new replicas as they become available.   Once the topology is deployed a new replication node has been created and populated for each shard.  We have increased availability by increasing the replication factor where the new storage nodes are in another geographic location. We have increased read throughput capability with the new Replication nodes for each shard and the Replication Factor is now 4.  
  • Notes: [1] Hardware for all runs 15 servers: 30 335 GB Fusion-IO SSDs, 2 per server 12 Xeon E5-2690, 2.9 GHz, 2 sockets, 32 threads, 193 GB RAM 3 Xeon E5-2680, 2.7 GHz, 2 sockets, 32 threads, 193 GB RAM 15 clients: Xeon X5670, 2.93 GHz, 2 sockets, 24 threads, 96 GB RAM Network: 10 gigabit ethernet[2] Configuration for scalability tests: Cluster configuration: 2 RNs per SN Replication factor: 3 JE cache size: 10 GB Java heap size: 15 GB JE nLockTables: 13 YCSB configuration: 200M entries per shard 20 clients on 10 client hosts, 2 processes per client 60 insert threads per SN 90 update threads per SN 95% read, 5% write in mixed workload Update mode: Read/modify/write
  • Notes: [1] Hardware for all runs 15 servers: 30 335 GB Fusion-IO SSDs, 2 per server 12 Xeon E5-2690, 2.9 GHz, 2 sockets, 32 threads, 193 GB RAM 3 Xeon E5-2680, 2.7 GHz, 2 sockets, 32 threads, 193 GB RAM 15 clients: Xeon X5670, 2.93 GHz, 2 sockets, 24 threads, 96 GB RAM Network: 10 gigabit ethernet[2] Configuration for scalability tests: Cluster configuration: 2 RNs per SN Replication factor: 3 JE cache size: 10 GB Java heap size: 15 GB JE nLockTables: 13 YCSB configuration: 200M entries per shard 20 clients on 10 client hosts, 2 processes per client 60 insert threads per SN 90 update threads per SN 95% read, 5% write in mixed workload Update mode: Read/modify/write
  • 1) Have you done any latency comparisons between Oracle NoSQL and other popular caching technologies? 2) What is the difference between MySQL NoSQL and Oracle NoSQL? 3) What development languages does Oracle's NoSQL database support?
  • Transcript

    • 1. 1
    • 2. Leveraging NoSQL to Implement Real-time Data Architectures Robert Greene, Senior Principal Product Manager, Oracle Championing the Oracle NoSQL Database Initiative Vineet Tyagi, VP Technology, Impetus Technologies Heads the Innovation Labs 2 Recorded version available at
    • 3. • The Evolving Real-time Data Caching Architecture • Oracle NoSQL Database, features for Real-time • An implementation: Real-time trading with NoSQL Program Agenda 3 Recorded version available at
    • 4. • Move from process space to networked • Need cache server and API • Bottlenecks: data load, network Data Caching Architectures 4 All data in memory Recorded version available at
    • 5. Data Caching Architectures 5 Partial data in memory • Bottlenecks in network and disk I/O • Algorithm dependencies • Cache hit/miss (FIFO, LFU, LRU, MRU, ARC, etc.) • Data composition, many seeks Recorded version available at
    • 6. • Bottleneck • Lock Manager • Data Load (composition) • Network • Complexity • APIs (App, Cache, Disk) • Expensive to grow Data Caching Architectures 6 All data in memory - Distributed Recorded version available at
    • 7. Data Caching Architectures 7 All data in partitioned cache space • Bottleneck • Data load, Network • Complexity • APIs (App, Cache, Disk) • Expensive to grow Recorded version available at
    • 8. Data Caching Architectures 8 All data in NoSQL backed storage • Bottleneck • Network • Reduced Complexity • API (App, Cache) • Reduced Cost • Disk instead of memory Recorded version available at
    • 9. What is Oracle NoSQL Database? 9 Non-relational key-value database designed for cost effective simple queries of high volume, velocity & variety data. Provides high performance & availability data storage and retrieval of simple data using a scale-out of servers design. Recorded version available at
    • 10. Architecture 10 Recorded version available at
    • 11. Oracle NoSQL Database 11 Scalable, Highly Available, Key-Value DatabaseFeatures • Flexible Key-Value Data Model • ACID transactions • Horizontally Scalable • Highly Available • Elastic Configuration • Simple administration • Intelligent Driver • Commercial grade software and support Application Storage Nodes Datacenter B Storage Nodes Datacenter A Application NoSQL DB Driver Application NoSQL DB Driver Application Java SE 6 (JDK 1.6.0 u25)+; Solaris or Linux Recorded version available at
    • 12. Oracle NoSQL Database 12 Real-time enabling features • Simple key-value based • No data composition lag • Integrated Caching • Single layer in architecture • WAL transactions • No disk seek time on write • Optimized seek on read • SSD optimized • Smart topology driver • Single RPC to dataRecorded version available at
    • 13. Features – Configurable CAP 13 Greater Flexibility • Configurable Durability per operation • Configurable Consistency per operation • ACID by default • Transaction scope is single API call • Records share same major key • Multiple operations supported Recorded version available at
    • 14. Features – Smart Topology Management 14 Automated Resource Planning • Storage nodes have indication of “capacity” • System allocates replicas per storage node • Intelligent Master/Replica load balancing • Ensures distribution of replicas • Efficient use of system resources • Reduces operator-caused configuration errors Application Smart Topology Driver Recorded version available at
    • 15. Features – Elastic Cluster Expansion 15 Smart Topology • Increase Data Capacity • Add more storage nodes • New shards automatically created • Increase Data Throughput • More shards = better write throughput • More replicas/shard = better read throughput NoSQL DB Driver Application Master Replica Replica StorageNode StorageNode StorageNode Shard-1 Master Replica Replica Shard-2 Recorded version available at
    • 16. YCSB – Benchmark Commodity Servers 16 What‟s the Big Deal • 1.6 billion records • 94K insert/sec • 25K read/ update/sec • Low latency • Linear scalability Recorded version available at
    • 17. YCSB - Benchmarking with SSD’s 17 What‟s the Big Deal • Twitter sees ~500M tweets/day • This is 750M a minute • Capture twitter activity with 3 commodity servers • 1.25M ops/sec • 2 billion records • 2 TB of data • 95% read, 5% update • Low latency, High Scalability Recorded version available at
    • 18. Global Data At Scale Recorded version available at
    • 19. • Lack of understanding • Too many silos • Operational decisions made manually, big mindset change • Where to look for use cases ? • Business decisions (rules) hard coded in systems • Operational Planning, Real Time Service Assurance and Product Optimization Challenges in Real Time Analytics 19 Recorded version available at
    • 20. • Real time and Offline Analysis over Stock Tick data received from 10 Global Stock Exchanges • Real-time data ingestion at scale • Real-time Analysis • Support Offline Analysis Real Time Use Case 20 Recorded version available at
    • 21. • Stock tick data Analytics at „Global Market‟ scale • Around 10+ stock exchanges around the world that provide stock tick data • Easy installation, configuration and monitoring of Big data cluster • Real-time analytics at high volume • Real-time dashboard, key stats, high ingestion rates Scope and Solution 21 Recorded version available at
    • 22. Architecture 22
    • 23. • Oracle NoSQL Database • Ankush (Impetus open source tool) • StreamAnalytix (Impetus pre-built tech stack) • Kafka • Intellicus • D3.js • Tomcat Technology 23 Recorded version available at
    • 24. Deployment 24
    • 25. Throughput 25 RF Storag e Nodes Shards Replicatio n Nodes Partition s Memory Cache Throughput 2 4 4 8 40 Default Default 220k ticks per sec 3 6 4 12 30 Default Default 200k ticks per sec Recorded version available at
    • 26. Functional View 26 Recorded version available at
    • 27. World Indexes View 27 Recorded version available at
    • 28. Regional Stock Exchange View 28 Recorded version available at
    • 29. Functional View (Exchange View) 29 Recorded version available at
    • 30. Functional View (Stock View) 30 Recorded version available at
    • 31. Functional Views (Industry View) 31 Recorded version available at
    • 32. Functional Views (Historical Report) Intellicus view 32 Recorded version available at
    • 33. Infrastructure View 33 Recorded version available at
    • 34. Infrastructure Views - Ankush34 Recorded version available at
    • 35. Cluster Dashboard 35 Recorded version available at
    • 36. Oracle NoSQL Database Monitoring36 Recorded version available at
    • 37. Cluster Metrics 37
    • 38. Node Monitoring38 Recorded version available at
    • 39. Replica Node Monitoring39 Recorded version available at
    • 40. Operations Available40 Recorded version available at
    • 41. For more info and/or a demo - Gartner BI and Analytics Summit 2014, Las Vegas – Meet us and attend our session @impetustech Contact Impetus to Accelerate Your Big Data Journey 41 Recorded version available at
    • 42. Join NoSQL Database Community 42 Twitter!/OracleNoSQL LinkedIn Oracle’s NoSQL DB blog Oracle Technology Network Developer Webcast Series Recorded version available at
    • 43. Q&A 43 Recorded version available at
    • 44. Thank You 44