• Save
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar

on

  • 1,133 views

Impetus webcast ‘Build and Manage: Hadoop and Oracle NoSQL Database Solutions’ available at http://lf1.me/6W/

Impetus webcast ‘Build and Manage: Hadoop and Oracle NoSQL Database Solutions’ available at http://lf1.me/6W/

Statistics

Views

Total Views
1,133
Views on SlideShare
1,133
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • So if we take our examples from the previous slide….Healthcare & Retail is mostly a batch oriented process.Location based is mostly a real time service.Each has specific requirements around how they use and process the data. Depending on how you want to use and process the data, you need to choose the proper technology to store/acquire that data…
  • Given those scenarios, here's how they might be storage/managed. HDFS is a great distributed file system. Parallel, highly scalable. However, it’s tuned primarily for bulk sequential read/write of file blocks. There are no indices for fast access to specific data records, it’s not well suited for lots of small files or updating files that have already been written. Primarily a batch system, write lots of data, then read it all in parallel over and over. NoSQL DB is a distributed key-value database. It has indices. It’s designed for high volume reads and writes of simple data. It’s not tuned for reading/writing huge files – use a file system for that.
  • Bottom line: NoSQL is about “data management scalability at cost” first and foremost. There are some technical features that are also important, but they come secondary. With enough effort (HW and SW) you can solve most of the technical problems with RDBMS systems. However, the whole reason that NoSQL was invented was to deal with the fact that it’s too expensive to manage Big Data using general purpose RDBMS systems. Regarding CAP: http://en.wikipedia.org/wiki/CAP_theoremThe CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:Consistency (all nodes see the same data at the same time)Availability (a guarantee that every request receives a response about whether it was successful or failed)Partition tolerance (the system continues to operate despite arbitrary message loss)According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three. RDBMS products focus on CA, where as NoSQL products focus on AP.
  • Cox Communications. 128-node Hadoop cluster. Home-grown distributed key-value storage using Berkeley DB. Would have used NoSQL DB if it had been available 2-3 yrs ago.
  • Cox Communications. 128-node Hadoop cluster. Home-grown distributed key-value storage using Berkeley DB. Would have used NoSQL DB if it had been available 2-3 yrs ago.
  • This slide shows the master-slave architecture of Oracle NoSQL DB. Master receives the write and it asynchronously replicate the data to the other replica-nodes.
  • Oracle NoSQL DB uses simple, understandable k-v pairs, simple get/insert/update/delete operations and ACID transactions. Different than SQL in an RDBMS, but the model and behavior is very familiar to application developers.Think of keys as a directory structure: multiple parts, allowing you to traverse the hierarchy. Major Key determines where the data is stored (which shard). Keys (M+m) are unique, only one value per unique Key. Minor Key allows you to have multiple records for a given Major Key. Keys are simple strings. Value is a byte string. It’s anything that you want it to be. The application knows what the structure and content of the value is. Support for a flexible data serialization format will be available in future releases (Apache Avro http://en.wikipedia.org/wiki/Apache_Avro).
  • This is basically a summary slide, highlighting the features of Oracle NoSQL Database, especially the that we think set us apart from some of the other products that are out on the market. General Purpose: What we mean here is that Oracle NoSQL DB is built as a general purpose scalable, highly reliable NoSQL database. Several of the open source NoSQL databases on the market were built specifically to solve the technical problems at a given company – Voldemort was built by LinkedIn, Dynamo was built by Amazon, Big Table was built by Google – which can trend to affect the technical direction and design decisions for those products. That is not the case with Oracle NoSQL Database. Reliable: Unlike most of the NoSQL databases out there, which are inventing both storage and distributed data management, Oracle NoSQL Database uses Berkeley DB Java Edition for key-value storage and replication on the storage nodes. BDB has been running large production applications for many years and is a proven, reliable, scalable storage system.
  • Keep the cluster investment at workMost bang for your buckTraining NeededMultiple Management ToolsRapidly, automatically or rule based single click provisioning of Big Data ClustersMeasure the boost provided by Clusters/Grids to your business data processing capabilities. Need to change your choice of cluster software at any point of time when you feel that it is not sufficiently delivering to your needsManage big data solution from a single cluster management software umbrellaIT & System Administrators wantConsistent and easy to use provisioning, management & monitoring toolsCreate less disruption in the stack, reuse technology investmentsExtensibility, keep the same tooling when adding new big data technologies to the stackReduced outage timesReduced time to scale & production
  • Cluster Analytics – Cross Cluster AnalyticsOptimizationsSelf healing capabilitiesFail Safe for false negatives/positivesAdvanced ProfilingCapability to “certify” cluster performanceJob Profiling – weeds out bad written codeValue Added FeaturesTesting Framework for Map – Reduce jobs : certify build to production
  • This slide shows the master-slave architecture of Oracle NoSQL DB. Master receives the write and it asynchronously replicate the data to the other replica-nodes.
  • This slide shows the master-slave architecture of Oracle NoSQL DB. Master receives the write and it asynchronously replicate the data to the other replica-nodes.
  • This slide shows the master-slave architecture of Oracle NoSQL DB. Master receives the write and it asynchronously replicate the data to the other replica-nodes.
  • Experienced Advisors Accelerated Consulting & Services Leader for Big Data. Headquartered in San Jose, offices in India.Expertise through Architects Pioneers in distributed software engineering with both vertical and functional expertise. Dedicated Innovation Labs.Excellence delivered through technology Advances Open source and Innovation Product Portfolio.Founded 1991 – 1300 StrongLeading Big Data since 2008Chicago, NYC, Atlanta, Indore, Noida, BangaloreImpetus provides Big Data thought leadership and services, creating new ways of analyzing data to gain key business insights across enterprises. Impetus’ experience extends across the big data ecosystem including Hadoop, NoSQL, newsql, MPP databases, machine learning, and visualization. Impetus offers a Quick Start program, Architecture Advisory Services, Proof of Concept, and Implementation. 
  • Oracle NoSQL Database allows you to relax/configure the Consistencyand Durability policies for a given operation. Durability is controlled by defining the Write Policy and the HA Acknowledgement Policy. You can increase write transactions performance by relaxing the Durability constraints. The default is Write-to-memory, Majority Ack. Consistency is controlled by defining the Read Guarantees that you require from the system. You can increase read transaction performance by relaxing the Consistency constraints. The default is None.
  • We heard you – we have ACID transactions in Oracle NoSQL Database. You can think of a transaction as a single auto-commit API call. That API call can be for a single record, multiple records or multiple operations AS LONG AS all of the records are for the same Major Key. However many records/operations are in that API call, they are all committed atomically (all or nothing). Because they all share the same Major Key, all of the data being affected resides on a single storage node, so we can guarantee the transactional semantics of the transaction commit. We will replicate that transaction to the replicas (copies of the data) as part of the transaction. Of course, not all operations are created equal. In some cases you may want operations that are not completely ACID. One of the benefits of NoSQL is that it relaxes transactional guarantees in order to provide faster throughput. The Oracle NoSQL Database allows you to override the default and relax the ACID properties on a per-operation basis, allowing the application to specify the transactional behavior that is most appropriate.
  • Elasticity refers to dynamic/online expansion changes in a deployed store configuration.  New storage nodes are added to a store to increase performance, reliability, or both.Increase Data Capacity - A Company’s Oracle NoSQL Database application is now obtaining it’s data from several unplanned new sources.  The utilization of the existing configuration as more than adequate  to meet requirements, with one exception, they anticipate running out of disk space later this year.  The company would  like to add the needed disks to the existing servers in existing slots, establish mount points, ask NoSQL Database to fully utilize the new disks along with the disks already in place while the system is up and running Oracle NoSQL Database.  The Administrator after installing the new disks, defines a new topology using the Administrator with the new mount points and capacity value such that new replication nodes can be created on the existing storage nodes.  The administrator can review the plan for errors and then when ready the new topology is deployed while the Oracle NoSQL Database is online and continues to serve the running application with CRUD operations.Increase Throughput-  As a result of an unplanned corporate merger, the live Oracle NoSQL Database will see a substantial increase in write operations.  The read write mix of transactions will go from 50/50 to 85/15.  The need workload will exceeds the I/O capacity available of the available storage nodes.  The company would like to add new hardware and have it be utilized by the existing Oracle NoSQL Database (kvstore) currently in place.  Oh, and of course the Application needs to continue to be available while this upgrade is occurring.With the new elasticity capabilities and topology planning, the administrator can add the new hardware and define a new topology with the new Storage Nodes.  The administrator can then look at the resulting topology (storage nodes, replication nodes, shards, etc)  to confirm it meets their requirements.  Once they are satisfied with the new topolgy they can also determine when they want to deploy the new topology in the background and while the existing application continues to operate.   As partitions/chunks of data are moved they are made available to the live system.  Increase Replication Factor-  A new requirement has been placed on an existing Oracle NoSQL Database to increase the overall availability of the Oracle NoSQL Database by increasing the replication factor by utilizing new storage nodes added in a second geographic location.  This is accomplished by adding at least 1 replication node for every existing shard.  The current configuration has a replication factor of 3.While the system is live, the administrator changes the topology to define the new storage nodes and define the replication factor.  Again the administrator can validate the topology and review it before deploying.  As a side point, the administrator could validate several changes to evaluate alternatives and then decide which topology to deploy.  Just like the other scenarios described the data is automatically moved and partitions are made available as they are moved as part of a background activity.  Meanwhile the KVStore continues to service the existing workload starting to use the new replicas as they become available.   Once the topology is deployed a new replication node has been created and populated for each shard.  We have increased availability by increasing the replication factor where the new storage nodes are in another geographic location. We have increased read throughput capability with the new Replication nodes for each shard and the Replication Factor is now 4.  
  • Rebalance a configuration :A storage node has failed and must be replaced (KVStore continues to run). The new hardware is a much more powerful machine (9 Cores, 64 GB of real (compared to 8 GB), multiple 400 GB Solid State Drives). The hardware is a heterogenous hardware mix. The new hardware replaces the failed storage node and the System administrator add the new Storage node to the pool of available storage modes and then migrates the old (failed) Storage node to the new one. After successful migration (KVStore continues to run) the failed storage node is deleted and all Storage nodes are active again. Continuing to monitor the performance of the system and the existing topology, the administrator notices that some of the older storage nodes have 2 replication nodes on them and the CPU/IO utilization is high and latency is high as well, while the new much faster storage node is under utilized. By using the new physical topology planning  support available in this release,  Oracle NoSQL Database will rebalance the configuration and redistribute the data .  In other words, Oracle NoSQL Database will make optimal use of heterogeneous storage nodes. The new Storage nodes will likely have multiple replication nodes running on them while many of the older systems may go from 2 to 1.  The replication nodes will automatically be moved. Again this can all happen while the system is online and at the convenience of the company.By using the new physical topology planning  support available in this release,  Oracle NoSQL Database will rebalance the configuration and redistribute the data .  In other words, Oracle NoSQL Database will make optimal use of heterogeneous storage nodes. The new Storage nodes will likely have multiple replication nodes running on them while many of the older systems may go from 2 to 1.  The replication nodes will automatically be moved. Again this can all happen while the system is online and at the convenience of the company.Data Movement:•          Idempotent:  Can be run multiple times with the same result•         Interruptible:  You can interrupt at any time and the KVStore will continue running.  The company may have a peak workload period daily and may want to interrupt the data movement (as part of the new topology) and restart it after the peak period.    •         Restartable:  
  • Why Avro?Avro is used in multiple products such as Hadoop and other programming languages. Having a schema and serialization framework is advantageous when working with multiple programmers and other products such as Hadoop. Schema With Avro, each value is associated with an AVRO schema (created in JSON format) typically created by the application programmer. An advantage of using Avro is that the serialized values can be stored in a space efficient manner. Avro has a number of primitive data types, including. boolean, int, long, float and stringBindingsOracle NoSQL Database supports multiple binding types. Generic – Schemas are treated dynamically (not fixed at build time).Using Specific bindings (named SpecificAvroBinding) has the advantage of creating a POJO (Plain Old Java Object) class with getter and setter methods for each field in the schema. JSON Bindings: . The JSON binding JsonAvroBinding is easy to read or create and also can interoperate with other programs that use JSON objects. Raw – Low level serialization not performedSchema Evolution is important with large databases where you can’t simply update every key/value pair in the store. Different schemas (with defined constraints in the avro specification) can be used when data is read or written. With well defined constraints in the avro specification, the schema used to read data does not need to be exactly the same as for writing data. For example, let’s imagine we have a key/value record representing profile information for a user. We have a new requirement to add an alternate email address. The field is added and a default value is established. In the future if a new key/value pair is added, the alternate email address is added. If the profile information is updated, the alternate email address is added. On reads (for example displaying the profile information) the alternate email address may not have been updated yet and that is fine, a default value can be displayed. This allows complete flexibility in terms of providing the updated field over time.
  • New streaming API for Large Objects (recommended size greater than 1M to 100’s of GB). Examples would be audio files, video files, Medical Imaging. New methods were created of the kvstore handle (getLob, putLOB, deleteLOB, putLOBIfAbsent, putLOBIfPresent)The major difference is the Input stream utilized to chunk the Large Object. The result is that the smaller chunks can be stored across the KVStore (multiple shards) depending on size. In addition, the chunks are stored in parallel so the write/read operations are much faster.
  • External Table support. Allows you to access data in external sources as it is a table in the Oracle Relational Database. Through Oracle’s external table support, you can access Oracle NoSQL Database key/value paris as if they are rows in Oracle Database. This allows you to issue SQL read statements such as Select, Select Count(*) where the results are obtained from Oracle NoSQL Database. Since Select statements can refer to multiple tables, the query can be looking at both Oracle NoSQL Database information AND data that resides directly in the Oracle Database. It also means that the data can be accessed via JDBC.Sample Programs and javadoc are available. Event Processing.The cartridge will work with Oracle EP.
  • From http://www.slideshare.net/jmusser/j-musser-apishotnotgluecon2012, slide 23
  • There’s a web-based Admin GUI which is a great way to get started. Most production sites with lots of nodes will probably use the CLI (command line interface) to start/stop the system, and use the GUI to check on status. The system keeps track of both the status of the system and the various storage nodes, as well as the performance statistics and throughput for each node. In a future of NoSQL Database, the administration functionality will also be available via Oracle Enterprise Manager.

Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar Presentation Transcript

  • 1. Deploy and Manage : Oracle NoSQL Database and Hadoop Cluster using Ankush 1 © 2013 Impetus Technologies - Confidential
  • 2. Agenda • • • • • • • 2 Overview of the big data Introduction to NoSQL Database Use Cases for Oracle NoSQL Database Oracle NoSQL Database Overview Introducing Ankush Ankush : Demo Q&A © 2013 Impetus Technologies - Confidential
  • 3. Definition You have a Big Data situation… When traditional information systems cannot store process or analyze the volume, variety or velocity of data in a costeffective and timely manner Store Process Analyze 3 © 2013 Impetus Technologies - Confidential Volume Velocity Variety COST TIME
  • 4. Where to look for the value of Big Data? • If you could test all of your decisions, how would that change the way you compete? • How would your business change if you used data for widespread in-time customization? • Could you create a new business model based on data? 4 © 2013 Impetus Technologies - Confidential
  • 5. Agenda • • • • • • • 5 Overview of the big data Introduction to NoSQL Database Use Cases for Oracle NoSQL Database Oracle NoSQL Database Overview Introducing Ankush Ankush : Demo Q&A © 2013 Impetus Technologies - Confidential
  • 6. Big Data Acquisition Characteristics Where should we put all that data? Batch-Oriented Real-Time Process data to use Deliver a service Bulk storage Write once, read all 6 © 2013 Impetus Technologies - Confidential Fast access to specific record Read, write, delete, update
  • 7. Big Data Storage Choices Hadoop Distributed File System (HDFS) File System Database Parallel scanning Indexed storage No inherent structure Simple data structure High volume writes High volume random reads and writes Batch Oriented 7 Oracle NoSQL Database Real-Time © 2013 Impetus Technologies - Confidential
  • 8. Challenges NoSQL Databases address • Performance – High rate of data capture – High volume of simple queries • Flexible schema – Diverse, changing data sets • Horizontal Scalability – Scale out, don’t scale up • Availability – Low cost highly available, distributed data store 8 © 2013 Impetus Technologies - Confidential
  • 9. Agenda • • • • • • • 9 Overview of the big data Introduction to NoSQL Database Use Cases for Oracle NoSQL Database Oracle NoSQL Database Overview Introducing Ankush Ankush : Demo Q&A © 2013 Impetus Technologies - Confidential
  • 10. Sample of Big Data Use Cases Today AUTOMOTIVE Auto sensors reporting location, problems COMMUNICATIONS Location-based advertising CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, problems FINANCIAL SERVICES Risk & portfolio analysis New products EDUCATION & RESEARCH Experiment sensor analysis HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis LIFE SCIENCES Clinical trials Genomics MEDIA/ ENTERTAINMENT Viewers / advertising effectiveness ON-LINE SERVICES / SOCIAL MEDIA People & career matching Web-site optimization HEALTH CARE Patient sensors, monitoring, EHRs Quality of care OIL & GAS Drilling exploration sensor analysis RETAIL Consumer sentiment Optimized marketing TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment UTILITIES Smart Meter analysis for network capacity, Challenged by: Data Volume, Velocity, Variety Oracle NoSQL Database is typically a component of a Big Data Solution 10 © 2013 Impetus Technologies - Confidential LAW ENFORCEMENT & DEFENSE Threat analysis social media monitoring, photo analysis
  • 11. Use Case – Online Display Advertising • Problem – Very low latency requirements – Publishers require < 75 ms response time from the ad serving platform – Extreme data volume– Multi-millions of requests per second – Highly available – 24/7 sites – Revenue maximization – Deliver the most relevant ad to maximize revenue • Solution – Where to use a NoSQL Database? – Cookie store – NoSQL database used to store cookies and associated behavioral segments – Track behavioral data – Beacons utilized during browsing to store timestamp, frequency, and behavioral segments by cookie – Optimize ad delivery – Recency, frequency, and behavioral segments used to determine optimal ad to deliver to user 11 © 2013 Impetus Technologies - Confidential
  • 12. Use Case – Online Display Advertising Architecture RDBMS Ad Server Application NoSQL DB Driver Multi-Dimensional Reporting 12 © 2013 Impetus Technologies - Confidential Hadoop Cluster
  • 13. Use Case – Remote Patient Monitoring Scenario • Patient uses multiple devices at home • Medical data periodically sent to database • App monitors and alerts patient state • Appropriate alerts sent to medical or emergency personnel, recorded in profile Important Attributes • High performance and high availability • High throughput event capture • Huge volumes of data • Simple data, flexible data model • Connectivity to Analytics and Discovery Capture Patient Monitoring Data NoSQL DB Goal: Better Patient Care at Lower Costs 13 © 2013 Impetus Technologies - Confidential Alerting System
  • 14. Agenda • • • • • • • 14 Overview of the big data Introduction to NoSQL Database Use Cases for Oracle NoSQL Database Oracle NoSQL Database Overview Introducing Ankush Ankush : Demo Q&A © 2013 Impetus Technologies - Confidential
  • 15. Oracle NoSQL Database Scalable, Highly Available, Key-Value Database Application NoSQL DB Driver Storage Nodes Datacenter A 15 Application Storage Nodes • • • • • • • • Application NoSQL DB Driver Features Application Datacenter B Simple Key-Value Data Model Horizontally Scalable Highly Available ACID Transactions Elastic Configuration Simple administration Transparent load balancing Commercial grade software and support © 2013 Impetus Technologies - Confidential
  • 16. Architecture: Application’s Perspective Application Application NoSQL DB Driver NoSQL DB Driver Shard 1 Shard 2 ... Shard N Master Master Replica 1 Replica 1 Replica 1 Replica 2 16 Master Replica 2 Replica 2 © 2013 Impetus Technologies - Confidential
  • 17. Simple Data Model Key-value pairs • • • • Simple data model – key-value pair (major+minor-key paradigm) Simple operations – read/insert/update/delete, RMW support Scope of transaction – records within a major key, single API call Unordered scan of all data (non-transactional) Major key: userid Strings Minor key: Byte Array  17 Value: © 2013 Impetus Technologies - Confidential subscriptions expiration date address phone # email id
  • 18. Latest YCSB Benchmark Results Mixed Throughput • 2 billion records • 2 TB of data • 95% read, 5% update 4 1,200,000 1,000,000 3 800,000 2 600,000 400,000 1 200,000 • Low latency • High Scalability 0 0 6 (2x3) 12 (4x3) 24 (8x3) 30 (10x3) Cluster Size Throughput (ops/sec) Write Latency (ms) Read Latency (ms) 18 © 2013 Impetus Technologies - Confidential Average Latency (ms) • 1.25M ops/sec Throughput (ops/sec) 1,400,000
  • 19. Oracle NoSQL Database Differentiation Integrates seamlessly with Oracle Stack (Database, OEP, RDF Graph) Commercial Grade Software and Support • General Purpose • Reliable – Based on proven Berkeley DB JE HA • Easy to Install & Configure Scalability and Availability • Intelligent Oracle NoSQL DB Driver • Evenly distributes data • Ops go to fastest node • Bounded network hops for all operations • Automatic replication and failover • 1M+ Operations/second 19 © 2013 Impetus Technologies - Confidential Simple Data Model • Simple Major + Minor Key-Value data structure •JSON schemas •ACID transactions • Configurable consistency and durability Simple Administration • Web-based Console and CLI commands • Smart Topology Manages and Monitors: • Topology • Load & Performance • Events & Alerts • JMX & SNMP Integration
  • 20. Agenda • • • • • • • 20 Overview of the big data Introduction to NoSQL Database Use Cases for Oracle NoSQL Database Oracle NoSQL Database Overview Introducing Ankush Ankush : Demo Q&A © 2013 Impetus Technologies - Confidential
  • 21. Challenges for System and IT Administrators • Enterprises are evolving from Hadoop only architectures to Big Data solution architectures • Impedance Mismatch : Is your IT organization geared up to transition Big Data technologies into the Enterprise? • Resolve Challenges • IT Administrator Desired features 21 © 2013 Impetus Technologies - Confidential
  • 22. Introducing Ankush : Big Data Cluster Management • Ankush – Rapid, easy & productive way to provision big data clusters – Reducing the overall time, cost & efforts required for cluster setup – Manage multiple clusters and cluster activities from a common dashboard – Support for In Premise and Cloud Clusters – Pro Active Monitoring & Analytics – Technology and Vendor Neutral 22 © 2013 Impetus Technologies - Confidential
  • 23. Ankush Key Features • • • • • • • • • 23 Automated setup for Big Data Ecosystem & its pre dependency Centralized cluster management & monitoring Create, Manage and Monitor multiple clusters Supports multiple vendor, version, bundles for Hadoop Ecosystem Components Web based Job management, Event alerts and notification mails Support setup for local as well as cloud based clusters i-FMR aims to offer generic Map-Reduce independent of cloud Cloud cluster termination modes & pre termination activities Anayltics – Cluster, Advance Profiling, Value Add © 2013 Impetus Technologies - Confidential
  • 24. Why Ankush ? • Multi Technology + Multi Vendor support – Manage single relationship, easier pricing/contract – Replace or migrate – protection from technology churn – Encourage Experimentation with centralized control and standardization • Analytics and Value Added Services – Cluster, Cross Cluster, Network, Logs, Jobs, Nodes – Analytics powered proactive monitoring – Profiling – Test Framework Integration 24 © 2013 Impetus Technologies - Confidential
  • 25. Sample Ankush Use Case • Test Beds – Testing application across different vendors, distributions & versions – Benchmarking on different permutation of configuration, load & environments – Analyzing role of cluster size by varying volume of loads patterns – Launching & Resizing on the fly DEMO 25 © 2013 Impetus Technologies - Confidential
  • 26. Single Instance Database (1x1) Good for Development Environment Application NoSQL DB Driver Shard 1 Master 26 © 2013 Impetus Technologies - Confidential
  • 27. Increased Data Capacity (2x1) Adding Shards to the cluster Application Application NoSQL DB Driver NoSQL DB Driver Shard 1 Shard 2 Master 27 © 2013 Impetus Technologies - Confidential Master
  • 28. Increased Cluster Availability (2x3) Adding replication-nodes to each shard Application Application NoSQL DB Driver NoSQL DB Driver Shard 1 Shard 2 Master Replica 1 Replica 1 Replica 2 28 Master Replica 2 © 2013 Impetus Technologies - Confidential
  • 29. Q & A? • Impetus Big Data Group – bigdata@impetus.com – Bigdata.impetus.com • Oracle NoSQL Database OTN Forum http://forums.oracle.com/forums/forum.jspa?forumID=1388 29 © 2013 Impetus Technologies - Confidential
  • 30. Appendix 30 © 2013 Impetus Technologies - Confidential
  • 31. Advisors • Experience • Thought Leadership Architects Advances 31 © 2013 Impetus Technologies - Confidential • Expertise • Data Scientists • Open Source • Tools
  • 32. Oracle NoSQL Database Resources External • NoSQL DB Use Cases, White Papers, Data Sheets, Benchmarks http://www.oracle.com/technetwork/products/nosqldb/overview/index.html • NoSQL DB Documentation http://www.oracle.com/technetwork/products/nosqldb/documentation/index.html • NoSQL DB Downloads http://www.oracle.com/technetwork/products/nosqldb/downloads/index.html • NoSQL DB OTN Forum http://forums.oracle.com/forums/forum.jspa?forumID=1388 • NoSQL DB version 2.0 Features http://bit.ly/UKn5Sc • OU Training Classes http://bit.ly/V5qbmY 32 © 2013 Impetus Technologies - Confidential
  • 33. Simple Data Model Major-Minor Key Paradigm Shard-1 /major/key/components/ - /minor/key/components RN2 RN1 RN3 RN2 RN1 RN3 Shard-3 RN2 RN1 RN3 33 © 2013 Impetus Technologies - Confidential Oracle NoSQL Driver Shard-2 /555.22.1111/-/profile /555.22.1111/-/image /555.22.1111/-/friends /Smith/Bob/-/555.22.1111 /666.22.3333/-/profile /666.22.3333/-/image /666.22.3333/-/friends /Smith/Richard/-/666.22.3333 /444.22.1212/-/profile /444.22.1212/-/image /444.22.1212/-/friends /Wong/Bill/-/444.22.1212
  • 34. Simple Data Model ACID Transactions – Configurability • Configurable Durability Policy • Configurable Consistency Policy 34 © 2013 Impetus Technologies - Confidential
  • 35. Simple Data Model ACID Transactions • ACID transactions by default • Transaction Scope – Single API call – All records must have the same major key – Support for multiple operations within a transaction • Can be relaxed for increased performance on a per- operation basis 35 © 2013 Impetus Technologies - Confidential
  • 36. Elasticity On-Demand Cluster Expansion Application On Demand • NoSQL DB Driver Increase Data Capacity • Add more storage nodes New shards automatically created Increase Data Throughput – – More shards = better write throughput More replicas/shard = better read throughput Master Master Replica Replica Replica Replica Shard-1 – – Shard-2 StorageNode 36 © 2013 Impetus Technologies - Confidential StorageNode StorageNode
  • 37. Rebalance an Unbalanced Store Application NoSQL DB Driver Improve Performance • • Replication nodes move from over-utilized to under-utilized storage nodes Number of shards and replication factor remain unchanged Master1 Master2 Master3 Represents a partition 37 © 2013 Impetus Technologies - Confidential
  • 38. JSON Data Format Avro based Serialization/Deserialization • Why Avro? – Compact, highly efficient serialization – Synergy with Hadoop – Multiple binding options (JSON, Generic, POJO) • Schema – DDL allows schema creation through Avro JSON definition – Supports serialization from/to JSON strings • Schema evolution – Easy to use mechanism for schema evolution – Schema versions can be opaque to readers 38 © 2013 Impetus Technologies - Confidential
  • 39. Support for Large Objects • Efficient storage and retrieval of large objects • Client side streaming interface for low memory consumption • Server side splitting and distribution of object chunks across nodes for better read/write latency 39 © 2013 Impetus Technologies - Confidential
  • 40. Integration with Oracle Products • Database External Tables – Access NoSQL data directly from Oracle – Available in the Enterprise Edition • Oracle Event Processing (OEP) – NoSQL cartridge for Oracle Event Processing – Java serialization utilized for values • Oracle Semantic Graph – RDF Jena adapter 40 © 2013 Impetus Technologies - Confidential
  • 41. How much throughput do you need? NoSQL DB has throughput even for the largest players 41 © 2013 Impetus Technologies - Confidential
  • 42. What’s New? Release 2 Feature Summary R2 Features Scalability & Manageability New APIs Integration & Monitoring Elasticity JSON schemas External Tables Rebalancing C-API Oracle Event Processing Smart Topology Large Object Support RDF Graph SNMP/JMX 42 © 2013 Impetus Technologies - Confidential
  • 43. Simple Administration • Web-based console and CLI commands • Manages and Monitors – Configuration changes – – – – 43 Load: Number of operations, data size Performance: Latency, throughput. Min, max, average, trailing, … Events: Failover, recovery, load distribution Alerts: Failure, poor performance, … © 2013 Impetus Technologies - Confidential