1. Oracle Database 12c
Features for Big Data
Disclaimer : The information presented here is based on my views, and information gathered from online sources, the
presentation is only to create an awareness about the features and does not describe a real solution.
Presented by Abishek V S
2. Agenda
• What is Big Data
• Big Data Versus RDBMS
• Oracle In-Memory Column Store
• JSON support in Oracle Database
• Oracle Database And Hadoop
4. What is Big Data
Big data is simply data that breaks traditional
architectures due to its sheer volume, speed and
variety.
Structured
Unstructured
Semi-Structured
Multiple Sources
Large Volumes
5. Characterization of Big Data
Volume
Variety
Velocity
From “Understanding Big Data” by IBM
Veracity, Validity, Volatility
6. Characterization of Big Data
From the dawn of civilization until
2003, humankind generated five
exabytes of data. Now we produce
five exabytes every two days…and
the pace is accelerating.
Eric Schmidt,
Executive Chairman, Google
9. Big Data: Driving Factors & Motivation
• Exponential growth of the internet
• Widespread acceptance of E-Commerce
• Growth of the Social Network
• Commoditization of the computing resources
• Per GB cost of storage is more affordable now than 10
years back.
• Commodity computers have become more powerful.
• Popularity of clusters based on commodity computers
• IoT (Internet of Things)
– Day by day the devices we own are getting smarter
and are learning about us.
10. • Distributed computing
– Distributed Servers and Storage (Cloud based)
– Distributed processing Eg : MapReduce with Hadoop
• Schema Free Databases
– NoSQL Database
• In-memory
• Semi Structures
– JSON
– Key, Value pairs
• Columnar databases
• Big Data Operations
• Analytic / Semantic Processing (e.g. R, OWLIM)
Big Data: Technologies and Tools
12. Big Data versus RDBMS
• RDBMS
– Data is stored in defined structures (tables)
– Transactional in nature
– Data consistency is a primary consideration
– Drives operational systems
– Response time is crucial
• Big Data
– Data comes in all shapes and sizes
– Behavioral Data
– Prone to rapid change
– Useful in VAS, identifying patterns not exposed by Operational
systems
– The value derived is of prime importance.
13. Big Data versus RDBMS
RDBMS
Captures Business Transactions
Ensures Operational Efficiency
Operational Decision support
Analytics is very limited
Integrating external data is expensive
ERP, BI, ETL, Data warehouse
Big Data
Captures User behavioral data
System logs, social data
Acts as Feedback to business
New opportunity exploration
Analytics is the key focus
Technology aims at integration.
User activity log, Web Analytics,
Social Media Streaming API, Hadoop
Map Reduce, NoSQL data store
optimized for Analytics
16. Oracle In-Memory Column Store
• A column format database stores each of the attributes
about a transaction or record in a separate column
structure
• A column format is ideal for analytics, as it allows for
faster data retrieval when only a few columns are
selected but the query accesses a large portion of the
data set.
• A column format is not so efficient at processing row
wise DML: In order to insert or delete a single record in
a column format all of the columnar structures in the
table must be changed.
• Up until now you have been forced to pick just one
format and suffer the tradeoff of either suboptimal OLTP
or sub-optimal analytics performance.
17. Oracle In-Memory Column Store
Oracle Database In-Memory provides best of both worlds
The in-memory column format store cache should be sized to fit the objects that
must be stored in memory.
Less than 20% overhead in terms of total memory requirements.
Database In-Memory uses an In-Memory column store (IM column store), which is
a new component of the Oracle Database System Global Area (SGA), called the In-
Memory Area (INMEMORY_SIZE).
18. Oracle In-Memory Column Store
• Tablespace Level
– ALTER TABLESPACE ts_data INMEMORY;
• Table Level
– ALTER TABLE sales INMEMORY NO INMEMORY(prod_id);
• Partition Level
– ALTER TABLE sales MODIFY PARTITION SALES_Q1_1998 NO INMEMORY;
• Objects are populated into the IM column store either in a prioritized list immediately after the
database is opened or after they are scanned (queried) for the first time.
– ALTER TABLE customers INMEMORY PRIORITY CRITICAL;
19. Oracle In-Memory Column Store
• In-Memory Compression
• Typically compression is considered only as a space-saving mechanism.
However, data populated into the IM column store is compressed using a
new set of compression algorithms that not only help save space but also
improve query performance
20. Oracle In-Memory Column Store
• In-Memory Scans
– Analytic queries typically reference only a small subset of the columns in a table.
– Oracle Database InMemory scans only the columns needed by a SQL, and applies any
WHERE clause filter predicates to these columns directly without decompressing them.
• In-Memory Storage Index
– A further reduction in the amount of data accessed
– Automatically created and maintained on each of the columns in the IM column store.
– Storage Indexes allow data pruning based on the filter predicates in a SQL statement.
21. • SIMD Vector Processing
– Database In-Memory uses SIMD (Single Instruction processing Multiple Data values) vector
processing
– SIMD vector processing allows a set of column values to be evaluated together in a single
CPU instruction.
• In-Memory Joins
– SQL statements that join multiple tables can also be processed very efficiently in the IM
column store as they can take advantage of Bloom Filters.
• A Bloom filter transforms a join into a filter that can be applied as part of the scan of the larger table.
• In-Memory Aggregation
– Analytic style queries often require complex aggregations and summaries.
– A new optimizer transformation, called Vector Group By, has been introduced with Oracle
Database 12.1.0.2 to ensure more complex analytic queries can be processed using new
CPU-efficient algorithms.
Oracle In-Memory Column Store
23. JSON support in Oracle Database
• JSON (Java Script Object Notation) is a fast-
growing data type often used in web and mobile
applications.
• JSON is also used as a data interchange format
– More lightweight
– Bandwidth-non-intensive
• JSON integrates into web pages as javascript can
directly inherit a JSON
24. JSON support in Oracle Database
• JSON is gaining popularity
– APIs (application programming interfaces)
• Most Social network providers provide JSON based data services
API.
• Webservices : RESTful (Representative state transfer)
– Big Data
• Many NoSQL databases use JSON as the storage format
– MongoDB, CouchDB, and Riak
– Internet of Things (IoT)
• With more personal devices and appliances getting smart and
hooking up to the internet, JSON is becoming the choice of use as it
is lightweight and better adaptable to these devices.
25. JSON support in Oracle Database
• JSON in Oracle Database 12c R1 (12.1.0.2)
– Creating Tables to Hold JSON
– Querying JSON Data
• Dot Notation
• IS JSON
• JSON_EXISTS
• JSON_VALUE
• JSON_QUERY
• JSON_TABLE
• JSON_TEXTCONTAINS
– Identifying Columns Containing JSON
– Loading JSON Files Using External Tables
26. JSON support in Oracle Database
• Creating Tables to Hold JSON
– No new data type has been added to support JSON. Instead, it is stored
in regular VARCHAR2 or CLOB columns.
– The IS JSON constraint indicates the column contains valid JSON data.
CREATE TABLE json_documents (
id RAW(16) NOT NULL,
data CLOB,
CONSTRAINT json_documents_pk PRIMARY KEY (id),
CONSTRAINT json_documents_json_chk CHECK (data IS JSON)
);
Lax or Strict checking “(data is JSON(Strict))”
– The [USER|ALL|DBA]_JSON_COLUMNS views can be used to identify
tables and columns containing JSON data.
28. COLUMN FirstName FORMAT A15
COLUMN LastName FORMAT A15
COLUMN Postcode FORMAT A10
COLUMN Email FORMAT A25
SELECT a.data.FirstName,
a.data.LastName,
a.data.Address.Postcode AS Postcode,
a.data.ContactDetails.Email AS Email
FROM json_documents a
ORDER BY a.data.FirstName,
a.data.LastName;
FIRSTNAME LASTNAME POSTCODE EMAIL
--------------- --------------- ---------- -------------------------
Jayne Doe A12 34B jayne.doe@example.com
John Doe A12 34B john.doe@example.com
29. • IS JSON
– The IS JSON condition can be used to test if a column contains JSON data.
• SELECT JSON_VALUE(a.data, '$.FirstName') AS first_name FROM json_documents_no_constraint a WHERE a.data IS JSON;
• JSON_EXISTS
– Similar to IS NULL, checks if an element has a value
• JSON_VALUE
– Returns an element from the JSON document, based on the specified JSON
path.
• JSON_QUERY
– The JSON_QUERY function returns a JSON fragment representing one or more
values.
• JSON_TABLE
– The JSON_TABLE function incorporates all the functionality of JSON_VALUE,
JSON_EXISTS and JSON_QUERY.
– JSON_TABLE is used for making JSON data look like relational data, which is
especially useful when creating relational views over JSON data,
• JSON_TEXTCONTAINS
– Works with JSON indexes and enables faster text searching through the JSON
data.
30. JSON support in Oracle Database
Loading JSON Files Using External Tables
• Create the directory objects for use with the external table.
CREATE OR REPLACE DIRECTORY order_entry_dir
AS '/u01/app/oracle/product/12.1.0.2/db_1/demo/schema/order_entry';
GRANT READ, WRITE ON DIRECTORY order_entry_dir TO test;
CREATE OR REPLACE DIRECTORY loader_output_dir AS '/tmp';
GRANT READ, WRITE ON DIRECTORY loader_output_dir TO test;
• Create the external table and query it to check if it is working.
CREATE TABLE json_dump_file_contents (json_document CLOB)
ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY order_entry_dir
ACCESS PARAMETERS (RECORDS DELIMITED BY 0x'0A'
DISABLE_DIRECTORY_LINK_CHECK
BADFILE loader_output_dir: 'JSONDumpFile.bad'
LOGFILE order_entry_dir: 'JSONDumpFile.log'
FIELDS (json_document CHAR(5000)))
LOCATION (order_entry_dir:'PurchaseOrders.dmp'))
PARALLEL
REJECT LIMIT UNLIMITED;
31. JSON support in Oracle Database
SELECT COUNT(*) FROM json_dump_file_contents;
COUNT(*)
----------
10000
• You can now load the database table with the contents of the external table.
TRUNCATE TABLE json_documents;
INSERT /*+ APPEND */ INTO json_documents
SELECT SYS_GUID(), json_document
FROM json_dump_file_contents
WHERE json_document IS JSON;
COMMIT;
33. Oracle Database And Hadoop
• Big Data Discussion is incomplete without the mention of Hadoop
• Hadoop is a distributed computing framework
• Runs Batch operations(MapReduce) on distributed clusters made of
commodity computers.
• Stores data in a distributed clustered filesystem
• Hadoop clusters are a shared nothing paradigm
35. Oracle Database And Hadoop
• In-Database MapReduce
• Avoid Shipping of data residing in RDBMS to an external
infrastructure
• Database security can be applied to the processed data.
• Shorter learning curve for both Developers and DBAs
• Mix SQL with MapReduce processing for flexibility and
efficiency
• Uses PL/SQL or Java Pipe-Lined Functions
INSERT INTO OUTTABLE
SELECT * FROM TABLE
(Word_Count_Reduce (:ConfKey,
CURSOR(SELECT * FROM TABLE
(Word_Cursor_Map(:ConfKey,
CURSOR(SELECT * FROM InTable)))))) ;
36. Oracle Database And Hadoop
• Pipelined Functions : Can either return a stream of rows or take it
as input too.
• Can be Parallelized with a partition key
• Implemented using PL/SQL, Java or C
• Contains 2 Pipelined Functions, one for mapper the other for
reducer.
• Further the mapper input source could be an external table, and the
reducer output may be placed in a DB table or further sent out to
filesystem file.
• Can leverage external tables, DBFS, use Java or C to write to files.
• The opportunities are endless when coupled with other DB features
and options.
• DB Scheduler can be used to schedule the mapreduce
• Clustered with distributed databases using DBLinks
• Add fault tolerance and scalability with RAC.
37. Oracle Database And Hadoop
• Oracle In-Database Hadoop
• We will look at this in a future discussion …
39. The Road Ahead
• Big Data/NoSQL databases WILL NOT replace
RDBMS databases.
• Oracle’s Roadmap has been Single Vendor
Solutions.
• Reusing available resources : Both technology
and human resource.
• Oracle is building more Appliance based
solutions.
40. The Road Ahead
• Oracle Big Data Products.
– Oracle Big Data Management
• Oracle Big Data Appliance
• Oracle Big Data SQL
• Oracle NoSQL Database
– Oracle Big Data Integration
• Oracle GoldenGate
• Oracle Data Integration
• Oracle Event Processing
– Big Data Analytics
• Oracle Big Data Discovery
• Oracle Advanced Analytics
• Oracle Business Intelligence Foundation