BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - MapR - Evolving from RDBMS to SQL + NoSQL

© 2015 MapR Technologies 2
Why Does this Matter
• 90%+ of the use cases do not deal with “relational” data
• RDBMS data models are more complex than a single table
– One-to-many relationships require multiple tables
– Creating code to persist data takes time and QA
• Inferred (or removed) keys used without actual foreign keys
– Difficult for others to understand relationships
• Transactional tables never look the same as analytics tables
– OLTP -> ETL -> OLAP
– This takes significant time to build

Empowering “as it happens”
businesses by speeding up the
data-to-action cycle

Changing Data Models

180 Tables
NOT SHOWN!

236 tables
to describe 7 kinds of things

artist
id
gid
name
sort_name
begin_date
end_date
ended
type
gender
area
being_area
end_area
comment
list<ipi>
list<isni>
list<alias>
list<release_id>
list<recording_id>
artist
id
gid
name
sort_name
begin_date
end_date
ended
type
gender
area
being_area
end_area
comment
list<ipi>
list<isni>
list<alias>

artist
id
gid
name
sort_name
begin_date
end_date
ended
type
gender
area
being_area
end_area
comment
list<ipi>
list<isni>
list<alias>
list<release_id>
list<recording_id>

Searching for Elvis
// Find discs where Elvis was credited
> SELECT distinct album_id, name FROM
(SELECT id album_id, artist_id, name, FLATTEN(credit) FROM release) albums
join
(SELECT distinct artist_id FROM
(SELECT id artist_id, FLATTEN(alias) FROM artist
where name like 'Elvis%Presley’)
) artists
USING artist_id;

Benefits
• Extended relational model allows massive simplification
– On a real example, we see >20x reduction in number of tables
• Simplification drives improved introspection
– This is good
• Apache Drill gives very high performance execution for extended
relational problems
• You can try this out today

A New Database for Data

Basics of the API
• http://ojai.github.io/
• Entry point to a table - DocumentStore
– insert()
– insertOrReplace()
– find()
– delete()
– replace()
– update()
– increment()

Working with JSON in Java
• Step 1 – Create instance of JSON Serializer
Gson gson = new Gson();
• Step 2 – Serialize POJO to JSON
String json = gson.toJson(myObject);
• Step 3 – Deserialize JSON into POJO
MyObject myObject = gson.fromJson(json, MyObject.class);

Creating Documents in Java OJAI
• Use static methods on class org.ojai.json.Json
Document doc = Json.newDocument(myObject);
Document doc = Json.newDocument(jsonString);
• Alternatively
– Use builders
– Stream from disk
– Use InputStream

Creating New Documents
• DocumentStore.insert(doc)
Done!
• DocumentStore.insertOrReplace(doc)
Done!
Easy right?

Updating Existing Documents
• DocumentStore.update(_id, DocumentMutation)
• Mutation methods
– mutation.append(FieldPath, “user visited URL”);
– mutation.set(“field.name”, “What a great example”);
– mutation.increment(“field”, 1);
– mutation.merge(“field”, Map<String, Object>);
– mutation.setOrReplace(…);
– mutation.delete(field);
Yes, these are atomic.

Deleting Documents
• DocumentStore.delete(doc);
Done!
• DocumentStore.delete(_id);
Done!
This is easy too, right?

Finding Documents
• DocumentStore.find(QueryCondition);
• Query condition setup:
– qc.is(“field”, EQUAL, “blue”)
.and().notExists(“other.field”)
.or().like(“field”, “%purple”)
.or().matches(“another.field”, “regular expression”)

Querying JSON Data and More

How To Bring SQL to Non-Relational Data Stores?
Familiarity of SQL Agility of NoSQL
• ANSI SQL semantics
• BI (Tableau, MicroStrategy,
etc.)
• Low latency
• No schema management
– HDFS (Parquet, JSON, etc.)
– HBase
– …
• No transformation
– No silos of data
• Ease of use

Drill Supports Schema Discovery On-The-Fly
• Fixed schema
• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or
schema-less
• Leverage schema in centralized
repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON
WRITE
SCHEMA
BEFORE READ
SCHEMA ON THE
FLY

Enabling “As-It-Happens” Business with Instant Analytics
Hadoop data Data modeling Transformation
Data
movement
(optional)
Users
Hadoop data Users
Traditional
approach
Exploratory
approach
New Business questionsSource data evolution
Total time to insight: weeks to months
Total time to insight: minutes

Common Use Cases
Raw Data Exploration JSON Analytics DWH offload
Hive HBaseFiles Directories
…
{JSON}, Parquet
Text Files …

Reuse Existing SQL Tools and Skills
Leverage SQL-compatible tools
(BI, query builders, etc.) via Drill’s
standard ODBC, JDBC and ANSI
SQL support
Enable business analysts, technical
analysts and data scientists to
explore and analyze large volumes
of real-time data

Security Controls

Granular Security via Drill Views
Name City State Credit Card #
Dave San Jose CA 1374-7914-3865-4817
John Boulder CO 1374-9735-1794-9711
Raw File (/raw/cards.csv)
Owner
Admins
Permission
Admins
Business Analyst Data Scientist
Dave San Jose CA 1374-1111-1111-1111
John Boulder CO 1374-1111-1111-1111
Data Scientist View (/views/maskedcards.csv)
Not a physical data copy
Name City State
Dave San Jose CA
John Boulder CO
Business Analyst View
Owner
Admins
Permission
Business
Analysts
Owner
Admins
Permission
Data
Scientists

Ownership Chaining
Combine Self Service Exploration with Data Governance
Dave San Jose CA 1374-7914-3865-4817
John Boulder CO 1374-9735-1794-9711
Raw File (/raw/cards.csv)
Dave San Jose CA 1374-1111-1111-1111
John Boulder CO 1374-1111-1111-1111
Data Scientist (/views/V_Scientist)
Jane (Read)
John (Owner)
Name City State
Dave San Jose CA
John Boulder CO
Analyst(/views/V_Analyst)
Jack (Read)
Jane(Owner)
RAWFILEV_ScientistV_Analyst
Does Jack have access to V_Analyst? ->YES
Who is the owner of V_Analyst? ->Jane
Drill accesses V_Analyst as Jane (Impersonation hop 1)
Does Jane have access to V_Scientist ? -> YES
Who is the owner of V_Scientist? ->John
Drill accesses V_Scientist as John (Impersonation hop 2)
John(Owner)
Does John have permissions on raw file? -> YES
Who is the owner of raw file? ->John
Drill accesses source file as John (no impersonation here)
Jack queries the view V_Analyst
*Ownership chain length (# hops) is configurable
Ownership
chaining
Access
path

Resources

Drill is Top-Ranked SQL-on-Hadoop
Source: Gigaom Research, 2015
Key:
• Number indicates companies relative strength across all vectors
• Size of ball indicates company’s relative strength along individual vector
“Drill isn’t just about
SQL-on-Hadoop.
It’s about SQL-on-
pretty-much-
anything,
immediately, and
without formality.”

OJAI and MapR-DB
Where to find it…
– The source: https://github.com/ojai/ojai
– The site: http://ojai.github.io/
– Python bindings: https://github.com/mapr-demos/python-bindings
– Javascript bindings: https://github.com/mapr-demos/js-bindings
Ready to play with your data?
– Download the sandbox: http://maprdb.io
– Examples:
• Java: https://github.com/mapr-demos/maprdb-ojai-101
• Python: https://github.com/mapr-demos/maprdb_python_examples

Drill Walkthrough
• Example queries
• Conversion from relational model to flat JSON model
https://www.mapr.com/blog/drilling-healthy-choices
https://www.mapr.com/blog/evolution-database-schemas-using-sql-
nosql

Recommendations for Getting Started with Drill
New to Drill?
– Get started with Free MapR On Demand training
– Test Drive Drill on cloud with AWS
– Learn how to use Drill with Hadoop using MapR sandbox
Ready to play with your data?
– Try out Apache Drill in 10 mins guide on your desktop
– Download Drill for your cluster and start exploration
– Comprehensive tutorials and documentation available
Ask questions
– user@drill.apache.org

Q&A
@kingmesal
jscott@mapr.com
Engage with us!
kingmesal

BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - MapR - Evolving from RDBMS to SQL + NoSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - MapR - Evolving from RDBMS to SQL + NoSQL

Similar to BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - MapR - Evolving from RDBMS to SQL + NoSQL (20)

More from Big Data Week

More from Big Data Week (20)

Recently uploaded

Recently uploaded (20)

BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - MapR - Evolving from RDBMS to SQL + NoSQL

Editor's Notes