A PEEK INTO THE FUTURE OF
DATA
ORM
NoSQL
Big Data

Presented by:
PRATEEK CHAUHAN
10ESKCS738
BEFORE STARTING….
• Are relational tables the most efficient way to
manage data?
• Do companies like Facebook, Twitter really use
traditional relational DBMS to manage data?
ORM
O

OBJECT

M

R

RELATIONAL

MAPPING
WAYS TO ACCESS DATABASE
• Using a GUI based DBMS
• Using a console based DBMS
• Using database embedded with applications
(most important).
THE BRIDGE ?

APPLICATION
PROGRAMMING
INTERFACE
(API)

DATABASE
THE BRIDGE
THE BRIDGE: JDBC
•Standard Java API for database-independent connectivity
between the Java programming language and a wide range of
databases.
•JDBC provides a flexible architecture to write a database
independent applications that can run on different platforms and
interact with different DBMS without any modification.
•JDBC includes APIs for each of the task commonly associated
with database usage:
Making a connection to a database.
Creating SQL statements.
Executing SQL queries in the database.
Viewing & modifying the resulting records.
JDBC
Pros of JDBC
• Clean and simple SQL
processing
• Good performance with
small data
• Very good for small
applications
• Simple syntax so easy to
learn

Cons of JDBC
• Complex if it is used in large
projects
• Large programming
overhead
• No encapsulation
• Hard to implement MVC
concept
• Query is DBMS specific
The Problem
The Problem
•
•
•
•

Mapping member variables to columns
Mapping Relationships
Handling data types (esp. Boolean)
Managing changes to object state
The Problem

Relational
Object

Mapping!
Saving without ORM
•
•
•
•
•

Database Configuration
The Model Object
Service method to create the model object
Database Design
DAO method to save the object using SQL
queries
The ORM Way
• JDBC Database Configuration – ORM specific
Configuration
• The Model object – Annotations
• Service method to create the model object –
Use the ORM framework API API
• Database Design – Not Needed !
• DAO method to save the objects using SQL
queries – Not Needed !
THE ONLY DISADVANTAGE
• Boilerplate code
=> XML configuration files
=> XML system files
=> Extra classes like POJO, etc.
NoSQL: THE NAME
• SQL: In general, “Traditional Relational DBMS”.
• Past decade: RDBMS isn’t the best solution.
• NoSQL: “No SQL”=> Not using traditional
RDBMS
ISSUES WITH RDBMS
• Primary issue: big package, has all the
features, but sometimes we don’t need all of
them:
COMPROMISES
• Convenient
• Multi-user

SIMILAR
• Safety
• Persistent

BOOSTS
• Reliable
• MASSIVE (big
data)
• Efficient
NoSQL SYSTEMS
Alternative to traditional RDBMS
Pros
• Flexible Schema
• Quicker/ Cheaper to
setup
• Massive scalability:
handle

big data

• Relaxed Consistency:
higher performance &
availability

Cons
• No declarative query
language: more
programming

• Relaxed Consistency:
fewer guarantees
Example: Social-Network Graph
Each record: User ID1, User ID2 …
Separate records: User Id, name, age, gender …
A

B

I

G
H

C

F

D

K
J

E

L
Example: Social-Network Graph
• TASK: Find all friends of given users.
• TASK: Find all friends of friends of given user.

• TASK: Find all women friends of men friends of
given user.
• TASK: Find all friends of friends of…. friends of
given user.
INCARNATIONS OF NoSQL
• MapReduce Framework: OLAP (big operations)
• Key-Value Store: OLTP (small operations)

• Document Stores
• Graph database systems
MapReduce Framework
• Originally from Google, open source: Hadoop.
• Two main functions:
1. Map: divides the problem into sub problem.
2. Reduce: operates upon the sub problems and
combines output to give record.
• Current implementations:
1. Hive: SQL like language
2. Pig: statement language
Graph Database Systems
•Data Model: nodes and edges.
•Nodes may have properties.
•Edges may have labels or roles.
•Example: neo4j, FlockDB, Pregel
Friends
ID: 3

ID: 1

Friends

Likes

Likes
ID: 2
AGAIN, SOME QUESTIONS…
• What is the maximum file size you’ve dealt so
far?
• What is the maximum download speed you
get?
• How much time required to just transfer data?
What is Big Data?
• Every day, we create 2.5 quintillion bytes of data — so
much that 90% of the data in the world today has been
created in the last two years alone.
• From the beginning of recorded time until 2003,
 We created 5 billion gigabytes (exabytes) of data.

• In 2011, the same amount was created every two days
• In 2013, the same amount of data is created every 10
minutes.
THIS IS “BIG DATA”
What is Big Data?-FINALLY..
• Big- Data’ is similar to ‘Small-data’ but bigger
• But having data bigger it requires different
approaches:
– Techniques, tools, architecture
• With an aim to solve new problems
– Or old problems in a better way
Type of Data
• Relational Data (Tables/Transaction/Legacy
Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
– Social Network, Semantic Web (RDF), …

• Streaming Data
– You can only scan the data once
What to do with these data?
• Aggregation and Statistics
– Data warehouse and OLAP

• Indexing, Searching, and Querying
– Keyword based search
– Pattern matching (XML/RDF)

• Knowledge discovery
– Data Mining
– Statistical Modeling
MARKET SIZE
Big Data Analytics Technologies
• NoSQL: non-relational database solutions such as
Hbase, Cassandra, MongoDB, Riak, CouchDB, and
many others.

• Hadoop: It is an ecosystem of software
packages, including MapReduce, HDFS, and a whole
host of other software packages.
Summarizing…
• Key enablers for the appearance and growth
of ‘Big-Data’ are:
+ Increase in storage capabilities
+ Increase in processing power
+ Availability of data
THANK YOU

A peek into the future

  • 1.
    A PEEK INTOTHE FUTURE OF DATA ORM NoSQL Big Data Presented by: PRATEEK CHAUHAN 10ESKCS738
  • 2.
    BEFORE STARTING…. • Arerelational tables the most efficient way to manage data? • Do companies like Facebook, Twitter really use traditional relational DBMS to manage data?
  • 3.
  • 4.
    WAYS TO ACCESSDATABASE • Using a GUI based DBMS • Using a console based DBMS • Using database embedded with applications (most important).
  • 5.
  • 6.
    THE BRIDGE: JDBC •StandardJava API for database-independent connectivity between the Java programming language and a wide range of databases. •JDBC provides a flexible architecture to write a database independent applications that can run on different platforms and interact with different DBMS without any modification. •JDBC includes APIs for each of the task commonly associated with database usage: Making a connection to a database. Creating SQL statements. Executing SQL queries in the database. Viewing & modifying the resulting records.
  • 7.
    JDBC Pros of JDBC •Clean and simple SQL processing • Good performance with small data • Very good for small applications • Simple syntax so easy to learn Cons of JDBC • Complex if it is used in large projects • Large programming overhead • No encapsulation • Hard to implement MVC concept • Query is DBMS specific
  • 8.
  • 9.
    The Problem • • • • Mapping membervariables to columns Mapping Relationships Handling data types (esp. Boolean) Managing changes to object state
  • 10.
  • 11.
    Saving without ORM • • • • • DatabaseConfiguration The Model Object Service method to create the model object Database Design DAO method to save the object using SQL queries
  • 12.
    The ORM Way •JDBC Database Configuration – ORM specific Configuration • The Model object – Annotations • Service method to create the model object – Use the ORM framework API API • Database Design – Not Needed ! • DAO method to save the objects using SQL queries – Not Needed !
  • 13.
    THE ONLY DISADVANTAGE •Boilerplate code => XML configuration files => XML system files => Extra classes like POJO, etc.
  • 15.
    NoSQL: THE NAME •SQL: In general, “Traditional Relational DBMS”. • Past decade: RDBMS isn’t the best solution. • NoSQL: “No SQL”=> Not using traditional RDBMS
  • 16.
    ISSUES WITH RDBMS •Primary issue: big package, has all the features, but sometimes we don’t need all of them: COMPROMISES • Convenient • Multi-user SIMILAR • Safety • Persistent BOOSTS • Reliable • MASSIVE (big data) • Efficient
  • 17.
    NoSQL SYSTEMS Alternative totraditional RDBMS Pros • Flexible Schema • Quicker/ Cheaper to setup • Massive scalability: handle big data • Relaxed Consistency: higher performance & availability Cons • No declarative query language: more programming • Relaxed Consistency: fewer guarantees
  • 18.
    Example: Social-Network Graph Eachrecord: User ID1, User ID2 … Separate records: User Id, name, age, gender … A B I G H C F D K J E L
  • 19.
    Example: Social-Network Graph •TASK: Find all friends of given users. • TASK: Find all friends of friends of given user. • TASK: Find all women friends of men friends of given user. • TASK: Find all friends of friends of…. friends of given user.
  • 20.
    INCARNATIONS OF NoSQL •MapReduce Framework: OLAP (big operations) • Key-Value Store: OLTP (small operations) • Document Stores • Graph database systems
  • 21.
    MapReduce Framework • Originallyfrom Google, open source: Hadoop. • Two main functions: 1. Map: divides the problem into sub problem. 2. Reduce: operates upon the sub problems and combines output to give record. • Current implementations: 1. Hive: SQL like language 2. Pig: statement language
  • 22.
    Graph Database Systems •DataModel: nodes and edges. •Nodes may have properties. •Edges may have labels or roles. •Example: neo4j, FlockDB, Pregel Friends ID: 3 ID: 1 Friends Likes Likes ID: 2
  • 24.
    AGAIN, SOME QUESTIONS… •What is the maximum file size you’ve dealt so far? • What is the maximum download speed you get? • How much time required to just transfer data?
  • 25.
    What is BigData? • Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. • From the beginning of recorded time until 2003,  We created 5 billion gigabytes (exabytes) of data. • In 2011, the same amount was created every two days • In 2013, the same amount of data is created every 10 minutes. THIS IS “BIG DATA”
  • 26.
    What is BigData?-FINALLY.. • Big- Data’ is similar to ‘Small-data’ but bigger • But having data bigger it requires different approaches: – Techniques, tools, architecture • With an aim to solve new problems – Or old problems in a better way
  • 27.
    Type of Data •Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data – Social Network, Semantic Web (RDF), … • Streaming Data – You can only scan the data once
  • 28.
    What to dowith these data? • Aggregation and Statistics – Data warehouse and OLAP • Indexing, Searching, and Querying – Keyword based search – Pattern matching (XML/RDF) • Knowledge discovery – Data Mining – Statistical Modeling
  • 29.
  • 30.
    Big Data AnalyticsTechnologies • NoSQL: non-relational database solutions such as Hbase, Cassandra, MongoDB, Riak, CouchDB, and many others. • Hadoop: It is an ecosystem of software packages, including MapReduce, HDFS, and a whole host of other software packages.
  • 31.
    Summarizing… • Key enablersfor the appearance and growth of ‘Big-Data’ are: + Increase in storage capabilities + Increase in processing power + Availability of data
  • 32.