SQL Server
Big Data and Polyglot Solutions
Polyglot solution in Big Data:
‘Polyglot’ means a software application that uses multiple resources to perform
desired operation and achieve results.
• It refers to creating hybrid data store that combines functionalities
of RDBMS and NoSQL data tools to achieve flexible and efficient
data model.
Web / Mobile or client application
Dynamo
DB
Cassandr
a
Calls for relational and
transactional data
Calls for unstructured
and scalable data
NoSQL Server Cluster
Implementing Polyglot solution
• Analyzing the requirement for business
continuity is the key.
• Selecting the apt database management
solution according to the need.
• Some of the approaches to implement
polyglot solution in an organization –
Multiple lanes, Polyglot mapper, Nested
Database, Omnipotent database.
Example – eCommerce Application
MongoDB
Redis
DynamoDB
Cassandra
MySQL
HBaseNeo4j
Product Catalog
Shopping Cart
Social Profile Social graph Email and feed messages
Payment process
Audit and activity log
Polyglot solution -Multiple lanes:
Dispatches work flow into multiple separate
persistence lanes.
Advantages:
• Easy to Implement.
• Works for all databases.
Disadvantages:
• Data needs to be domain specific pre-
partitioned.
• Cross lane queries are hard to implement.
MySQL Mongo DB
Polyglot solution – Polyglot Mapper:
Particular component of the application
handles different multiple databases in
parallel by mapping.
Advantages:
• Single Object model.
• Cross Lane Queries possible.
Disadvantages:
• Just few mapper products that supports
NoSQL + RDBMS.
MySQL Mongo DB
Polyglot solution – Nested Database:
Primary database backend maps to
secondary database.
Advantages:
• Polyglot persistence is invisible.
• Cross database query is possible.
Disadvantages:
• Not many databases supports this model.
MySQL
Mongo DB
Polyglot solution – Omnipotent
database:
Databases uses backend multiple relational
and non-relational database storage engines
in parallel.
Advantages:
• Polyglot persistence is invisible.
• Backup & Restore is possible.
Disadvantages:
• Just few products support this model.
MySQL Mongo DB
FE
BE
Advantages of Polyglot solution
• Leverage the strengths of multiple data
stores.
• Synchronization engine integrates
multiple data storage for a single
individual application.
• Uses highly optimized data storage
formats.
• Provides great query performance.
• More Scalable data.
• Most of the NoSQL solutions are open
source.
Disadvantages of Polyglot solution
• Complexity of architecture and setting up
the system increases significantly.
• Impedance mismatch between databases
due to different data models, languages.
• Different backup, restoring and
maintenance approaches provides
greater challenges for implemented
systems.
Recommendations
• Select the data store based on requirements and
environment.
• Minimize the number of stores, it would
significantly bring down the maintenance and
implementation cost.

Big data and polyglot solutions

  • 1.
    SQL Server Big Dataand Polyglot Solutions Polyglot solution in Big Data: ‘Polyglot’ means a software application that uses multiple resources to perform desired operation and achieve results. • It refers to creating hybrid data store that combines functionalities of RDBMS and NoSQL data tools to achieve flexible and efficient data model. Web / Mobile or client application Dynamo DB Cassandr a Calls for relational and transactional data Calls for unstructured and scalable data NoSQL Server Cluster
  • 2.
    Implementing Polyglot solution •Analyzing the requirement for business continuity is the key. • Selecting the apt database management solution according to the need. • Some of the approaches to implement polyglot solution in an organization – Multiple lanes, Polyglot mapper, Nested Database, Omnipotent database.
  • 3.
    Example – eCommerceApplication MongoDB Redis DynamoDB Cassandra MySQL HBaseNeo4j Product Catalog Shopping Cart Social Profile Social graph Email and feed messages Payment process Audit and activity log
  • 4.
    Polyglot solution -Multiplelanes: Dispatches work flow into multiple separate persistence lanes. Advantages: • Easy to Implement. • Works for all databases. Disadvantages: • Data needs to be domain specific pre- partitioned. • Cross lane queries are hard to implement. MySQL Mongo DB
  • 5.
    Polyglot solution –Polyglot Mapper: Particular component of the application handles different multiple databases in parallel by mapping. Advantages: • Single Object model. • Cross Lane Queries possible. Disadvantages: • Just few mapper products that supports NoSQL + RDBMS. MySQL Mongo DB
  • 6.
    Polyglot solution –Nested Database: Primary database backend maps to secondary database. Advantages: • Polyglot persistence is invisible. • Cross database query is possible. Disadvantages: • Not many databases supports this model. MySQL Mongo DB
  • 7.
    Polyglot solution –Omnipotent database: Databases uses backend multiple relational and non-relational database storage engines in parallel. Advantages: • Polyglot persistence is invisible. • Backup & Restore is possible. Disadvantages: • Just few products support this model. MySQL Mongo DB FE BE
  • 8.
    Advantages of Polyglotsolution • Leverage the strengths of multiple data stores. • Synchronization engine integrates multiple data storage for a single individual application. • Uses highly optimized data storage formats. • Provides great query performance. • More Scalable data. • Most of the NoSQL solutions are open source.
  • 9.
    Disadvantages of Polyglotsolution • Complexity of architecture and setting up the system increases significantly. • Impedance mismatch between databases due to different data models, languages. • Different backup, restoring and maintenance approaches provides greater challenges for implemented systems.
  • 10.
    Recommendations • Select thedata store based on requirements and environment. • Minimize the number of stores, it would significantly bring down the maintenance and implementation cost.