Case study   polyglot persistence in pharmaceutical industry
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Case study polyglot persistence in pharmaceutical industry

on

  • 984 views

 

Statistics

Views

Total Views
984
Views on SlideShare
587
Embed Views
397

Actions

Likes
1
Downloads
13
Comments
0

4 Embeds 397

http://bijoor.me 338
http://reach1to1.com 54
http://reach1to1biz.sitewalla.com 4
http://bijoor.sitewalla.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Case study polyglot persistence in pharmaceutical industry Presentation Transcript

  • 1. Copyrights: Reach1to1 Technologies Pvt. Ltd. Big Data Innovation Conference Case Study: Polyglot Persistence in Pharmaceutical Industry Ashutosh Bijoor Director, Reach1to1 Technologies Pvt. Ltd.
  • 2. Copyrights: Reach1to1 Technologies Pvt. Ltd. Contents ● Customer Requirements ● Existing Architecture & Limitations ● Approach - Polyglot Persistence ● Challenges & Addressing Them ● Proposed Architecture ● Performance Results ● Similar Cases from Different Industries
  • 3. Copyrights: Reach1to1 Technologies Pvt. Ltd. Customer Requirements Information Sources User Applications I.P. Research Repository Web Content Intranet Data Files Customer Portals Analytical Dashboards Documents Databases Admin Control
  • 4. Copyrights: Reach1to1 Technologies Pvt. Ltd. Customer Requirements ● Information Sources – Integrate wide range of IPR related information sources – Different document formats, size and frequency of updates – Both structured and unstructured information – Single repository to handle wide variety and large volume of data ● User Applications – Unified API to access and manipulate all data sources – High performance of search and analytics as well as batch operations – Flexibility of adding new data sources with minimal or no code change – Extensible, high performance data processing architecture
  • 5. Copyrights: Reach1to1 Technologies Pvt. Ltd. Existing Architecture Information Sources User Applications Files Archive RDBMS Loading Scripts File API Documents Web Content Data Files Databases Dashboards Intranet Customer Portals Admin Control Loading Scripts SQL Parsing Scripts
  • 6. Copyrights: Reach1to1 Technologies Pvt. Ltd. Existing Architecture Limitations ● Information Sources – Structured data in RDBMS – fixed schema – Unstructured data in File Archive – no analytics – Database unable to handle large volume of data – Limits on volume and variety of data sources ● User Applications – Performance of search and analytics slowing down – not usable – Inability to add new search & analytics features – Batch ingestion of new data very cumbersome – Stagnation of performance and capabilities
  • 7. Copyrights: Reach1to1 Technologies Pvt. Ltd. Existing Architecture Performance Performance Benchmarks Batch 4 secs / 100 docs SearchBatch + 15 secs Search 5 secs Estimated time to add new data source: 3 months
  • 8. Copyrights: Reach1to1 Technologies Pvt. Ltd. Approach Single repository to handle wide variety and large volume of of data Extensible, high performance data processing architecture + Which database do we choose?
  • 9. Copyrights: Reach1to1 Technologies Pvt. Ltd. Which database do we choose? Currently about 150 NoSQL Databases Listed!
  • 10. Copyrights: Reach1to1 Technologies Pvt. Ltd. Factors affecting database choice ● Data Models – What type of data sources do we want to integrate? – How do we want to manipulate / analyze the data? – What is the volume, variety and velocity of data? ● Consistency, Availability, Partitioning (CAP) – Consistency: Only one value of an object to each client (Atomicity) – Availability: All objects are always available (Low Latency) – Partition Tolerance: Data split into multiple network partitions (Clustering) – CAP Theorem: Choose any two - which two should we choose?
  • 11. Copyrights: Reach1to1 Technologies Pvt. Ltd. Databases - Models and CAPability ● Data Models – Relational – Key-Value – Column Oriented – Document Oriented – Graph ● CAP ability – Consistency – Availability – Partition Tolerance – Pick any two! AA CC PP Pick Two! APCA CP RDBMSs Aster Data Greenplum Vertica Cassandra SimpleDB CouchDB Riak Dynamo Voldermort BigTable Hypertable HBase MongoDB Terrastore Scalaris MemcacheDB Redis Neo4j Source:Visual Guide to NoSQL Systems by Nathan Hurst Over 10 different models!
  • 12. Copyrights: Reach1to1 Technologies Pvt. Ltd. Polyglot Persistence Any one database does not fit all needs! Documents MongoDB Analytics RDBMS Search Apache Solr Relationships Neo4j ● Document-oriented ● Flexible schema ● Replication & High Availability ● Auto-sharding ● Rich, document- based queries ● Fast In-Place Updates ● GridFS ● Aggregation Framework ● Advanced text search ● Flexible schema ● Support for highlighting, pivoted faceting, spell check, clustering ● Support for replication & sharding ● High-performance graph database ● Nodes and edges can have indexed meta data ● Graphs of several billion nodes on a single machine ● Powerful traversal framework ● Legacy data and apps ● Structured data ● Support for legacy applications Solution: Polyglot Persistence – use more than one database!
  • 13. Copyrights: Reach1to1 Technologies Pvt. Ltd. Challenges ● Synchronization – How to manage consistency between multiple engines? – How to maintain low latency of CRUD operations? ● Scalability – How to ensure high throughput of batch operations? – How to handle large number of concurrent operations? ● Extensibility – How to allow new engines to be added with minimal architecture change?
  • 14. Copyrights: Reach1to1 Technologies Pvt. Ltd. Challenges – Addressing them ● High Performance Synchronization Engine – Logical Locking – flexible synchronization models – Event-driven – distributed control logic – Kanban Queues – balanced resource utilization ● Horizontally Scalable – Distributed processing – automatic – Asynchronous I/O – high concurrency ● Component-based extensions – Application-specific Controller modules – Re-usable Synchronization patterns – Re-usable plugins for various databases
  • 15. Copyrights: Reach1to1 Technologies Pvt. Ltd. Polyglot Persistence Platform ● Reusable customizable platform – Open source license – Modular, extensible architecture – Commercial plugins for various databases and indexing engines ● Proven performance – Based on NodeJS – High performance in high load conditions – Developed and supported by strongly invested team http://oodebe.org
  • 16. Copyrights: Reach1to1 Technologies Pvt. Ltd. Proposed Architecture Information Sources User Applications MongoDB Apache Solr RDBMS Neo4j Web Content Data Files Documents Databases Intranet Customer Portals Dashboards Admin Control Synchronization Engine Custom-built Web Services Loading APIs DB-specific APIs
  • 17. Copyrights: Reach1to1 Technologies Pvt. Ltd. Sample Operation User Application User Application Batch API Controller Source Processor Doc Processors DB Handler DB Handler DB Handler DB Handler Data Source DB Engine 1 DB Engine 2 DB Engine 3 REST API Kanban Queue Asynchronous I/OAsynchronous I/O Messages / Events Locks
  • 18. Copyrights: Reach1to1 Technologies Pvt. Ltd. Deployment Architecture Controllers Cluster Database Cluster Data Processing Cluster
  • 19. Copyrights: Reach1to1 Technologies Pvt. Ltd. Customer Requirements ● Information Sources – Integrate wide range of IPR related information sources – Different document formats, size and frequency of updates – Both structured and unstructured information – Single repository to handle wide variety and large volume of data ● User Applications – Unified API to access and manipulate all data sources – High performance of search and analytics as well as batch operations – Flexibility of adding new data sources with minimal or no code change – Extensible, high performance data processing architecture
  • 20. Copyrights: Reach1to1 Technologies Pvt. Ltd. New Architecture Performance Performance Benchmarks Batch 4 secs / 100 docs SearchBatch + 15 secs Search 5 secs Time to add new data source: 3 months 1 day <1 sec 1.5 secs / 100 docs <1 sec
  • 21. Copyrights: Reach1to1 Technologies Pvt. Ltd. Similar Cases from Other Industries Airlines Customer Loyalty Integration of flight schedules, ancillary services, bookings and payments into a single point interface for customers Insurance Claims Analysis Integration of claims, feedback forms, customer info, call center logs into central repository for search and analytics Telecom CRM Analytics Call center logs, IVR logs, email and social media feeds archived for analysis and preventive fault alerts BFSI Investment Advisor Integration of social media feeds, analyst opinions, web content and trading data with search and sentiment analysis Publishing Content Repository Aggregated and original content processed with text mining, automatic and assisted classification and annotation Media Online TV Broadcast schedules, ratings, social media feeds and user recordings for a TV Anywhere platform
  • 22. Copyrights: Reach1to1 Technologies Pvt. Ltd. About Reach1to1 ● Over 10 years experience with NoSQL and Big Data – Implemented solutions in various industries ● Wide skill sets spanning emerging technologies – Big data, cloud and mobile applications ● Variety of engagement models – Projects, Consulting, Extended Delivery Centers ● Strong investor backing – Basil Partners, Singapore ● Low operating costs and high reach – Sales team in US, delivery team in Mumbai and Bangalore
  • 23. Copyrights: Reach1to1 Technologies Pvt. Ltd. Ashutosh Bijoor bijoor@reach1to1.com http://bijoor.me Big Data Innovation Conference (c) Thank you!