• Share
  • Email
  • Embed
  • Like
  • Private Content
Storing and processing data with the wso2 platform
 

Storing and processing data with the wso2 platform

on

  • 2,612 views

 

Statistics

Views

Total Views
2,612
Views on SlideShare
2,455
Embed Views
157

Actions

Likes
1
Downloads
73
Comments
0

2 Embeds 157

http://wso2.org 119
http://wso2.com 38

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Storing and processing data with the wso2 platform Storing and processing data with the wso2 platform Presentation Transcript

    • Storing and processing data with the WSO2 Platform Deependra Ariyadewa Wathsala Vithanage
    • WSO2• Founded in 2005 by acknowledged leaders in XML, Web Services Technologies & Standards and Open Source• Producing entire middleware platform 100% open source under Apache license• Business model is to sell comprehensive support & maintenance for our products• Venture funded by Intel Capital and Quest Software.• Global corporation with offices in USA, UK & Sri Lanka• 150+ employees and growing.
    • Introduction to Data Problem• Information explosion o Rapid growth of published data. o Managing large amounts of data is difficult (this leads to an information overload) o Difficulties include  Capture  Storage  Search  Sharing  Analytics  Visualization o We need new tools to deal with BIG DATA.
    • The Well Known Data SolutionRDBMS• For many years this has been the choice• Scaling up RDBMS o Put it in a bigger computer o Replicate database over 2 - 3 nodes. This does not work well with more than 2 - 3 nodes. o Partition data over several nodes. Although JOIN queries are hard across many nodes, may require custom code and configuration. Transactions may not scale well.
    • CAP Theorem and RDBMS• RDBMS has two key features o Relational Model with SQL o ACID transactions (Atomic, Consistent, Isolation & Durable)• CAP theorem states that in distributed systems it is only possible to have two properties out of the properties Consistency, Availability & Partition Tolerance at any given time. o Once you have picked two properties you will loose the remaining one.• But there are some applications that do not need all the properties of RDBMS. Once these are dropped system scales. (e.g. Google Big Tables)
    • Rise of NoSQL• Large internet companies hit the problem first, they build systems that are specific to their problem, and they did scale. o Google Big table o Amazon Dynamo• Soon many others followed, and most of them are free and open source.• Among advantages of NoSQL are o Scalability o Flexible schema o Designed to scale and support fault tolerance out of the Box
    • Finding the right Data Solution• Data Types o Unstructured Data  Files o Semi Structured Data  XML Databases, Queues, Graphs and Lists o Structured Data  DBMS
    • Handling Unstructured Data• Storage Options o Key - Value storages for small data items o Distributed file systems for other cases o Metadata Registries (Nirvana, SDSC Resource broker)• Scalability o Key - Value storages are highly Scalable (e.g. Amazon Dynamo) o Distributed File Systems are generally scalable (HDFS, Lustre) o Metadata Registries are also highly scalable• Search o Each of above provide key based retrieval o Metadata registries provide property based search. o It is possible to build a index for content using tools like Lucence and use that for search.
    • Handling Semi-Structured Data• Storage Options o Answer depends on the type of structure. (e.g. XML = XML Databases, Graphs = Graph Databases, List = Data structure servers, work items = Queue) o If there is a server optimized for a given type, it is often much more efficient than using a DB. (e.g. Graph databases can support fast relationship search)• Scalabilty o XML databases can shared data across nodes, so usually scalable, but others are not that scalable• Search o Very much custom. E.g. XML or any tree = XPath o Graph can support very fast relationship search
    • Handling Structured Data (1-3 nodes) • In general using DB here Small (1-3 nodes) for every case might Loose Operation Transactions Consistency Consistency work. Primary Key DB/ KV/ CF DB/ KV/ CF DB • Reason for using options other than DB Where DB/ CF/Doc DB/ CF/Doc DB • When there is JOIN DB DB DB potential need to scale Offline DB/CF/Doc DB/CF/Doc DB/CF/Doc later. • High write throughput • KV is 1-D where as other two are 2D*KV: Key-Value Systems, CF: ColumnFamilies, Doc: document basedSystems
    • Handling Structured Data (10 nodes) • KV, CF, and Doc can easily handle Scalable (10 nodes) this case. Loose Operation Transactions Consistency Consistency • If DBs used with data shredded across many nodes.PrimaryKey KV/CF KV/CF Partitioned DB? • Transactions might work withWhere CF/Doc CF/Doc Partitioned given that participants on one DB? transaction are not too many.JOIN ?? ?? Partitioned • JOINs might need to transfer too DB?? much data between nodes.Offline CF/Doc CF/Doc No • Also should consider in Memory DBs like Vault DB • Offline mode will work • Most systems let users choose*KV: Key-Value Systems, CF: Column consistency, and loose consistencyFamilies, Doc: document based can scale more. (e.g. Cassandra)Systems
    • Highly Scalable System • Transactions does not work in this scale. Highly Scalable (1000s nodes) (CAP theorem). • Same for the JOIN. Problem is sometime Loose Operation Transactions Consistency Consistency too much data needs to be transferredPrimary KV/CF KV/CF No between nodes to perform the JOIN.Key • Offline case handled through Map-Where CF/Doc CF/Doc No Reduce. Even JOIN case is OK since there is time.JOIN No No NoOffline CF/Doc CF/Doc No *KV: Key-Value Systems, CF: Column Families, Doc: document based Systems
    • Highly Scalable Systems + Primary Key Retrieval • This is (comparatively) the easy one. Highly Scalable (1000s nodes) Loose Operation Transactions • Can be solved through DHT Consistency Consistency (Distributed Hash table) based solutionsPrimary KV/CF KV/CF No or architectures like OceanStore. Key Where CF/Doc(?) CF/Doc(?) No • Both Key-Value Storages(KV) and JOIN No No No Column Families (CF) can be used. But Key-Value model is preferred as it is Offline CF/Doc CF/Doc No more scalable.*KV: Key-Value Systems, CF: ColumnFamilies, Doc: document basedSystems
    • Highly scalable systems + WHERE • This Generally OK, but tricky. Highly Scalable (1000s nodes) Loose Operation Transactions • CF work through a Secondary index that Consistency Consistency do Scatter-gather (e.g. Cassandra).Primary KV/CF KV/CF No Key • Doc work through Map-Reduce viewsWhere CF/Doc(?) CF/Doc(?) No (e.g. CouchDB). JOIN No No No • There is Bissa, which build a index for all possible queries (No range queries)Offline CF/Doc CF/Doc No • If you are doing this, you should do pilot runs and make sure things work.*KV: Key-Value Systems, CF: ColumnFamilies, Doc: document basedSystems
    • Hybrid Approaches• Some solution have many types of data and hence need more than one data solution (hybrid architectures).• For example o Using DB for transactional data and CF for other data. o Keeping metadata and actual data separate for large data archives. o Use GraphDB to store relationship data while other while other data is in Column family storage.• However, if transactions are needed, transactions have to be handled outside storages (e.g. using Atomicas, Zookeeper ).
    • Other Parameters• Above list is not exhaustive, and there are other parameters o Read/Write ratio - when high, easy to scale. o High write throughput. o Very large data products - you will need a file system. May be keep metadata in Data registry and store data in a file system. o Flexible schema. o Archival usecases o Analytical usecases o Others ...
    • WSO2 Data Solutions• Data Service Server - DSS• Relational Storage Service - RSS• Column Store Service - CSS• File System as a service ( FSaaS) - HDFS• DSS and RSS• DSS and CSS
    • WSO2 Data Service Server (DSS)
    • WSO2 Data Service Server (DSS) Support for large XML outputs Content Filtering based on Users role Support for named parameters Ability to configure schema type for output elements Mixing multiple data sources in nested queries Distributed transaction support Oracle Ref Cursor support Support for multiple data source types Clustering support for High Availability and High Scalability Full support for WS-Security, WS-Trust, WS-Policy and WS-Secure Conversation and XKMS JMX and Web interface based monitoring and management WS-* and REST support Data validations UDT (User Defined Type) Support Complex Results Auto Generated Keys Support Boxcarring Support Batch Request Support Scheduled Tasks Registry Integration for Excel,CSV,XSLT Web Scraping Support Multiple SQL Dialect Support DB -> DS Generation Service Group/Hierarchy Support Database Explorer Data as a Service Features - DSS Stratos Service o Cassandra Integration o RDS Provisioning
    • WSO2 Data Service Server (DSS)
    • Data Services Description Language - DSDL
    • DSS Management Console
    • WSO2 Stratos Support for Relational Data • Offering a “database as as service” for tenants WSO2 Relational Storage Service • Users create database and receive JDBC URL • Database is allocated from Amazon RDS (MySQL) horizontal cluster • Tenants are isolated from each other and integrated with platform security model
    • WSO2 Relational Storage Service• Use your own database server (anywhere)• Register database connection as a datasource Use RSS to allocate a database
    • Stratos RSS
    • Stratos RSS
    • Stratos RSS
    • RSS Sample
    • WSO2 Column Store Service - CSSUsers can log in to the Web Console and createCassandra key spaces.
    • Column Store Service (Contd.)• Key spaces will be allocated from a Cassandra clusters• Users can manage and share his key spaces through Stratos Web Console and use those key spaces through Hector Client (Java Client for Cassandra)• In essence we provide Cassandra as a part of Stratos as a Service with Multi-tenancy support and Security integration with WSO2 security model
    • WSO2 CSS Admin Console Left Menu Keyspace View
    • WSO2 CSS Admin ConsoleKeyspace Connection Details
    • WSO2 CSS Sample
    • File System as a Service - FSaaS
    • File System as a Service - FSaaSThe volume will be allocated from a HDFS cluster they areisolated from other tenants in Stratos it is integrated with WSO2Security model.Users can manage and share his File system through StratosWeb Console and use the file system like any other filesystem.
    • FSaaS Sample
    • Data Processing - Mapreduce• Mapreduce is inspired by map and reduce functions used in functional programming. o Initially introduced by Google with some parts being patented.• Hadoop is a Mapreduce implementation that comes under Apache license agreement.• WSO2 provides Mapreduce as a service.• WSO2 Business Activity Monitor (BAM2) is an example use- case for WSO2s Mapreduce as a service.
    • WSO2 Mapreduce• WSO2 Mapreduce is secure.• WSO2 Mapreduce can use both FSaaS and DSS. o HDFS (FSaaS) o Cassendra (DSS)
    • WSO2 Mapreduce
    • WSO2 Mapreduce
    • WSO2 Mapreduce
    • WSO2 Mapreduce
    • WSO2 Mapreduce
    • WSO2 Mapreduce
    • Q&A
    • WSO2• Founded in 2005 by acknowledged leaders in XML, Web Services Technologies & Standards and Open Source• Producing entire middleware platform 100% open source under Apache license• Business model is to sell comprehensive support & maintenance for our products• Venture funded by Intel Capital and Quest Software.• Global corporation with offices in USA, UK & Sri Lanka• 150+ employees and growing.
    • Selected Customers https://ail.google.com/mail/u/0/?ui=2&i k=ad9ae58f41&view=att&th=1331a70 983344a32&attid=0.1&disp=thd&reala ttid=f_gtxto6mk0&zw
    • WSO2 engagement model• QuickStart• Development Support• Development Services• Production Support• Turnkey Solutions • WSO2 Mobile Services Solution • WSO2 FIX Gateway Solution • WSO2 SAP Gateway Solution