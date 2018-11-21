Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Anupama Pathirage November 2018
“ Data “A re-interpretable representation of information in a formalized manner suitable for communication, interpretation...
Source : https://en.wikipedia.org/wiki/Data The data volumes are exploding, more data has been created in the past two yea...
Source : https://whatsthebigdata.com/2015/10/28/predictive-analytics-determine-how-much-you-will-pay-next-time-you-will-fl...
○ Heterogeneous data sources - Databases, cloud systems, legacy systems, files, web content, etc. ○ Lack of a standard dat...
.bal JDBC EP MongoDB EP Cassandra EP Redis EP File EP Spreadsheet EP BVM Ballerina Application/Service Ballerina Runtime D...
Connecting with Different Data Sources
○ MySQL Endpoint - SQL endpoint customized for MySQL DB ○ H2 Endpoint - SQL endpoint customized for H2 DB ○ JDBC Endpoint ...
endpoint mysql:Client mysqlDB { host: "localhost", port: 3306, name: "testdb", username: "demouser", password: "password@1...
endpoint mongodb:Client conn { host: "localhost", dbName: "testdb", username: "demouser", password: "password@123" }; endp...
Type System Support
Ballerina is designed with a sophisticated type system with first-class support for different data types and formats. So u...
//Iterating table table<Employee> t1 = tableEmployee; foreach row in t1 { io:println(row); } //Converting a table to json ...
Ballerina Tables
Data type that organizes information in rows and columns. ○ Cursor based tables - Returned from ballerina database connect...
//Cursor based table table t1 = mysqlDB->select("SELECT id, age, name from employee", ()); //In memory table table<Employe...
Data Streaming
○ Table to JSON and table to XML type conversions result in streamed data. ○ With the data streaming functionality, when a...
Transactions
○ Ballerina supports - Local transactions - XA transactions - Distributed transactions ○ For distributed transactions, a p...
transaction with retries = 2 { //Update first table int count = check mysqlDB->update( "INSERT INTO employee (id, name, ag...
Data Security
○ Need to make sure we handle data in secure way - Support for Login Authentication - Support for Data Encryption - Preven...
//Use Config API endpoint mysql:Client mysqlDB { host: config:getAsString("DATABASE_HOST"), port: config:getAsInt("DATABAS...
Visualizing Data Service
○ How to : - Connect with different data sources - Handle data in different formats - Handle data efficiently using data s...
THANK YOU
Data integration
Upcoming SlideShare
Loading in …5
×

Data integration

9 views

Published on

With systems composed of multiple, collaborating services, there is a vast quantity of enterprise data scattered across heterogeneous data sources. Data integration allows businesses to combine data residing in different sources to provide users a consolidated view. How you choose to use, integrate with, and analyze enterprise data may be different than what you're used to. This session will discuss how Ballerina can be used to solve this problem. The following key aspects will also be discussed during the talk.

Connecting to different data sources using endpoints
First class support for SQL result-sets, JSON data, XML data, etc.
Dealing with the problem of consistent state using transactions
Streaming a large quantity of data

Published in: Software
no profile picture user

  • Be the first to comment

  • Be the first to like this

Data integration

  1. 1. Anupama Pathirage November 2018
  2. 2. “ Data “A re-interpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.” - Reference Model for an Open Archival Information System (OAIS)
  3. 3. Source : https://en.wikipedia.org/wiki/Data The data volumes are exploding, more data has been created in the past two years than in the entire previous history of the human race.
  4. 4. Source : https://whatsthebigdata.com/2015/10/28/predictive-analytics-determine-how-much-you-will-pay-next-time-you-will-fly-the-big-data-friendly-skies/
  5. 5. ○ Heterogeneous data sources - Databases, cloud systems, legacy systems, files, web content, etc. ○ Lack of a standard data format - Relational data, JSON, XML, csv, etc. ○ Bad data - Legacy data must be cleaned up prior to conversion and integration ○ Lack of data security & integrity - Need to provide an appropriate level of protection for all the data bringing together ○ Poor performance of data integration - Need to consider richness of data as well as the total time consumption to process ○ Lack of data management expertise - Process of transferring different data from its independent source to the integrated system requires expertise knowledge
  6. 6. .bal JDBC EP MongoDB EP Cassandra EP Redis EP File EP Spreadsheet EP BVM Ballerina Application/Service Ballerina Runtime Data Sources
  7. 7. Connecting with Different Data Sources
  8. 8. ○ MySQL Endpoint - SQL endpoint customized for MySQL DB ○ H2 Endpoint - SQL endpoint customized for H2 DB ○ JDBC Endpoint - Connects with SQL based tabular data sources via JDBC drivers ○ MongoDB Endpoint - Connects to MongoDB and allows data find & manipulation operations ○ Cassandra Endpoint - Connects with Cassandra data source and update, select data ○ Redis Endpoint - To connect Ballerina with Redis data source ○ FTP Endpoint - To connect to an FTP server and perform I/O operations ○ Google Spreadsheet Endpoint - To access the Google Spreadsheet API Version v4 through Ballerina
  9. 9. endpoint mysql:Client mysqlDB { host: "localhost", port: 3306, name: "testdb", username: "demouser", password: "password@123" }; endpoint h2:Client h2DB { path: "./h2/database", name: "testdb", username: "demouser", password: "password@123" }; endpoint jdbc:Client oracleDB { url: "jdbc:oracle:thin:@localhost:1521:testdb", username: "demouser", password: "password@123" }; MySQL Client Endpoint H2 Client Endpoint JDBC Client Endpoint
  10. 10. endpoint mongodb:Client conn { host: "localhost", dbName: "testdb", username: "demouser", password: "password@123" }; endpoint cassandra:Client conn { host: "localhost", port: 9042, username: "demouser", password: "password@123" }; endpoint redis:Client conn { host: "localhost", password: "password@123" }; MongoDB Client Endpoint Cassandra Client Endpoint Redis Client Endpoint
  11. 11. Type System Support
  12. 12. Ballerina is designed with a sophisticated type system with first-class support for different data types and formats. So users can generate, manipulate, and convert from one type to another easily. ○ Simple types: (), boolean, int, float, decimal, string ○ Structured types: tuple, array, map, record, table, xml, json ○ Behavioral types: error, function, future, object, stream, typedesc
  13. 13. //Iterating table table<Employee> t1 = tableEmployee; foreach row in t1 { io:println(row); } //Converting a table to json json jsonReturned = check <json>t1; //Converting a table to xml xml xmlReturned = check <xml>t1;
  14. 14. Ballerina Tables
  15. 15. Data type that organizes information in rows and columns. ○ Cursor based tables - Returned from ballerina database connector operations such as “select” or “call”. ○ In memory tables - Allows ballerina developers to create tables which adheres to a defined set of columns and manipulate data in it.
  16. 16. //Cursor based table table t1 = mysqlDB->select("SELECT id, age, name from employee", ()); //In memory table table<Employee> tableEmployee = table { { key id, age, name }, [ { 1, 20, "Mary" }, { 2, 30, "John" }, { 3, 23, "Jim" } ] };
  17. 17. Data Streaming
  18. 18. ○ Table to JSON and table to XML type conversions result in streamed data. ○ With the data streaming functionality, when a service client makes a request, the result is streamed to the service client rather than building the full result in the server and returning it. - Allows virtually unlimited payload sizes in the result - Response is instantaneous to the client
  19. 19. Transactions
  20. 20. ○ Ballerina supports - Local transactions - XA transactions - Distributed transactions ○ For distributed transactions, a protocol which can result in the joint outcome based on a coordinator is used. ○ Syntax support for defining transaction boundaries easily and handling transaction failures and retries.
  21. 21. transaction with retries = 2 { //Update first table int count = check mysqlDB->update( "INSERT INTO employee (id, name, age) VALUES (?,?,?)", id, name, age); //Update second table count = check mysqlDB->update("INSERT INTO salary (id, value) VALUES (?, ?)", id, salary); } onretry { log:printError("Transaction failed, retrying ..."); }
  22. 22. Data Security
  23. 23. ○ Need to make sure we handle data in secure way - Support for Login Authentication - Support for Data Encryption - Prevent SQL Injections
  24. 24. //Use Config API endpoint mysql:Client mysqlDB { host: config:getAsString("DATABASE_HOST"), port: config:getAsInt("DATABASE_PORT"), name: config:getAsString("DATABASE_NAME"), username: config:getAsString("DATABASE_USER"), password: config:getAsString("DATABASE_PASSWORD"), dbOptions: { useSSL: false} }; //Check for tainted values table<Employee> t1 = check mysqlDB->select("SELECT id, age, name from employee where name = " + untaint s1, Employee);
  25. 25. Visualizing Data Service
  26. 26. ○ How to : - Connect with different data sources - Handle data in different formats - Handle data efficiently using data streaming - Handle transactions - Handle data securely - Visualize the data service Demo Code : https://github.com/anupama-pathirage/BallerinaDemo/tree/master/BallerinaCon/samples
  27. 27. THANK YOU

×