Gilberto Augusto Holms                                gibaholms85@gmail.com                                       @gibahol...
About me...Gilberto Augusto Holms   Java and SOA Architect   Expertise: Java, EAI, SOA, BPEL, BPM, Oracle Fusion Middlew...
Balance Line Algorithm  What is “Balance Line” ?        Balance Line is an algorithm, a computational        technique to ...
Balance Line Algorithm  What are “Sequential Data” ?        Sequential Data are big data sets, from one or        more dat...
Balance Line Algorithm  Why to use ?         Improves the processing performance         Saves computational resources  ...
Balance Line Algorithm  When to use ?         Data synchronization (like iPod)         Data loading (full or partial)   ...
Case Study  The “X” company have in your database a big table containing main  information about all the banks and agencie...
Dummy SolutionBegin        For each text file line        Check if the agency               exists                Exists  ...
Balance Line Algorithm Concepts Master File  Is the main data set, represents the final view of the data, the  persistent...
Balance Line Algorithm Concepts  The big secret ...                                   SORTING BY KEY !                    ...
Balance Line Algorithm – Step by Step  1 – Identify one unique key            Master                                      ...
Balance Line Algorithm – Step by Step  2 – Sort the data sources (ascending)            Master                            ...
Balance Line Algorithm – Step by Step  3 – Prepare two “pointers”            Master                                       ...
Balance Line Algorithm – Step by Step  4 – Begin key comparison                         KM > KT  INSERT, moves T         ...
Balance Line Algorithm – Step by Step  4 – Begin key comparison                         KM < KT  DELETE, moves M         ...
Balance Line Algorithm – Step by Step  4 – Begin key comparison                         KM = KT  UPDATE, moves M and T   ...
Balance Line Algorithm – Step by Step  4 – Begin key comparison                         KM = KT  UPDATE, moves M and T   ...
Balance Line Algorithm – Step by Step  4 – Begin key comparison                         KM > KT  INSERT, moves T         ...
Balance Line Algorithm – Step by Step  4 – Begin key comparison                         KM (no KT)  DELETE, moves M      ...
Balance Line Algorithm – Step by Step  5 – Final master file            Master       3         .....       10        ........
BalanceLine4j Framework Java implementation of Balance Line algorithm Focus on business rules and let the framework hand...
BalanceLine4j Framework – Additional Features FileSorter.java    The framework provides a great file sorter class capable...
Back to Case Study Master File: bank agencies database table (select * order by) Transaction File: positional text file ...
Back to Case Study Dummy Solution     1 random access for each transaction record     33.218 lines x 1 query with “wher...
BalanceLine4j Framework – Complementary StrategiesTo further increase performance of the Balance Line processing    algori...
More Information and Samples Project Site: https://github.com/gibaholms/balanceline4j/ Authors Blog: http://gibaholms.wo...
Upcoming SlideShare
Loading in …5
×

BalanceLine4j Framework Overview

670 views
570 views

Published on

This presentation is an overview about BalanceLine4j Project, an implementation of the Balance Line Algorithm for Java applications.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
670
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BalanceLine4j Framework Overview

  1. 1. Gilberto Augusto Holms gibaholms85@gmail.com @gibaholms http://gibaholms.wordpress.com/http://gibaholms.wordpress.com/ Revision: 01
  2. 2. About me...Gilberto Augusto Holms Java and SOA Architect Expertise: Java, EAI, SOA, BPEL, BPM, Oracle Fusion Middleware Interests: OpenSource, Artificial Intelligence, Innovation Twitter: @gibaholms Blog: http://gibaholms.wordpress.com/ SCJA, SCJP, SCWCD, SCBCD, SCDJWS, OCE WLP 10g http://gibaholms.wordpress.com/
  3. 3. Balance Line Algorithm What is “Balance Line” ? Balance Line is an algorithm, a computational technique to coordinate the processing of sequential massive data. http://gibaholms.wordpress.com/
  4. 4. Balance Line Algorithm What are “Sequential Data” ? Sequential Data are big data sets, from one or more data sources, that have a common key and present themselves ordered by that key. http://gibaholms.wordpress.com/
  5. 5. Balance Line Algorithm Why to use ?  Improves the processing performance  Saves computational resources http://gibaholms.wordpress.com/
  6. 6. Balance Line Algorithm When to use ?  Data synchronization (like iPod)  Data loading (full or partial)  Data conciliation http://gibaholms.wordpress.com/
  7. 7. Case Study The “X” company have in your database a big table containing main information about all the banks and agencies of the country (number, address, contacts). Daily, this company receives from the Central Bank a file that is a huge text file containing the newest data about the agencies, where might occur the following conditions: Data update (changes on number, address, contacts and so on) Agency not exists anymore New agency added Our work is to develop a software to maintain this table up-to-date, making the file process and syncronize the record changes. http://gibaholms.wordpress.com/
  8. 8. Dummy SolutionBegin For each text file line Check if the agency exists Exists Y Check if the agency Data N ? changed data changed ? N Y INSERT UPDATE N End of file ? Y For each record that DELETE not exists anymore End http://gibaholms.wordpress.com/
  9. 9. Balance Line Algorithm Concepts Master File Is the main data set, represents the final view of the data, the persistent, the reference, the orign. Transaction File Is the secoundary data set, represents the transactions made, contais the data that must be syncronized with the orign. Key Is an unique identificator that identifies one single record (can be a single field, a mix of fields, a SHA-1 hash and so on). BalanceLine Transaction Master BalanceLine Transaction http://gibaholms.wordpress.com/ ...
  10. 10. Balance Line Algorithm Concepts The big secret ... SORTING BY KEY ! http://gibaholms.wordpress.com/
  11. 11. Balance Line Algorithm – Step by Step 1 – Identify one unique key Master Transaction 10 ..... 3 ..... 5 ..... 10 ..... 20 ..... 18 ..... 17 ..... 17 ..... http://gibaholms.wordpress.com/
  12. 12. Balance Line Algorithm – Step by Step 2 – Sort the data sources (ascending) Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  13. 13. Balance Line Algorithm – Step by Step 3 – Prepare two “pointers” Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  14. 14. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM > KT  INSERT, moves T Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  15. 15. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM < KT  DELETE, moves M Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  16. 16. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM = KT  UPDATE, moves M and T Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  17. 17. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM = KT  UPDATE, moves M and T Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  18. 18. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM > KT  INSERT, moves T Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  19. 19. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM (no KT)  DELETE, moves M Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  20. 20. Balance Line Algorithm – Step by Step 5 – Final master file Master 3 ..... 10 ..... 17 ..... 18 ..... http://gibaholms.wordpress.com/
  21. 21. BalanceLine4j Framework Java implementation of Balance Line algorithm Focus on business rules and let the framework handle the algorithm Provides abstraction of Sequential Data Sources that can be any sortable data set (Comparable<T>):  Object Collections, Sets, Maps  Text files (with a built-in text file sorter)  Database Resultsets  Custom (interface provided) Algorithm run by data streaming, little memory consumption Easy to use, easy API, no knowledge of the algorithm required Better to maintain and evolve because it promotes isolation of business rules out of the algorithm code http://gibaholms.wordpress.com/
  22. 22. BalanceLine4j Framework – Additional Features FileSorter.java The framework provides a great file sorter class capable of safely sort big quantity of text data without memory overflow, because it uses the file system to write temporary chunks of data and then merge-sort all chunks. http://gibaholms.wordpress.com/
  23. 23. Back to Case Study Master File: bank agencies database table (select * order by) Transaction File: positional text file with the newest agencies information (if not sorted, use the FileSorter class) Key: string concatenation of bank number + agency number Sync Mode: full (if the agency not exists anymore, delete it) Benchmark: Dummy Solution vs. Balance Line Solution http://gibaholms.wordpress.com/
  24. 24. Back to Case Study Dummy Solution  1 random access for each transaction record  33.218 lines x 1 query with “where” clause = 33.218 queries with “where” clause  Same slow processing time in every sync Balance Line Solution  1 single sequential access  1 query with “order by” clause  Fastest processing time in first sync (70% up) and much more faster in next syncs (less changes = less processing time because keys moves faster) http://gibaholms.wordpress.com/
  25. 25. BalanceLine4j Framework – Complementary StrategiesTo further increase performance of the Balance Line processing algorithm, there are some complementary techniques that can be used: Dump data from database to text, work at filesystem I/O level and then update the database (filesystem I/O is faster than networking I/O) Sometimes using a hash code (MD5, SHA-1) to check if a record have changed is faster than compare field by field Use a transaction code (insert, update, delete) to identify the transaction type made per record in transaction file Buffer some records into memory to optimize the data streaming http://gibaholms.wordpress.com/
  26. 26. More Information and Samples Project Site: https://github.com/gibaholms/balanceline4j/ Authors Blog: http://gibaholms.wordpress.com/ Authors Twitter: @gibaholms Thanks ! gibaholms85@gmail.com http://gibaholms.wordpress.com/

×