BalanceLine4j Framework Overview

Gilberto Augusto Holms
gibaholms85@gmail.com
@gibaholms
http://gibaholms.wordpress.com/

http://gibaholms.wordpress.com/ Revision: 01

About me...

Gilberto Augusto Holms

 Java and SOA Architect
 Expertise: Java, EAI, SOA, BPEL, BPM, Oracle Fusion Middleware
 Interests: OpenSource, Artificial Intelligence, Innovation
 Twitter: @gibaholms
 Blog: http://gibaholms.wordpress.com/
 SCJA, SCJP, SCWCD, SCBCD, SCDJWS, OCE WLP 10g


Balance Line Algorithm

What is “Balance Line” ?

Balance Line is an algorithm, a computational
technique to coordinate the processing of
sequential massive data.



What are “Sequential Data” ?

Sequential Data are big data sets, from one or
more data sources, that have a common key and
present themselves ordered by that key.



Why to use ?

 Improves the processing performance

 Saves computational resources



When to use ?

 Data synchronization (like iPod)

 Data loading (full or partial)

 Data conciliation


Case Study

The “X” company have in your database a big table containing main
information about all the banks and agencies of the country
(number, address, contacts). Daily, this company receives from the Central
Bank a file that is a huge text file containing the newest data about the
agencies, where might occur the following conditions:
Data update (changes on number, address, contacts and so on)
Agency not exists anymore
New agency added

Our work is to develop a software to maintain this table up-to-date, making
the file process and syncronize the record changes.


Dummy Solution
Begin

For each text file line

Check if the agency
exists

Exists Y Check if the agency Data
N
? changed data changed ?

N Y

INSERT UPDATE

N End of file
?

Y
For each record that
DELETE
not exists anymore
End

Balance Line Algorithm Concepts
 Master File
Is the main data set, represents the final view of the data, the
persistent, the reference, the orign.
 Transaction File
Is the secoundary data set, represents the transactions
made, contais the data that must be syncronized with the orign.
 Key
Is an unique identificator that identifies one single record (can be
a single field, a mix of fields, a SHA-1 hash and so on).
BalanceLine
Transaction

Master BalanceLine
Transaction

...

Balance Line Algorithm Concepts

The big secret ...

SORTING BY KEY !


Balance Line Algorithm – Step by Step

1 – Identify one unique key

Master Transaction

10 ..... 3 .....

5 ..... 10 .....

20 ..... 18 .....

17 ..... 17 .....



2 – Sort the data sources (ascending)

Master Transaction

5 ..... 3 .....

10 ..... 10 .....

17 ..... 17 .....

20 ..... 18 .....



3 – Prepare two “pointers”

Master Transaction

5 ..... 3 .....

10 ..... 10 .....

17 ..... 17 .....

20 ..... 18 .....



4 – Begin key comparison
KM > KT  INSERT, moves T
Master Transaction

5 ..... 3 .....

10 ..... 10 .....

17 ..... 17 .....

20 ..... 18 .....



KM < KT  DELETE, moves M
Master Transaction

5 ..... 3 .....

10 ..... 10 .....

17 ..... 17 .....

20 ..... 18 .....



KM = KT  UPDATE, moves M and T
Master Transaction

5 ..... 3 .....

10 ..... 10 .....

17 ..... 17 .....

20 ..... 18 .....



KM (no KT)  DELETE, moves M
Master Transaction

5 ..... 3 .....

10 ..... 10 .....

17 ..... 17 .....

20 ..... 18 .....



5 – Final master file

Master

3 .....

10 .....

17 .....

18 .....


BalanceLine4j Framework
 Java implementation of Balance Line algorithm
 Focus on business rules and let the framework handle the
algorithm
 Provides abstraction of Sequential Data Sources that can be any
sortable data set (Comparable<T>):
 Object Collections, Sets, Maps
 Text files (with a built-in text file sorter)
 Database Resultsets
 Custom (interface provided)
 Algorithm run by data streaming, little memory consumption
 Easy to use, easy API, no knowledge of the algorithm required
 Better to maintain and evolve because it promotes isolation of
business rules out of the algorithm code


BalanceLine4j Framework – Additional Features
 FileSorter.java

The framework provides a great file sorter class capable of safely
sort big quantity of text data without memory overflow, because
it uses the file system to write temporary chunks of data and then
merge-sort all chunks.


Back to Case Study
 Master File: bank agencies database table (select * order by)
 Transaction File: positional text file with the newest agencies
information (if not sorted, use the FileSorter class)
 Key: string concatenation of bank number + agency number
 Sync Mode: full (if the agency not exists anymore, delete it)

Benchmark: Dummy Solution vs. Balance Line Solution


Back to Case Study
 Dummy Solution
 1 random access for each transaction record
 33.218 lines x 1 query with “where” clause = 33.218
queries with “where” clause
 Same slow processing time in every sync

 Balance Line Solution
 1 single sequential access
 1 query with “order by” clause
 Fastest processing time in first sync (70% up) and much
more faster in next syncs (less changes = less processing
time because keys moves faster)


BalanceLine4j Framework – Complementary Strategies
To further increase performance of the Balance Line processing
algorithm, there are some complementary techniques that can be
used:

 Dump data from database to text, work at filesystem I/O level and
then update the database (filesystem I/O is faster than
networking I/O)
 Sometimes using a hash code (MD5, SHA-1) to check if a record
have changed is faster than compare field by field
 Use a transaction code (insert, update, delete) to identify the
transaction type made per record in transaction file
 Buffer some records into memory to optimize the data streaming


More Information and Samples

 Project Site: https://github.com/gibaholms/balanceline4j/

 Authors Blog: http://gibaholms.wordpress.com/

 Authors Twitter: @gibaholms

Thanks !

gibaholms85@gmail.com

BalanceLine4j Framework Overview

Recommended

Recommended

More Related Content

Similar to BalanceLine4j Framework Overview

Similar to BalanceLine4j Framework Overview (7)

Recently uploaded

Recently uploaded (20)

BalanceLine4j Framework Overview