SlideShare a Scribd company logo
1 of 26
Gilberto Augusto Holms
                                gibaholms85@gmail.com
                                       @gibaholms
                            http://gibaholms.wordpress.com/




http://gibaholms.wordpress.com/                     Revision: 01
About me...

Gilberto Augusto Holms

   Java and SOA Architect
   Expertise: Java, EAI, SOA, BPEL, BPM, Oracle Fusion Middleware
   Interests: OpenSource, Artificial Intelligence, Innovation
   Twitter: @gibaholms
   Blog: http://gibaholms.wordpress.com/
   SCJA, SCJP, SCWCD, SCBCD, SCDJWS, OCE WLP 10g




                            http://gibaholms.wordpress.com/
Balance Line Algorithm




  What is “Balance Line” ?

        Balance Line is an algorithm, a computational
        technique to coordinate the processing of
        sequential massive data.




                         http://gibaholms.wordpress.com/
Balance Line Algorithm




  What are “Sequential Data” ?

        Sequential Data are big data sets, from one or
        more data sources, that have a common key and
        present themselves ordered by that key.




                         http://gibaholms.wordpress.com/
Balance Line Algorithm




  Why to use ?

         Improves the processing performance

         Saves computational resources




                         http://gibaholms.wordpress.com/
Balance Line Algorithm




  When to use ?

         Data synchronization (like iPod)

         Data loading (full or partial)

         Data conciliation


                          http://gibaholms.wordpress.com/
Case Study


  The “X” company have in your database a big table containing main
  information about all the banks and agencies of the country
  (number, address, contacts). Daily, this company receives from the Central
  Bank a file that is a huge text file containing the newest data about the
  agencies, where might occur the following conditions:
       Data update (changes on number, address, contacts and so on)
       Agency not exists anymore
       New agency added

  Our work is to develop a software to maintain this table up-to-date, making
  the file process and syncronize the record changes.




                                 http://gibaholms.wordpress.com/
Dummy Solution
Begin

        For each text file line



        Check if the agency
               exists



                Exists      Y     Check if the agency                               Data
                                                                                                  N
                  ?                 changed data                                  changed ?

                     N                                                                  Y

               INSERT                                                             UPDATE



                                  N        End of file
                                               ?

                                                    Y
                                      For each record that
                                                                                         DELETE
                                       not exists anymore
                                                                                                      End
                                                http://gibaholms.wordpress.com/
Balance Line Algorithm Concepts
 Master File
  Is the main data set, represents the final view of the data, the
  persistent, the reference, the orign.
 Transaction File
  Is the secoundary data set, represents the transactions
  made, contais the data that must be syncronized with the orign.
 Key
  Is an unique identificator that identifies one single record (can be
  a single field, a mix of fields, a SHA-1 hash and so on).
                                  BalanceLine
                                                              Transaction


                   Master          BalanceLine
                                                              Transaction



                            http://gibaholms.wordpress.com/
                                                                ...
Balance Line Algorithm Concepts




  The big secret ...


                                   SORTING BY KEY !



                          http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  1 – Identify one unique key

            Master                                               Transaction


       10        .....                                       3          .....



       5         .....                                       10         .....


       20        .....                                       18         .....



       17        .....                                       17         .....


                           http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  2 – Sort the data sources (ascending)

            Master                                               Transaction


       5         .....                                       3          .....



       10        .....                                       10         .....


       17        .....                                       17         .....



       20        .....                                       18         .....


                           http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  3 – Prepare two “pointers”

            Master                                               Transaction


       5         .....                                       3          .....



       10        .....                                       10         .....


       17        .....                                       17         .....



       20        .....                                       18         .....


                           http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  4 – Begin key comparison
                         KM > KT  INSERT, moves T
            Master                                                Transaction


       5         .....                                        3          .....



       10        .....                                        10         .....


       17        .....                                        17         .....



       20        .....                                        18         .....


                            http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  4 – Begin key comparison
                         KM < KT  DELETE, moves M
            Master                                                Transaction


       5         .....                                        3          .....



       10        .....                                        10         .....


       17        .....                                        17         .....



       20        .....                                        18         .....


                            http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  4 – Begin key comparison
                         KM = KT  UPDATE, moves M and T
            Master                                                Transaction


       5         .....                                        3          .....



       10        .....                                        10         .....


       17        .....                                        17         .....



       20        .....                                        18         .....


                            http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  4 – Begin key comparison
                         KM = KT  UPDATE, moves M and T
            Master                                                Transaction


       5         .....                                        3          .....



       10        .....                                        10         .....


       17        .....                                        17         .....



       20        .....                                        18         .....


                            http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  4 – Begin key comparison
                         KM > KT  INSERT, moves T
            Master                                                Transaction


       5         .....                                        3          .....



       10        .....                                        10         .....


       17        .....                                        17         .....



       20        .....                                        18         .....


                            http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  4 – Begin key comparison
                         KM (no KT)  DELETE, moves M
            Master                                                Transaction


       5         .....                                        3          .....



       10        .....                                        10         .....


       17        .....                                        17         .....



       20        .....                                        18         .....


                            http://gibaholms.wordpress.com/
Balance Line Algorithm – Step by Step

  5 – Final master file

            Master


       3         .....



       10        .....


       17        .....



       18        .....


                           http://gibaholms.wordpress.com/
BalanceLine4j Framework
 Java implementation of Balance Line algorithm
 Focus on business rules and let the framework handle the
  algorithm
 Provides abstraction of Sequential Data Sources that can be any
  sortable data set (Comparable<T>):
       Object Collections, Sets, Maps
       Text files (with a built-in text file sorter)
       Database Resultsets
       Custom (interface provided)
 Algorithm run by data streaming, little memory consumption
 Easy to use, easy API, no knowledge of the algorithm required
 Better to maintain and evolve because it promotes isolation of
  business rules out of the algorithm code

                           http://gibaholms.wordpress.com/
BalanceLine4j Framework – Additional Features
 FileSorter.java

    The framework provides a great file sorter class capable of safely
    sort big quantity of text data without memory overflow, because
    it uses the file system to write temporary chunks of data and then
    merge-sort all chunks.




                            http://gibaholms.wordpress.com/
Back to Case Study
 Master File: bank agencies database table (select * order by)
 Transaction File: positional text file with the newest agencies
  information (if not sorted, use the FileSorter class)
 Key: string concatenation of bank number + agency number
 Sync Mode: full (if the agency not exists anymore, delete it)




    Benchmark: Dummy Solution vs. Balance Line Solution




                            http://gibaholms.wordpress.com/
Back to Case Study
 Dummy Solution
     1 random access for each transaction record
     33.218 lines x 1 query with “where” clause = 33.218
       queries with “where” clause
     Same slow processing time in every sync

 Balance Line Solution
      1 single sequential access
      1 query with “order by” clause
      Fastest processing time in first sync (70% up) and much
         more faster in next syncs (less changes = less processing
         time because keys moves faster)


                            http://gibaholms.wordpress.com/
BalanceLine4j Framework – Complementary Strategies
To further increase performance of the Balance Line processing
    algorithm, there are some complementary techniques that can be
    used:

 Dump data from database to text, work at filesystem I/O level and
  then update the database (filesystem I/O is faster than
  networking I/O)
 Sometimes using a hash code (MD5, SHA-1) to check if a record
  have changed is faster than compare field by field
 Use a transaction code (insert, update, delete) to identify the
  transaction type made per record in transaction file
 Buffer some records into memory to optimize the data streaming



                           http://gibaholms.wordpress.com/
More Information and Samples


 Project Site: https://github.com/gibaholms/balanceline4j/

 Authors Blog: http://gibaholms.wordpress.com/

 Authors Twitter: @gibaholms



                                                                Thanks !

                                                             gibaholms85@gmail.com
                           http://gibaholms.wordpress.com/

More Related Content

Similar to BalanceLine4j Framework Overview

Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Jongwook Woo
 
Count-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksCount-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksGuillaume Pitel
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
 
Dataframes in Spark - Data Analysts' perspective
Dataframes in Spark - Data Analysts' perspectiveDataframes in Spark - Data Analysts' perspective
Dataframes in Spark - Data Analysts' perspectiveMarcin Szymaniuk
 
Preprocessing CVS Data for Fine-Grained Analysis
Preprocessing CVS Data for Fine-Grained AnalysisPreprocessing CVS Data for Fine-Grained Analysis
Preprocessing CVS Data for Fine-Grained AnalysisThomas Zimmermann
 

Similar to BalanceLine4j Framework Overview (7)

London Scala Meetup - Omnia
London Scala Meetup - OmniaLondon Scala Meetup - Omnia
London Scala Meetup - Omnia
 
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
 
Count-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksCount-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasks
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Dataframes in Spark - Data Analysts' perspective
Dataframes in Spark - Data Analysts' perspectiveDataframes in Spark - Data Analysts' perspective
Dataframes in Spark - Data Analysts' perspective
 
Spark3
Spark3Spark3
Spark3
 
Preprocessing CVS Data for Fine-Grained Analysis
Preprocessing CVS Data for Fine-Grained AnalysisPreprocessing CVS Data for Fine-Grained Analysis
Preprocessing CVS Data for Fine-Grained Analysis
 

Recently uploaded

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 

Recently uploaded (20)

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 

BalanceLine4j Framework Overview

  • 1. Gilberto Augusto Holms gibaholms85@gmail.com @gibaholms http://gibaholms.wordpress.com/ http://gibaholms.wordpress.com/ Revision: 01
  • 2. About me... Gilberto Augusto Holms  Java and SOA Architect  Expertise: Java, EAI, SOA, BPEL, BPM, Oracle Fusion Middleware  Interests: OpenSource, Artificial Intelligence, Innovation  Twitter: @gibaholms  Blog: http://gibaholms.wordpress.com/  SCJA, SCJP, SCWCD, SCBCD, SCDJWS, OCE WLP 10g http://gibaholms.wordpress.com/
  • 3. Balance Line Algorithm What is “Balance Line” ? Balance Line is an algorithm, a computational technique to coordinate the processing of sequential massive data. http://gibaholms.wordpress.com/
  • 4. Balance Line Algorithm What are “Sequential Data” ? Sequential Data are big data sets, from one or more data sources, that have a common key and present themselves ordered by that key. http://gibaholms.wordpress.com/
  • 5. Balance Line Algorithm Why to use ?  Improves the processing performance  Saves computational resources http://gibaholms.wordpress.com/
  • 6. Balance Line Algorithm When to use ?  Data synchronization (like iPod)  Data loading (full or partial)  Data conciliation http://gibaholms.wordpress.com/
  • 7. Case Study The “X” company have in your database a big table containing main information about all the banks and agencies of the country (number, address, contacts). Daily, this company receives from the Central Bank a file that is a huge text file containing the newest data about the agencies, where might occur the following conditions: Data update (changes on number, address, contacts and so on) Agency not exists anymore New agency added Our work is to develop a software to maintain this table up-to-date, making the file process and syncronize the record changes. http://gibaholms.wordpress.com/
  • 8. Dummy Solution Begin For each text file line Check if the agency exists Exists Y Check if the agency Data N ? changed data changed ? N Y INSERT UPDATE N End of file ? Y For each record that DELETE not exists anymore End http://gibaholms.wordpress.com/
  • 9. Balance Line Algorithm Concepts  Master File Is the main data set, represents the final view of the data, the persistent, the reference, the orign.  Transaction File Is the secoundary data set, represents the transactions made, contais the data that must be syncronized with the orign.  Key Is an unique identificator that identifies one single record (can be a single field, a mix of fields, a SHA-1 hash and so on). BalanceLine Transaction Master BalanceLine Transaction http://gibaholms.wordpress.com/ ...
  • 10. Balance Line Algorithm Concepts The big secret ... SORTING BY KEY ! http://gibaholms.wordpress.com/
  • 11. Balance Line Algorithm – Step by Step 1 – Identify one unique key Master Transaction 10 ..... 3 ..... 5 ..... 10 ..... 20 ..... 18 ..... 17 ..... 17 ..... http://gibaholms.wordpress.com/
  • 12. Balance Line Algorithm – Step by Step 2 – Sort the data sources (ascending) Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  • 13. Balance Line Algorithm – Step by Step 3 – Prepare two “pointers” Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  • 14. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM > KT  INSERT, moves T Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  • 15. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM < KT  DELETE, moves M Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  • 16. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM = KT  UPDATE, moves M and T Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  • 17. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM = KT  UPDATE, moves M and T Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  • 18. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM > KT  INSERT, moves T Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  • 19. Balance Line Algorithm – Step by Step 4 – Begin key comparison KM (no KT)  DELETE, moves M Master Transaction 5 ..... 3 ..... 10 ..... 10 ..... 17 ..... 17 ..... 20 ..... 18 ..... http://gibaholms.wordpress.com/
  • 20. Balance Line Algorithm – Step by Step 5 – Final master file Master 3 ..... 10 ..... 17 ..... 18 ..... http://gibaholms.wordpress.com/
  • 21. BalanceLine4j Framework  Java implementation of Balance Line algorithm  Focus on business rules and let the framework handle the algorithm  Provides abstraction of Sequential Data Sources that can be any sortable data set (Comparable<T>):  Object Collections, Sets, Maps  Text files (with a built-in text file sorter)  Database Resultsets  Custom (interface provided)  Algorithm run by data streaming, little memory consumption  Easy to use, easy API, no knowledge of the algorithm required  Better to maintain and evolve because it promotes isolation of business rules out of the algorithm code http://gibaholms.wordpress.com/
  • 22. BalanceLine4j Framework – Additional Features  FileSorter.java The framework provides a great file sorter class capable of safely sort big quantity of text data without memory overflow, because it uses the file system to write temporary chunks of data and then merge-sort all chunks. http://gibaholms.wordpress.com/
  • 23. Back to Case Study  Master File: bank agencies database table (select * order by)  Transaction File: positional text file with the newest agencies information (if not sorted, use the FileSorter class)  Key: string concatenation of bank number + agency number  Sync Mode: full (if the agency not exists anymore, delete it) Benchmark: Dummy Solution vs. Balance Line Solution http://gibaholms.wordpress.com/
  • 24. Back to Case Study  Dummy Solution  1 random access for each transaction record  33.218 lines x 1 query with “where” clause = 33.218 queries with “where” clause  Same slow processing time in every sync  Balance Line Solution  1 single sequential access  1 query with “order by” clause  Fastest processing time in first sync (70% up) and much more faster in next syncs (less changes = less processing time because keys moves faster) http://gibaholms.wordpress.com/
  • 25. BalanceLine4j Framework – Complementary Strategies To further increase performance of the Balance Line processing algorithm, there are some complementary techniques that can be used:  Dump data from database to text, work at filesystem I/O level and then update the database (filesystem I/O is faster than networking I/O)  Sometimes using a hash code (MD5, SHA-1) to check if a record have changed is faster than compare field by field  Use a transaction code (insert, update, delete) to identify the transaction type made per record in transaction file  Buffer some records into memory to optimize the data streaming http://gibaholms.wordpress.com/
  • 26. More Information and Samples  Project Site: https://github.com/gibaholms/balanceline4j/  Authors Blog: http://gibaholms.wordpress.com/  Authors Twitter: @gibaholms Thanks ! gibaholms85@gmail.com http://gibaholms.wordpress.com/