16. BLOBSYNC AWESOMESAUCE
• DETECTS CHANGES
• DOES NOT NEED ORIGINAL FILE TO DETECT CHANGES
• UPLOADS/DOWNLOADS CHANGES ONLY
• A TRANSPARENT BLACKBOX… OPEN SOURCE BUT CAN TREAT AS A BLACK BOX
30. SUCCESS!
• CAN NOW FIND BLOCKS EVEN WHEN MOVED
• IF WE CAN FIND A BLOCK WE CAN DETERMINE IF WE CAN REUSE IT
31. SUCCESS!
• CAN NOW FIND BLOCKS EVEN WHEN MOVED
• IF WE CAN FIND A BLOCK WE CAN DETERMINE IF WE CAN REUSE IT
• BUT…….
32. SUCCESS!
• CAN NOW FIND BLOCKS EVEN WHEN MOVED
• IF WE CAN FIND A BLOCK WE CAN DETERMINE IF WE CAN REUSE IT
• BUT…….
• MD5/SHA ETC ARE TOO SLOW TO DO THIS
33. • TOO SLOW? NO WAY!
• EG
• 100MB FILE/BLOB
• BLOCK OF 100K
• > 104M HASH CALCULATIONS. JUST TO FIND THAT ONE BLOCK
34. YOU HAVE TO ROLL WITH IT.
• ROLLING SIGNATURE
• EXTREMELY QUICK.
35. YOU HAVE TO ROLL WITH IT.
• ROLLING SIGNATURE
• EXTREMELY QUICK.
• DUE TO FALSE POSITIVES USE MD5/SHA AS CONFIRMATION STEP
36. YOU HAVE TO ROLL WITH IT.
• SIG = FUNC( 0 .. 4 )
37. YOU HAVE TO ROLL WITH IT.
• SIG = FUNC( 0 .. 4 )
• CALCULATE SIG OF 1..5 BASED OFF OLD SIG
• NEW SIG = OLDSIG – ARRAY[0] + ARRAY[5]
38. YOU HAVE TO ROLL WITH IT.
• CAN SEARCH ENTIRE FILE WITH MINIMAL CALCULATIONS. IE FAST!
39. SO WHAT NOW?
• CAN NOW SEARCH FILES QUICKLY FOR SIGNATURE MATCHES
• MEANS WE CAN FIGURE OUT WHAT IS COMMON BETWEEN CLOUD AND LOCAL
• CAN DOWNLOAD/UPLOAD ONLY THE DIFFERENCES.
43. LIES, MORE LIES AND STATISTICS
• SMALL DB (14M).
• CLEARED A SMALL TABLE.
• UPDATE 340K
• LARGE DB (555M).
• CLEARED A SMALL TABLE
• UPDATE 720K
• VM (8G).
• DELETED SOME FILES
• UPDATE 800M
44. UPCOMING CHANGES
• DEFRAG
• DYNAMICALLY DETERMINE BLOCK SIZE
• BETTER PARALLEL UPLOAD/DOWNLOAD
• 32 BIT VERSION
45. LINKS
• BLOG ON BLOBSYNC:
• HTTPS://KPFAULKNER.WORDPRESS.COM/CATEGORY/BLOBSYNC/
• NUGET PACKAGE:
• HTTPS://WWW.NUGET.ORG/PACKAGES/BLOBSYNC/
• GITHUB WITH SOURCE:
• HTTPS://GITHUB.COM/KPFAULKNER/BLOBSYNC/