Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tiny Google Projects

2,250 views

Published on

Presentation about 3 Google projects.

Published in: Technology
  • Be the first to comment

Tiny Google Projects

  1. 1. :)
  2. 2. tiny :projects
  3. 3. Tesseract OCR1985 2006HP Google
  4. 4. Tesseract OCR2006 2011TIFF *
  5. 5. Tesseract OCR2009 2010Text layout
  6. 6. Tesseract OCR2007 2011 6 33
  7. 7. Tesseract OCR Arabic, English, Bulgarian, Catalan, Czech, Chinese (Simplified and Traditional), Danish(standard and Fraktur script), German, Greek,Finnish, French, Hebrew, Croatian, Hungarian,Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak (standard and Fraktur script), Slovenian, Spanish, Serbian, Swedish, Tagalog, Thai, Turkish, Ukrainian and Vietnamese
  8. 8. Tesseract OCROfficially supported: Probably runs on:
  9. 9. Image processing
  10. 10. Google Refine
  11. 11. Runs on:
  12. 12. Runs in:
  13. 13. Major features:Import from anywhereFacetingClusteringSplit crate custom columnsGREL transformationsExport/etc
  14. 14. google protocol buffers Person person; person.set_id(123); >message Person { person.set_name("Bob"); required int32 id = 1; person.set_email("bob@example.com"); required string name = 2; optional string email = 3; fstream out("person.pb", ios::out ...} person.SerializeToOstream(&out); out.close();
  15. 15. 512 bytes / tweet 340,000,000 tweets / day (2012)7,253,333,333 bytes / hour 2,014,814 bytes / second 1,921 Mbytes / second 15,371 Mbits / second 8 Tbytes / day (2011) Google: ~ 377M searches/day
  16. 16. + =
  17. 17. + =
  18. 18. + =
  19. 19. > + =
  20. 20. > + =
  21. 21. > + =? MapReduce
  22. 22. snappyhttp://code.google.com/p/snappy/
  23. 23. snappyFast StableRobust Free and BSD
  24. 24. Size(less is better) compression ratio (%)80706050403020100 lzjb 2010 lzo 2.04 1x fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1 lzf lzrw1-a lzrw2 lzrw3 lzrw3-a snappy quicklz quicklz 1 2 1.0 1.5.0 -1 1.5.0 -2
  25. 25. 6 Data types 5 4compression ratio 3 snappy zlib 2 1 0 plain text html jpeg
  26. 26. Sizefrom 20% to 100% bigger :( ...not for amazon glacier
  27. 27. Speed is better) Compression (MB/s) (more25020015010050 0 lzjb 2010 lzo 2.04 fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1 lzf lzrw1-a lzrw2 lzrw3 lzrw3-a snappy quicklz quicklz 1x 1 2 1.0 1.5.0 -1 1.5.0 -2
  28. 28. Speed is better) Decompression (MB/s) (more50045040035030025020015010050 0 lzjb 2010 lzo 2.04 fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1 lzf lzrw1-a lzrw2 lzrw3 lzrw3-a snappy quicklz quicklz 1x 1 2 1.0 1.5.0 -1 1.5.0 -2
  29. 29. On 1 core of 64-bit Core i7 processor: • Compression: 250MB/s • Decompression: 500MB/s :P
  30. 30. Portable, but...
  31. 31. Portable, but primarily optimizedfor 64-bit x86-compatibleprocessors
  32. 32. Used: BigTableMapReduceGoogle RPC Hadoop
  33. 33. Bindings:
  34. 34. @TarasRoshko HTTP headers here:http://code.google.com/p/snappy/source/browse/trunk/framing_for mat.txt
  35. 35. QA? Ostap Andrusiv Software Engineer Eleks software @p1f

×