:)
tiny   :projects
Tesseract OCR1985       2006HP       Google
Tesseract OCR2006       2011TIFF              *
Tesseract OCR2009       2010Text      layout
Tesseract OCR2007          2011 6               33
Tesseract OCR  Arabic, English, Bulgarian, Catalan, Czech, Chinese (Simplified and Traditional), Danish(standard and Frakt...
Tesseract OCROfficially supported: Probably runs on:
Image processing
Google Refine
Runs on:
Runs in:
Major features:Import from anywhereFacetingClusteringSplit crate custom columnsGREL transformationsExport/etc
google protocol buffers                                   Person person;                                   person.set_id(1...
512   bytes / tweet  340,000,000   tweets / day (2012)7,253,333,333   bytes / hour    2,014,814   bytes / second        1,...
+ =
+ =
+ =
>   + =
>   + =
>   + =?    MapReduce
snappyhttp://code.google.com/p/snappy/
snappyFast                StableRobust                  Free and BSD
Size(less is better)                                             compression ratio (%)80706050403020100     lzjb 2010 lzo ...
6                                     Data types                    5                    4compression ratio               ...
Sizefrom 20% to 100% bigger                :(     ...not for amazon glacier
Speed is better)                                            Compression (MB/s) (more25020015010050  0      lzjb 2010   lzo...
Speed is better)                                          Decompression (MB/s) (more50045040035030025020015010050  0      ...
On 1 core of 64-bit Core i7 processor:  • Compression:        250MB/s  • Decompression: 500MB/s                   :P
Portable, but...
Portable, but primarily optimizedfor 64-bit x86-compatibleprocessors
Used: BigTableMapReduceGoogle RPC Hadoop
Bindings:
@TarasRoshko       HTTP headers here:http://code.google.com/p/snappy/source/browse/trunk/framing_for             mat.txt
QA?   Ostap Andrusiv      Software Engineer      Eleks software      @p1f
Tiny Google Projects
Tiny Google Projects
Tiny Google Projects
Tiny Google Projects
Tiny Google Projects
Tiny Google Projects
Tiny Google Projects
Tiny Google Projects
Tiny Google Projects
Upcoming SlideShare
Loading in …5
×

Tiny Google Projects

2,144 views
2,038 views

Published on

Presentation about 3 Google projects.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,144
On SlideShare
0
From Embeds
0
Number of Embeds
1,407
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • In-memory test (compression and decompression) with ENWIK8 using1 core of Intel Xeon X5355 @ 2.66GHz (64-bit compilation under gcc 4.1.1 (Linux) -O3 -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math --param inline-unit-growth=999 -DNDEBUG)
  • zlibsnappyplain text1.5-1.72.7html2-4 3-7 jpeg11
  • http://aws.amazon.com/glacier/
  • http://pastebin.com/SFaNzRuf
  • http://encode.ru/threads/1255-Google-released-Snappy-compression-decompression-library
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • http://www.cloudera.com/blog/2011/09/snappy-and-hadoop/
  • Tiny Google Projects

    1. 1. :)
    2. 2. tiny :projects
    3. 3. Tesseract OCR1985 2006HP Google
    4. 4. Tesseract OCR2006 2011TIFF *
    5. 5. Tesseract OCR2009 2010Text layout
    6. 6. Tesseract OCR2007 2011 6 33
    7. 7. Tesseract OCR Arabic, English, Bulgarian, Catalan, Czech, Chinese (Simplified and Traditional), Danish(standard and Fraktur script), German, Greek,Finnish, French, Hebrew, Croatian, Hungarian,Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak (standard and Fraktur script), Slovenian, Spanish, Serbian, Swedish, Tagalog, Thai, Turkish, Ukrainian and Vietnamese
    8. 8. Tesseract OCROfficially supported: Probably runs on:
    9. 9. Image processing
    10. 10. Google Refine
    11. 11. Runs on:
    12. 12. Runs in:
    13. 13. Major features:Import from anywhereFacetingClusteringSplit crate custom columnsGREL transformationsExport/etc
    14. 14. google protocol buffers Person person; person.set_id(123); >message Person { person.set_name("Bob"); required int32 id = 1; person.set_email("bob@example.com"); required string name = 2; optional string email = 3; fstream out("person.pb", ios::out ...} person.SerializeToOstream(&out); out.close();
    15. 15. 512 bytes / tweet 340,000,000 tweets / day (2012)7,253,333,333 bytes / hour 2,014,814 bytes / second 1,921 Mbytes / second 15,371 Mbits / second 8 Tbytes / day (2011) Google: ~ 377M searches/day
    16. 16. + =
    17. 17. + =
    18. 18. + =
    19. 19. > + =
    20. 20. > + =
    21. 21. > + =? MapReduce
    22. 22. snappyhttp://code.google.com/p/snappy/
    23. 23. snappyFast StableRobust Free and BSD
    24. 24. Size(less is better) compression ratio (%)80706050403020100 lzjb 2010 lzo 2.04 1x fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1 lzf lzrw1-a lzrw2 lzrw3 lzrw3-a snappy quicklz quicklz 1 2 1.0 1.5.0 -1 1.5.0 -2
    25. 25. 6 Data types 5 4compression ratio 3 snappy zlib 2 1 0 plain text html jpeg
    26. 26. Sizefrom 20% to 100% bigger :( ...not for amazon glacier
    27. 27. Speed is better) Compression (MB/s) (more25020015010050 0 lzjb 2010 lzo 2.04 fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1 lzf lzrw1-a lzrw2 lzrw3 lzrw3-a snappy quicklz quicklz 1x 1 2 1.0 1.5.0 -1 1.5.0 -2
    28. 28. Speed is better) Decompression (MB/s) (more50045040035030025020015010050 0 lzjb 2010 lzo 2.04 fastlz 0.1 - fastlz 0.1 - 3.6 vf lzf 3.6 uf lzrw1 lzf lzrw1-a lzrw2 lzrw3 lzrw3-a snappy quicklz quicklz 1x 1 2 1.0 1.5.0 -1 1.5.0 -2
    29. 29. On 1 core of 64-bit Core i7 processor: • Compression: 250MB/s • Decompression: 500MB/s :P
    30. 30. Portable, but...
    31. 31. Portable, but primarily optimizedfor 64-bit x86-compatibleprocessors
    32. 32. Used: BigTableMapReduceGoogle RPC Hadoop
    33. 33. Bindings:
    34. 34. @TarasRoshko HTTP headers here:http://code.google.com/p/snappy/source/browse/trunk/framing_for mat.txt
    35. 35. QA? Ostap Andrusiv Software Engineer Eleks software @p1f

    ×