Three cool innovations from Zarafa
Steve Hardy
48h feature run update: twitter
•   Renamed ‘twidget’
•   Will parse email body to find
    twitter references and show it in
    a widget
48h feature run: PDF preview
•   An existing PDF previewer library seems quite good
•   PDF previewer can work like JPG/PNG viewer
•   No screenshots yet
48h feature run: SugarCRM archive
•   Was already available for WebAccess
•   Send entire email to sugarcrm
•   Reuse of some WebAccess code
•   Looks promising
48h feature run: commandline permissions
   •      Delegates most interesting
   •      Python script
setpermissions.py [OPTIONS] mailbox

Options:
 Permission folders:
                --calendar [none|readonly|secretary|owner]         Set the permissions for the Calendar folder. Default <none>
                --tasks [none|readonly|secretary|owner]            Set the permissions for the Tasks folder. Default <none>
                --inbox [none|readonly|secretary|owner]            Set the permissions for the Inbox folder. Default <none>
                --contacts [none|readonly|secretary|owner]         Set the permissions for the Contacts folder. Default <none>
                --notes [none|readonly|secretary|owner]            Set the permissions for the Notes folder. Default <none>
                --journal [none|readonly|secretary|owner]          Set the permissions for the Journal folder. Default <none>
                --store [none|readonly|secretary|owner]            Set the permissions for the Store folder. Default <readonly>
 Delegate options:
                --seeprivate                                       Delegator can see private items
                --sendcopy                                         Delegator receives copies of the meeting-related messages sent to mailbox owner.
                --send-only-to-delegators                          Send meeting requests and response only to the delegator, not to the mailbox owner.

 Users or group:
               --users usernames                                                                   User which get the permissions for the given
mailbox.
               --groups groupnames                                                                 Group which get the permissions for the given
mailbox.

Global options: [-h|--host path]
    -h path             Connect through <path>, e.g. file:///var/run/socket
    -V                Print version info.
    --help             Show this help text.
48h feature run: Sizes in GB, MB, KB
•   It’s done!

•   Will be in 7.1

•   Any byte size in any applications can be specified with

     – “cache_cell_size = 16M” instead of
     – “cache_cell_size = 16777216”
48h feature run: Exchange / Zarafa replication
•   Pro’s
    – ICS is the same (synchronization mechanism)
    – Generic synchronization should work
•   Con’s
    – Address conversion needed in many places
    – ID conversion needed in some cases
48h feature run: dropbox
•   BONUS FEATURE

•   Apparently someone made some kind of dropbox integration

•   Actual details are a surprise
zarafa-search
• Performance of zaraf-
  indexer could be
  improved                   Lucene
• Clucene storage          Text analysis
  removed
• Switched to
  kyotocabinet-based         Lucene
  storage system           Text indexer
• Renamed from zarafa-
  indexer to zarafa-
  search                     Lucene
                           Database file
• Included in Zarafa 7.1
Why not CLucene ?
•   One-size-does-not-fit-all
     – Clucene optimized for web-like searches
     – Top-N results shown, others ignored
•   MAPI requires
     – All results
     – Often many results
•   I/O access patterns not optimized for MAPI-like searches
I/O solution: re-engineer it yourself
•   Generic database
     –   Kyoto cabinet
     –   Key/value database                 Lucene
     –   High performance                 Text analysis
     –   Compression
     –   Low overhead
     –   Crash-safe                       Zarafa-search
•   Lucene still used for text analysis    text indexer



                                          Kyoto cabinet
                                          database file
Improved search protocol
•   Search is local to server
•   Multiple servers using single indexer no longer possible
•   Document ID’s are local
•   No more mapping needed between document ID and database ID
    – Saves IOPS during search




        Zarafa-       Zarafa-        Zarafa-       Zarafa-
        server        search         server        search



                  node1                        node2
IOPS comparison
IOPS                     Old     New
Search with 10 hits      11      1
Search with 100 hits     101     5
Search with 10000 hits   10001   5
Indexing many users

Old method                 New method

-Scan all users            -Wait for signal from
-Scan all folders          zarafa-server
-Update index              -Index only items
                           received



   Latency: > 30 minutes      Latency: < 10 seconds
Upgrade path
•   Old indexes are not compatible with new indexes
•   New indexes must be generated
•   Just start zarafa-search and let it index your data
Other indexer optimizations
•   Zarafa-server now optimized for streaming output
    –   Uses MySQL stored procedures
    –   Parallel processing between MySQL and zarafa-server
    –   Increases message throughput
    –   > 150 messages (emails) per second
•   New protocol feature: streaming de-stub
    – For indexing archived messages
    – Streams data from archive server just like ‘normal’ indexing
    – Only possible in zarafa-server 7.1 (will fail if your archive server is
      running 7.0)
•   Compressed indexes
    – Indexes are now about ½ size of previous Lucene indexes
Zarafa-import
•   Uses libpff (open source library)
•   Able to read .PST files from linux commandline
•   Currently imports entire content into MAPI store
•   Supports unicode and non-unicode PSTs (97-2002 PSTs vs 2003
    PSTs)

Things it does not do:
- Email address conversion
- Folder mapping

Future nice-to-have:
- Import directly into archive

Zarafa SummerCamp 2012 - Keynote Steve Hardy - 3 Cool innovations

  • 1.
    Three cool innovationsfrom Zarafa Steve Hardy
  • 2.
    48h feature runupdate: twitter • Renamed ‘twidget’ • Will parse email body to find twitter references and show it in a widget
  • 3.
    48h feature run:PDF preview • An existing PDF previewer library seems quite good • PDF previewer can work like JPG/PNG viewer • No screenshots yet
  • 4.
    48h feature run:SugarCRM archive • Was already available for WebAccess • Send entire email to sugarcrm • Reuse of some WebAccess code • Looks promising
  • 5.
    48h feature run:commandline permissions • Delegates most interesting • Python script setpermissions.py [OPTIONS] mailbox Options: Permission folders: --calendar [none|readonly|secretary|owner] Set the permissions for the Calendar folder. Default <none> --tasks [none|readonly|secretary|owner] Set the permissions for the Tasks folder. Default <none> --inbox [none|readonly|secretary|owner] Set the permissions for the Inbox folder. Default <none> --contacts [none|readonly|secretary|owner] Set the permissions for the Contacts folder. Default <none> --notes [none|readonly|secretary|owner] Set the permissions for the Notes folder. Default <none> --journal [none|readonly|secretary|owner] Set the permissions for the Journal folder. Default <none> --store [none|readonly|secretary|owner] Set the permissions for the Store folder. Default <readonly> Delegate options: --seeprivate Delegator can see private items --sendcopy Delegator receives copies of the meeting-related messages sent to mailbox owner. --send-only-to-delegators Send meeting requests and response only to the delegator, not to the mailbox owner. Users or group: --users usernames User which get the permissions for the given mailbox. --groups groupnames Group which get the permissions for the given mailbox. Global options: [-h|--host path] -h path Connect through <path>, e.g. file:///var/run/socket -V Print version info. --help Show this help text.
  • 6.
    48h feature run:Sizes in GB, MB, KB • It’s done! • Will be in 7.1 • Any byte size in any applications can be specified with – “cache_cell_size = 16M” instead of – “cache_cell_size = 16777216”
  • 7.
    48h feature run:Exchange / Zarafa replication • Pro’s – ICS is the same (synchronization mechanism) – Generic synchronization should work • Con’s – Address conversion needed in many places – ID conversion needed in some cases
  • 8.
    48h feature run:dropbox • BONUS FEATURE • Apparently someone made some kind of dropbox integration • Actual details are a surprise
  • 9.
    zarafa-search • Performance ofzaraf- indexer could be improved Lucene • Clucene storage Text analysis removed • Switched to kyotocabinet-based Lucene storage system Text indexer • Renamed from zarafa- indexer to zarafa- search Lucene Database file • Included in Zarafa 7.1
  • 10.
    Why not CLucene? • One-size-does-not-fit-all – Clucene optimized for web-like searches – Top-N results shown, others ignored • MAPI requires – All results – Often many results • I/O access patterns not optimized for MAPI-like searches
  • 11.
    I/O solution: re-engineerit yourself • Generic database – Kyoto cabinet – Key/value database Lucene – High performance Text analysis – Compression – Low overhead – Crash-safe Zarafa-search • Lucene still used for text analysis text indexer Kyoto cabinet database file
  • 12.
    Improved search protocol • Search is local to server • Multiple servers using single indexer no longer possible • Document ID’s are local • No more mapping needed between document ID and database ID – Saves IOPS during search Zarafa- Zarafa- Zarafa- Zarafa- server search server search node1 node2
  • 13.
    IOPS comparison IOPS Old New Search with 10 hits 11 1 Search with 100 hits 101 5 Search with 10000 hits 10001 5
  • 14.
    Indexing many users Oldmethod New method -Scan all users -Wait for signal from -Scan all folders zarafa-server -Update index -Index only items received Latency: > 30 minutes Latency: < 10 seconds
  • 15.
    Upgrade path • Old indexes are not compatible with new indexes • New indexes must be generated • Just start zarafa-search and let it index your data
  • 16.
    Other indexer optimizations • Zarafa-server now optimized for streaming output – Uses MySQL stored procedures – Parallel processing between MySQL and zarafa-server – Increases message throughput – > 150 messages (emails) per second • New protocol feature: streaming de-stub – For indexing archived messages – Streams data from archive server just like ‘normal’ indexing – Only possible in zarafa-server 7.1 (will fail if your archive server is running 7.0) • Compressed indexes – Indexes are now about ½ size of previous Lucene indexes
  • 17.
    Zarafa-import • Uses libpff (open source library) • Able to read .PST files from linux commandline • Currently imports entire content into MAPI store • Supports unicode and non-unicode PSTs (97-2002 PSTs vs 2003 PSTs) Things it does not do: - Email address conversion - Folder mapping Future nice-to-have: - Import directly into archive