Apache POI
           Recipes
           Paolo Mottadelli - ApacheCon Oakland 2009




  http://chromasia.com
Thursday, No...
paolo@apache.org



   my to-do list




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
...
paolo@apache.org




   POI @ Content Tech
      ✴ Document to application (and back)
               ✴ Publish data

     ...
Thursday, November 5, 2009
                             1
                             A-B-C
paolo@apache.org




   POI modules (1): OLE2
      ✴ POIFS: reading/writing Office
               Documents
      ✴ HSSF ...
paolo@apache.org




   POI modules (2): OOXML
      ✴ XSSF: r/w OXML Excel
      ✴ XWPF: r/w OXML Word
      ✴ XSLF: r/w ...
POI 3.5
  http://chromasia.com
Thursday, November 5, 2009
paolo@apache.org




   OOXML dev status
      ✴ XSSF: Final in POI-3.5
      ✴ XWPF: Draft (basic features)
      ✴ XSLF:...
paolo@apache.org




   HSSF & XSSF
      ✴ Common user model interface
      ✴ User model based on existing HSSF
      ✴ ...
Thursday, November 5, 2009
                             2
                             Same recipe,
                      ...
paolo@apache.org




   Common H/XSSF access
      ✴ org.apache.poi.ss.usermodel




                             - Apache...
paolo@apache.org




   Upgrading to POI-3.5
      ✴ HSSFFormulaEvaluator.CellValue
               ✴ convert from .hssf. t...
Thursday, November 5, 2009
                             3
                             Meet
                             O...
paolo@apache.org



                                               made (very) simple
   Open XML
      ✴ XML based
      ...
paolo@apache.org




   Package concepts
      ✴ Package (the container)
      ✴ Part (xml file)
      ✴ Relationship
    ...
paolo@apache.org




   Expanded package, Excel




                             - ApacheCon US 2009, Oakland - Apache POI...
paolo@apache.org




   WordprocessingML
      ✴ body
               ✴ paragraphs
                      ✴ runs


      ✴ p...
paolo@apache.org




   SpreadsheetML
      ✴ workbook
               ✴ worksheets
                      ✴ rows

         ...
paolo@apache.org




   PresentationML
      ✴ presentation
               ✴ slides

               ✴ slides-masters

    ...
Thursday, November 5, 2009
                             4
                             openxml4j
paolo@apache.org




   openXML4J
      ✴ Package, parts, rels


                                                         ...
Thursday, November 5, 2009
                             5
                             Text Extraction
paolo@apache.org




   Extractors
      ✴ POITextExtractor
               ✴ POIOLE2TextExtractor
                        ...
paolo@apache.org




   Text extraction
      ✴ made simple




                             - ApacheCon US 2009, Oakland ...
Thursday, November 5, 2009
                             6
                             EXCEL
                             ...
paolo@apache.org




   New Workbook




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
...
paolo@apache.org




   New Sheet




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thu...
paolo@apache.org




   Creating Cells




                             - ApacheCon US 2009, Oakland - Apache POI Recipes ...
paolo@apache.org




   Cell types




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Th...
paolo@apache.org




   Fills and colors




                             - ApacheCon US 2009, Oakland - Apache POI Recipe...
Thursday, November 5, 2009
                             7
                             EXCEL
                             ...
paolo@apache.org




   Export to XML




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -...
paolo@apache.org




   xmlMaps.xml




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
T...
paolo@apache.org




   XML Import/Export




                             - ApacheCon US 2009, Oakland - Apache POI Recip...
Thursday, November 5, 2009
                             8
                             WORD
                             S...
paolo@apache.org




   A simple doc




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
...
paolo@apache.org




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5...
Thursday, November 5, 2009
                             9
                             Use Case 1
                        ...
paolo@apache.org




   Use Case
      ✴ Upload a document
      ✴ Detect document mimetype
      ✴ Extract text and metad...
paolo@apache.org




   Without Tika
   ✴ Detect the document mimetype
               ✴ (source/target mimetype)

      ✴ ...
paolo@apache.org




   With Tika




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thu...
paolo@apache.org




   Extension use case
      ✴ Adding support for Office Open
               XML documents (Office 200...
paolo@apache.org




   POI text extractors
      ✴ Remember?




                             - ApacheCon US 2009, Oaklan...
paolo@apache.org




   Apache Tika (Excel)




                             - ApacheCon US 2009, Oakland - Apache POI Rec...
paolo@apache.org




   Apache Tika




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
T...
paolo@apache.org




   Apache Tika (Word)




                             - ApacheCon US 2009, Oakland - Apache POI Reci...
paolo@apache.org




   Apache Tika (Word)




                             - ApacheCon US 2009, Oakland - Apache POI Reci...
Thursday, November 5, 2009
                             10
                             Use Case 2
                       ...
paolo@apache.org




   Make your wb look pro-
      ✴ Rich text
      ✴ Graphics
      ✴ Formulas & Named Ranges
      ✴ ...
Thursday, November 5, 2009
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation
      ✴ The evaluation engine enables you
               to calculate formula r...
paolo@apache.org




   Formula evaluation (continued)
      ✴ All arithmetic operators are
               implemented
   ...
paolo@apache.org




   Formula evaluation (code)




                             - ApacheCon US 2009, Oakland - Apache P...
Thursday, November 5, 2009
                             11
                             Use Case 3:
                      ...
Thursday, November 5, 2009
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipe...
paolo@apache.org




   getParagraphs(...)
      ✴ Makes use of
               ✴ org.apache.poi.hwpf.usermodel.Range




 ...
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipe...
paolo@apache.org




   getTitle(...)
      ✴ Gets the first paragraph’s text




                             - ApacheCon...
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipe...
paolo@apache.org




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5...
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
                             12
                             Want more?
paolo@apache.org




   More Examples
      ✴ http://poi.apache.org/spreadsheet/examples.html




                        ...
paolo@apache.org




   Even more
      ✴ Get in touch
               ✴ http://poi.apache.org/

      ✴ Get informed
     ...
paolo@apache.org




      ✴ Get slides
               ✴ http://www.slideshare.net/paolomoz/apache-poi-recipes




   Than...
Upcoming SlideShare
Loading in...5
×

Apache Poi Recipes

6,159

Published on

Apache POI Recipes, presented at ApacheCon US 2009 in Oakland, gives a general description of Apache POI project and describes 3 use cases where POI functionalities are used in real applications.

Published in: Technology, Self Improvement
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,159
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
136
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Apache Poi Recipes

  1. 1. Apache POI Recipes Paolo Mottadelli - ApacheCon Oakland 2009 http://chromasia.com Thursday, November 5, 2009
  2. 2. paolo@apache.org my to-do list - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  3. 3. paolo@apache.org POI @ Content Tech ✴ Document to application (and back) ✴ Publish data ✴ Build a doc from your content ✴ Know your documents ✴ Extract text ✴ Extract content - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  4. 4. Thursday, November 5, 2009 1 A-B-C
  5. 5. paolo@apache.org POI modules (1): OLE2 ✴ POIFS: reading/writing Office Documents ✴ HSSF r/w Excel Spreadsheets ✴ HWPF r/w Word Docs ✴ HSLF r/w PowerPoint Docs ✴ HPSF r/w property sets - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  6. 6. paolo@apache.org POI modules (2): OOXML ✴ XSSF: r/w OXML Excel ✴ XWPF: r/w OXML Word ✴ XSLF: r/w OXML PowerPoint - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  7. 7. POI 3.5 http://chromasia.com Thursday, November 5, 2009
  8. 8. paolo@apache.org OOXML dev status ✴ XSSF: Final in POI-3.5 ✴ XWPF: Draft (basic features) ✴ XSLF: Not covered (only text ext.) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  9. 9. paolo@apache.org HSSF & XSSF ✴ Common user model interface ✴ User model based on existing HSSF ✴ Using OpenXML4J and SAX - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  10. 10. Thursday, November 5, 2009 2 Same recipe, different flavours
  11. 11. paolo@apache.org Common H/XSSF access ✴ org.apache.poi.ss.usermodel - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  12. 12. paolo@apache.org Upgrading to POI-3.5 ✴ HSSFFormulaEvaluator.CellValue ✴ convert from .hssf. to .ss. ✴ HSSFRow.MissingCellPolicy ✴ convert from .hssf. to .ss. ✴ RecordFormatException in DDF ✴ convert from .hssf. to .util. Dreadful Drawing Format - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  13. 13. Thursday, November 5, 2009 3 Meet Office Open XML
  14. 14. paolo@apache.org made (very) simple Open XML ✴ XML based ✴ WordprocessingML ✴ SpreadsheetML ✴ PresentationML ✴ Stored as a package ✴ Open Packaging Conventions - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  15. 15. paolo@apache.org Package concepts ✴ Package (the container) ✴ Part (xml file) ✴ Relationship ✴ package-relationship ✴ part-relationship - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  16. 16. paolo@apache.org Expanded package, Excel - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  17. 17. paolo@apache.org WordprocessingML ✴ body ✴ paragraphs ✴ runs ✴ properties (for runs and pars) ✴ styles ✴ headers/footers ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  18. 18. paolo@apache.org SpreadsheetML ✴ workbook ✴ worksheets ✴ rows ✴ cells ✴ styles ✴ formulas ✴ images ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  19. 19. paolo@apache.org PresentationML ✴ presentation ✴ slides ✴ slides-masters ✴ notes-masters ✴ layout, animation, audio, video, transitions ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  20. 20. Thursday, November 5, 2009 4 openxml4j
  21. 21. paolo@apache.org openXML4J ✴ Package, parts, rels "/xl/worksheets/sheet1.xml" - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  22. 22. Thursday, November 5, 2009 5 Text Extraction
  23. 23. paolo@apache.org Extractors ✴ POITextExtractor ✴ POIOLE2TextExtractor getT xt() e ✴ POIXMLTextExtractor ✴ XSSFExcelExtractor ✴ XWPFWordExtractor ✴ XSLFPowerPointExtractor ✴ If text is all what you need - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  24. 24. paolo@apache.org Text extraction ✴ made simple - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  25. 25. Thursday, November 5, 2009 6 EXCEL Simple Tasks
  26. 26. paolo@apache.org New Workbook - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  27. 27. paolo@apache.org New Sheet - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  28. 28. paolo@apache.org Creating Cells - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  29. 29. paolo@apache.org Cell types - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  30. 30. paolo@apache.org Fills and colors - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  31. 31. Thursday, November 5, 2009 7 EXCEL Imp/Exp to XML
  32. 32. paolo@apache.org Export to XML - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  33. 33. paolo@apache.org xmlMaps.xml - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  34. 34. paolo@apache.org XML Import/Export - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  35. 35. Thursday, November 5, 2009 8 WORD Simple Doc
  36. 36. paolo@apache.org A simple doc - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  37. 37. paolo@apache.org - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  38. 38. Thursday, November 5, 2009 9 Use Case 1 Alfresco Search
  39. 39. paolo@apache.org Use Case ✴ Upload a document ✴ Detect document mimetype ✴ Extract text and metadata ✴ Create search index ✴ Search (and find) the document - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  40. 40. paolo@apache.org Without Tika ✴ Detect the document mimetype ✴ (source/target mimetype) ✴ Get the proper ContentTransformer ✴ (ContentTransformerRegistry) ✴ Tranform Doc Content to Text ✴ (PoiHssfContentTransformer) I here PO ✴ Create Lucene index - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  41. 41. paolo@apache.org With Tika - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  42. 42. paolo@apache.org Extension use case ✴ Adding support for Office Open XML documents (Office 2007+) ✴ Word 2007+ ✴ Excel 2007+ ✴ PowerPoint 2007+ - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  43. 43. paolo@apache.org POI text extractors ✴ Remember? - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  44. 44. paolo@apache.org Apache Tika (Excel) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  45. 45. paolo@apache.org Apache Tika - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  46. 46. paolo@apache.org Apache Tika (Word) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  47. 47. paolo@apache.org Apache Tika (Word) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  48. 48. Thursday, November 5, 2009 10 Use Case 2 JM Lafferty Financial Forecasting
  49. 49. paolo@apache.org Make your wb look pro- ✴ Rich text ✴ Graphics ✴ Formulas & Named Ranges ✴ Data validations ✴ Conditional formatting ✴ Cell comments - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  50. 50. Thursday, November 5, 2009
  51. 51. Thursday, November 5, 2009
  52. 52. paolo@apache.org Formula evaluation ✴ The evaluation engine enables you to calculate formula results from within a POI application ✴ Formulas may be added to your workbook by POI ✴ Evaluation is available for .xls and .xlsx - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  53. 53. paolo@apache.org Formula evaluation (continued) ✴ All arithmetic operators are implemented ✴ Over 280 Excel built in functions are supported - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  54. 54. paolo@apache.org Formula evaluation (code) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  55. 55. Thursday, November 5, 2009 11 Use Case 3: CQ5 Import
  56. 56. Thursday, November 5, 2009
  57. 57. Thursday, November 5, 2009
  58. 58. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  59. 59. paolo@apache.org getParagraphs(...) ✴ Makes use of ✴ org.apache.poi.hwpf.usermodel.Range - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  60. 60. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  61. 61. paolo@apache.org getTitle(...) ✴ Gets the first paragraph’s text - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  62. 62. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  63. 63. paolo@apache.org - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  64. 64. Thursday, November 5, 2009
  65. 65. Thursday, November 5, 2009
  66. 66. Thursday, November 5, 2009 12 Want more?
  67. 67. paolo@apache.org More Examples ✴ http://poi.apache.org/spreadsheet/examples.html - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  68. 68. paolo@apache.org Even more ✴ Get in touch ✴ http://poi.apache.org/ ✴ Get informed ✴ dev@poi.apache.org ✴ Get involved ✴ http://svn.apache.org/repos/asf/poi/trunk/ - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  69. 69. paolo@apache.org ✴ Get slides ✴ http://www.slideshare.net/paolomoz/apache-poi-recipes Thanks - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×