SlideShare a Scribd company logo
1 of 69
Download to read offline
Apache POI
           Recipes
           Paolo Mottadelli - ApacheCon Oakland 2009




  http://chromasia.com
Thursday, November 5, 2009
paolo@apache.org



   my to-do list




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   POI @ Content Tech
      ✴ Document to application (and back)
               ✴ Publish data

               ✴ Build a doc from your content

      ✴ Know your documents
               ✴ Extract text

               ✴ Extract content



                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             1
                             A-B-C
paolo@apache.org




   POI modules (1): OLE2
      ✴ POIFS: reading/writing Office
               Documents
      ✴ HSSF r/w Excel Spreadsheets
      ✴ HWPF r/w Word Docs
      ✴ HSLF r/w PowerPoint Docs
      ✴ HPSF r/w property sets

                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   POI modules (2): OOXML
      ✴ XSSF: r/w OXML Excel
      ✴ XWPF: r/w OXML Word
      ✴ XSLF: r/w OXML PowerPoint




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
POI 3.5
  http://chromasia.com
Thursday, November 5, 2009
paolo@apache.org




   OOXML dev status
      ✴ XSSF: Final in POI-3.5
      ✴ XWPF: Draft (basic features)
      ✴ XSLF: Not covered (only text ext.)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   HSSF & XSSF
      ✴ Common user model interface
      ✴ User model based on existing HSSF
      ✴ Using OpenXML4J and SAX




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             2
                             Same recipe,
                             different flavours
paolo@apache.org




   Common H/XSSF access
      ✴ org.apache.poi.ss.usermodel




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Upgrading to POI-3.5
      ✴ HSSFFormulaEvaluator.CellValue
               ✴ convert from .hssf. to .ss.

      ✴ HSSFRow.MissingCellPolicy
               ✴ convert from .hssf. to .ss.

      ✴ RecordFormatException in DDF
               ✴ convert from .hssf. to .util.                           Dreadful Drawing
                                                                             Format


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             3
                             Meet
                             Office Open XML
paolo@apache.org



                                               made (very) simple
   Open XML
      ✴ XML based
               ✴ WordprocessingML

               ✴ SpreadsheetML

               ✴ PresentationML

      ✴ Stored as a package
               ✴ Open Packaging Conventions



                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Package concepts
      ✴ Package (the container)
      ✴ Part (xml file)
      ✴ Relationship
               ✴ package-relationship

               ✴ part-relationship




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Expanded package, Excel




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   WordprocessingML
      ✴ body
               ✴ paragraphs
                      ✴ runs


      ✴ properties (for runs and pars)
      ✴ styles
      ✴ headers/footers ...

                               - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   SpreadsheetML
      ✴ workbook
               ✴ worksheets
                      ✴ rows

                             ✴ cells



      ✴ styles
      ✴ formulas
      ✴ images ...
                                       - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   PresentationML
      ✴ presentation
               ✴ slides

               ✴ slides-masters

               ✴ notes-masters

      ✴ layout, animation, audio, video,
               transitions ...

                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             4
                             openxml4j
paolo@apache.org




   openXML4J
      ✴ Package, parts, rels


                                                                          "/xl/worksheets/sheet1.xml"




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             5
                             Text Extraction
paolo@apache.org




   Extractors
      ✴ POITextExtractor
               ✴ POIOLE2TextExtractor
                                                                    getT xt()
                                                                        e
               ✴ POIXMLTextExtractor
                      ✴ XSSFExcelExtractor

                      ✴ XWPFWordExtractor

                      ✴ XSLFPowerPointExtractor


      ✴ If text is all what you need

                              - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Text extraction
      ✴ made simple




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             6
                             EXCEL
                             Simple Tasks
paolo@apache.org




   New Workbook




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   New Sheet




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Creating Cells




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Cell types




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Fills and colors




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             7
                             EXCEL
                             Imp/Exp to XML
paolo@apache.org




   Export to XML




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   xmlMaps.xml




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   XML Import/Export




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             8
                             WORD
                             Simple Doc
paolo@apache.org




   A simple doc




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             9
                             Use Case 1
                             Alfresco Search
paolo@apache.org




   Use Case
      ✴ Upload a document
      ✴ Detect document mimetype
      ✴ Extract text and metadata
      ✴ Create search index
      ✴ Search (and find) the document


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Without Tika
   ✴ Detect the document mimetype
               ✴ (source/target mimetype)

      ✴ Get the proper ContentTransformer
               ✴ (ContentTransformerRegistry)

      ✴ Tranform Doc Content to Text
               ✴ (PoiHssfContentTransformer) I here
                                          PO
      ✴ Create Lucene index
                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   With Tika




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Extension use case
      ✴ Adding support for Office Open
               XML documents (Office 2007+)
               ✴ Word 2007+

               ✴ Excel 2007+

               ✴ PowerPoint 2007+




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   POI text extractors
      ✴ Remember?




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika (Excel)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika (Word)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika (Word)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             10
                             Use Case 2
                             JM Lafferty
                             Financial Forecasting
paolo@apache.org




   Make your wb look pro-
      ✴ Rich text
      ✴ Graphics
      ✴ Formulas & Named Ranges
      ✴ Data validations
      ✴ Conditional formatting
      ✴ Cell comments
                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation
      ✴ The evaluation engine enables you
               to calculate formula results from
               within a POI application
      ✴ Formulas may be added to your
               workbook by POI
      ✴ Evaluation is available for .xls
               and .xlsx
                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation (continued)
      ✴ All arithmetic operators are
               implemented
      ✴ Over 280 Excel built in functions
               are supported




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation (code)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             11
                             Use Case 3:
                             CQ5 Import
Thursday, November 5, 2009
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   getParagraphs(...)
      ✴ Makes use of
               ✴ org.apache.poi.hwpf.usermodel.Range




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   getTitle(...)
      ✴ Gets the first paragraph’s text




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
                             12
                             Want more?
paolo@apache.org




   More Examples
      ✴ http://poi.apache.org/spreadsheet/examples.html




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Even more
      ✴ Get in touch
               ✴ http://poi.apache.org/

      ✴ Get informed
               ✴ dev@poi.apache.org

      ✴ Get involved
               ✴ http://svn.apache.org/repos/asf/poi/trunk/


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




      ✴ Get slides
               ✴ http://www.slideshare.net/paolomoz/apache-poi-recipes




   Thanks


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009

More Related Content

Viewers also liked

Intro to Apache Spark - Lab
Intro to Apache Spark - LabIntro to Apache Spark - Lab
Intro to Apache Spark - LabMammoth Data
 
Catalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesCatalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesSpark Controles
 
Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache TikaPaolo Mottadelli
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkDataWorks Summit
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learninghoondong kim
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamFlink Forward
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMeeraj Kunnumpurath
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 

Viewers also liked (20)

Intro to Apache Spark - Lab
Intro to Apache Spark - LabIntro to Apache Spark - Lab
Intro to Apache Spark - Lab
 
Catalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesCatalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark Controles
 
Introdução ao desenvolvimento de jogos com unity3d
Introdução ao desenvolvimento de jogos com unity3dIntrodução ao desenvolvimento de jogos com unity3d
Introdução ao desenvolvimento de jogos com unity3d
 
Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache Tika
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learning
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 

More from Paolo Mottadelli

Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014Paolo Mottadelli
 
Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014Paolo Mottadelli
 
Evolve13 cq-commerce-framework
Evolve13 cq-commerce-frameworkEvolve13 cq-commerce-framework
Evolve13 cq-commerce-frameworkPaolo Mottadelli
 
AEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkAEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkPaolo Mottadelli
 
Adobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisAdobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisPaolo Mottadelli
 
Jira as a Project Management Tool
Jira as a Project Management ToolJira as a Project Management Tool
Jira as a Project Management ToolPaolo Mottadelli
 
Interoperability at Apache Software Foundation
Interoperability at Apache Software FoundationInteroperability at Apache Software Foundation
Interoperability at Apache Software FoundationPaolo Mottadelli
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaPaolo Mottadelli
 

More from Paolo Mottadelli (11)

Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014
 
Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014
 
Evolve13 cq-commerce-framework
Evolve13 cq-commerce-frameworkEvolve13 cq-commerce-framework
Evolve13 cq-commerce-framework
 
AEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkAEM (CQ) eCommerce Framework
AEM (CQ) eCommerce Framework
 
Adobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisAdobe AEM Commerce with hybris
Adobe AEM Commerce with hybris
 
Java standards in WCM
Java standards in WCMJava standards in WCM
Java standards in WCM
 
JCR and Sling Quick Dive
JCR and Sling Quick DiveJCR and Sling Quick Dive
JCR and Sling Quick Dive
 
Open Development
Open DevelopmentOpen Development
Open Development
 
Jira as a Project Management Tool
Jira as a Project Management ToolJira as a Project Management Tool
Jira as a Project Management Tool
 
Interoperability at Apache Software Foundation
Interoperability at Apache Software FoundationInteroperability at Apache Software Foundation
Interoperability at Apache Software Foundation
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache Tika
 

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Apache POI Recipes for Excel, Word and PowerPoint Documents

  • 1. Apache POI Recipes Paolo Mottadelli - ApacheCon Oakland 2009 http://chromasia.com Thursday, November 5, 2009
  • 2. paolo@apache.org my to-do list - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 3. paolo@apache.org POI @ Content Tech ✴ Document to application (and back) ✴ Publish data ✴ Build a doc from your content ✴ Know your documents ✴ Extract text ✴ Extract content - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 4. Thursday, November 5, 2009 1 A-B-C
  • 5. paolo@apache.org POI modules (1): OLE2 ✴ POIFS: reading/writing Office Documents ✴ HSSF r/w Excel Spreadsheets ✴ HWPF r/w Word Docs ✴ HSLF r/w PowerPoint Docs ✴ HPSF r/w property sets - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 6. paolo@apache.org POI modules (2): OOXML ✴ XSSF: r/w OXML Excel ✴ XWPF: r/w OXML Word ✴ XSLF: r/w OXML PowerPoint - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 7. POI 3.5 http://chromasia.com Thursday, November 5, 2009
  • 8. paolo@apache.org OOXML dev status ✴ XSSF: Final in POI-3.5 ✴ XWPF: Draft (basic features) ✴ XSLF: Not covered (only text ext.) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 9. paolo@apache.org HSSF & XSSF ✴ Common user model interface ✴ User model based on existing HSSF ✴ Using OpenXML4J and SAX - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 10. Thursday, November 5, 2009 2 Same recipe, different flavours
  • 11. paolo@apache.org Common H/XSSF access ✴ org.apache.poi.ss.usermodel - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 12. paolo@apache.org Upgrading to POI-3.5 ✴ HSSFFormulaEvaluator.CellValue ✴ convert from .hssf. to .ss. ✴ HSSFRow.MissingCellPolicy ✴ convert from .hssf. to .ss. ✴ RecordFormatException in DDF ✴ convert from .hssf. to .util. Dreadful Drawing Format - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 13. Thursday, November 5, 2009 3 Meet Office Open XML
  • 14. paolo@apache.org made (very) simple Open XML ✴ XML based ✴ WordprocessingML ✴ SpreadsheetML ✴ PresentationML ✴ Stored as a package ✴ Open Packaging Conventions - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 15. paolo@apache.org Package concepts ✴ Package (the container) ✴ Part (xml file) ✴ Relationship ✴ package-relationship ✴ part-relationship - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 16. paolo@apache.org Expanded package, Excel - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 17. paolo@apache.org WordprocessingML ✴ body ✴ paragraphs ✴ runs ✴ properties (for runs and pars) ✴ styles ✴ headers/footers ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 18. paolo@apache.org SpreadsheetML ✴ workbook ✴ worksheets ✴ rows ✴ cells ✴ styles ✴ formulas ✴ images ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 19. paolo@apache.org PresentationML ✴ presentation ✴ slides ✴ slides-masters ✴ notes-masters ✴ layout, animation, audio, video, transitions ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 20. Thursday, November 5, 2009 4 openxml4j
  • 21. paolo@apache.org openXML4J ✴ Package, parts, rels "/xl/worksheets/sheet1.xml" - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 22. Thursday, November 5, 2009 5 Text Extraction
  • 23. paolo@apache.org Extractors ✴ POITextExtractor ✴ POIOLE2TextExtractor getT xt() e ✴ POIXMLTextExtractor ✴ XSSFExcelExtractor ✴ XWPFWordExtractor ✴ XSLFPowerPointExtractor ✴ If text is all what you need - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 24. paolo@apache.org Text extraction ✴ made simple - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 25. Thursday, November 5, 2009 6 EXCEL Simple Tasks
  • 26. paolo@apache.org New Workbook - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 27. paolo@apache.org New Sheet - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 28. paolo@apache.org Creating Cells - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 29. paolo@apache.org Cell types - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 30. paolo@apache.org Fills and colors - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 31. Thursday, November 5, 2009 7 EXCEL Imp/Exp to XML
  • 32. paolo@apache.org Export to XML - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 33. paolo@apache.org xmlMaps.xml - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 34. paolo@apache.org XML Import/Export - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 35. Thursday, November 5, 2009 8 WORD Simple Doc
  • 36. paolo@apache.org A simple doc - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 37. paolo@apache.org - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 38. Thursday, November 5, 2009 9 Use Case 1 Alfresco Search
  • 39. paolo@apache.org Use Case ✴ Upload a document ✴ Detect document mimetype ✴ Extract text and metadata ✴ Create search index ✴ Search (and find) the document - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 40. paolo@apache.org Without Tika ✴ Detect the document mimetype ✴ (source/target mimetype) ✴ Get the proper ContentTransformer ✴ (ContentTransformerRegistry) ✴ Tranform Doc Content to Text ✴ (PoiHssfContentTransformer) I here PO ✴ Create Lucene index - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 41. paolo@apache.org With Tika - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 42. paolo@apache.org Extension use case ✴ Adding support for Office Open XML documents (Office 2007+) ✴ Word 2007+ ✴ Excel 2007+ ✴ PowerPoint 2007+ - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 43. paolo@apache.org POI text extractors ✴ Remember? - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 44. paolo@apache.org Apache Tika (Excel) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 45. paolo@apache.org Apache Tika - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 46. paolo@apache.org Apache Tika (Word) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 47. paolo@apache.org Apache Tika (Word) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 48. Thursday, November 5, 2009 10 Use Case 2 JM Lafferty Financial Forecasting
  • 49. paolo@apache.org Make your wb look pro- ✴ Rich text ✴ Graphics ✴ Formulas & Named Ranges ✴ Data validations ✴ Conditional formatting ✴ Cell comments - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 52. paolo@apache.org Formula evaluation ✴ The evaluation engine enables you to calculate formula results from within a POI application ✴ Formulas may be added to your workbook by POI ✴ Evaluation is available for .xls and .xlsx - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 53. paolo@apache.org Formula evaluation (continued) ✴ All arithmetic operators are implemented ✴ Over 280 Excel built in functions are supported - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 54. paolo@apache.org Formula evaluation (code) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 55. Thursday, November 5, 2009 11 Use Case 3: CQ5 Import
  • 58. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 59. paolo@apache.org getParagraphs(...) ✴ Makes use of ✴ org.apache.poi.hwpf.usermodel.Range - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 60. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 61. paolo@apache.org getTitle(...) ✴ Gets the first paragraph’s text - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 62. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 63. paolo@apache.org - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 66. Thursday, November 5, 2009 12 Want more?
  • 67. paolo@apache.org More Examples ✴ http://poi.apache.org/spreadsheet/examples.html - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 68. paolo@apache.org Even more ✴ Get in touch ✴ http://poi.apache.org/ ✴ Get informed ✴ dev@poi.apache.org ✴ Get involved ✴ http://svn.apache.org/repos/asf/poi/trunk/ - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 69. paolo@apache.org ✴ Get slides ✴ http://www.slideshare.net/paolomoz/apache-poi-recipes Thanks - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009