SlideShare a Scribd company logo
1 of 69
Download to read offline
Apache POI
           Recipes
           Paolo Mottadelli - ApacheCon Oakland 2009




  http://chromasia.com
Thursday, November 5, 2009
paolo@apache.org



   my to-do list




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   POI @ Content Tech
      ✴ Document to application (and back)
               ✴ Publish data

               ✴ Build a doc from your content

      ✴ Know your documents
               ✴ Extract text

               ✴ Extract content



                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             1
                             A-B-C
paolo@apache.org




   POI modules (1): OLE2
      ✴ POIFS: reading/writing Office
               Documents
      ✴ HSSF r/w Excel Spreadsheets
      ✴ HWPF r/w Word Docs
      ✴ HSLF r/w PowerPoint Docs
      ✴ HPSF r/w property sets

                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   POI modules (2): OOXML
      ✴ XSSF: r/w OXML Excel
      ✴ XWPF: r/w OXML Word
      ✴ XSLF: r/w OXML PowerPoint




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
POI 3.5
  http://chromasia.com
Thursday, November 5, 2009
paolo@apache.org




   OOXML dev status
      ✴ XSSF: Final in POI-3.5
      ✴ XWPF: Draft (basic features)
      ✴ XSLF: Not covered (only text ext.)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   HSSF & XSSF
      ✴ Common user model interface
      ✴ User model based on existing HSSF
      ✴ Using OpenXML4J and SAX




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             2
                             Same recipe,
                             different flavours
paolo@apache.org




   Common H/XSSF access
      ✴ org.apache.poi.ss.usermodel




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Upgrading to POI-3.5
      ✴ HSSFFormulaEvaluator.CellValue
               ✴ convert from .hssf. to .ss.

      ✴ HSSFRow.MissingCellPolicy
               ✴ convert from .hssf. to .ss.

      ✴ RecordFormatException in DDF
               ✴ convert from .hssf. to .util.                           Dreadful Drawing
                                                                             Format


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             3
                             Meet
                             Office Open XML
paolo@apache.org



                                               made (very) simple
   Open XML
      ✴ XML based
               ✴ WordprocessingML

               ✴ SpreadsheetML

               ✴ PresentationML

      ✴ Stored as a package
               ✴ Open Packaging Conventions



                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Package concepts
      ✴ Package (the container)
      ✴ Part (xml file)
      ✴ Relationship
               ✴ package-relationship

               ✴ part-relationship




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Expanded package, Excel




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   WordprocessingML
      ✴ body
               ✴ paragraphs
                      ✴ runs


      ✴ properties (for runs and pars)
      ✴ styles
      ✴ headers/footers ...

                               - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   SpreadsheetML
      ✴ workbook
               ✴ worksheets
                      ✴ rows

                             ✴ cells



      ✴ styles
      ✴ formulas
      ✴ images ...
                                       - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   PresentationML
      ✴ presentation
               ✴ slides

               ✴ slides-masters

               ✴ notes-masters

      ✴ layout, animation, audio, video,
               transitions ...

                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             4
                             openxml4j
paolo@apache.org




   openXML4J
      ✴ Package, parts, rels


                                                                          "/xl/worksheets/sheet1.xml"




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             5
                             Text Extraction
paolo@apache.org




   Extractors
      ✴ POITextExtractor
               ✴ POIOLE2TextExtractor
                                                                    getT xt()
                                                                        e
               ✴ POIXMLTextExtractor
                      ✴ XSSFExcelExtractor

                      ✴ XWPFWordExtractor

                      ✴ XSLFPowerPointExtractor


      ✴ If text is all what you need

                              - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Text extraction
      ✴ made simple




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             6
                             EXCEL
                             Simple Tasks
paolo@apache.org




   New Workbook




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   New Sheet




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Creating Cells




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Cell types




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Fills and colors




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             7
                             EXCEL
                             Imp/Exp to XML
paolo@apache.org




   Export to XML




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   xmlMaps.xml




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   XML Import/Export




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             8
                             WORD
                             Simple Doc
paolo@apache.org




   A simple doc




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             9
                             Use Case 1
                             Alfresco Search
paolo@apache.org




   Use Case
      ✴ Upload a document
      ✴ Detect document mimetype
      ✴ Extract text and metadata
      ✴ Create search index
      ✴ Search (and find) the document


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Without Tika
   ✴ Detect the document mimetype
               ✴ (source/target mimetype)

      ✴ Get the proper ContentTransformer
               ✴ (ContentTransformerRegistry)

      ✴ Tranform Doc Content to Text
               ✴ (PoiHssfContentTransformer) I here
                                          PO
      ✴ Create Lucene index
                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   With Tika




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Extension use case
      ✴ Adding support for Office Open
               XML documents (Office 2007+)
               ✴ Word 2007+

               ✴ Excel 2007+

               ✴ PowerPoint 2007+




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   POI text extractors
      ✴ Remember?




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika (Excel)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika (Word)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika (Word)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             10
                             Use Case 2
                             JM Lafferty
                             Financial Forecasting
paolo@apache.org




   Make your wb look pro-
      ✴ Rich text
      ✴ Graphics
      ✴ Formulas & Named Ranges
      ✴ Data validations
      ✴ Conditional formatting
      ✴ Cell comments
                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation
      ✴ The evaluation engine enables you
               to calculate formula results from
               within a POI application
      ✴ Formulas may be added to your
               workbook by POI
      ✴ Evaluation is available for .xls
               and .xlsx
                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation (continued)
      ✴ All arithmetic operators are
               implemented
      ✴ Over 280 Excel built in functions
               are supported




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation (code)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             11
                             Use Case 3:
                             CQ5 Import
Thursday, November 5, 2009
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   getParagraphs(...)
      ✴ Makes use of
               ✴ org.apache.poi.hwpf.usermodel.Range




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   getTitle(...)
      ✴ Gets the first paragraph’s text




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
                             12
                             Want more?
paolo@apache.org




   More Examples
      ✴ http://poi.apache.org/spreadsheet/examples.html




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Even more
      ✴ Get in touch
               ✴ http://poi.apache.org/

      ✴ Get informed
               ✴ dev@poi.apache.org

      ✴ Get involved
               ✴ http://svn.apache.org/repos/asf/poi/trunk/


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




      ✴ Get slides
               ✴ http://www.slideshare.net/paolomoz/apache-poi-recipes




   Thanks


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009

More Related Content

Viewers also liked

Intro to Apache Spark - Lab
Intro to Apache Spark - LabIntro to Apache Spark - Lab
Intro to Apache Spark - LabMammoth Data
 
Catalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesCatalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesSpark Controles
 
Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache TikaPaolo Mottadelli
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkDataWorks Summit
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learninghoondong kim
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamFlink Forward
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMeeraj Kunnumpurath
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 

Viewers also liked (20)

Intro to Apache Spark - Lab
Intro to Apache Spark - LabIntro to Apache Spark - Lab
Intro to Apache Spark - Lab
 
Catalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesCatalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark Controles
 
Introdução ao desenvolvimento de jogos com unity3d
Introdução ao desenvolvimento de jogos com unity3dIntrodução ao desenvolvimento de jogos com unity3d
Introdução ao desenvolvimento de jogos com unity3d
 
Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache Tika
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learning
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 

More from Paolo Mottadelli

Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014Paolo Mottadelli
 
Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014Paolo Mottadelli
 
Evolve13 cq-commerce-framework
Evolve13 cq-commerce-frameworkEvolve13 cq-commerce-framework
Evolve13 cq-commerce-frameworkPaolo Mottadelli
 
AEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkAEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkPaolo Mottadelli
 
Adobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisAdobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisPaolo Mottadelli
 
Jira as a Project Management Tool
Jira as a Project Management ToolJira as a Project Management Tool
Jira as a Project Management ToolPaolo Mottadelli
 
Interoperability at Apache Software Foundation
Interoperability at Apache Software FoundationInteroperability at Apache Software Foundation
Interoperability at Apache Software FoundationPaolo Mottadelli
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaPaolo Mottadelli
 

More from Paolo Mottadelli (11)

Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014
 
Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014
 
Evolve13 cq-commerce-framework
Evolve13 cq-commerce-frameworkEvolve13 cq-commerce-framework
Evolve13 cq-commerce-framework
 
AEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkAEM (CQ) eCommerce Framework
AEM (CQ) eCommerce Framework
 
Adobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisAdobe AEM Commerce with hybris
Adobe AEM Commerce with hybris
 
Java standards in WCM
Java standards in WCMJava standards in WCM
Java standards in WCM
 
JCR and Sling Quick Dive
JCR and Sling Quick DiveJCR and Sling Quick Dive
JCR and Sling Quick Dive
 
Open Development
Open DevelopmentOpen Development
Open Development
 
Jira as a Project Management Tool
Jira as a Project Management ToolJira as a Project Management Tool
Jira as a Project Management Tool
 
Interoperability at Apache Software Foundation
Interoperability at Apache Software FoundationInteroperability at Apache Software Foundation
Interoperability at Apache Software Foundation
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache Tika
 

Recently uploaded

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Apache Poi Recipes

  • 1. Apache POI Recipes Paolo Mottadelli - ApacheCon Oakland 2009 http://chromasia.com Thursday, November 5, 2009
  • 2. paolo@apache.org my to-do list - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 3. paolo@apache.org POI @ Content Tech ✴ Document to application (and back) ✴ Publish data ✴ Build a doc from your content ✴ Know your documents ✴ Extract text ✴ Extract content - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 4. Thursday, November 5, 2009 1 A-B-C
  • 5. paolo@apache.org POI modules (1): OLE2 ✴ POIFS: reading/writing Office Documents ✴ HSSF r/w Excel Spreadsheets ✴ HWPF r/w Word Docs ✴ HSLF r/w PowerPoint Docs ✴ HPSF r/w property sets - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 6. paolo@apache.org POI modules (2): OOXML ✴ XSSF: r/w OXML Excel ✴ XWPF: r/w OXML Word ✴ XSLF: r/w OXML PowerPoint - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 7. POI 3.5 http://chromasia.com Thursday, November 5, 2009
  • 8. paolo@apache.org OOXML dev status ✴ XSSF: Final in POI-3.5 ✴ XWPF: Draft (basic features) ✴ XSLF: Not covered (only text ext.) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 9. paolo@apache.org HSSF & XSSF ✴ Common user model interface ✴ User model based on existing HSSF ✴ Using OpenXML4J and SAX - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 10. Thursday, November 5, 2009 2 Same recipe, different flavours
  • 11. paolo@apache.org Common H/XSSF access ✴ org.apache.poi.ss.usermodel - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 12. paolo@apache.org Upgrading to POI-3.5 ✴ HSSFFormulaEvaluator.CellValue ✴ convert from .hssf. to .ss. ✴ HSSFRow.MissingCellPolicy ✴ convert from .hssf. to .ss. ✴ RecordFormatException in DDF ✴ convert from .hssf. to .util. Dreadful Drawing Format - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 13. Thursday, November 5, 2009 3 Meet Office Open XML
  • 14. paolo@apache.org made (very) simple Open XML ✴ XML based ✴ WordprocessingML ✴ SpreadsheetML ✴ PresentationML ✴ Stored as a package ✴ Open Packaging Conventions - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 15. paolo@apache.org Package concepts ✴ Package (the container) ✴ Part (xml file) ✴ Relationship ✴ package-relationship ✴ part-relationship - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 16. paolo@apache.org Expanded package, Excel - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 17. paolo@apache.org WordprocessingML ✴ body ✴ paragraphs ✴ runs ✴ properties (for runs and pars) ✴ styles ✴ headers/footers ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 18. paolo@apache.org SpreadsheetML ✴ workbook ✴ worksheets ✴ rows ✴ cells ✴ styles ✴ formulas ✴ images ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 19. paolo@apache.org PresentationML ✴ presentation ✴ slides ✴ slides-masters ✴ notes-masters ✴ layout, animation, audio, video, transitions ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 20. Thursday, November 5, 2009 4 openxml4j
  • 21. paolo@apache.org openXML4J ✴ Package, parts, rels "/xl/worksheets/sheet1.xml" - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 22. Thursday, November 5, 2009 5 Text Extraction
  • 23. paolo@apache.org Extractors ✴ POITextExtractor ✴ POIOLE2TextExtractor getT xt() e ✴ POIXMLTextExtractor ✴ XSSFExcelExtractor ✴ XWPFWordExtractor ✴ XSLFPowerPointExtractor ✴ If text is all what you need - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 24. paolo@apache.org Text extraction ✴ made simple - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 25. Thursday, November 5, 2009 6 EXCEL Simple Tasks
  • 26. paolo@apache.org New Workbook - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 27. paolo@apache.org New Sheet - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 28. paolo@apache.org Creating Cells - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 29. paolo@apache.org Cell types - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 30. paolo@apache.org Fills and colors - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 31. Thursday, November 5, 2009 7 EXCEL Imp/Exp to XML
  • 32. paolo@apache.org Export to XML - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 33. paolo@apache.org xmlMaps.xml - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 34. paolo@apache.org XML Import/Export - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 35. Thursday, November 5, 2009 8 WORD Simple Doc
  • 36. paolo@apache.org A simple doc - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 37. paolo@apache.org - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 38. Thursday, November 5, 2009 9 Use Case 1 Alfresco Search
  • 39. paolo@apache.org Use Case ✴ Upload a document ✴ Detect document mimetype ✴ Extract text and metadata ✴ Create search index ✴ Search (and find) the document - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 40. paolo@apache.org Without Tika ✴ Detect the document mimetype ✴ (source/target mimetype) ✴ Get the proper ContentTransformer ✴ (ContentTransformerRegistry) ✴ Tranform Doc Content to Text ✴ (PoiHssfContentTransformer) I here PO ✴ Create Lucene index - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 41. paolo@apache.org With Tika - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 42. paolo@apache.org Extension use case ✴ Adding support for Office Open XML documents (Office 2007+) ✴ Word 2007+ ✴ Excel 2007+ ✴ PowerPoint 2007+ - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 43. paolo@apache.org POI text extractors ✴ Remember? - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 44. paolo@apache.org Apache Tika (Excel) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 45. paolo@apache.org Apache Tika - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 46. paolo@apache.org Apache Tika (Word) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 47. paolo@apache.org Apache Tika (Word) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 48. Thursday, November 5, 2009 10 Use Case 2 JM Lafferty Financial Forecasting
  • 49. paolo@apache.org Make your wb look pro- ✴ Rich text ✴ Graphics ✴ Formulas & Named Ranges ✴ Data validations ✴ Conditional formatting ✴ Cell comments - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 52. paolo@apache.org Formula evaluation ✴ The evaluation engine enables you to calculate formula results from within a POI application ✴ Formulas may be added to your workbook by POI ✴ Evaluation is available for .xls and .xlsx - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 53. paolo@apache.org Formula evaluation (continued) ✴ All arithmetic operators are implemented ✴ Over 280 Excel built in functions are supported - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 54. paolo@apache.org Formula evaluation (code) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 55. Thursday, November 5, 2009 11 Use Case 3: CQ5 Import
  • 58. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 59. paolo@apache.org getParagraphs(...) ✴ Makes use of ✴ org.apache.poi.hwpf.usermodel.Range - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 60. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 61. paolo@apache.org getTitle(...) ✴ Gets the first paragraph’s text - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 62. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 63. paolo@apache.org - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 66. Thursday, November 5, 2009 12 Want more?
  • 67. paolo@apache.org More Examples ✴ http://poi.apache.org/spreadsheet/examples.html - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 68. paolo@apache.org Even more ✴ Get in touch ✴ http://poi.apache.org/ ✴ Get informed ✴ dev@poi.apache.org ✴ Get involved ✴ http://svn.apache.org/repos/asf/poi/trunk/ - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 69. paolo@apache.org ✴ Get slides ✴ http://www.slideshare.net/paolomoz/apache-poi-recipes Thanks - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009