SlideShare a Scribd company logo
1 of 28
Learn more about office users
-- Feature usage study by document
element statistics
Rui SuYing
IBM Lotus Symphony
Agenda
● Why we need analyse office feature usage
● Feature usage study by document element
statistics
– Introduction on methodology and tool
● Statistic result sharing
● Future work
● Q&A
Why we need analyse office features usage
● Thousands of features in office application
● About 270 menu items in Office 2003, more features in
2007
● 400+ subsections in ODF spec used to describe office
features
● Large quality of features brings challenges to
office product
– UI design sometimes depends on feature usage
– Task prioritization
– Limited dev resource vs. endless requirements
Some approaches
● User Survey
– Questionnaire Survey
– Customer evaluation
– Can get special requirement from special user group
● User behaviour collection in office application
– User action recording when using office application
– Focusing on UE improving
– Can get accurate user data
– Not all users are willing to join for privacy concern
– Cross network framework needed
Feature usage study
by document element statistics
Feature usage study
by document element statistics
Sample File
Collection
Document
Element
Collection
Result
Analysis
● Large quantity of files were collected for analysis use
● We detached document elements usage from the
sample files statically
● Result analysis convert raw data to visual result
Feature usage study by document element
statistics
-- Sample File collection
● Two key points
● Large Quantity
● As random as we can
● Methods
● Google search with only file extension name as key word
● Web download one by one
● Sample File Coverage
● 1400+ spreadsheet files(xls,ods, 123)
● 1600+ document files(doc, odt, lwp)
● 400+ presentation files(ppt, odp, prz)(to be added)
● 90%+ written in English, covering multiple language(Chinese,
French, Japanese, etc)
Document element collection
-- Methodology
● We need to analyse document formats
● ODF
● MS Binary
● Lotus SmartSuite
● Parse and load sample files with different filters in IBM
Lotus Symphony/OpenOffice
● Document element collection with UNO call after
document loading
● Why not work on disk file than collecting after file
loading?
● XML parser can handle ODF format, but cannot deal with MS and
Lotus SS format
● Some information can not be collected before document formatting
Statistic Result Analysis
● Raw result – document element usage per file
Statistic Result Analysis
● Average value, maximum value, minimum value
● Element use frequency distribution analysis
● We leveraged D.Scott's method
● Find a proper bin width, get the number of document files
whose element usage is in the bin
● The number combined with the bin composes distribution
● Bin width = 3.49 * Standard deviation of sample data * the
quantity of sample data ^(-1/3)
●
Statistic Result Sharing
Presentation Documents(odp+ppt files)
0 20 40 60 80 100 120 140 160 180 200
0
20
40
60
80
100
120
Presentation Document Page Number Distribution
Distribution
● 412 sample files
● 30.71 slides as average
● Presentation files with less than 30 slides covers more than
90% usage
What presentation slides number tells us
● Load/save performance evaluation
● 90% coverage when page number is less than 30
● 95% coverage when page number is less than 70
● Page Slider Design
● Why we need a page slider in presentation
● A reference for page slider design -- 6 pages shown
in page slider as default in Symphony/7 pages
shown as default in MS PPT 2003
Spreadsheet Documents(xls+ods file)
Formula UsageIF SUM
COUNTIF LEN
CONCATENATE VLOOKUP
ROUND PROPER
STYLE PRODUCT
ROUNDDOWN AVERAGE
MAX COUNTBLANK
INDEX SQRT
SUMPRODUCT TEXT
ABS
 Top 10 formulas covers 88.31% usage
 Total 129 formula used in 1531 sample files
What Formula Usage tells us
● Assumption:
● The spreadsheet file collected from web indicates
normal users behavior
● Only 129 formulas used in more than 1500 sample files
● OpenOffice supports 371, Symphony supports 377
● A reference when we develop a light-weight
spreadsheet(web spreadsheet)
● Formula testing focus finding
● Thinking...
● If we can get enterprise user's sample file, perhaps we
can get a different result.
●
Word Processor Document
● Word Count Distribution & Analysis
●
●
●
●
●
●
●
0 20000 40000 60000 80000 100000 120000
0
200
400
600
800
1000
1200
1400
Word Count Distribution
Distribution
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
50
100
150
200
250
300
350
Word Count Distribution2
Distribution
Word Processor Document
● Page Number Distribution & Analysis
●
●
●
●
●
● Average Page Number: 10.15 pages
● Short Documents published in web
0 20 40 60 80 100 120 140 160 180 200
0
100
200
300
400
500
600
700
800
900
Page Number Distribution
Distribution
Word Processor Document
● Table usage in sample document
● Table used in 44.58% of sample documents
● Most of them are middle size
● Graphic usage in sample document
● Graphic usage in 43.41% of sample documents
Limitation of document element analysis
by file sampling
● Issues in file sampling
● Coverage
● Randomicity
● Lack of files in enterprise environment
●
● Limitation in document element collection
● Limitation of filter capability of Symphony and OpenOffice
● UNO Call quality
Future Work
Future Work
● We will go deeper in this work
● Animation usage statistic – For development priority
and UI design
● Chart usage - Chart type & Chart property usage
● Paragraph statistic – Reference for collaboration writing
and paragraph sharing
● Document element statistic for sample files
● documents for different industries and different language
● Issues: Document categorisation for industries
●
● A more smart way to collect sample file
Q & A
Reference
● MS CEIP -
http://www.microsoft.com/products/ceip/EN-
US/default.mspx
● D. Scott, “On Optimal and Data-based
Histograms,” Biometrika, vol. 66, no. 3, pp. 605–
610, 1979.
Feature usage study
by document element statistics
● Sample files in actual use are resource for feature
usage study
● Document element usage information are stored in those files
● Large quantity of sample files will tell us something
●
● We can happen to find large quality of files from
web
● Assumption: most of documents in web are for actual use● We have existing tool to be reused for the feature
analysis
● IBM Lotus Symphohy/OpenOffice have ability to open multiple
types of documents
● IBM Lotus Symphony/OpenOffice can recognize most of
document elements
Document element collection
– Symphony plugin
Java Part
Java UNO Runtime
C++ Part
C++ Uno Components
C++ UNO Runtime
Toolkit API
UNO Services
Menu/Toolbars
Views
Spreadsheet Documents(xls+ods file)
● Spreadsheet Document Sampling issues
● Different usage between enterprise users and individual users
● Sheet number distribution show
0 10 20 30 40 50 60 70
0
100
200
300
400
500
600
700
Sheet Number Distribution
Distribution

More Related Content

Similar to Digital Marketing

MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...MongoDB
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsClaudiu Coman
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
Android developer fundamentals training overview Part II
Android developer fundamentals training overview Part IIAndroid developer fundamentals training overview Part II
Android developer fundamentals training overview Part IIYoza Aprilio
 
Extract and Analyze Data from PDF File and Web : A Review
Extract and Analyze Data from PDF File and Web : A ReviewExtract and Analyze Data from PDF File and Web : A Review
Extract and Analyze Data from PDF File and Web : A ReviewIRJET Journal
 
Resume_2016Aug
Resume_2016AugResume_2016Aug
Resume_2016AugI-Fan Chu
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Intro to XPages for Administrators (DanNotes, November 28, 2012)
Intro to XPages for Administrators (DanNotes, November 28, 2012)Intro to XPages for Administrators (DanNotes, November 28, 2012)
Intro to XPages for Administrators (DanNotes, November 28, 2012)Per Henrik Lausten
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientistsStitch Fix Algorithms
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industryTommaso Teofili
 
European SharePoint Conference 2017 Summary
European SharePoint Conference 2017 SummaryEuropean SharePoint Conference 2017 Summary
European SharePoint Conference 2017 SummaryJeff ANGAMA
 
(ATS6-DEV02) Web Application Strategies
(ATS6-DEV02) Web Application Strategies(ATS6-DEV02) Web Application Strategies
(ATS6-DEV02) Web Application StrategiesBIOVIA
 
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...Sylvain Utard
 
Practical automation for beginners
Practical automation for beginnersPractical automation for beginners
Practical automation for beginnersSeoweon Yoo
 
Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingTechWell
 
Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem	Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem Linaro
 

Similar to Digital Marketing (20)

MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Android developer fundamentals training overview Part II
Android developer fundamentals training overview Part IIAndroid developer fundamentals training overview Part II
Android developer fundamentals training overview Part II
 
Extract and Analyze Data from PDF File and Web : A Review
Extract and Analyze Data from PDF File and Web : A ReviewExtract and Analyze Data from PDF File and Web : A Review
Extract and Analyze Data from PDF File and Web : A Review
 
Resume_2016Aug
Resume_2016AugResume_2016Aug
Resume_2016Aug
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Intro to XPages for Administrators (DanNotes, November 28, 2012)
Intro to XPages for Administrators (DanNotes, November 28, 2012)Intro to XPages for Administrators (DanNotes, November 28, 2012)
Intro to XPages for Administrators (DanNotes, November 28, 2012)
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
European SharePoint Conference 2017 Summary
European SharePoint Conference 2017 SummaryEuropean SharePoint Conference 2017 Summary
European SharePoint Conference 2017 Summary
 
(ATS6-DEV02) Web Application Strategies
(ATS6-DEV02) Web Application Strategies(ATS6-DEV02) Web Application Strategies
(ATS6-DEV02) Web Application Strategies
 
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
 
Practical automation for beginners
Practical automation for beginnersPractical automation for beginners
Practical automation for beginners
 
XML Performance
XML PerformanceXML Performance
XML Performance
 
Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for Testing
 
Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem	Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem
 
Evolutionary Design Solid
Evolutionary Design SolidEvolutionary Design Solid
Evolutionary Design Solid
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Digital Marketing

  • 1.
  • 2. Learn more about office users -- Feature usage study by document element statistics Rui SuYing IBM Lotus Symphony
  • 3. Agenda ● Why we need analyse office feature usage ● Feature usage study by document element statistics – Introduction on methodology and tool ● Statistic result sharing ● Future work ● Q&A
  • 4. Why we need analyse office features usage ● Thousands of features in office application ● About 270 menu items in Office 2003, more features in 2007 ● 400+ subsections in ODF spec used to describe office features ● Large quality of features brings challenges to office product – UI design sometimes depends on feature usage – Task prioritization – Limited dev resource vs. endless requirements
  • 5. Some approaches ● User Survey – Questionnaire Survey – Customer evaluation – Can get special requirement from special user group ● User behaviour collection in office application – User action recording when using office application – Focusing on UE improving – Can get accurate user data – Not all users are willing to join for privacy concern – Cross network framework needed
  • 6. Feature usage study by document element statistics
  • 7. Feature usage study by document element statistics Sample File Collection Document Element Collection Result Analysis ● Large quantity of files were collected for analysis use ● We detached document elements usage from the sample files statically ● Result analysis convert raw data to visual result
  • 8. Feature usage study by document element statistics -- Sample File collection ● Two key points ● Large Quantity ● As random as we can ● Methods ● Google search with only file extension name as key word ● Web download one by one ● Sample File Coverage ● 1400+ spreadsheet files(xls,ods, 123) ● 1600+ document files(doc, odt, lwp) ● 400+ presentation files(ppt, odp, prz)(to be added) ● 90%+ written in English, covering multiple language(Chinese, French, Japanese, etc)
  • 9. Document element collection -- Methodology ● We need to analyse document formats ● ODF ● MS Binary ● Lotus SmartSuite ● Parse and load sample files with different filters in IBM Lotus Symphony/OpenOffice ● Document element collection with UNO call after document loading ● Why not work on disk file than collecting after file loading? ● XML parser can handle ODF format, but cannot deal with MS and Lotus SS format ● Some information can not be collected before document formatting
  • 10. Statistic Result Analysis ● Raw result – document element usage per file
  • 11. Statistic Result Analysis ● Average value, maximum value, minimum value ● Element use frequency distribution analysis ● We leveraged D.Scott's method ● Find a proper bin width, get the number of document files whose element usage is in the bin ● The number combined with the bin composes distribution ● Bin width = 3.49 * Standard deviation of sample data * the quantity of sample data ^(-1/3) ●
  • 13. Presentation Documents(odp+ppt files) 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 Presentation Document Page Number Distribution Distribution ● 412 sample files ● 30.71 slides as average ● Presentation files with less than 30 slides covers more than 90% usage
  • 14. What presentation slides number tells us ● Load/save performance evaluation ● 90% coverage when page number is less than 30 ● 95% coverage when page number is less than 70 ● Page Slider Design ● Why we need a page slider in presentation ● A reference for page slider design -- 6 pages shown in page slider as default in Symphony/7 pages shown as default in MS PPT 2003
  • 15. Spreadsheet Documents(xls+ods file) Formula UsageIF SUM COUNTIF LEN CONCATENATE VLOOKUP ROUND PROPER STYLE PRODUCT ROUNDDOWN AVERAGE MAX COUNTBLANK INDEX SQRT SUMPRODUCT TEXT ABS  Top 10 formulas covers 88.31% usage  Total 129 formula used in 1531 sample files
  • 16. What Formula Usage tells us ● Assumption: ● The spreadsheet file collected from web indicates normal users behavior ● Only 129 formulas used in more than 1500 sample files ● OpenOffice supports 371, Symphony supports 377 ● A reference when we develop a light-weight spreadsheet(web spreadsheet) ● Formula testing focus finding ● Thinking... ● If we can get enterprise user's sample file, perhaps we can get a different result. ●
  • 17. Word Processor Document ● Word Count Distribution & Analysis ● ● ● ● ● ● ● 0 20000 40000 60000 80000 100000 120000 0 200 400 600 800 1000 1200 1400 Word Count Distribution Distribution 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0 50 100 150 200 250 300 350 Word Count Distribution2 Distribution
  • 18. Word Processor Document ● Page Number Distribution & Analysis ● ● ● ● ● ● Average Page Number: 10.15 pages ● Short Documents published in web 0 20 40 60 80 100 120 140 160 180 200 0 100 200 300 400 500 600 700 800 900 Page Number Distribution Distribution
  • 19. Word Processor Document ● Table usage in sample document ● Table used in 44.58% of sample documents ● Most of them are middle size ● Graphic usage in sample document ● Graphic usage in 43.41% of sample documents
  • 20. Limitation of document element analysis by file sampling ● Issues in file sampling ● Coverage ● Randomicity ● Lack of files in enterprise environment ● ● Limitation in document element collection ● Limitation of filter capability of Symphony and OpenOffice ● UNO Call quality
  • 22. Future Work ● We will go deeper in this work ● Animation usage statistic – For development priority and UI design ● Chart usage - Chart type & Chart property usage ● Paragraph statistic – Reference for collaboration writing and paragraph sharing ● Document element statistic for sample files ● documents for different industries and different language ● Issues: Document categorisation for industries ● ● A more smart way to collect sample file
  • 23. Q & A
  • 24. Reference ● MS CEIP - http://www.microsoft.com/products/ceip/EN- US/default.mspx ● D. Scott, “On Optimal and Data-based Histograms,” Biometrika, vol. 66, no. 3, pp. 605– 610, 1979.
  • 25.
  • 26. Feature usage study by document element statistics ● Sample files in actual use are resource for feature usage study ● Document element usage information are stored in those files ● Large quantity of sample files will tell us something ● ● We can happen to find large quality of files from web ● Assumption: most of documents in web are for actual use● We have existing tool to be reused for the feature analysis ● IBM Lotus Symphohy/OpenOffice have ability to open multiple types of documents ● IBM Lotus Symphony/OpenOffice can recognize most of document elements
  • 27. Document element collection – Symphony plugin Java Part Java UNO Runtime C++ Part C++ Uno Components C++ UNO Runtime Toolkit API UNO Services Menu/Toolbars Views
  • 28. Spreadsheet Documents(xls+ods file) ● Spreadsheet Document Sampling issues ● Different usage between enterprise users and individual users ● Sheet number distribution show 0 10 20 30 40 50 60 70 0 100 200 300 400 500 600 700 Sheet Number Distribution Distribution