SlideShare a Scribd company logo
1 of 28
Learn more about office users
-- Feature usage study by document
element statistics
Rui SuYing
IBM Lotus Symphony
Agenda
● Why we need analyse office feature usage
● Feature usage study by document element
statistics
– Introduction on methodology and tool
● Statistic result sharing
● Future work
● Q&A
Why we need analyse office features usage
● Thousands of features in office application
● About 270 menu items in Office 2003, more features in
2007
● 400+ subsections in ODF spec used to describe office
features
● Large quality of features brings challenges to
office product
– UI design sometimes depends on feature usage
– Task prioritization
– Limited dev resource vs. endless requirements
Some approaches
● User Survey
– Questionnaire Survey
– Customer evaluation
– Can get special requirement from special user group
● User behaviour collection in office application
– User action recording when using office application
– Focusing on UE improving
– Can get accurate user data
– Not all users are willing to join for privacy concern
– Cross network framework needed
Feature usage study
by document element statistics
Feature usage study
by document element statistics
Sample File
Collection
Document
Element
Collection
Result
Analysis
● Large quantity of files were collected for analysis use
● We detached document elements usage from the
sample files statically
● Result analysis convert raw data to visual result
Feature usage study by document element
statistics
-- Sample File collection
● Two key points
● Large Quantity
● As random as we can
● Methods
● Google search with only file extension name as key word
● Web download one by one
● Sample File Coverage
● 1400+ spreadsheet files(xls,ods, 123)
● 1600+ document files(doc, odt, lwp)
● 400+ presentation files(ppt, odp, prz)(to be added)
● 90%+ written in English, covering multiple language(Chinese,
French, Japanese, etc)
Document element collection
-- Methodology
● We need to analyse document formats
● ODF
● MS Binary
● Lotus SmartSuite
● Parse and load sample files with different filters in IBM
Lotus Symphony/OpenOffice
● Document element collection with UNO call after
document loading
● Why not work on disk file than collecting after file
loading?
● XML parser can handle ODF format, but cannot deal with MS and
Lotus SS format
● Some information can not be collected before document formatting
Statistic Result Analysis
● Raw result – document element usage per file
Statistic Result Analysis
● Average value, maximum value, minimum value
● Element use frequency distribution analysis
● We leveraged D.Scott's method
● Find a proper bin width, get the number of document files
whose element usage is in the bin
● The number combined with the bin composes distribution
● Bin width = 3.49 * Standard deviation of sample data * the
quantity of sample data ^(-1/3)
●
Statistic Result Sharing
Presentation Documents(odp+ppt files)
0 20 40 60 80 100 120 140 160 180 200
0
20
40
60
80
100
120
Presentation Document Page Number Distribution
Distribution
● 412 sample files
● 30.71 slides as average
● Presentation files with less than 30 slides covers more than
90% usage
What presentation slides number tells us
● Load/save performance evaluation
● 90% coverage when page number is less than 30
● 95% coverage when page number is less than 70
● Page Slider Design
● Why we need a page slider in presentation
● A reference for page slider design -- 6 pages shown
in page slider as default in Symphony/7 pages
shown as default in MS PPT 2003
Spreadsheet Documents(xls+ods file)
Formula UsageIF SUM
COUNTIF LEN
CONCATENATE VLOOKUP
ROUND PROPER
STYLE PRODUCT
ROUNDDOWN AVERAGE
MAX COUNTBLANK
INDEX SQRT
SUMPRODUCT TEXT
ABS
 Top 10 formulas covers 88.31% usage
 Total 129 formula used in 1531 sample files
What Formula Usage tells us
● Assumption:
● The spreadsheet file collected from web indicates
normal users behavior
● Only 129 formulas used in more than 1500 sample files
● OpenOffice supports 371, Symphony supports 377
● A reference when we develop a light-weight
spreadsheet(web spreadsheet)
● Formula testing focus finding
● Thinking...
● If we can get enterprise user's sample file, perhaps we
can get a different result.
●
Word Processor Document
● Word Count Distribution & Analysis
●
●
●
●
●
●
●
0 20000 40000 60000 80000 100000 120000
0
200
400
600
800
1000
1200
1400
Word Count Distribution
Distribution
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
50
100
150
200
250
300
350
Word Count Distribution2
Distribution
Word Processor Document
● Page Number Distribution & Analysis
●
●
●
●
●
● Average Page Number: 10.15 pages
● Short Documents published in web
0 20 40 60 80 100 120 140 160 180 200
0
100
200
300
400
500
600
700
800
900
Page Number Distribution
Distribution
Word Processor Document
● Table usage in sample document
● Table used in 44.58% of sample documents
● Most of them are middle size
● Graphic usage in sample document
● Graphic usage in 43.41% of sample documents
Limitation of document element analysis
by file sampling
● Issues in file sampling
● Coverage
● Randomicity
● Lack of files in enterprise environment
●
● Limitation in document element collection
● Limitation of filter capability of Symphony and OpenOffice
● UNO Call quality
Future Work
Future Work
● We will go deeper in this work
● Animation usage statistic – For development priority
and UI design
● Chart usage - Chart type & Chart property usage
● Paragraph statistic – Reference for collaboration writing
and paragraph sharing
● Document element statistic for sample files
● documents for different industries and different language
● Issues: Document categorisation for industries
●
● A more smart way to collect sample file
Q & A
Reference
● MS CEIP -
http://www.microsoft.com/products/ceip/EN-
US/default.mspx
● D. Scott, “On Optimal and Data-based
Histograms,” Biometrika, vol. 66, no. 3, pp. 605–
610, 1979.
Feature usage study
by document element statistics
● Sample files in actual use are resource for feature
usage study
● Document element usage information are stored in those files
● Large quantity of sample files will tell us something
●
● We can happen to find large quality of files from
web
● Assumption: most of documents in web are for actual use● We have existing tool to be reused for the feature
analysis
● IBM Lotus Symphohy/OpenOffice have ability to open multiple
types of documents
● IBM Lotus Symphony/OpenOffice can recognize most of
document elements
Document element collection
– Symphony plugin
Java Part
Java UNO Runtime
C++ Part
C++ Uno Components
C++ UNO Runtime
Toolkit API
UNO Services
Menu/Toolbars
Views
Spreadsheet Documents(xls+ods file)
● Spreadsheet Document Sampling issues
● Different usage between enterprise users and individual users
● Sheet number distribution show
0 10 20 30 40 50 60 70
0
100
200
300
400
500
600
700
Sheet Number Distribution
Distribution

More Related Content

Viewers also liked

onboarding-in-a-box-v03-06
onboarding-in-a-box-v03-06onboarding-in-a-box-v03-06
onboarding-in-a-box-v03-06Nitin Sharma
 
10 Real Estate Trends for the Future
10 Real Estate Trends for the Future10 Real Estate Trends for the Future
10 Real Estate Trends for the FuturePeter Bubel
 
Stout 2016 Curriculum Vitae 1
Stout 2016 Curriculum Vitae 1Stout 2016 Curriculum Vitae 1
Stout 2016 Curriculum Vitae 1Bill Stout
 

Viewers also liked (7)

onboarding-in-a-box-v03-06
onboarding-in-a-box-v03-06onboarding-in-a-box-v03-06
onboarding-in-a-box-v03-06
 
10 Real Estate Trends for the Future
10 Real Estate Trends for the Future10 Real Estate Trends for the Future
10 Real Estate Trends for the Future
 
CUENTACUENTOS.LOS TRES CERDITOS
CUENTACUENTOS.LOS TRES CERDITOSCUENTACUENTOS.LOS TRES CERDITOS
CUENTACUENTOS.LOS TRES CERDITOS
 
La globalización
La globalizaciónLa globalización
La globalización
 
Directorio minero
Directorio mineroDirectorio minero
Directorio minero
 
Caída del bloque socialista
Caída del bloque socialistaCaída del bloque socialista
Caída del bloque socialista
 
Stout 2016 Curriculum Vitae 1
Stout 2016 Curriculum Vitae 1Stout 2016 Curriculum Vitae 1
Stout 2016 Curriculum Vitae 1
 

Similar to digital marketing

Natural Language Processing for Annual Report in Australia
Natural Language Processing for Annual Report in AustraliaNatural Language Processing for Annual Report in Australia
Natural Language Processing for Annual Report in AustraliaMalcolm Yang
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...MongoDB
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsClaudiu Coman
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
Android developer fundamentals training overview Part II
Android developer fundamentals training overview Part IIAndroid developer fundamentals training overview Part II
Android developer fundamentals training overview Part IIYoza Aprilio
 
Extract and Analyze Data from PDF File and Web : A Review
Extract and Analyze Data from PDF File and Web : A ReviewExtract and Analyze Data from PDF File and Web : A Review
Extract and Analyze Data from PDF File and Web : A ReviewIRJET Journal
 
Resume_2016Aug
Resume_2016AugResume_2016Aug
Resume_2016AugI-Fan Chu
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Intro to XPages for Administrators (DanNotes, November 28, 2012)
Intro to XPages for Administrators (DanNotes, November 28, 2012)Intro to XPages for Administrators (DanNotes, November 28, 2012)
Intro to XPages for Administrators (DanNotes, November 28, 2012)Per Henrik Lausten
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientistsStitch Fix Algorithms
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industryTommaso Teofili
 
European SharePoint Conference 2017 Summary
European SharePoint Conference 2017 SummaryEuropean SharePoint Conference 2017 Summary
European SharePoint Conference 2017 SummaryJeff ANGAMA
 
(ATS6-DEV02) Web Application Strategies
(ATS6-DEV02) Web Application Strategies(ATS6-DEV02) Web Application Strategies
(ATS6-DEV02) Web Application StrategiesBIOVIA
 
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...Sylvain Utard
 
Practical automation for beginners
Practical automation for beginnersPractical automation for beginners
Practical automation for beginnersSeoweon Yoo
 
Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingTechWell
 

Similar to digital marketing (20)

Natural Language Processing for Annual Report in Australia
Natural Language Processing for Annual Report in AustraliaNatural Language Processing for Annual Report in Australia
Natural Language Processing for Annual Report in Australia
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
MongoDB World 2019: Enabling Global Tire Design Leveraging MongoDB's Document...
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Android developer fundamentals training overview Part II
Android developer fundamentals training overview Part IIAndroid developer fundamentals training overview Part II
Android developer fundamentals training overview Part II
 
Extract and Analyze Data from PDF File and Web : A Review
Extract and Analyze Data from PDF File and Web : A ReviewExtract and Analyze Data from PDF File and Web : A Review
Extract and Analyze Data from PDF File and Web : A Review
 
Resume_2016Aug
Resume_2016AugResume_2016Aug
Resume_2016Aug
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Intro to XPages for Administrators (DanNotes, November 28, 2012)
Intro to XPages for Administrators (DanNotes, November 28, 2012)Intro to XPages for Administrators (DanNotes, November 28, 2012)
Intro to XPages for Administrators (DanNotes, November 28, 2012)
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
European SharePoint Conference 2017 Summary
European SharePoint Conference 2017 SummaryEuropean SharePoint Conference 2017 Summary
European SharePoint Conference 2017 Summary
 
(ATS6-DEV02) Web Application Strategies
(ATS6-DEV02) Web Application Strategies(ATS6-DEV02) Web Application Strategies
(ATS6-DEV02) Web Application Strategies
 
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
Fast & relevant search: solutions and trade-offs (January 2020 - Search Techn...
 
Practical automation for beginners
Practical automation for beginnersPractical automation for beginners
Practical automation for beginners
 
XML Performance
XML PerformanceXML Performance
XML Performance
 
Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for Testing
 

Recently uploaded

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

digital marketing

  • 1.
  • 2. Learn more about office users -- Feature usage study by document element statistics Rui SuYing IBM Lotus Symphony
  • 3. Agenda ● Why we need analyse office feature usage ● Feature usage study by document element statistics – Introduction on methodology and tool ● Statistic result sharing ● Future work ● Q&A
  • 4. Why we need analyse office features usage ● Thousands of features in office application ● About 270 menu items in Office 2003, more features in 2007 ● 400+ subsections in ODF spec used to describe office features ● Large quality of features brings challenges to office product – UI design sometimes depends on feature usage – Task prioritization – Limited dev resource vs. endless requirements
  • 5. Some approaches ● User Survey – Questionnaire Survey – Customer evaluation – Can get special requirement from special user group ● User behaviour collection in office application – User action recording when using office application – Focusing on UE improving – Can get accurate user data – Not all users are willing to join for privacy concern – Cross network framework needed
  • 6. Feature usage study by document element statistics
  • 7. Feature usage study by document element statistics Sample File Collection Document Element Collection Result Analysis ● Large quantity of files were collected for analysis use ● We detached document elements usage from the sample files statically ● Result analysis convert raw data to visual result
  • 8. Feature usage study by document element statistics -- Sample File collection ● Two key points ● Large Quantity ● As random as we can ● Methods ● Google search with only file extension name as key word ● Web download one by one ● Sample File Coverage ● 1400+ spreadsheet files(xls,ods, 123) ● 1600+ document files(doc, odt, lwp) ● 400+ presentation files(ppt, odp, prz)(to be added) ● 90%+ written in English, covering multiple language(Chinese, French, Japanese, etc)
  • 9. Document element collection -- Methodology ● We need to analyse document formats ● ODF ● MS Binary ● Lotus SmartSuite ● Parse and load sample files with different filters in IBM Lotus Symphony/OpenOffice ● Document element collection with UNO call after document loading ● Why not work on disk file than collecting after file loading? ● XML parser can handle ODF format, but cannot deal with MS and Lotus SS format ● Some information can not be collected before document formatting
  • 10. Statistic Result Analysis ● Raw result – document element usage per file
  • 11. Statistic Result Analysis ● Average value, maximum value, minimum value ● Element use frequency distribution analysis ● We leveraged D.Scott's method ● Find a proper bin width, get the number of document files whose element usage is in the bin ● The number combined with the bin composes distribution ● Bin width = 3.49 * Standard deviation of sample data * the quantity of sample data ^(-1/3) ●
  • 13. Presentation Documents(odp+ppt files) 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 Presentation Document Page Number Distribution Distribution ● 412 sample files ● 30.71 slides as average ● Presentation files with less than 30 slides covers more than 90% usage
  • 14. What presentation slides number tells us ● Load/save performance evaluation ● 90% coverage when page number is less than 30 ● 95% coverage when page number is less than 70 ● Page Slider Design ● Why we need a page slider in presentation ● A reference for page slider design -- 6 pages shown in page slider as default in Symphony/7 pages shown as default in MS PPT 2003
  • 15. Spreadsheet Documents(xls+ods file) Formula UsageIF SUM COUNTIF LEN CONCATENATE VLOOKUP ROUND PROPER STYLE PRODUCT ROUNDDOWN AVERAGE MAX COUNTBLANK INDEX SQRT SUMPRODUCT TEXT ABS  Top 10 formulas covers 88.31% usage  Total 129 formula used in 1531 sample files
  • 16. What Formula Usage tells us ● Assumption: ● The spreadsheet file collected from web indicates normal users behavior ● Only 129 formulas used in more than 1500 sample files ● OpenOffice supports 371, Symphony supports 377 ● A reference when we develop a light-weight spreadsheet(web spreadsheet) ● Formula testing focus finding ● Thinking... ● If we can get enterprise user's sample file, perhaps we can get a different result. ●
  • 17. Word Processor Document ● Word Count Distribution & Analysis ● ● ● ● ● ● ● 0 20000 40000 60000 80000 100000 120000 0 200 400 600 800 1000 1200 1400 Word Count Distribution Distribution 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0 50 100 150 200 250 300 350 Word Count Distribution2 Distribution
  • 18. Word Processor Document ● Page Number Distribution & Analysis ● ● ● ● ● ● Average Page Number: 10.15 pages ● Short Documents published in web 0 20 40 60 80 100 120 140 160 180 200 0 100 200 300 400 500 600 700 800 900 Page Number Distribution Distribution
  • 19. Word Processor Document ● Table usage in sample document ● Table used in 44.58% of sample documents ● Most of them are middle size ● Graphic usage in sample document ● Graphic usage in 43.41% of sample documents
  • 20. Limitation of document element analysis by file sampling ● Issues in file sampling ● Coverage ● Randomicity ● Lack of files in enterprise environment ● ● Limitation in document element collection ● Limitation of filter capability of Symphony and OpenOffice ● UNO Call quality
  • 22. Future Work ● We will go deeper in this work ● Animation usage statistic – For development priority and UI design ● Chart usage - Chart type & Chart property usage ● Paragraph statistic – Reference for collaboration writing and paragraph sharing ● Document element statistic for sample files ● documents for different industries and different language ● Issues: Document categorisation for industries ● ● A more smart way to collect sample file
  • 23. Q & A
  • 24. Reference ● MS CEIP - http://www.microsoft.com/products/ceip/EN- US/default.mspx ● D. Scott, “On Optimal and Data-based Histograms,” Biometrika, vol. 66, no. 3, pp. 605– 610, 1979.
  • 25.
  • 26. Feature usage study by document element statistics ● Sample files in actual use are resource for feature usage study ● Document element usage information are stored in those files ● Large quantity of sample files will tell us something ● ● We can happen to find large quality of files from web ● Assumption: most of documents in web are for actual use● We have existing tool to be reused for the feature analysis ● IBM Lotus Symphohy/OpenOffice have ability to open multiple types of documents ● IBM Lotus Symphony/OpenOffice can recognize most of document elements
  • 27. Document element collection – Symphony plugin Java Part Java UNO Runtime C++ Part C++ Uno Components C++ UNO Runtime Toolkit API UNO Services Menu/Toolbars Views
  • 28. Spreadsheet Documents(xls+ods file) ● Spreadsheet Document Sampling issues ● Different usage between enterprise users and individual users ● Sheet number distribution show 0 10 20 30 40 50 60 70 0 100 200 300 400 500 600 700 Sheet Number Distribution Distribution