2. Outline
• Our ingestion rate experiments
• Hardware and Software setup
• Experimental design
• Observations and Conclusions from these tests
• Implications for Repository Sizing and
Organization in SharePoint and UCM
• Lessons Learned and Recommendations
• Q and A
3. Overall aims of this research
• Apply real-world scenarios to ingestion testing
• Rather than ultra high performance / ultra high cost
• Determine actual ingestion rates for different
scenarios on identical hardware
• Expose weaknesses / issues in large imports
• Derive recommendations for best practices in
importing existing content into new CMS
repositories
4. Experimental Approach
• Import existing files from file system into newly-installed
CMS
– Standard configurations
– Commodity hardware
– No specialized tuning or optimizations
– Vendor recommended OS and databases
• Four scenarios
– 20,000 files @ 40kB
– 20,000 files @ 100kB
– 1,000,000 files @ 40kB
– 1,000,000 files @ 100kB
5. Are these Scenarios Realistic?
• >80% of single instance CMS repositories
contain 50-200,000+ items
• Average “document” size in most
industries is ~100kB.
• Most projects need to import existing
content from file shares or other systems
6. Commodity Hardware
• Dell PowerEdge R710s server
• Dual Intel Xeon 5560 CPUs (@ quad core)
running at 2.8Ghz
• 16GB RAM
• Eight 146GB 10K RPM SAS drives
8. SharePoint Installation
• Operating System: Windows Server 2008 Std Edition for
Partners
• Database: Microsoft SQL Server 2008 R2
Enterprise
• Web Server: IIS7 (Standard with Windows Server
2008 - specifically v 7.5.76)
• Content ManagementSharePoint Server 2010 Enterprise for
System: Partners
• File storage: Database Storage in SQL server
9. Ingestion Approaches
• UCM
– used BatchBuilder and BatchLoader
• SharePoint -
– had to use third party tool (UploadZen by
Roxority)
– Need to organize content before import
– Limited flexibility in directory size
10. Supported SharePoint 2010 bulk
import strategies
• Multiple file upload applet
– Silverlight; supports up to 100 docs, does not support
subdirectories
• Windows Explorer view
– Extension of webDAV
– Limited performance
• SharePoint Workspace
– Client integration
– Only supports up to 500 documents
11. Differences between Import Strategies
• BatchLoader
– Supported system tool
– Allows automated file system crawl (BatchBuilder)
– Storage / browse location in repository unrelated to source
location
– Supports high volume
• UploadZen
– Third-party application
– Requires organization and sizing of import directories
– Organization within repository reflects import location
– Major challenges with high volume imports
12. Considerations for Repository Sizing
1. Should be primarily driven by business / infosec needs
2. Practicality
– Import / migrate
– Search / organize
– Backup / DR
3. Flexibility
– Growth in content volume / size
– Leverage HSM / partitioning
– Provide options for storage strategies
13. Ingestion Rate Testing
• Major things to test:
– Overall rate of ingestion with different sized
files and different sized collections
– Ease of use of import tools
– Flexibility in organization of content during /
after import
14. 20,000 files – each 40kB
• First set of tests
• Single directory for SharePoint source
• UCM – File System storage – 198,000 docs/hr
• UCM – JDBC storage – 156,000 docs/hr
• SharePoint – 153,000 docs/hr
15. 20,000 files – each 100kB
• UCM – File System storage – 171,000 docs/hr
• SharePoint – 138,000 docs/hr
• Ingestion rates fell 10-15% for larger file size
• SharePoint RAM usage higher, primarily in
database
16. 1,000,000 files – each 40kB
• Need to organize files in directories for SharePoint
– 50 folders each with 20,000 items - failed
– 2,000 folders each with 500 items – succeeded
• UCM – FS storage & Sun JRE 205,000 docs/hr
• UCM – FS storage & JRockit JRE 212,000 docs/hr
• UCM – JDBC storage & Sun JRE 171,000 docs/hr
• SharePoint w/ 50 import folders failed
• SharePoint w/ 2,000 import folders 217,000 docs/hr
17. 1,000,000 files – each 40kB
(contd.)
• Substantial work to organize content for
SharePoint import
• SharePoint much more RAM intensive
– Primarily with database process
• UCM more CPU intensive
– Much more linear response
18. 1,000,000 files – each 100kB
• UCM – FS storage & Sun JRE 179,000 docs/hr
– 15% decrease in rate due to file size
• Unable to complete test with SharePoint
19. Conclusions
• SharePoint requires 3rd party tools and substantial work
before import
• SharePoint has limited flexibility in terms of repository
sizing, content organization, and import strategies
• With optimized import, SharePoint ingestion rates are
comparable to UCM
• UCM has much more flexibility in import strategies
• UCM has consistent import rates between 156,000 and
212,000 docs/hr (OOTB)
20. Conclusions (contd.)
• Ingestion rates are dependant on average file size (10-
15% decrease in rate between 40kB and 100kB file size)
• UCM can be deployed on commodity hardware for
repositories of 1,000,000 items
• SharePoint has challenges importing 1,000,000 files on
commodity hardware
• Both systems function well on this hardware after import.
• SharePoint import is much more RAM intensive whereas
UCM import is CPU intensive