SlideShare a Scribd company logo
So you think you can crawl?
Stretching the Boundaries of SharePoint 2013!
Petter Skodvin-Hvammen
AD-Gruppen, Norway
Who am I?
Petter Skodvin-Hvammen
Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD
• Solutions Architect
• SharePoint Consultant
• Search Enthusiast
• Community Lead
@pettersh - psh@adgruppen.no
www.adgruppen.no
Enterprise Search
Index thousands
of sources
Automate index
management
Infrastructure
sizing
Challenges and Solutions
Not Included: code/scripts, user experience, relevancy, governancewww.sharepointeurope.com
Enterprise Search using SharePoint Server 2013
• 30,000 users
• 85 locations in 30 countries
• 15,000 daily searches
• 100,000,000 documents(?)
• 60 core systems, 2,000 applications
The Mission…
What do we index?
100,000,000
documents
3,000
fileshares
500
servers
Where is the data?
• Datacenters
• Time zones
• Bandwidth
www.sharepointeurope.com
* http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx
How can we get it?
• Limit bandwidth usage for specific server locations
• Limit crawler impact within local business hours
• Grant read access to crawler per file share
• Avoid token bloat issues with more than 1,015*
groups per account
How do we operate it?
• File shares are created, changed, and deleted every
day using a custom self service solution
• File shares are moved between servers every day by
automation rules
• Manage indexing and crawling of each file shares with
minimum manual effort
www.sharepointeurope.com
What can SharePoint do?
• Max 50 content sources per service application
– Max 500 with October 2013 CU installed
• Max 100 start addresses per content source
– Max 500 with October 2013 CU installed
• Max 20 concurrent crawls per service application
– Limitation has been removed
http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search
It’s complicated
• More data than we have space for
• It’s located all over the place
• Everything changes all of the time
• There are limitations in SharePoint
• Someone’s gotta maintain this
• It has to be secure and relevant
www.sharepointeurope.com
What did we do?
• Created logical groups of file shares
• Used symbolic linking
www.sharepointeurope.com
fewer
content
sources
file01share01
file02share03
file03share03
file00sharesym01
file00sharesym02
file00sharesym03
file00share
Start address
What did we do?
• Grouped file shares based on region
• One content source per region
• Incremental crawls every night
www.sharepointeurope.com
crawling
based on
time zones
What did we do?
• Created DNS alias per impact rule in
etc/hosts on crawl servers
www.sharepointeurope.com
reduced
crawler
impact
What did we do?
• Granted file share access to the
account included in least groups
• Monitored group memberships
• Grouped file shares by crawl account
• Crawl rules matched folder structure
managed pool
of crawl
accounts
file://.*/spcrwl01/.*
file://.*/spcrwl02/.*
Include
Include
SPspcrwl01
SPspcrwl02
www.sharepointeurope.com
The bigger picture
• Folder structure:
• Start addresses:
<content source>/<crawler impact>/<crawl account>/<symbolic link>
file://<crawler impact>/<content source>/<crawler impact>
Source Start addresses Folder Crawl rule Impact rule
Europe file://default/europe/default europe/default/spcrwl01 file://.*/spcrwl01/.* Default
europe/default/spcrwl02 file://.*/spcrwl02/.* Default
file://wait-60/europe/wait-60 europe/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60
europe/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
Asia file://default/asia/default asia/default/spcrwl01 file://.*/spcrwl01/.* Default
asia/default/spcrwl02 file://.*/spcrwl02/.* Default
file://wait-60/asia/wait-60 asia/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60
asia/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
How did we manage this?
www.sharepointeurope.com
self service portal for
enabling indexing of
file shares
custom web service
integration in self service portal
custom solution for
granting access to
crawl accounts
custom timer job to get list of file shares
to crawl from self service portal
custom timer job for creating
and removing symbolic links
custom lists for mapping
server to content source, schedule
and impact, shares to crawl accounts
and metadata, UNC to symlink
content enrichment service for
replacing symlinks in paths with actual file paths
www.sharepointeurope.com
Title: European SharePoint Conference
Owner: Petter Skodvin-Hvammen
Business Area: Consulting
Classification: Internal
Type: Project
UNC Path: Assigned automatically
Crawl Account: Assigned automatically
CancelSave
Example: Self Service Portal Example: Custom Lists
Title: European SharePoint Conference
Owner: Petter Skodvin-Hvammen
Business Area: Consulting
Classification: Internal
Type: Project
UNC Path: file01share01
Crawl Account: SPspcrawl01
Symlink: defaulteuropedefaultspcrwl01e5dc12a41d
Location: europe (server file01 is located in Oslo DC)
Bandwidth: 5Mbps
Index-0
Query
WFE
Doc Proc
Crawling
Central Admin
Enrichment
Query
WFE
Index-2
Index-1
Index-3
Index-0
Index-2
Index-1
Index-3
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Crawling
Analytics
AdminAdmin
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Analytics
Doc Proc
Enrichment
Doc Proc
Enrichment
40Million
Documents
10Queries /
Second
SQL Server SQL Server
• Admin DB
• Analytics DB
• Crawl DB
• Link DB
• Other SP DBs
Caching Caching
Capacity testing
Purpose
• Crawling of symbolic links
• Scaling of virtual machines
• Sizing of disk space
• Verify Microsoft’s advises
Approach
• 4 server farm with 2 partitions
• 8 vCPU, 16 GB RAM, 850 GB
• Crawl 10 file shares (3.7M files)
• Replay top 300 queries
• Apache JMeter
www.sharepointeurope.com
Capacity testing – findings
• Crawl rate declined 1% per million items indexed
• Query latency increased exponentially from 12 million items
indexed per partition
• Database latency was insignificant during crawling
• Successfully crawled file shares via symbolic directory links
• Disk space usage was significant lower than expected
– Reduced data volume from 850 GB to 450 GB
– 40+ servers => huge cost savings
www.sharepointeurope.com
Infrastructure – VM sizing
Dedicated ESX Cluster
• 14 x VM for SharePoint 2013
– 4 physical machines
– 4 x 32 = 128 CPUs
– 4 x 56 = 1024 GB memory
• HA max utiliization = ¾
– 3 x 32 = 96 CPUs
– 3 x 56 = 768 GB memory
• CPU and Memory can be over-
commited
• CPU over-commited 1,34
(1,78 if one physical host fail)
• VM’s must wait for physical CPU
Wait time for 8 cpu = 2 x 4 cpu
• Mitigation:
a) Reduce allocated virtual CPU, or
b) Increase physical CPU
• Memory factor 0,44 (0,59)
• Reserved and locked memory
prevents HA failover
www.sharepointeurope.com
Infrastructure – VM tuning
www.sharepointeurope.com
DC Role vCPU Peak Average Calculated Recommended Change
A Web, Query, Admin 8 187,55 37,03 2 4 -4
B Web, Query, Admin 8 621,88 92,69 8 8 0
A Crawl, Analytics, Content, CEWS, Central Admin 8 724,35 210,59 8 8 0
B Crawl, Analytics, Content, CEWS, Symbolic Links 8 724,56 198,44 8 8 0
A Index 0, Content, CEWS 8 486,18 62,55 6 6 -2
B Index 0, Content, CEWS 8 520,63 63,98 6 6 -2
A Index 1, Content, CEWS 8 547,08 69,3 6 6 -2
B Index 1, Content, CEWS 8 546,44 91,74 6 6 -2
A Index 2, Content, CEWS 8 491,38 65,6 6 6 -2
B Index 2, Content, CEWS 8 532,01 77,83 6 6 -2
A Index 3, Content, CEWS 8 540,45 78,72 6 6 -2
B Index 3, Content, CEWS 8 621,88 92,69 8 8 0
A Distributed Cache 4 91,71 5,99 2 2 -2
B Distributed Cache* (added later) - - - - - -
100 78 80 -20
Peak and average CPU usage is calculated over 30 days
Summary
1. Indexing thousands of content sources
2. Automation for rapid changing index requirements
3. Sizing the infrastructure for performance and HA
www.sharepointeurope.com
Questions?
petter.skodvin-hvammen@adgruppen.no http://linkedin.com/in/petterskodvin@pettersh

More Related Content

What's hot

SharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid worldSharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid world
Jethro Seghers
 
SharePoint 2013 Performance Enhancements
SharePoint 2013 Performance EnhancementsSharePoint 2013 Performance Enhancements
SharePoint 2013 Performance Enhancements
Eric Shupps
 
Sps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitvSps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitv
amitvasu
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
European Collaboration Summit
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
Elaine Van Bergen
 
Share point 2013 in a hybrid world
Share point 2013 in a hybrid worldShare point 2013 in a hybrid world
Share point 2013 in a hybrid world
Jethro Seghers
 
How to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DBHow to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DB
Microsoft Tech Community
 
Rev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best PracticesRev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best Practices
SPC Adriatics
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Nik Patel
 
2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release
Dan Usher
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
European Collaboration Summit
 
SPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint SearchSPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint Search
NCCOMMS
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft Azure
K.Mohamed Faizal
 
What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?
Jason Himmelstein
 
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
European Collaboration Summit
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
SPC Adriatics
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
Elaine Van Bergen
 
SPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointSPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePoint
Dan Usher
 
O365 Sydney - Hybrid Dev
O365 Sydney - Hybrid DevO365 Sydney - Hybrid Dev
O365 Sydney - Hybrid Dev
Elaine Van Bergen
 
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
Don Donais
 

What's hot (20)

SharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid worldSharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid world
 
SharePoint 2013 Performance Enhancements
SharePoint 2013 Performance EnhancementsSharePoint 2013 Performance Enhancements
SharePoint 2013 Performance Enhancements
 
Sps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitvSps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitv
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
 
Share point 2013 in a hybrid world
Share point 2013 in a hybrid worldShare point 2013 in a hybrid world
Share point 2013 in a hybrid world
 
How to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DBHow to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DB
 
Rev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best PracticesRev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best Practices
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...
 
2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
 
SPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint SearchSPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint Search
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft Azure
 
What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?
 
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
 
SPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointSPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePoint
 
O365 Sydney - Hybrid Dev
O365 Sydney - Hybrid DevO365 Sydney - Hybrid Dev
O365 Sydney - Hybrid Dev
 
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
 

Similar to ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenI2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
SPS Paris
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
Petter Skodvin-Hvammen
 
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesShare point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practices
Eric Shupps
 
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
DIWUG
 
SharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSharePoint 2013 Search Operations
SharePoint 2013 Search Operations
SPC Adriatics
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)
Brian Culver
 
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectSharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
Noorez Khamis
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
European SharePoint Conference
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
Michael Kehoe
 
What's new in sharepoint 2016
What's new in sharepoint 2016What's new in sharepoint 2016
What's new in sharepoint 2016
Mike Maadarani
 
Leveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organizationLeveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organization
Don Donais
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
Phil Pursglove
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Sascha Wenninger
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
Brian Culver
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
Eric Shupps
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
Tips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint OnlineTips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint Online
Andries den Haan
 
SharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - AnnouncementsSharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - Announcements
Nick Hobbs
 
Deep thoughts from the real world of azure
Deep thoughts from the real world of azureDeep thoughts from the real world of azure
Deep thoughts from the real world of azure
Michele Leroux Bustamante
 

Similar to ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013! (20)

I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenI2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
 
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesShare point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practices
 
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
 
SharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSharePoint 2013 Search Operations
SharePoint 2013 Search Operations
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)
 
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectSharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
What's new in sharepoint 2016
What's new in sharepoint 2016What's new in sharepoint 2016
What's new in sharepoint 2016
 
Leveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organizationLeveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organization
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Tips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint OnlineTips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint Online
 
SharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - AnnouncementsSharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - Announcements
 
Deep thoughts from the real world of azure
Deep thoughts from the real world of azureDeep thoughts from the real world of azure
Deep thoughts from the real world of azure
 

Recently uploaded

Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 

Recently uploaded (20)

Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 

ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

  • 1. So you think you can crawl? Stretching the Boundaries of SharePoint 2013! Petter Skodvin-Hvammen AD-Gruppen, Norway
  • 2. Who am I? Petter Skodvin-Hvammen Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD • Solutions Architect • SharePoint Consultant • Search Enthusiast • Community Lead @pettersh - psh@adgruppen.no www.adgruppen.no
  • 3. Enterprise Search Index thousands of sources Automate index management Infrastructure sizing Challenges and Solutions Not Included: code/scripts, user experience, relevancy, governancewww.sharepointeurope.com
  • 4. Enterprise Search using SharePoint Server 2013 • 30,000 users • 85 locations in 30 countries • 15,000 daily searches • 100,000,000 documents(?) • 60 core systems, 2,000 applications The Mission…
  • 5. What do we index? 100,000,000 documents 3,000 fileshares 500 servers
  • 6. Where is the data? • Datacenters • Time zones • Bandwidth www.sharepointeurope.com
  • 7. * http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx How can we get it? • Limit bandwidth usage for specific server locations • Limit crawler impact within local business hours • Grant read access to crawler per file share • Avoid token bloat issues with more than 1,015* groups per account
  • 8. How do we operate it? • File shares are created, changed, and deleted every day using a custom self service solution • File shares are moved between servers every day by automation rules • Manage indexing and crawling of each file shares with minimum manual effort www.sharepointeurope.com
  • 9. What can SharePoint do? • Max 50 content sources per service application – Max 500 with October 2013 CU installed • Max 100 start addresses per content source – Max 500 with October 2013 CU installed • Max 20 concurrent crawls per service application – Limitation has been removed http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search
  • 10. It’s complicated • More data than we have space for • It’s located all over the place • Everything changes all of the time • There are limitations in SharePoint • Someone’s gotta maintain this • It has to be secure and relevant www.sharepointeurope.com
  • 11. What did we do? • Created logical groups of file shares • Used symbolic linking www.sharepointeurope.com fewer content sources file01share01 file02share03 file03share03 file00sharesym01 file00sharesym02 file00sharesym03 file00share Start address
  • 12. What did we do? • Grouped file shares based on region • One content source per region • Incremental crawls every night www.sharepointeurope.com crawling based on time zones
  • 13. What did we do? • Created DNS alias per impact rule in etc/hosts on crawl servers www.sharepointeurope.com reduced crawler impact
  • 14. What did we do? • Granted file share access to the account included in least groups • Monitored group memberships • Grouped file shares by crawl account • Crawl rules matched folder structure managed pool of crawl accounts file://.*/spcrwl01/.* file://.*/spcrwl02/.* Include Include SPspcrwl01 SPspcrwl02 www.sharepointeurope.com
  • 15. The bigger picture • Folder structure: • Start addresses: <content source>/<crawler impact>/<crawl account>/<symbolic link> file://<crawler impact>/<content source>/<crawler impact> Source Start addresses Folder Crawl rule Impact rule Europe file://default/europe/default europe/default/spcrwl01 file://.*/spcrwl01/.* Default europe/default/spcrwl02 file://.*/spcrwl02/.* Default file://wait-60/europe/wait-60 europe/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60 europe/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60 Asia file://default/asia/default asia/default/spcrwl01 file://.*/spcrwl01/.* Default asia/default/spcrwl02 file://.*/spcrwl02/.* Default file://wait-60/asia/wait-60 asia/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60 asia/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
  • 16. How did we manage this? www.sharepointeurope.com self service portal for enabling indexing of file shares custom web service integration in self service portal custom solution for granting access to crawl accounts custom timer job to get list of file shares to crawl from self service portal custom timer job for creating and removing symbolic links custom lists for mapping server to content source, schedule and impact, shares to crawl accounts and metadata, UNC to symlink content enrichment service for replacing symlinks in paths with actual file paths
  • 17. www.sharepointeurope.com Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Business Area: Consulting Classification: Internal Type: Project UNC Path: Assigned automatically Crawl Account: Assigned automatically CancelSave Example: Self Service Portal Example: Custom Lists Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Business Area: Consulting Classification: Internal Type: Project UNC Path: file01share01 Crawl Account: SPspcrawl01 Symlink: defaulteuropedefaultspcrwl01e5dc12a41d Location: europe (server file01 is located in Oslo DC) Bandwidth: 5Mbps
  • 18. Index-0 Query WFE Doc Proc Crawling Central Admin Enrichment Query WFE Index-2 Index-1 Index-3 Index-0 Index-2 Index-1 Index-3 Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Crawling Analytics AdminAdmin Enrichment Enrichment Enrichment Enrichment Enrichment Enrichment Enrichment Analytics Doc Proc Enrichment Doc Proc Enrichment 40Million Documents 10Queries / Second SQL Server SQL Server • Admin DB • Analytics DB • Crawl DB • Link DB • Other SP DBs Caching Caching
  • 19. Capacity testing Purpose • Crawling of symbolic links • Scaling of virtual machines • Sizing of disk space • Verify Microsoft’s advises Approach • 4 server farm with 2 partitions • 8 vCPU, 16 GB RAM, 850 GB • Crawl 10 file shares (3.7M files) • Replay top 300 queries • Apache JMeter www.sharepointeurope.com
  • 20. Capacity testing – findings • Crawl rate declined 1% per million items indexed • Query latency increased exponentially from 12 million items indexed per partition • Database latency was insignificant during crawling • Successfully crawled file shares via symbolic directory links • Disk space usage was significant lower than expected – Reduced data volume from 850 GB to 450 GB – 40+ servers => huge cost savings www.sharepointeurope.com
  • 21. Infrastructure – VM sizing Dedicated ESX Cluster • 14 x VM for SharePoint 2013 – 4 physical machines – 4 x 32 = 128 CPUs – 4 x 56 = 1024 GB memory • HA max utiliization = ¾ – 3 x 32 = 96 CPUs – 3 x 56 = 768 GB memory • CPU and Memory can be over- commited • CPU over-commited 1,34 (1,78 if one physical host fail) • VM’s must wait for physical CPU Wait time for 8 cpu = 2 x 4 cpu • Mitigation: a) Reduce allocated virtual CPU, or b) Increase physical CPU • Memory factor 0,44 (0,59) • Reserved and locked memory prevents HA failover www.sharepointeurope.com
  • 22. Infrastructure – VM tuning www.sharepointeurope.com DC Role vCPU Peak Average Calculated Recommended Change A Web, Query, Admin 8 187,55 37,03 2 4 -4 B Web, Query, Admin 8 621,88 92,69 8 8 0 A Crawl, Analytics, Content, CEWS, Central Admin 8 724,35 210,59 8 8 0 B Crawl, Analytics, Content, CEWS, Symbolic Links 8 724,56 198,44 8 8 0 A Index 0, Content, CEWS 8 486,18 62,55 6 6 -2 B Index 0, Content, CEWS 8 520,63 63,98 6 6 -2 A Index 1, Content, CEWS 8 547,08 69,3 6 6 -2 B Index 1, Content, CEWS 8 546,44 91,74 6 6 -2 A Index 2, Content, CEWS 8 491,38 65,6 6 6 -2 B Index 2, Content, CEWS 8 532,01 77,83 6 6 -2 A Index 3, Content, CEWS 8 540,45 78,72 6 6 -2 B Index 3, Content, CEWS 8 621,88 92,69 8 8 0 A Distributed Cache 4 91,71 5,99 2 2 -2 B Distributed Cache* (added later) - - - - - - 100 78 80 -20 Peak and average CPU usage is calculated over 30 days
  • 23. Summary 1. Indexing thousands of content sources 2. Automation for rapid changing index requirements 3. Sizing the infrastructure for performance and HA www.sharepointeurope.com