Optimizing SharePoint for Transactional Content Management

  • 568 views
Uploaded on

While SharePoint 2010 and 2013 has a wide range of great document management features, organizations that need "transactional content management" (such as invoices, purchase orders, claims, …

While SharePoint 2010 and 2013 has a wide range of great document management features, organizations that need "transactional content management" (such as invoices, purchase orders, claims, registration forms or other high volume documents related to a business process or transaction) find numerous challenges in optimizing SharePoint for this purpose. This presentation will cover how best to configure and optimize SharePoint for this type of document management.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
568
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Introduction slide
  • Transactional DocsUsers need graphical "query builders" that allow them to combine multiple search conditions. Queries should search metadata only, not keywords or content in the document. Queries should return the exact results specified by the user. System should not attempt to "figure out" what the user really wanted, based on ratings or other social algorithms, or removing duplicates, etc. Results displayed in tabular format, with sort default order determined by the user or admin, and allowing ad-hoc sorting by any column.SharePoint DocsUsers find documents using Search. Query Builders is not OOB, but available through 3rd party vendors. Search looks at both metadata and document content. Search tries to be "intelligent" figure out what you really want. Search results are formatted like a Google or Bing search. Search engine decides how to order the results. Results cannot be re-sorted by user. Metadata Navigation supports tablular results with sorting, but is limited in terms of what column types are supported (e.g. single line of text is not supported). However, with large libraries, it only searches through the most recent 1,000 rows or so (fallback queries)
  • Transactional DocsUsers need graphical "query builders" that allow them to combine multiple search conditions. Queries should search metadata only, not keywords or content in the document. Queries should return the exact results specified by the user. System should not attempt to "figure out" what the user really wanted, based on ratings or other social algorithms, or removing duplicates, etc. Results displayed in tabular format, with sort default order determined by the user or admin, and allowing ad-hoc sorting by any column.SharePoint DocsUsers find documents using Search. Query Builders is not OOB, but available through 3rd party vendors. Search looks at both metadata and document content. Search tries to be "intelligent" figure out what you really want. Search results are formatted like a Google or Bing search. Search engine decides how to order the results. Results cannot be re-sorted by user. Metadata Navigation supports tablular results with sorting, but is limited in terms of what column types are supported (e.g. single line of text is not supported). However, with large libraries, it only searches through the most recent 1,000 rows or so (fallback queries)
  • Conclusion slide
  • SharePint

Transcript

  • 1. Optimizing SharePoint Search forTransactional ContentManagement (TCM)
  • 2. About Hershey Technologies…• Founded in 1991• Microsoft Partner• Specialists in• Document Imaging / Scanning• OCR (data and document capture)• ECM• BPM / workflow• End to End SharePoint Consulting Services• Follow us onTwitter: @HersheyTech
  • 3. About Tom Castiglia…•Principal at HersheyTechnologies• Twitter: @tomcastiglia• Email: tcastiglia@hersheytech.com• Joined HersheyTech in 1998• Director of Hershey’s professional services teamsince 2001
  • 4. Agenda• Explanation of “Transactional Content Management” (TCM)• Overview of SharePoint features that are relevant toTCM• How to make SharePoint supportTCM• Demo of solutions that fill the feature gaps to ensure SharePoint issuccessful for your transactional content management project• Ad-hoc scanning / document capture into SharePoint• Optimizing SharePoint search for large scaleTCM deployments• Enable collaboration of static, transactional documents• Make scanned images and PDF documents a 1st class citizen within SharePoint
  • 5. Topics not covered in this presentation• Assumptions - I presume that you understand:• Columns (document metadata)• ContentTypes• Document Libraries• Other topics not covered (just not enough time to include):• Automated Data Capture/OCR• Records Management• Workflow• RBS
  • 6. Enterprise Content Managementin SharePointSharePoint Rocks at this!Web ContentSharePoint Rocks at this!Document CollaborationSharePoint needs a little help hereTransactional Documents
  • 7. What is “Transactional Content Management”?“high-volume throughput ofrelatively static documents”“content which typically originates outside andorganization from external parties – customersor partners-and relies on workflow or businessprocess management (BPM) to drivetransactional, back-office business processes.”-Forrester Research
  • 8. Typical types of documentsTRANSACTIONAL DOCUMENTS• Purchase Orders• Vendor Invoices• Application Forms• Insurance claims• Student Records• Enrollment Forms• (Not project based)COLLABORATIVE DOCUMENTS• Proposals, reports, spreadsheets,presentations and otherdocuments created and edited byknowledge worker users• Office docs (Word, Excel, PowerPoint)• PDF files• Created and uploaded on an ad-hoc basis to support day to dayoperations• (Often project based)
  • 9. How documents are typically receivedTRANSACTIONAL DOCUMENTSFax ServerInvoices@mycompany.comOrders@mycompany.comOCRFormProcessingExternalSystems(AP, claims,etc.)
  • 10. Information ArchitectureTRANSACTIONAL CONTENT• Centralized• Often isolated to just one or afew site collections• Document Center or Record Center• Thousands to millions ofdocuments per libraryCOLLABORATIVE CONTENT• Decentralized• Documents are often spreadthroughout many sitecollections, sub-sites, librariesand content types• Typically under 5K documentsper library.
  • 11. How users find documentsTRANSACTIONAL DOCUMENTS• Navigation doesn’t work - too manydocuments per library• Search via metadata queries only• Ignore document content• Ignore social based algorithms likeratings• Users expect intuitive, graphicalquery builders to specify precisesearch conditions against one ormore metadata fields.COLLABORATION SCENARIOS• Navigation• SiteSub-SiteLibraryFolderDocument• Keyword Search• Searches both metadata and documentcontent• Use of social algorithms improve searchresults (e.g. highly rated documents arereturned above other documents)
  • 12. How users find documentsTRANSACTIONAL DOCUMENT SEARCHTYPICAL SHAREPOINT SEARCH
  • 13. What about Metadata Navigation and Filtering?• This native SharePoint feature does provide a limited querybuilder …• Allows users to query against specific SharePoint columns and choosevarious search operators (Equals,At Most, At Least, On, Before, etc.)• Filters document library providing results in a sortable, tabular display.
  • 14. Limits of Metadata Navigation and Filtering• Doesn’t support text columns• Transactional documentsgenerally need text basedcolumns for fields likeInvoiceNumber, PONumber,VendorId, ClaimNumber, etc.• Doesn’t scale well for librariesthat exceed the list viewthreshold (5,000 documents bydefault)
  • 15. Integrating Metadata with SearchMetadataColumnsCrawledPropertiesManagedPropertiesSearchResults
  • 16. Four Challenges to Transactional ContentManagement in SharePoint• Configuring Managed Properties in SharePoint Search is morecomplex than it needs to be.• SharePoint does not provide a robust query builder for users tointuitively query documents (other ECM solutions offer this OOB)• SharePoint formats Search results like a search engine, not like adocument management product.• SharePoint treats PDF documents and scanned images as a 2nd classcitizen.
  • 17. Crawled Properties• Crawled properties are metadata (such as author, title, or subject) thatare extracted from SharePoint columns during crawls.• However, this is the internal representation of the metadata. Toenable users to search on this metadata, we need to use managedproperties that are mapped to the crawled properties.
  • 18. Crawled Properties• A new crawled property is created for each newcustom column, after…• The column is added to at least one list or library• The column is populated with a value in at least one item• A Full Crawl is performed
  • 19. Crawled Properties - Categories• All Crawled properties are groupedinto various categories.• ForTransactional ContentManagement solutions, we generallycare about the “SharePoint” Category,which contains crawled propertiesthat are tied to list columns inSharePoint.• Accessible from Search ServiceApplication: MetadataProperties>Categories
  • 20. Crawled Properties• The Naming convention is fully controlled by SharePoint,using this convention: ows_[internal name of column]• However, spaces or other symbols (.-!@#$%^, etc.)within the internal column name are escaped, such as:Column Internal Name Crawled Property NameInvoiceNumber ows_InvoiceNumberInvoice Number ows_Invoice_x0020_NumberInvoice.Number ows_Invoice_x002e_NumberInvoice-Number ows_Invoice_x002d_Number
  • 21. Crawled Properties• In SP2010, most SharePoint columns gets one crawledproperty• Managed Metadata Columns get a 2nd crawled property, with aprefix of “ows_taxid”• This extra crawled property is used to store the internalGUID value that is associated with the managed metadataterm. For example:Column Name: CostCenterNormal Crawled Property: ows_CostCenterMM Id Crawled Property: ows_taxid_CostCenter
  • 22. Managed Properties…•…Allow you to enable standardization inthe terms used for searching SharePoint.•…Represent the end-user’s vision of the SPtaxonomy (at least with regards to Search)• So the name of your managed propertiesshould normally be something intuitive toyour end-users
  • 23. Managed Properties• One managed property may be mapped to one or morecrawled properties.• Useful in low governance situations where multiple site owners orsite collection admins have duplicated site columns using differentnames (e.g. InvoiceNumber vs ‘Invoice Number’)• One crawled property may be mapped to one or moremanaged properties• Useful if different applications create their own managedproperties, and need to reference the same crawled property.
  • 24. Using Managed PropertiesWITHOUT MANAGED PROPERTIES WITH MANAGED PROPERTIESReturns 16 items, only 6of which are related towhat I wanted.Included otherdocuments that happento contain the StudentIdvalue either as text inthe document or insome other field (like anInvoice Number, orsomething else)Returns onlythe 6 correctitems
  • 25. Advanced Search Web PartProvides an OOB searchinterface that allows usersto select a ManagedProperty from a drop downlist, rather than having totype out the managedproperty name (e.g.“StudentID:” or“StudentID=“)
  • 26. Configuring Advanced Search Web PartUse your favorite XML editor (VS 2012)
  • 27. Using Advanced Search Web Part
  • 28. Creating Managed PropertiesUnlike Crawled Properties (which arealways auto-generated by SharePoint…Managed properties can be created in oneof three ways…
  • 29. Creating Managed Properties (Option 1)• SP2010: “MetadataProperties” link• SP2013: “Search Schema”linkSP 2010 SP 2013Managed Properties can be createdmanually by a SharePoint Administratorfrom the Search Service Applicationconfiguration.
  • 30. Creating Managed Property (SP2010)• Click “New Managed Property” link fromMetadata Property Mappings• Property Name can contain most characters, except forspaces (but please don’t use special characters)• Based on the selected type, this managed property canonly be mapped to crawled properties with the same type.• Add Mapping – Select 1 or more crawled properties to mapto this managed property.• If multiple are selected decide whether to include all values orjust the first one found• Scopes – preset filter on content – like a global where clause• Reduce storage requirements (“hash”) – optionactually works in reverse to what is stated.
  • 31. Creating Managed Property (SP2013)• Property Name - Same as SP2010• Add Mapping - same as in SP2010• Reduce storage requirements (“hash”) option - No longerexists in SP2013• Many additional settings• Searchable – Enables querying against the content of themanaged property• Queryable – Enables querying against the specificmanaged property• Retrievable – Enable this setting for managed propertiesthat are relevant to present in search results.• Refinable – Can be used as a search refiner• Sortable –• Token Normalization• Complete Matching
  • 32. Creating Managed Properties (Option 2)• For example, Hershey’s XenDocsECM for SharePoint will validatethat a managed property isproperly configured orautomatically create a managedproperty for each column when ourweb part is configured.Automatically generated by customcode or a 3rd party application
  • 33. Creating Managed Properties (Option 3)Let SharePointAuto-Generate newmanaged properties when it crawls
  • 34. Auto-Generating Managed Properties•In SharePoint 2010…• This feature is off by default, but it can be enabled in yourSearch Service ApplicationFrom the Categories list, hover over the SharePoint category,click the drop down arrow and then select the Edit Categoryoption.Select the option to “automatically generate a new managedproperty for each crawled property…”
  • 35. Auto-Generating Managed Properties•In SharePoint 2013…• All site columns that contain data will have a managed propertyauto-generated upon a full crawl• This does not happen for list columns• This feature cannot be turned off and is not configurable (as far as Ican tell)http://technet.microsoft.com/en-us/library/jj613136.aspx
  • 36. Comparison of Naming conventions for Crawledand Auto-Generated Managed PropertiesColumn SharePoint 2010 SharePoint 2013Name Crawled Property Managed Property Crawled Properties Managed PropertyFooBar 0ws_FooBar owsFooBar1 0ws_FooBarows_q_TEXT_FooBarNot mappedFooBarOWSTEXTFoo Bar ows_Foo_x0020_Bar owsFoox0020Bar ows_Foo_x0020_Bar FooBarOWSTEXTFoo_Bar 0ws_Foo_Bar owsFooBar 0ws_Foo_Barows_q_TEXT_Foo_BarNot mappedFooBarOWSTEXTFoo-Bar ows_Foo_x002d_Bar owsFoox002dBar ows_Foo-Barows_q_TEXT_Foo-BarNot mappedFoo-BarOWSTEXTFoo.Bar ows_Foo_x002e_Bar owsFoox002eBar ows_Foo.Barows_q_TEXT_Foo.BarNot mappedFoo.BarOWSTEXTThe auto-generatednames formanagedproperties arenot “end-userfriendly” !
  • 37. Enhancing the User Experience…
  • 38. Hershey’s XenDocs ECM for SharePointA vast improvement compared to the native Advanced SearchWeb Part
  • 39. Viewing PDF files and scanned images• MS Office Documents are first 1st class citizens in SharePoint• When office files are opened in Office 2007, 2010 or 2013, users can performmany SharePoint functions on those documents:• Edit document content• Check in/out/discard• See version history• Edit metadata• PreviewThumbnails in SP 2013• Most other file types, especially PDF files and scanned images are 2ndclass citizens• Read only view of document
  • 40. Viewing Scanned Images and/or PDF filesFiles typically open in nativeapps such asWindows PhotoGallery or Adobe Reader• Users cannot edit metadata• If user rotates, re-orders ordeletes a page, the changescannot be saved to SP• User cannot annotate pages(e.g. sticky notes, redactions,etc.)
  • 41. Vizit Essential™ - Integrated viewing of scannedimages and/or PDF documents in SharePointA powerful, lowcost PDF andimaging viewerfor SharePoint
  • 42. Vizit Essential - Integrated viewing of scannedimages and/or PDF documents in SharePointVisuallysearchdocumentswiththumbnailsand quickpreviews
  • 43. Vizit Essential - Integrated viewing of scannedimages and/or PDF documents in SharePointSearch for textwithin a PDF file(just like AdobeReader/Acrobat)
  • 44. Vizit Essential - Integrated viewing of scannedimages and/or PDF documents in SharePointEdit SharePointmetadata withinthe viewer forPDF documentsand scannedimages
  • 45. Vizit Pro™ - Integrated viewing of scannedimages and/or PDF documents in SharePointAdds robustimage editingfeatures –annotations, re-order, rotate ordelete pages,image cleanup
  • 46. Conclusion• To leverage SharePoint’s native features for transactional documentmanagement…• Extensive upfront planning• Complex configuration (many more steps to configure SP compared to mostdedicated document management products)• To make the overall user experience in SharePoint comparable withdedicated Document Management products, plan on:• Lots of custom code ... OR …• 3rd party solutions
  • 47. Join us right after the event at the Firehouse Grill!Socialize and unwind after our day of learning.1765 E. Bayshore RoadEast Palo Alto, CA