SPS Phoenix Optimizing SharePoint for Transactional Content Management
Upcoming SlideShare
Loading in...5
×
 

SPS Phoenix Optimizing SharePoint for Transactional Content Management

on

  • 499 views

Transactional content is comprised of documents that organization receives from external parties, such as vendors, customers and partners. Common examples of transactional documents are vendor ...

Transactional content is comprised of documents that organization receives from external parties, such as vendors, customers and partners. Common examples of transactional documents are vendor invoices, purchase orders from clients, claim forms, application and enrollment forms and most any documents that a company receives on paper and needs to scan in.

While SharePoint is a an excellent platform for Enterprise Content Management, there are a few challenges in using SharePoint to manage transactional content.

This presentation addresses some of the more obscure (yet complex) out of the box features that can be configured in SharePoint to improve management of transactional content along with some software products from Hershey Technologies that simplify and enhance the user experience.

Statistics

Views

Total Views
499
Views on SlideShare
499
Embed Views
0

Actions

Likes
2
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Introduction slide
  • Transactional DocsUsers need graphical "query builders" that allow them to combine multiple search conditions. Queries should search metadata only, not keywords or content in the document. Queries should return the exact results specified by the user. System should not attempt to "figure out" what the user really wanted, based on ratings or other social algorithms, or removing duplicates, etc. Results displayed in tabular format, with sort default order determined by the user or admin, and allowing ad-hoc sorting by any column.SharePoint DocsUsers find documents using Search. Query Builders is not OOB, but available through 3rd party vendors. Search looks at both metadata and document content. Search tries to be "intelligent" figure out what you really want. Search results are formatted like a Google or Bing search. Search engine decides how to order the results. Results cannot be re-sorted by user. Metadata Navigation supports tablular results with sorting, but is limited in terms of what column types are supported (e.g. single line of text is not supported). However, with large libraries, it only searches through the most recent 1,000 rows or so (fallback queries)
  • Transactional DocsUsers need graphical "query builders" that allow them to combine multiple search conditions. Queries should search metadata only, not keywords or content in the document. Queries should return the exact results specified by the user. System should not attempt to "figure out" what the user really wanted, based on ratings or other social algorithms, or removing duplicates, etc. Results displayed in tabular format, with sort default order determined by the user or admin, and allowing ad-hoc sorting by any column.SharePoint DocsUsers find documents using Search. Query Builders is not OOB, but available through 3rd party vendors. Search looks at both metadata and document content. Search tries to be "intelligent" figure out what you really want. Search results are formatted like a Google or Bing search. Search engine decides how to order the results. Results cannot be re-sorted by user. Metadata Navigation supports tablular results with sorting, but is limited in terms of what column types are supported (e.g. single line of text is not supported). However, with large libraries, it only searches through the most recent 1,000 rows or so (fallback queries)

SPS Phoenix Optimizing SharePoint for Transactional Content Management SPS Phoenix Optimizing SharePoint for Transactional Content Management Presentation Transcript

  • or Transactional Content Optimizing SharePoint for Transactional Content Management
  • » Principal at Hershey Technologies ˃Twitter: @tomcastiglia ˃Email: tcastiglia@hersheytech.com Joined Hershey Tech in 1998 This is my 7th SharePoint Saturday Director of Hershey’s professional services team since 2001 Founding member of San Diego SharePoint User Group (@sanspug) » Founding member of San Diego .NET User Group » » » »
  • » Founded in 1991 » Microsoft Partner » Specialists in ˃ End to End SharePoint Consulting Services ˃ Document Imaging / Scanning ˃ OCR (data and document capture) ˃ ECM / Document Management ˃ BPM / workflow » SharePoint ISV ˃ XenDocs ECM for SharePoint » Follow us on Twitter: @HersheyTech
  • Intuitive Document Query Builder
  • Middleware component (Windows Service) to in integrate content from multi-function scanners, fax servers and reporting apps with SharePoint
  • » Explanation of “Transactional Content Management” (TCM) » Overview of SharePoint features that are relevant to TCM » How to make SharePoint support TCM » Demo of solutions that fill the feature gaps to ensure SharePoint is successful for your transactional content management project ˃ ˃ ˃ ˃ Ad-hoc scanning / document capture into SharePoint Optimizing SharePoint search for large scale TCM deployments Enable collaboration of static, transactional documents Make scanned images and PDF documents a 1st class citizen within SharePoint
  • » Assumptions - I presume that you understand: ˃ Columns (document metadata) ˃ Content Types ˃ Document Libraries » Other topics not covered (just not enough time to include): ˃ ˃ ˃ ˃ Automated Data Capture/OCR Records Management Workflow RBS
  • Web Content SharePoint Rocks at this! Document Collaboration Transactional Documents SharePoint Rocks at this! SharePoint needs a little help here
  • “high-volume throughput of relatively static documents” “content which typically originates outside and organization from external parties – customers or partners-and relies on workflow or business process management (BPM) to drive transactional, back-office business processes.” -Forrester Research
  • » Capturing content from MFPs & Fax servers » Indexing scanned documents is clumsy » Configuring Metadata Taxonomy for Search requires unique expertise » Lacks intuitive metadata query driven document search » Treats scanned images and PDF files as a “2nd class citizen” (compared to MS Office documents)
  • Transactional Documents » » » » » » » Purchase Orders Vendor Invoices Application Forms Insurance claims Student Records Enrollment Forms (Not project based) Collaborative documents » Proposals, reports, spreadsheets, presentations and other documents created and edited by knowledge worker users ˃ Office docs (Word, Excel, PowerPoint) ˃ PDF files » Created and uploaded on an adhoc basis to support day to day operations » (Often project based)
  • Transactional Documents Invoices@mycompany.com Orders@mycompany.com External Systems (AP, claims, etc.) OCR Page Rotation Barcode Rec. Doc Sep. Form Processing Fax Server
  • Transactional Content Collaborative content » Centralized » Often isolated to just one or a few site collections » Decentralized » Documents are often spread throughout many site collections, subsites, libraries and content types » Typically under 5K documents per library. ˃ Document Center or Record Center » Thousands to millions of documents per library
  • Collaboration scenarios Transactional Documents » Navigation » Navigation doesn’t work - too many documents per library » Search via metadata queries only ˃ SiteSubSiteLibraryFolderDocument » Keyword Search ˃ Searches both metadata and document content ˃ Use of social algorithms improve search results (e.g. highly rated documents are returned above other documents) ˃ Ignore document content ˃ Ignore social based algorithms like ratings » Users expect intuitive, graphical query builders to specify precise search conditions against one or more metadata fields.
  • Typical SharePoint search Transactional Document search
  • » This native SharePoint feature does provide a limited query builder … ˃ Allows users to query against specific SharePoint columns and choose various search operators (Equals, At Most, At Least, On, Before, etc.) ˃ Filters document library providing results in a sortable, tabular display.
  • » Doesn’t support text columns » Transactional documents generally need text based columns for fields like InvoiceNumber, PONumber, Ve ndorId, ClaimNumber, etc. » Doesn’t scale well for libraries that exceed the list view threshold (5,000 documents by default)
  • Metadata Columns Crawled Properties Managed Properties Search Results
  • » Configuring Managed Properties in SharePoint Search is more complex than it needs to be. » SharePoint does not provide a robust query builder for users to intuitively query documents (other ECM solutions offer this OOB) » SharePoint formats Search results like a search engine, not like a document management product. » SharePoint treats PDF documents and scanned images as a 2nd class citizen.
  • » Crawled properties are metadata (such as author, title, or subject) that are extracted from SharePoint columns during crawls. » However, this is the internal representation of the metadata. To enable users to search on this metadata, we need to use managed properties that are mapped to the crawled properties.
  • » A new crawled property is created for each new custom column, after… ˃The column is added to at least one list or library ˃The column is populated with a value in at least one item ˃A Full Crawl is performed
  • » All Crawled properties are grouped into various categories. » For Transactional Content Management solutions, we generally care about the “SharePoint” Category, which contains crawled properties that are tied to list columns in SharePoint. » Accessible from Search Service Application: Metadata Properties>Categories
  • » The Naming convention is fully controlled by SharePoint, using this convention: ˃ ows_[internal name of column] » However, spaces or other symbols (.-!@#$%^, etc.) within the internal column name are escaped, such as: Column Internal Name Crawled Property Name InvoiceNumber ows_InvoiceNumber Invoice Number ows_Invoice_x0020_Number Invoice.Number ows_Invoice_x002e_Number Invoice-Number ows_Invoice_x002d_Number
  • » In SP2010, most SharePoint columns gets one crawled property ˃ Managed Metadata Columns get a 2nd crawled property, with a prefix of “ows_taxid” » This extra crawled property is used to store the internal GUID value that is associated with the managed metadata term. For example: Column Name: CostCenter Normal Crawled Property: ows_CostCenter MM Id Crawled Property: ows_taxid_CostCenter
  • » …Allow you to enable standardization in the terms used for searching SharePoint. » …Represent the end-user’s vision of the SP taxonomy (at least with regards to Search) ˃So the name of your managed properties should normally be something intuitive to your end-users
  • » One managed property may be mapped to one or more crawled properties. ˃ Useful in low governance situations where multiple site owners or site collection admins have duplicated site columns using different names (e.g. InvoiceNumber vs ‘Invoice Number’) » One crawled property may be mapped to one or more managed properties ˃ Useful if different applications create their own managed properties, and need to reference the same crawled property.
  • Without Managed Properties Returns 16 items, only 6 of which are related to what I wanted. Included other documents that happen to contain the StudentId value either as text in the document or in some other field (like an Invoice Number, or something else) With managed properties Returns only the 6 correct items
  • Provides an OOB search interface that allows users to select a Managed Property from a drop down list, rather than having to type out the managed property name (e.g. “StudentID:” or “StudentID=“)
  • Use your favorite XML editor (VS 2012)
  • Unlike Crawled Properties (which are always auto-generated by SharePoint… Managed properties can be created in one of three ways…
  • SP 2010 Managed Properties can be created manually by a SharePoint Administrator from the Search Service Application configuration. » SP2010: “Metadata Properties” link » SP2013: “Search Schema” link SP 2013
  • » Click “New Managed Property” link from Metadata Property Mappings ˃ Property Name can contain most characters, except for spaces (but please don’t use special characters) ˃ Based on the selected type, this managed property can only be mapped to crawled properties with the same type. ˃ Add Mapping – Select 1 or more crawled properties to map to this managed property. + ˃ If multiple are selected decide whether to include all values or just the first one found Scopes – preset filter on content – like a global where clause ˃ Reduce storage requirements (“hash”) – option actually works in reverse to what is stated.
  • » » » Property Name - Same as SP2010 Add Mapping - same as in SP2010 Reduce storage requirements (“hash”) option - No longer exists in SP2013 » Many additional settings ˃ Searchable – Enables querying against the content of the managed property ˃ Queryable – Enables querying against the specific managed property ˃ Retrievable – Enable this setting for managed properties that are relevant to present in search results. ˃ Refinable – Can be used as a search refiner ˃ Sortable – ˃ Token Normalization ˃ Complete Matching
  • Automatically generated by custom code or a 3rd party application » For example, Hershey’s XenDocs ECM for SharePoint will validate that a managed property is properly configured or automatically create crawled and managed properties for each column when our web part is configured.
  • Set References to… • Microsoft.Office.Server.dll • Microsoft.Office.Server.Search.dll DLLs Located in: C:Program FilesCommon FilesMicrosoft SharedWeb Server Extensions15ISAPI Initialize the Search Schema… using Microsoft.Office.Server.Search.Administration; public void InitSearchSchema(string url) { SPSite site = new SPSite(url); Schema _searchSchema = new Schema(SearchContext.GetContext(site)); }
  • public void CreateCrawledProperty(PropertySet propSet) { var propSetId = GetPropertySetId(propSet); var category = _searchSchema.AllCategories[propSetId]; category.CreateCrawledProperty(crawledPropertyName, false, propSetId); category.Update(); _crawledProps = _searchSchema.QueryCrawledProperties(string.Empty, MAX_PROPS, Guid.NewGuid(), string.Empty, true).Cast<CrawledProperty>(); }
  • public void CreateManagedProperty() { // Create new Managed Property to existing Crawled Property ManagedPropertyCollection allProperties = _searchSchema.AllManagedProperties; ManagedProperty managedProperty = allProperties.Create(propertyName, dataType); // Map new Managed Property to existing Crawled Property MappingCollection mappings = managedProperty.GetMappings(); Mapping mapping = new Mapping(crawledProperty.Propset, crawledProperty.Name, crawledProperty.VariantType, property.PID); }
  • public static void VerifySettings(ManagedDataType dataType, ManagedProperty property, string fieldDataType, bool enabledForScoping, bool respectPriority) { property.Searchable = true; property.Retrievable = true; property.Sortable = true; property.Queryable = true; property.SortableType = SortableType.Enabled; if (fieldDataType == "SPFieldLookupValueCollection" || fieldDataType == "TaxonomyFieldValueCollection") { property.HasMultipleValues = true; } Continued…
  • Continued… if (dataType == ManagedDataType.Text && property.MaxCharactersInPropertyStoreIndex != 64) { // Reduce Storage requirements by using Hash.. // Yes for Text fields // No for all others property.MaxCharactersInPropertyStoreIndex = 64; } property.RespectPriority = respectPriority; property.EnabledForScoping = enabledForScoping; property.SafeForAnonymous = true; property.TokenNormalization = true; property.Update(); }
  • public static void VerifySettings(ManagedDataType dataType, ManagedProperty property, string fieldDataType, bool enabledForScoping, bool respectPriority) { property.Searchable = true; property.Retrievable = true; property.Sortable = true; property.Queryable = true; property.SortableType = SortableType.Enabled; if (fieldDataType == "SPFieldLookupValueCollection" || fieldDataType == "TaxonomyFieldValueCollection") { property.HasMultipleValues = true; } }
  • Let SharePoint Auto-Generate new managed properties when it crawls
  • » In SharePoint 2010… ˃This feature is off by default, but it can be enabled in your Search Service Application From the Categories list, hover over the SharePoint category, click the drop down arrow and then select the Edit Category option. Select the option to “automatically generate a new managed property for each crawled property…”
  • » In SharePoint 2013… ˃All site columns that contain data will have a managed property auto-generated upon a full crawl ˃This does not happen for list columns ˃This feature cannot be turned off and is not configurable (as far as I can tell) http://technet.microsoft.com/en-us/library/jj613136.aspx
  • Column SharePoint 2010 SharePoint 2013 Name Crawled Property Managed Property Crawled Properties Managed Property FooBar ows_FooBar owsFooBar1 ows_FooBar ows_q_TEXT_FooBar Not mapped FooBarOWSTEXT Foo Bar ows_Foo_x0020_Bar owsFoox0020Bar ows_Foo_x0020_Bar FooBarOWSTEXT Foo_Bar ows_Foo_Bar owsFooBar ows_Foo_Bar ows_q_TEXT_Foo_Bar Not mapped FooBarOWSTEXT Foo-Bar ows_Foo_x002d_Bar owsFoox002dBar ows_Foo-Bar ows_q_TEXT_Foo-Bar Foo.Bar ows_Foo_x002e_Bar owsFoox002eBar ows_Foo.Bar ows_q_TEXT_Foo.Bar Not mapped Foo-BarOWSTEXT Not mapped Foo.BarOWSTEXT The autogenerated names for managed properties are not “end-user friendly” !
  • A vast improvement compared to the native Advanced Search Web Part
  • » MS Office Documents are first 1st class citizens in SharePoint ˃ When office files are opened in Office 2007, 2010 or 2013, users can perform many SharePoint functions on those documents: + Edit document content + Check in/out/discard + See version history + Edit metadata ˃ Preview Thumbnails in SP 2013 » Most other file types, especially PDF files and scanned images are 2nd class citizens ˃ Read only view of document
  • Files typically open in native apps such as Windows Photo Gallery or Adobe Reader » Users cannot edit metadata » If user rotates, reorders or deletes a page, the changes cannot be saved to SP » User cannot annotate pages (e.g. sticky notes, redactions, etc.)
  • A powerful, low cost PDF and imaging viewer for SharePoint
  • Visually search documents with thumbnails and quick previews
  • Search for text within a PDF file (just like Adobe Reader/Acrobat)
  • Edit SharePoint metadata within the viewer for PDF documents and scanned images
  • Adds robust image editing features – annotations, reorder, rotate or delete pages, image cleanup
  • » To leverage SharePoint’s native features for transactional document management… ˃ Extensive upfront planning ˃ Complex configuration (many more steps to configure SP compared to most dedicated document management products) » To make the overall user experience in SharePoint comparable with dedicated Document Management products, plan on: ˃ Lots of custom code ... OR … ˃ 3rd party solutions