Transactional content is comprised of documents that organization receives from external parties, such as vendors, customers and partners. Common examples of transactional documents are vendor invoices, purchase orders from clients, claim forms, application and enrollment forms and most any documents that a company receives on paper and needs to scan in.
While SharePoint is a an excellent platform for Enterprise Content Management, there are a few challenges in using SharePoint to manage transactional content.
This presentation addresses some of the more obscure (yet complex) out of the box features that can be configured in SharePoint to improve management of transactional content along with some software products from Hershey Technologies that simplify and enhance the user experience.
2. » Principal at Hershey Technologies
˃Twitter: @tomcastiglia
˃Email: tcastiglia@hersheytech.com
Joined Hershey Tech in 1998
This is my 7th SharePoint Saturday
Director of Hershey’s professional services team since 2001
Founding member of San Diego SharePoint User Group
(@sanspug)
» Founding member of San Diego .NET User Group
»
»
»
»
3. » Founded in 1991
» Microsoft Partner
» Specialists in
˃ End to End SharePoint Consulting Services
˃ Document Imaging / Scanning
˃ OCR (data and document capture)
˃ ECM / Document Management
˃ BPM / workflow
» SharePoint ISV
˃ XenDocs ECM for SharePoint
» Follow us on Twitter: @HersheyTech
6. » Explanation of “Transactional Content Management”
(TCM)
» Overview of SharePoint features that are relevant to
TCM
» How to make SharePoint support TCM
» Demo of solutions that fill the feature gaps to ensure
SharePoint is successful for your transactional content
management project
˃
˃
˃
˃
Ad-hoc scanning / document capture into SharePoint
Optimizing SharePoint search for large scale TCM deployments
Enable collaboration of static, transactional documents
Make scanned images and PDF documents a 1st class citizen within SharePoint
7. » Assumptions - I presume that you understand:
˃ Columns (document metadata)
˃ Content Types
˃ Document Libraries
» Other topics not covered (just not enough time to
include):
˃
˃
˃
˃
Automated Data Capture/OCR
Records Management
Workflow
RBS
8. Web Content
SharePoint Rocks at this!
Document Collaboration
Transactional Documents
SharePoint Rocks at this!
SharePoint needs a little help here
9. “high-volume throughput of
relatively static documents”
“content which typically originates outside and
organization from external parties – customers
or partners-and relies on workflow or business
process management (BPM) to drive
transactional, back-office business processes.”
-Forrester Research
10. » Capturing content from MFPs & Fax servers
» Indexing scanned documents is clumsy
» Configuring Metadata Taxonomy for Search requires
unique expertise
» Lacks intuitive metadata query driven document search
» Treats scanned images and PDF files as a “2nd class
citizen” (compared to MS Office documents)
11. Transactional Documents
»
»
»
»
»
»
»
Purchase Orders
Vendor Invoices
Application Forms
Insurance claims
Student Records
Enrollment Forms
(Not project
based)
Collaborative documents
» Proposals, reports, spreadsheets,
presentations and other
documents created and edited by
knowledge worker users
˃ Office docs (Word, Excel, PowerPoint)
˃ PDF files
» Created and uploaded on an adhoc basis to support day to day
operations
» (Often project based)
13. Transactional Content
Collaborative content
» Centralized
» Often isolated to just
one or a few site
collections
» Decentralized
» Documents are often
spread throughout many
site collections, subsites, libraries and content
types
» Typically under 5K
documents per library.
˃ Document Center or Record
Center
» Thousands to millions
of documents per
library
14. Collaboration scenarios
Transactional Documents
» Navigation
» Navigation doesn’t work - too
many documents per library
» Search via metadata queries
only
˃ SiteSubSiteLibraryFolderDocument
» Keyword Search
˃ Searches both metadata and
document content
˃ Use of social algorithms improve
search results (e.g. highly rated
documents are returned above
other documents)
˃ Ignore document content
˃ Ignore social based algorithms like ratings
» Users expect intuitive, graphical
query builders to specify precise
search conditions against one or
more metadata fields.
16. » This native SharePoint feature does provide a
limited query builder …
˃ Allows users to query against specific SharePoint columns and choose
various search operators (Equals, At Most, At Least, On, Before, etc.)
˃ Filters document library providing results in a sortable, tabular display.
17. » Doesn’t support text columns
» Transactional documents
generally need text based
columns for fields like
InvoiceNumber, PONumber, Ve
ndorId, ClaimNumber, etc.
» Doesn’t scale well for libraries
that exceed the list view
threshold (5,000 documents by
default)
19. » Configuring Managed Properties in SharePoint
Search is more complex than it needs to be.
» SharePoint does not provide a robust query builder
for users to intuitively query documents (other
ECM solutions offer this OOB)
» SharePoint formats Search results like a search
engine, not like a document management product.
» SharePoint treats PDF documents and scanned
images as a 2nd class citizen.
20. » Crawled properties are metadata (such as
author, title, or subject) that are extracted from
SharePoint columns during crawls.
» However, this is the internal representation of
the metadata. To enable users to search on this
metadata, we need to use managed properties
that are mapped to the crawled properties.
21. » A new crawled property is created for each new custom
column, after…
˃The column is added to at least one list or library
˃The column is populated with a value in at least one item
˃A Full Crawl is performed
22. » All Crawled properties are grouped into
various categories.
» For Transactional Content Management
solutions, we generally care about the
“SharePoint” Category, which contains
crawled properties that are tied to list
columns in SharePoint.
» Accessible from Search Service
Application: Metadata
Properties>Categories
23. » The Naming convention is fully controlled by
SharePoint, using this convention:
˃ ows_[internal name of column]
» However, spaces or other symbols (.-!@#$%^, etc.)
within the internal column name are escaped, such
as:
Column Internal Name
Crawled Property Name
InvoiceNumber
ows_InvoiceNumber
Invoice Number
ows_Invoice_x0020_Number
Invoice.Number
ows_Invoice_x002e_Number
Invoice-Number
ows_Invoice_x002d_Number
24. » In SP2010, most SharePoint columns gets one crawled
property
˃ Managed Metadata Columns get a 2nd crawled property, with a
prefix of “ows_taxid”
» This extra crawled property is used to store the internal
GUID value that is associated with the managed metadata
term. For example:
Column Name: CostCenter
Normal Crawled Property: ows_CostCenter
MM Id Crawled Property: ows_taxid_CostCenter
25. » …Allow you to enable standardization in the
terms used for searching SharePoint.
» …Represent the end-user’s vision of the SP
taxonomy (at least with regards to Search)
˃So the name of your managed properties should
normally be something intuitive to your end-users
26. » One managed property may be mapped to one or more crawled
properties.
˃ Useful in low governance situations where multiple site owners or site
collection admins have duplicated site columns using different names
(e.g. InvoiceNumber vs ‘Invoice Number’)
» One crawled property may be mapped to one or more managed
properties
˃ Useful if different applications create their own managed
properties, and need to reference the same crawled property.
27. Without Managed Properties
Returns 16 items, only 6
of which are related to
what I wanted.
Included other
documents that happen
to contain the StudentId
value either as text in
the document or in
some other field (like an
Invoice Number, or
something else)
With managed properties
Returns only
the 6 correct
items
28. Provides an OOB
search interface that
allows users to select a
Managed Property
from a drop down
list, rather than having
to type out the
managed property
name (e.g.
“StudentID:” or
“StudentID=“)
31. Unlike Crawled Properties (which are
always auto-generated by SharePoint…
Managed properties can be created in one
of three ways…
32. SP 2010
Managed Properties can be created
manually by a SharePoint Administrator
from the Search Service Application
configuration.
» SP2010: “Metadata
Properties” link
» SP2013: “Search Schema”
link
SP 2013
33. » Click “New Managed
Property” link from Metadata
Property Mappings
˃ Property Name can contain most characters, except for
spaces (but please don’t use special characters)
˃ Based on the selected type, this managed property can only
be mapped to crawled properties with the same type.
˃ Add Mapping – Select 1 or more crawled properties to map
to this managed property.
+
˃
If multiple are selected decide whether to include all
values or just the first one found
Scopes – preset filter on content – like a global where clause
˃ Reduce storage requirements (“hash”) –
option actually works in reverse to what is
stated.
34. »
»
»
Property Name - Same as SP2010
Add Mapping - same as in SP2010
Reduce storage requirements (“hash”) option - No longer
exists in SP2013
» Many additional settings
˃ Searchable – Enables querying against the content
of the managed property
˃ Queryable – Enables querying against the specific
managed property
˃ Retrievable – Enable this setting for managed
properties that are relevant to present in search
results.
˃ Refinable – Can be used as a search refiner
˃ Sortable –
˃ Token Normalization
˃ Complete Matching
35. Automatically generated by custom
code or a 3rd party application
» For example, Hershey’s XenDocs
ECM for SharePoint will validate
that a managed property is
properly configured or
automatically create crawled and
managed properties for each
column when our web part is
configured.
36. Set References to…
• Microsoft.Office.Server.dll
• Microsoft.Office.Server.Search.dll
DLLs Located in:
C:Program FilesCommon FilesMicrosoft SharedWeb Server Extensions15ISAPI
Initialize the Search Schema…
using Microsoft.Office.Server.Search.Administration;
public void InitSearchSchema(string url)
{
SPSite site = new SPSite(url);
Schema _searchSchema = new Schema(SearchContext.GetContext(site));
}
37. public void CreateCrawledProperty(PropertySet propSet)
{
var propSetId = GetPropertySetId(propSet);
var category = _searchSchema.AllCategories[propSetId];
category.CreateCrawledProperty(crawledPropertyName, false, propSetId);
category.Update();
_crawledProps = _searchSchema.QueryCrawledProperties(string.Empty,
MAX_PROPS,
Guid.NewGuid(),
string.Empty,
true).Cast<CrawledProperty>();
}
38. public void CreateManagedProperty()
{
// Create new Managed Property to existing Crawled Property
ManagedPropertyCollection allProperties = _searchSchema.AllManagedProperties;
ManagedProperty managedProperty = allProperties.Create(propertyName, dataType);
// Map new Managed Property to existing Crawled Property
MappingCollection mappings = managedProperty.GetMappings();
Mapping mapping = new Mapping(crawledProperty.Propset,
crawledProperty.Name,
crawledProperty.VariantType,
property.PID);
}
43. » In SharePoint 2010…
˃This feature is off by default, but it can be enabled in
your Search Service Application
From the Categories list, hover over the SharePoint
category, click the drop down arrow and then select the Edit
Category option.
Select the option to “automatically generate a new managed
property for each crawled property…”
44. » In SharePoint 2013…
˃All site columns that contain data will have a managed property
auto-generated upon a full crawl
˃This does not happen for list columns
˃This feature cannot be turned off and is not configurable (as far
as I can tell)
http://technet.microsoft.com/en-us/library/jj613136.aspx
45. Column
SharePoint 2010
SharePoint 2013
Name
Crawled Property
Managed Property
Crawled Properties
Managed Property
FooBar
ows_FooBar
owsFooBar1
ows_FooBar
ows_q_TEXT_FooBar
Not mapped
FooBarOWSTEXT
Foo Bar
ows_Foo_x0020_Bar
owsFoox0020Bar
ows_Foo_x0020_Bar
FooBarOWSTEXT
Foo_Bar
ows_Foo_Bar
owsFooBar
ows_Foo_Bar
ows_q_TEXT_Foo_Bar
Not mapped
FooBarOWSTEXT
Foo-Bar
ows_Foo_x002d_Bar
owsFoox002dBar ows_Foo-Bar
ows_q_TEXT_Foo-Bar
Foo.Bar
ows_Foo_x002e_Bar
owsFoox002eBar ows_Foo.Bar
ows_q_TEXT_Foo.Bar
Not mapped
Foo-BarOWSTEXT
Not mapped
Foo.BarOWSTEXT
The autogenerated
names for
managed
properties are
not “end-user
friendly” !
47. » MS Office Documents are first 1st class citizens in
SharePoint
˃ When office files are opened in Office 2007, 2010 or 2013, users can
perform many SharePoint functions on those documents:
+ Edit document content
+ Check in/out/discard
+ See version history
+ Edit metadata
˃ Preview Thumbnails in SP 2013
» Most other file types, especially PDF files and
scanned images are 2nd class citizens
˃ Read only view of document
48. Files typically open in native
apps such as Windows Photo
Gallery or Adobe Reader
» Users cannot edit
metadata
» If user rotates, reorders or deletes a
page, the changes
cannot be saved to SP
» User cannot annotate
pages (e.g. sticky
notes, redactions, etc.)
54. » To leverage SharePoint’s native features for
transactional document management…
˃ Extensive upfront planning
˃ Complex configuration (many more steps to configure SP compared to
most dedicated document management products)
» To make the overall user experience in
SharePoint comparable with dedicated
Document Management products, plan on:
˃ Lots of custom code ... OR …
˃ 3rd party solutions
Editor's Notes
Introduction slide
Transactional DocsUsers need graphical "query builders" that allow them to combine multiple search conditions. Queries should search metadata only, not keywords or content in the document. Queries should return the exact results specified by the user. System should not attempt to "figure out" what the user really wanted, based on ratings or other social algorithms, or removing duplicates, etc. Results displayed in tabular format, with sort default order determined by the user or admin, and allowing ad-hoc sorting by any column.SharePoint DocsUsers find documents using Search. Query Builders is not OOB, but available through 3rd party vendors. Search looks at both metadata and document content. Search tries to be "intelligent" figure out what you really want. Search results are formatted like a Google or Bing search. Search engine decides how to order the results. Results cannot be re-sorted by user. Metadata Navigation supports tablular results with sorting, but is limited in terms of what column types are supported (e.g. single line of text is not supported). However, with large libraries, it only searches through the most recent 1,000 rows or so (fallback queries)
Transactional DocsUsers need graphical "query builders" that allow them to combine multiple search conditions. Queries should search metadata only, not keywords or content in the document. Queries should return the exact results specified by the user. System should not attempt to "figure out" what the user really wanted, based on ratings or other social algorithms, or removing duplicates, etc. Results displayed in tabular format, with sort default order determined by the user or admin, and allowing ad-hoc sorting by any column.SharePoint DocsUsers find documents using Search. Query Builders is not OOB, but available through 3rd party vendors. Search looks at both metadata and document content. Search tries to be "intelligent" figure out what you really want. Search results are formatted like a Google or Bing search. Search engine decides how to order the results. Results cannot be re-sorted by user. Metadata Navigation supports tablular results with sorting, but is limited in terms of what column types are supported (e.g. single line of text is not supported). However, with large libraries, it only searches through the most recent 1,000 rows or so (fallback queries)