Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
PDF AssociationTechnical Conference June 18-19 2013

PDF and Microsoft Sharepoint
Hurdles to Overcome

Neil Pitman
Aquafor...
Objective

PDF as a Sharepoint “First Class Citizen”
 Objectives
 Sharepoint Overview
 PDF Capture
 PDF Search

Agenda

 iFilters
 Handling Image and Mixed Mode PDFs

 ...
Microsoft Sharepoint Server - 125 million licenses sold
Sharepoint to be a natural target for PDF storage

 What is Share...
 Sharepoint Editions (2010, 2013)

Sharepoint
Overview

 Foundation
 Standard
 Enterprise

 Office 365 / Sharepoint O...
Sharepoint
Architecture
Overview



MS Web-based (IIS)



MS Office Integration



SQL Server Storage



List or libra...
 Options

PDF Capture
for Sharepoint







Sharepoint UI
Acrobat XI
Load Tools
Custom Code
Workflow & Event Receive...
Acrobat XI
Sharepoint
Integration

http://www.adobe.com/uk/products/acrobat/pdf-version-control-sharepoint-integration.htm...
PDF Search in
Sharepoint Overview

 Item 1
 Item 2
iFilters scan documents for text and attributes – primarily in support
of Microsoft Search technologies.

iFilter
Architec...
iFilter
Configuration

 Architecture
 Code Sample
 Suppliers
 Issues
iFilter Explorer

PDF Search in
Sharepoint :
iFilters

 iFilter Explorer
https://gist.github.com/jimschubert/1473904

Using iFilters
directly in
Code

StringBuilder Buffer=new StringBuilder();
st...
iFilter Test
Bookmark

PDF
Attachment

XMP
Metadata
Text

Image/OCR Text
Dictionary
Metadata

Annotation
Adobe
iFilter

FoxIt
iFilter

Microsoft
Format Handler

Body Text

iFilter Test
Results

PDFLib
iFilter




Bookmarks

...
Classify :





Dealing with
Image and
Mixed-Mode
PDFs

Image-Only
Born-Digital
Part Image-Only, Part Born-Digital
Pre...
 Objectives:
 Ensure Full Searchability
 Avoid Text to Image Processing

 Process :

Dealing with
Image and
Mixed-Mode...
 Text Search vs Metadata Search
 Crawled vs Managed Properies
 Review Requirements

 Dictionary Metadata
 XMP Metadat...
Crawled vs Managed Properies

PDF Metadata
In Sharepoint
PDF Metadata
In Sharepoint :
Using Event
Receivers

 Event Receivers can enable Metadata assignment
Entity Extraction

PDF Metadata
In Sharepoint
Configuration

 Sharepoint 2010
 Sharepoint 2013
 Missing icon and iFilter

Sharepoint
2010 PDF
Configuration

http://www.adobe.com/devnet-docs/acrobatetk/tools/AdminGuid...
Sharepoint
2010 PDF
Configuration
 Default for PDF : X-Download-Options: noopen' added to HTTP
Response Header

Sharepoint
PDF
Configuration
 PDF Format Handler Support
 Currently no iFilter Support for PDF !?!?!!

Sharepoint
2013 and PDF
Configuration
Inline Viewing PDF in Sharepoint 2013

Sharepoint
2013 and PDF
Configuration

http://stevemannspath.blogspot.co.uk/2012/10...
 Microsoft Sharepoint Server - 125 million licenses sold
 Sharepoint to be a natural target for PDF storage
 PDF as a S...
Upcoming SlideShare
Loading in …5
×

Pdf and microsoft share point hurdles to overcome

510 views

Published on

Microsoft SharePoint the natural target for PDF storage with 125 million Microsoft SharePoint Server licenses sold. An overview of PDF and SharePoint - PDF as a SharePoint "First Class Citizen".

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Pdf and microsoft share point hurdles to overcome

  1. 1. PDF AssociationTechnical Conference June 18-19 2013 PDF and Microsoft Sharepoint Hurdles to Overcome Neil Pitman Aquaforest Limited Version 1.120613
  2. 2. Objective PDF as a Sharepoint “First Class Citizen”
  3. 3.  Objectives  Sharepoint Overview  PDF Capture  PDF Search Agenda  iFilters  Handling Image and Mixed Mode PDFs  PDF Metadata  Dictionary, XMP and Entity Extraction  Configuration  Sharepoint 2010 , 2013  Summary
  4. 4. Microsoft Sharepoint Server - 125 million licenses sold Sharepoint to be a natural target for PDF storage  What is Sharepoint?  On-Premise and Cloud-based Collaboration & Document Management Platform Sharepoint Overview  Origin - 2001  Usage  Focus on MS Office Documents  Typically distributed capture
  5. 5.  Sharepoint Editions (2010, 2013) Sharepoint Overview  Foundation  Standard  Enterprise  Office 365 / Sharepoint Online  Ecosystem  Partner Products  Office / Sharepoint Marketplace
  6. 6. Sharepoint Architecture Overview  MS Web-based (IIS)  MS Office Integration  SQL Server Storage  List or library data in a site collection is stored in a SQL Server database table, which uses queries, indexes and locks to maintain overall performance, sharing, and accuracy.  Filtered views with column indexes (and other operations) create database queries that identify a subset of columns and rows and return this subset to your computer.  Thresholds and limits help throttle operations and balance resources for many simultaneous users.  Privileged developers can use object model overrides to temporarily increase thresholds and limits for custom applications.  Administrators can specify dedicated time windows for all users to do unlimited operations during off-peak hours.  Information workers can use appropriate views, styles, and page limits to speed up the display of data on the page. Microsoft Technology Stack      Windows Server 2008/12 Internet Information Server (IIS) .Net Framework SQL Server MS Office
  7. 7.  Options PDF Capture for Sharepoint      Sharepoint UI Acrobat XI Load Tools Custom Code Workflow & Event Receivers WebRequest request = WebRequest.Create(destUrl); request.Credentials = CredentialCache.DefaultCredentials; request.Method = "PUT"; byte[] buffer = new byte[1024]; using (Stream stream = request.GetRequestStream()) using (MemoryStream ms = new MemoryStream(fileBytes)) { for (int i = ms.Read(buffer, 0, buffer.Length); i > 0; i = ms.Read(buffer, 0, buffer.Length)) { stream.Write(buffer, 0, i); } } WebResponse response = request.GetResponse(); response.Close(); Logging.Log("Upload successful");
  8. 8. Acrobat XI Sharepoint Integration http://www.adobe.com/uk/products/acrobat/pdf-version-control-sharepoint-integration.html
  9. 9. PDF Search in Sharepoint Overview  Item 1  Item 2
  10. 10. iFilters scan documents for text and attributes – primarily in support of Microsoft Search technologies. iFilter Architecture
  11. 11. iFilter Configuration  Architecture  Code Sample  Suppliers  Issues
  12. 12. iFilter Explorer PDF Search in Sharepoint : iFilters  iFilter Explorer
  13. 13. https://gist.github.com/jimschubert/1473904 Using iFilters directly in Code StringBuilder Buffer=new StringBuilder(); string PDFFile = @"C:devPDF Conferences.pdf"; FilterCode f=new FilterCode(); f.GetTextFromDocument(PDFFile, ref Buffer); Console.WriteLine(Buffer); [DllImport("query.dll", SetLastError = true, CharSet = CharSet.Unicode)] static extern int LoadIFilter(string pwcsPath, [MarshalAs(UnmanagedType.IUnknown)] object pUnkOuter, ref IFilter ppIUnk); public void GetTextFromDocument(string Path, ref StringBuilder Buffer) { IFilter filter = null; int hresult; IFilterReturnCodes rtn; // Initialize the return buffer to 64K. Buffer = new StringBuilder(64 * 1024); // Try to load the filter for the path given. hresult = LoadIFilter(Path, new IntPtr(0), ref filter); if (hresult == 0) { IFILTER_FLAGS uflags; // Init the filter provider. rtn = filter.Init( IFILTER_INIT.IFILTER_INIT_CANON_PARAGRAPHS | IFILTER_INIT.IFILTER_INIT_CANON_HYPHENS | IFILTER_INIT.IFILTER_INIT_CANON_SPACES | IFILTER_INIT.IFILTER_INIT_APPLY_INDEX_ATTRIBUTES | IFILTER_INIT.IFILTER_INIT_INDEXING_ONLY, 0, new IntPtr(0), out uflags); if (rtn == IFilterReturnCodes.S_OK) { STAT_CHUNK statChunk;
  14. 14. iFilter Test Bookmark PDF Attachment XMP Metadata Text Image/OCR Text Dictionary Metadata Annotation
  15. 15. Adobe iFilter FoxIt iFilter Microsoft Format Handler Body Text iFilter Test Results PDFLib iFilter   Bookmarks  Dictionary Metadata       Annotations     XMP Metadata    PDF Attachment  *      
  16. 16. Classify :     Dealing with Image and Mixed-Mode PDFs Image-Only Born-Digital Part Image-Only, Part Born-Digital Previously OCRed
  17. 17.  Objectives:  Ensure Full Searchability  Avoid Text to Image Processing  Process : Dealing with Image and Mixed-Mode PDFs  Capture Time?  Scheduled In-Place?
  18. 18.  Text Search vs Metadata Search  Crawled vs Managed Properies  Review Requirements  Dictionary Metadata  XMP Metadata  Entity Extraction PDF Metadata In Sharepoint  Consider Automation
  19. 19. Crawled vs Managed Properies PDF Metadata In Sharepoint
  20. 20. PDF Metadata In Sharepoint : Using Event Receivers  Event Receivers can enable Metadata assignment
  21. 21. Entity Extraction PDF Metadata In Sharepoint
  22. 22. Configuration  Sharepoint 2010  Sharepoint 2013
  23. 23.  Missing icon and iFilter Sharepoint 2010 PDF Configuration http://www.adobe.com/devnet-docs/acrobatetk/tools/AdminGuide/Acrobat_Reader_IFilter_configuration.pdf
  24. 24. Sharepoint 2010 PDF Configuration
  25. 25.  Default for PDF : X-Download-Options: noopen' added to HTTP Response Header Sharepoint PDF Configuration
  26. 26.  PDF Format Handler Support  Currently no iFilter Support for PDF !?!?!! Sharepoint 2013 and PDF Configuration
  27. 27. Inline Viewing PDF in Sharepoint 2013 Sharepoint 2013 and PDF Configuration http://stevemannspath.blogspot.co.uk/2012/10/sharepoint-2013-pdf-preview-in-search.html http://stevemannspath.blogspot.co.uk/2013/04/sharepoint-2013-pdf-support-and.html
  28. 28.  Microsoft Sharepoint Server - 125 million licenses sold  Sharepoint to be a natural target for PDF storage  PDF as a Sharepoint “First Class Citizen” Summary Contact : neil.pitman@aquaforest.com

×