Getting information in and out of share point final


Published on

Learn how to deliver your document-based data into SharePoint – and how to effectively manage your large volumes of active content and govern old inactive information once it’s in there. This presentation will cover how to convert paper to electronic information that can easily be routed into and accessed via SharePoint. Also discover how you can provide storage optimization to SharePoint – even with massive amounts of data. Learn ways to enable secure access to active and inactive SharePoint content. The presentation includes discussions on how to explore and manage content throughout its lifecycle, while supporting regulatory and corporate requirements.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • ns are using SharePoint, 17,000 plus companies and over 120 million end-users. Not many other apps out there with that many eyes looking and fingers tapping on them… oh well maybe one other one… Office. Oh hey, isn’t that seamlessly connected to SharePoint? Basically SharePoint connects the people with the content, or more specifically it connects the most ubiquitous content making solution with the most ubiquitous solution for storing and accessing content, and I mean lots of content, sometimes too much content.And as SharePoint adoption continues to drive forwards we’re seeing a lot of organizations leverage it far more now than ever before – it has in some ways become the central point for information access, meaning, it’s where we’re now trying to put everything.
  • So SharePoint is going to mean big content problems… I personally blame SharePoint for all of this content that EMC is going to have to store in EMC hardware and manage in EMC archives… it’s terrible.But let’s seriously look at the top six things SharePoint is being used for; all of these are really going to mean more content in content repositories… and here this one, File share replacement is my favorite. I mean for just one second think about all of the content living, in a relatively disorganized fashion, in file shares on all the network drives in organizations around the world. I mean file shares were SharePoint until SharePoint came around, and many of us are planning on porting ALL of this content into SharePoint… Show of has here: how many have or are considering getting rid of file shares altogether and moving that content to SharePoint?So what am I alluding to… [NEXT SLIDE]
  • Many industries rely heavily on paper documents to communicate and deliver services to customers. Documents are received as paper files and routed through complex workflows to make important business decision. Even when business processes are not involved, paper remains an important part of a business’s compliance or eDiscovery initiative. Unfortunately, working with large volumes of paper documents and files presents several challenges:Physical Storage of documents carries significant risk of physical lossAdd the risk of not being able to locate misfiled or lost documentsThe cost of managing documents in paper form is clearSome important facts regarding paper documents. The typical organization…Spends $20 in labor to file each documentSpends $120 in labor searching for each misfiled documentLoses one out of every 20 documentsSpends 25 hours recreating each lost documentSpends $8 to process every invoice—70 percent (or $5.60) of which is related to document handlingSources: PricewaterhouseCoopers and IAPP (International Accounts Payable Professionals)
  • If we look at a recent survey conducted by Enterprise Strategy Group (ESG), corporations ranked their most considerations over the next 12-18 months when considering IT investment. And what we see if reduction in operational costs and business process improvement were the two top reasons for justifying technology investment.
  • And if we look at those two key reasons, plus driver surrounding, “improved security and risk management”, document capture addresses three of the top four considerations directly:Reduced operational costs: Capture is able to reduce paper storage and processing costs.Business process improvement: By transforming paper into electronic content, organizations can streamline paper-based business processes.Risk management: By applying consistent business processes and reporting throughout the capture and business process, organizations can better manage their important data and improve the consistency with which business decisions are made.
  • There are three key considerations that we’d like to focus on which will help you understand why organizations invest in “intelligent enterprise capture” solutions. The first consideration is to “capture anything, from anywhere”. By that we mean capturing documents wherever they are received within the organization and capturing documents in whatever form they are received.
  • Effective intelligent enterprise capture solutions support the ability to capture documents wherever they are received. First, intelligent enterprise capture solutions supports high-volume centralized capture environments. Even in distributed organizations, organizations must be able to capture very high volumes of incoming documents in a mailroom environmentNote to Presenter: Click now in Slide Show mode for animation.Second, to accelerate document capture, an intelligent enterprise capture solution allows front-office workers to capture documents using scanners on their desktops or shared capture devices, such as network scanners or multi-function peripherals (MFPs)Note to Presenter: Click now in Slide Show mode for animation.Lastly, a true enterprise capture solution must support the ability to capture documents throughout the organization, from branch offices, distributed offices, and even support the ability to allow partners or customers to capture documents.
  • To support these diverse requirements requires support of a variety of devices. Intelligent enterprise capture solutions provide a platform to capture from anything and anywhere including:High-speed scanners to support centralized batch capture requirementsIntegration with fax servers to capture information received by faxSupport for ad-hoc capture through both desktop attached scanners……as well as browser-based support to enable capture from diverse locationsSupport for MFP to allow the use of existing hardware to support entire workgroups or departmentsAnd e-mail integration to support the automatic capture of paper-like documents that are received as e-mail messages or attachments (e.g., Acrobat PDF documents)
  • The second consideration is to “intelligently connect document information to business systems”. By that we mean applying intelligence and technology to automate the transformation of documents into business-ready data. And once the data is available, intelligently connecting it with downstream systems that can process and manage this information.
  • By applying intelligent automation, intelligent enterprise capture solutions can dramatically improve capture processes automated document classification leverages several technologies to automate document sortingNote to Presenter: Click now in Slide Show mode for animation.By using both graphic identification and intelligent text-based identification, documents are automatically identified, dramatically reducing or in some cases eliminating document preparation tasks.
  • Even more significant is the value that intelligent data extraction provides, transforming paper documents into electronic business information. With intelligent enterprise capture solutions, zonal and freeform technologies enable data extraction from all document types. The data is automatically validated using set business rules, and performing business validations against business systems and databases, ensuring the data is extracted correctly. The result is a significant cost reduction in terms of labor, and a faster, more reliable business process.
  • Intelligence goes beyond identifying documents and data on documents. Intelligent enterprise capture systems must be able to connect this information – including both the document image and the extracted business data – with the business systems that need to manage and process this information.Captiva integrates with all of the most common enterprise content management systems, and your most critical business systems and processes. Through out of the box connectors, and by providing a service-oriented architecture, Captiva can both receive documents and return images and/or business data using Web Services.
  • Captiva transforms paper documents into electronic information that can be stored and managed in a variety of different systems. And if we look at the various systems Captiva can connect to, Captiva obviously integrates tightly with EMC Documentum and Documentum xCP, allowing you to control where information is stored and what happens to the documents when they are delivered. When integrating, Captiva delivers:Electronic images, that can be stored in specific locations based on document content or business rulesDocument metadata, that can be used to find documents within a large repositoryExtracted data, that can be used to execute business processesTrigger Processes, so that Captiva can facilitate fully automated business processesNote to Presenter: Click now in Slide Show mode for animation.But Captiva’s integration with business systems goes well beyond integration with Documentum. Captiva can integrate with a variety of systems, including other leading ECM systems, business systems such as SAP and Oracle, Microsoft SharePoint, and many other business systems. And with each of these systems, Captiva’s integrates feature the same ability to store images, metadata and extracted data, and the ability to trigger workflows or business processes within the business system
  • The final consideration is to support “mission-critical enterprise scalability requirements”. That includes the ability to scale and provide availability to support mission-critical processes that rely upon paper documents. The ability scale to address complex, high-volume requirements throughout worldwide organizations. And allow customers to quickly address capture requirements in various lines of businesses and departments throughout the larger organization.
  • In March 2010, EMC commissioned Wipro to perform a lab-based competitive benchmarking of the EMC Captiva intelligent capture solution along with the equivalent product suite of a leading competitor. The objective was to assess and compare the ability of each capture solution to meet the requirements of today’s business and IT environments.Wipro, an integrator familiar with building capture solutions with EMC Captiva and several other systems, performed a complete benchmark of EMC Captiva and a leading competitor and found that EMC Captiva was superior in a number of different measures including performance, manageability, scalability, and modularity. When performing a lab based test, Wipro found the Captiva solution to be 2.7 faster than the competitor when performing end-to-end capture.
  • An important trend in capture today is the desire to consolidate capture applications for a number of different departments onto a single platform, dramatically reducing ongoing maintenance costs and simplifying deployment. Rather than maintaining a number of different systems throughout the enterprise, EMC Captiva provides capabilities that make it suitable forMailroom operations, including support for very high document capture volumes.Invoice capture, including advanced support for intelligently extracting data from less-structured documents.New accounts applications, including support for capture at distributed locations, such as branch offices. And many other applications throughout the enterprise.
  • So now that we’ve tidied up our physical content and are managing it in SharePoint… [NEXT SLIDE – garbage pile]
  • We may end up with something that looks a little like this.SharePoint can turn into a content repository with a bunch of really good important content at the top…. But it can become difficult to find and manage some of the older content, when we get to certain volume levels, because we’ve done things like scanned a few thousand documents and imported those as image files into SharePoint. Reality is, by importing massive volumes of large file content can hamper performance of our deployment so we need to consider how we are going to now govern all of this content.My point is that the resultant problem of not addressing the “end stage” of the typical content lifecycle within SharePoint is that the best content can get buried and lost amongst content that is stale, old, obsolete, or worst of all – inaccurate. And the catch is that we often have to keep a lot of this business critical content in line with long term preservation requirements, or, information governance requirements.
  • To further nutshell this down, [CLICK] when we’re talking about governance and SharePoint we need to understand two sides of the same coin. We need to govern the people first, so making sure we have control factors in place that best manage who can create a new SharePoint site and how they can fill it with content to ensure maximum usability and so things don’t fly off the rails in terms of overloading the deployment.The second thing we need to govern is the content itself, and that’s really what I aim to address today.So with that what should we be considering? [NEXT SLIDE - Pains]
  • Now I wanted to outline the big pains. For the most part organizational pain points can be divided into two camps, IT and Compliance. IT feels those pains more associated with the operational issues driven by large SharePoint deployments: things like exponential information growth, and where they should be storing it all, how they should be controlling its growth as line-of-business workers go out and create new sites unbeknownst to IT, ensuring rapid search times and not so long back-up times.Now the compliance folks are a different breed, their focus isn’t what they can do for the infrastructure, it’s what the infrastructure can do for them! They want to make sure the content is under control, that all regulatory obligations are being met, that they are mitigating any unnecessary risk and more so, that they are prepared when legal action strikes!But these two are joined in their misery, as they both ultimately desire information governance – one wants to govern the people creating the content, the other govern the content created by the people. They both strive for operational efficiencies, with one group looking to ensure the infrastructure is performing at the required levels and the other wanting to make sure the content is being centrally managed against a unified policy. In terms of End-user transparency. The IT folks want the end-users to feel as though nothing has changed in their experience and the compliance folk want unfettered access to content when required. And lastly, everyone wants reduced costs – costs attributed to down time and admin time, and those costs accorded to legal bills and fines.
  • So I’ve painted a pretty obvious picture here around SharePoint, what we’re using it for and more specifically some of the issues we’re going to have to manage and overcome. As mentioned my plan is to discuss information governance for SharePoint and for the next few slides I hope to outline what we should be thinking of around this topic. So here we have what I would consider a short list of the bigger things we should be considering as we continue to use and in some cases abuse our new favorite solution, in the end we should have a better understanding of what we need to be doing in order to be taking part in Good Information Governance.What are your most immediate needs?How much content are you planning to hold in SharePoint? And what’s your budget? What type of content are you trying to capture?What are your compliance requirements?Are you going to break the “experience”?Do you have other content outside of SharePoint that needs to be archived?
  • What are your most immediate needs?So what are your most immediate needs? Well for each organization it would be fair to say, needs are all pretty unique, at least in my experience. I mean at a high level we all need the same things, but as we delve down into the minutia of individual organizations requirements can get pretty specific. In terms of Info Gov I generally see a few key areas in which to start.I’d say the biggest request I get , and have gotten in the past 3 years is to get content out of SQL. As the amount of information grows in SharePoint we can quickly run into operational issues with slow performance, and scalability.Another classic concern specific to my line of work is around old content in SharePoint. Although we’ll get to specifics later, there are a lot of folks out there with lots of sites that do very little but take up space.And lastly, we live in a litigious nation, people sue people, it’s fun. Well information can be one of two things, your saviour, or your undoing. You need to ensure fast access to content, all of the content, all of the time.In some cases, cases that I am particularly fond of, we see a requirement for all of this.The bottom line is that we all have immediate needs, and we need a solution that is flexible enough to address these needs as they arise, either individually or frighteningly, all at once.
  • So we’ve come to the agreement that we have lots of information, there’s no question about that… [CLICK] we have our word, PowerPoint and excel docs, email, all of that stuff we have in file shares and network drives, we have dynamic content like wikis and blogs and of course we have physical content… LOTS of physical content. And we’re trying to get all of this content into SharePoint so we can leverage it in a nice orderly fashion.Ideally we want to manage all of our information in the same way and in the same place as the rest of our content, so we can leverage centrally managed ecosystem under a set of unified policies. But there are other considerations we must make in terms of the type of content we are trying to manage…
  • After I ask customers how much content they have and what type it is, I ask, how old is it? According to industry average, last year, we were seeing that roughly one quarter of all sites were inactive, or orphaned. And let’s think about the process SharePoint end-users undertake here… we get a request from a group in the organization for a site, where they will actively post and version all sorts of content for Project A. A few months go by and Project A comes to completion, and out of the blue we get another request for a new site… this time for Project B. And although some folks may grab some content and repurpose in Project B for the most part after the first little while of Project B has passed, the Project A site sits idle. So not only do we have another site with all sorts of new content, we are now starting to build out our volumes of duplicate content… and this is when things get ugly.Now think 2 slides back and the example of the large financial firm with 40 TBs of data in SharePoint… based on the stat you see here, that means 10 TBs of that data is doing nothing, but taking up the most costly form of storage. There must be a better way!?!
  • It would be fair to say that theses days you’d be hard pressed not to be confronted with compliance – be it corporate policies or full-on industry regulations, there are a few out there and frighteningly enough, many more on the way. This list here includes just those I could remember myself…From an information governance standpoint, if you are moving SharePoint content into an RM repository or archive, you want to either apply new policies according to your requirements, or maintain the policies you had when the content lived in SharePoint. Applying a lifecycle to content through retention and disposition policies is a must for a couple of reasons, one: you’re accountable for all of the information you create, and two: once all regulatory obligations have been met on content it really behooves you to get rid of it – because as they say, any information you have can and will be used against you in a court of law.
  • Ah yes, the “Experience!” Show of hands, how many of you have watched someone from MSFT present SharePoint? I have a number of times, and for fun the last time, at Tech Ed this Spring, I counted the number of times the presenter used the word experience, 442. Seriously though, the experience is really what SharePoint is all about right, I mean that’s why it has been so immensely popular, it really connects the people and the content through technology.One thing you have to ask yourself when you’re a vendor looking to develop some sort of solution that enhances or extends SharePoint is, will I break the experience? If the answer is yes, the solution will fail. It is of tantamount importance that the SharePoint end-user in particular never be taken outside of that work experience, that if any enhancements or extensions to SharePoint are made, they be made behind the scenes.You can’t add to the effort of the end-user, the solution has to maintain it’s simple and easy-to-use… you know.
  • It wouldn’t be a stretch to say that if you have SharePoint you probably have email and file shares. Most organizations also have things likeback-up tapes and DVDs; Flash drives and network connected PCs. Oh! And physical content.We learned a long time ago that storing stuff here and there was not optimal. Centralized management of information spreads across the information infrastructure was the ideal. This concept of centralized management makes a lot of sense, and although SharePoint is an information solution, it is likely not going to be your ONLY information solution, rather, it will be part of the broader information infrastructure. So back to what you see on the screen here: we have multiple silos of content, and we probably want to centrally manage everything you’ll be archiving. So if you’re moving content from SharePoint to an archive, it should really live alongside the file content you’ve archived, the email you’ve archived and God knows what else you feel like archiving. The benefits of centralization relate to a lot of the things we’ve already discussed, like compliance and litigation readiness: managing all of your archived content under one set of unified compliance policies, or have a central place where all your information can be retrieved from when, not if, you get sued and need to get at it.
  • Before I dive into the solution overview, I wanted to do a quick review of the SourceOne platform for information governance. SourceOne solutions allow you to understand what content you have, and centrally manage that content, be it email, file system or SharePoint content, against your organization’s specific compliance requirements, and if you need to get to that content SourceOne provides the tools to discover and maintain it for litigation. And lastly, but certainly not least, SourceOne helps you save money by reducing IT overhead and automating processes.
  • So now I present to you, EMC SourceOne for Microsoft SharePoint, or SourceOne for SharePoint to save some time.There are really three facets to this solution: operational efficiencies,information governance and end-user transparency. You’ll note that these align very closely to the Ideal Solution I outlined moments ago.
  • In terms of operational value we are again reducing the load on SharePoint by rerouting content into a more appropriate and less costly tier of storage. To do this we leverage Microsoft’s recommended method for externalization and ensure 100% transparency between the content that has been re-directed and the end-user that uses it. This solution also can enhance SharePoint’s overall scalability and can dramatically improve performance, as up to 95% of the load can be displaced from SharePoint.
  • So how does the externalization of active SharePoint content help? Well first let’s begin with how it works.[CLICK]Natively SharePoint stores its content in a SQL Server database, now it is important to understand that the content itself can be split into two things, one, the meta data, and secondly the Binary Large Object or BLOB, which again can account for up to 95% of the contents mass.[CLICK]What the SourceOne solution does is leverages Microsoft’s recommended approach to externalization and uses a MSFT created API to essentially dissect the content into metadata and BLOB, where we then route the BLOB to a SourceOne folder and the metadata continues on its trip to SQL. As we already know this provides improved efficiencies and reduced load on SharePoint, but the big thing that leaving the metadata in SQL allows is 100% end-user transparency, such that the end-user sees and can access the content as though it lives natively in SharePoint!
  • So what about the information governance side of this coin?Well we already noted that 25% of SharePoint content is living outdated or orphaned, the product of completed projects now sitting idle in SharePoint.To help manage the growth of SharePoint farms and more importantly provide good information governance, SourceOne for SharePoint brings forward three big benefits:The first is one of the biggest, and that is the ability to manage archived content against a lifecycle, or more specifically, apply retention and disposition policies against any and all content moved into a SourceOne archive. And in this day and age of legal preparedness we must ensure that when required we can find, access and gather content, in its original form, and present it in a court of law. To do this organizations can also leverage EMC SourceOne’s industry leading eDiscovery solutions.And lastly, content accessibility. In a classic archival process content is copied and moved out of its original repository and stored in an archive, or separate repository; once it is confirmed that the content is in the archive, it is most often, for storage management efficiencies sake, deleted from its original repository as to not be storing twice the amount of content. But with SharePoint we must always be aware of the end-user. And even though we’ve archived the content and most likely “removed” it from SharePoint, we must ensure accessibility to the end-user, and with the SourceOne solution for SharePoint we have by providing a SharePoint search web part that lives right in SharePoint’s native Search Services.
  • This is a view of the SourceOne administrator’s console where admins can execute on activities to archive email, file system AND of course SharePoint content. It is a simple wizard driven UI that provides a complete information governance platform for the centralized management, long term archival and preservation of old and outdated content under a unified set of policies, from multiple sources. In this image you can see that through the SourceOne admin console, amongst other archiving options, an administrator can now select an activity to archive SharePoint content specifically.
  • 2 x CLICKSOne use case I wanted to bring up surrounds the migration from SharePoint 2003/2007 to 2010. [CLICK] It is again important to understand that although a lot of content will reside in SharePoint, chances are ALL of your content won’t live in SharePoint. [CLICK] But during migration you will have the proverbial hood open, why not take this opportunity to tune the whole engine as opposed to just replacing the spark plugs?
  • The first thing we need to do is simply understand what information we have. So there are a number of tools that can help you search across your entire information infrastructure that help you figure out two key things: What type of content do you have, and when was it last accessed.
  • With this information we can define 3 simple buckets:[CLICK]The first bucket is our active content bucket, so all of the information that we are using regularly, collaborating on, versioning, etc. in SharePoint. And if we’ve done something like a File Share migration, or moved all of our paper forms into SharePoint to get a better handle on all of the working content we have in our organization, we might find that we’ve placed too large a burden on SharePoint, but that’s okay, because we can always externalize active content to a smarter more cost effective tier of storage.[CLICK]The next bucket is the inactive stuff, that old, orphaned content that was, for example, once part of a project now complete. Additionally, during that file share migration effort we undertook we probably realized, hey, a lot of this stuff is old and not being used, but we have to keep it under compliance due to regulatory obligations… no point in bogging down SharePoint with old stuff right? No, move it into a retention enabled archive where it can be managed long-term at a much lower cost.[CLICK] And lastly, and often the biggest, the Delete Bucket. This is where we put those music libraries some guy in marketing had on the G-drive, but even more so, this is where we put all the duplicate information we managed to find. And please remember: we only press the giant red delete button once we are sure the content has lived up to all regulatory obligations.In the end we are left with a nice tidy information infrastructure, where active content is being managed in SharePoint with large loads being externalized, and we’ve established a great set of long-term policies to manage older content that we must maintain under compliance.
  • For more content and information around the EMC solutions please check out If I do say so myself it is one of the most unique interactive web experiences out there and a place where you can find data sheets, white papers, videos and more.Or feel free to email me at davidm.martin@emc.comThanks.
  • ×