Requirements for Managing Unstructured Data

Requirements for Managing Unstructured Data






Total Views
Views on SlideShare
Embed Views



3 Embeds 155 153 1 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • i need some concepts of unstructured data
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Requirements for Managing Unstructured Data Requirements for Managing Unstructured Data Presentation Transcript

  • Determining Requirements for Managing Unstructured Data Christine Connors TriviumRLG LLC Information Management Consulting March 22, 2012Thursday, March 22, 12
  • Overview ✤ Triggers ✤ Techniques ✤ Input ✤ Output ✤ Scale ✤ Mapping requirements to capabilitiesThursday, March 22, 12
  • Triggers ✤ “Didn’t we already do that?” ✤ “I found it once. It’s in there somewhere.” ✤ “Who knows how to do this?” ✤ “We maintain how many document management systems?!?” ✤ “Why can’t we use this content to do ... ?” ✤ “Which customer wanted that feature?”Thursday, March 22, 12
  • As true today... ✤ “The search engine is poor to inadequate. I needed to find an appropriations data sheet and was returned 366 entries, none which had anything to do with appropriations. I spend far too much time looking through the search results for this engine to be effective. If I could find this document on the INTERNET I would do so, but this is an internal document that is successfully hidden somewhere in the archives with the Ark of the Covenant.” Unidentified search and browse survey participant, June, 2003 ✤ “Who gets more hits: or Listen up people: Our intranet is a wasteland of information. We need to unify - we need to standardize. Information is power - but only if it is on my desktop, not hidden away in some server waiting for a lucky adventurer to uncover it like some lost continent.” Another unidentified search and browse survey participant, June, 2003Thursday, March 22, 12
  • Wonderful objects with no metadata (context) A secret garden “Secret Garden” by wonderlane | Flickr | CC Attribution 2.0 GenericThursday, March 22, 12
  • Objects with can’t-be-bothered metadata A maze “Longleat Maze” by odolphie | Flickr | CC Attribution 2.0 GenericThursday, March 22, 12
  • Lots of unmarked repositories Silos “Silo” by Plano Light | Flickr | CC Attribution 2.0 GenericThursday, March 22, 12
  • TechniquesThursday, March 22, 12
  • Sometimes, it’s obvious ✤ Environmental scan ✤ Do we really need 40 document management systems? ➡ We need to reduce the number of systems ➡ Improve the finability of the objects contained ✤ Budget analysis ✤ Projections indicate un-supportable costs of maintaining servers ✤ Costs are going down, but not as fast as our rate of acquisition ➡ We need to archive or compress the data, intelligentlyThursday, March 22, 12
  • Here there be dragons...Thursday, March 22, 12
  • Standard Techniques ✤ Surveys ✤ Focus groups ✤ Observation ✤ SWOT ✤ Capabilities analysisThursday, March 22, 12
  • User Personas ✤ Craft fictional characters based on your key user groups ✤ These archetypes will represent the users of your new system or process ✤ Give them attributes and stories ✤ Figure out what you need to solve their problemsThursday, March 22, 12
  • Knowledge Audits ✤ Identify what types of information are critical for the organization ✤ Dashboards? ✤ Note gaps ✤ Note overlaps - redundancies, duplication and collaborateThursday, March 22, 12
  • Business Process Map ✤ Document the steps in standard business processes ✤ Identify where unstructured data is used and created ✤ Identify critical inputs/ outputs ✤ Identify breaks and blocks in the system Photo by ottonassar | CC Attribution-Share Alike, March 22, 12
  • Social Tagging Analysis ✤ Analyze the metadata and folksonomy - the organic hierarchies and social tags that have been created ad hoc in the systems ✤ Are there synonymous or near-synonymous terms? ✤ Are there trends by date or location?Thursday, March 22, 12
  • Survey Stakeholders ✤ What problem(s) are you solving? ✤ What are the pain points in the digital asset management strategy? Discovery, re-use, IP management? ✤ What are the benefits? ✤ New products, increased customer and/or employee satisfaction? ✤ Are there restrictions on how it gets done?Thursday, March 22, 12
  • Typical Project Structure ✤ Analysis of needs & wants ✤ Define requirements ✤ Commit ✤ Resourcing ✤ Develop and Deploy ✤ Define & Publish Maintenance Processes and Governance RulesThursday, March 22, 12
  • Improve Efficiencies Reduce CostsThursday, March 22, 12
  • Input - Lay of the Land ✤ Data discovery in an 80k employee multi-national ✤ 85% of the data “unstructured” ✤ 90% had no metadata ✤ most of that was “bad” metadata ✤ 13% exact duplicate ✤ True age of object hard to determine due to web scripting, server migrations, shared accessThursday, March 22, 12
  • Input ✤ Qualify searches by ✤ function, organization, and business ✤ date ✤ document type (especially web pages) ✤ category (tags) ✤ Provide sorting of results by date, document type ✤ Do not change URLs of pages (users bookmarked)Thursday, March 22, 12
  • Improved Efficiencies ✤ Delphi Group: ✤ Business professionals spend more than 2 hours per day searching for information ✤ Half of that time – 1 hour per day is wasted by failure to find what they seek ✤ The single factor most attributed to the large amount of time wasted was ✤ data changes (location 35%) and ✤ bad tools (ineffective search and lack of labeling 28%)Thursday, March 22, 12
  • Output ✤ Objects must have metadata ✤ Title, Author, Subject ✤ Repositories should be created for organization/business/function ✤ Objects must be stored in one location to reduce duplicates ✤ Objects need to be shared to many locations ✤ Search & browse UI tools must provide filters for the index created ✤ File naming conventions need to be created and enforcedThursday, March 22, 12
  • Improved Efficiencies Dollars Returned to the Business for Growth (1 hour per year per general employee plus 1 hour per month) $4,000,000 $3,000,000 $2,000,000 $1,000,000 $0 1.2k 2k 10.4k 12.3k 4k 11.6k 11.9k 8.3k 13.9kThursday, March 22, 12
  • Reduce Storage Costs Data growth assuming 60% annual growth rate $90 3000 T1 Only General tiered move Unintelligent Move Policy based Move $68 2250 2228 Millions (Annual Cost) $45 1500 $44.3 1393 $27.7 870 $23 750 544 $17.3 $12.2 $10.8 340 $0 0 Year 1 Year 2 Year 3 Year 4 Year 5 Relative of starting point, growth curves represent storage acquisition cost increases over time.Thursday, March 22, 12
  • Identify OpportunitiesThursday, March 22, 12
  • Input ✤ Curate the content for me ✤ Allow me to reuse content easily ✤ a part, not the whole ✤ in a new package ✤ without copying/pasting ✤ with citations ✤ Allow me to annotate content ✤ Allow me to refine content based on my needsThursday, March 22, 12
  • Content Re-use and Re-purposing ✤ Skills: people do not learn at the same pace nor neatly align to ‘grade’ levels ✤ Product catalog: name and image as a tile on a sale page as well as in a detailed product description ✤ A taxonomy focused on a subject from introductory to mastery levels of understanding can be used to tag content fragments ✤ Combined with a taxonomy of skill levels, the content can be aggregated into packages consistently addressing the right audience in the right order ✤ These fragments can be re-used in a variety of products: multiple skill levels, multiple assessments, multiple delivery channelsThursday, March 22, 12
  • Output ✤ CRM content must be indexed and categorized ✤ Objects must have metadata ✤ Title, Author, Subject, Skill Level, Process Step ✤ Objects need to be shared to many locations ✤ Objects must be usable in multiple systems and platforms ✤ File naming conventions need to be created and enforced ✤ Source data/citations must be available ✤ Objects must be written in a re-usable, neutral voiceThursday, March 22, 12
  • Define Requirements ✤ Functional Requirements ✤ User Requirements ✤ Administrative Requirements ✤ Authentication/Authorization/Security ✤ Metrics ✤ Documentation requirements ✤ Technical Requirements ✤ Back End ✤ Front End ✤ Platform ✤ InteroperabilityThursday, March 22, 12
  • Authentication, Authorization and Security ✤ Consider the content collections that will be part of the program. ✤ Do you anticipate any of it having restrictions? ✤ If so, then what are those restrictions? ✤ How will authorized users authenticate and gain access? ✤ Will you restrict access by entity type? ✤ By rules-based classification? ✤ By system access and control policies?Thursday, March 22, 12
  • Back End ✤ How will you architect the back end to scale effectively? ✤ Will it be easily repeated on additional clusters? ✤ What OS and software will it need to run? ✤ Will it fail over? ✤ Can it scale to handle the number of users, documents and entities predicted for the anticipated life of the hardware?Thursday, March 22, 12
  • Front End ✤ How will users interact with the system? ✤ Create - Read - Update - Delete as permissioned ✤ Search, browse, publish, integrate, migrate and import to and from other systems. ✤ What tools are needed to support these actions? ✤ Should select users be able to perform administrative tasks via a client or browser interface? ✤ How about the ability to generate reports? ✤ What operating system(s) does this interface need to function on? ✤ Mobile? Offline?Thursday, March 22, 12
  • Interoperability ✤ How are you going to package and publish the data? ✤ File servers? ✤ Cloud? ✤ XML? Office suites? Analytics packages? Other tools? ✤ What other applications need to use the data created by one of the above? ✤ DMS/DAM/CMS/CRMThursday, March 22, 12
  • Metadata Management ✤ What kinds of information is important to manage - what metadata elements? ✤ Title, Author, Subject, Process, Skill, Dates, Business, Function... ✤ Will you need a taxonomy? ✤ Enforce some control on the description of attributes ✤ Do you need an external tool or is there a module within your CMS, DMS or portal solution that will suffice?Thursday, March 22, 12
  • Resourcing ✤ Build vs. buy ✤ Human resources - staff or contractors needed ✤ Technology needs ✤ Hardware? Software? Network? Costs?Thursday, March 22, 12
  • Define & Publish Processes and Rules ✤ Maintenance processes ✤ Schedule for review and updates ✤ Rules for additions, changes, deletions ✤ Implementation and publishing process ✤ Governance rules ✤ Editor? Committee? User input? ✤ Standards compliance?Thursday, March 22, 12
  • ScaleThursday, March 22, 12
  • Scale ✤ According to the 2011 Digital Universe study by IDC/Sponsored by EMC, by 2020 the world will generate 50x the amount of information we have now, on 75x the number of containers, and increase IT support for those systems only by a factor of 1.5.Thursday, March 22, 12
  • Scale Using Tools ✤ Compression technologies ✤ Metadata management ✤ Indexing, NLP, Search ✤ Business rule generation and application ✤ VirtualizationThursday, March 22, 12
  • Scale Using Processes ✤ Standards ✤ Metadata governance ✤ Schema ✤ Taxonomy ✤ Subject Matter Experts ✤ Editorial Boards ✤ Product developmentThursday, March 22, 12
  • Mapping requirements to technologiesThursday, March 22, 12
  • What’s available? ✤ Latest technologies ✤ Information management frameworks ✤ Business process best practicesThursday, March 22, 12
  • Questions?Thursday, March 22, 12
  • Thank you for attending! Christine Connors, March 22, 12