Big Data – Shining the Light on Enterprise Dark Data

  • 2,299 views
Uploaded on

Content stored for a business purpose is often without structure or metadata required to determine its original purpose. With Hitachi Data Discovery Suite and Hitachi Content Platform, businesses can …

Content stored for a business purpose is often without structure or metadata required to determine its original purpose. With Hitachi Data Discovery Suite and Hitachi Content Platform, businesses can uncover dark data that could be leveraged for better business insight and uncover compliance issues that could prevent business risks. View this session and learn: What is enterprise dark data? How can enterprise dark data impact business decisions? How can you augment your underutilized data and deliver more value? How can you decrease the headache and challenges created by dark data? For more information please visit: http://www.hds.com/products/file-and-content/

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,299
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
22
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BIG DATA – SHINING THE LIGHT ON ENTERPRISE DARK DATA (EDD) APRIL 17, 2013
  • 2. Content stored for a business purpose often lacks structure or metadata required to determine its original purpose. With Hitachi Data Discovery Suite and Hitachi Content Platform, businesses can uncover dark data that could be leveraged for better business insight and uncover compliance issues that could prevent business risks. Attend this session and learn: • What is enterprise dark data? • How can enterprise dark data impact business decisions? • How can you augment your underutilized data and deliver more value? • How can you decrease the headache and challenges created by dark data? BIG DATA – SHINING THE LIGHT ON ENTERPRISE DARK DATA WEBTECH EDUCATIONAL SERIES
  • 3. SPEAKERS Jeff Lundberg, senior product marketing manager, Hitachi Content Platform Marcelline Sanders, senior product manager, Hitachi Data Discovery Suite Eamon O’Neill, senior product manager, Hitachi Content Platform Photo?
  • 4. WHAT IS ENTERPRISE DARK DATA?  Dark data is ‒ Old files ‒ Data that you kept just in case ‒ Content on devices and clouds outside of IT control  It's created almost everywhere and stored anywhere  Organizations hoard this unanalyzed information because it’s value is unknown and storage is “cheap”  It may be worthless, invaluable or somewhere in between ‒ It’s clogging up production systems ‒ It’s all being treated the same despite widely varying value to the organization
  • 5. INFORMATION IS CREATED IN SILOS OPERATIONS DISTRIBUTION MARKETING CALL CENTER MANU- FACTURING R&D IT STORES AND SALES EMAIL EMAIL EMAIL EMAIL EMAIL PDF PDF PDF PDF
  • 6. UNSTRUCTURED DATA IS A MESS
  • 7. OLD WAYS OF INFORMATION GATHERING
  • 8. HOW TO GAIN INSIGHT ACROSS THE BUSINESS? Legal CounselCEO CIO What’s the next big opportunity for the company? Is the business at risk due to dark data? How do I understand my enterprise dark data? CMO How can we influence market sentiment for our brand?
  • 9. COLLECT AND ORGANIZE YOUR DATA Corporate Compliance Operational Intelligence New Insight
  • 10. 10 HOW IT WORKS IN THE REAL WORLD
  • 11. HEALTHCARE, LIFE SCIENCES THE KNOWLEDGE OF ALL FOR THE TREATMENT OF ONE RESEARCH EVALUATION TREATMENT CLINICAL TRIALS = The next cure = Better patient care
  • 12. HEALTHCARE EXAMPLE KLINIKUM WELS Primary Site 8 HCP nodes 2 HDDS Nodes (Full content and metadata search) USP-V Secondary Site 4 HCP nodes 1 HDDS node USP-V Replication Health Portal Ingest and consolidate data from 37 departments, 26 specialties Metadata-based repository Metadata Robot (CDA, PDF and XML) Adds metadata and custom metadata to create context (information and intelligence)  The environment ‒ Consolidate content from 37 departments ‒ 30-year compliant preservation ‒ Aggregation, search and metadata mining  How they use big data ‒ Intelligent data management ‒ Improve patient care, research and education capabilities ‒ Trend analysis ‒ Reduce cost and complexity of backups ‒ Make data independent of applications
  • 13. FINANCIAL SERVICES PROACTIVELY SEARCH FOR REGULATORY ISSUES BLOOMBERG MESSAGES EMAIL CALL RECORDINGS DATABASE RECORDS = Smart Intelligence from enterprise dark data = Protect business from risk XML
  • 14. FINANCIAL SERVICES − REGULATORY XML AUDIO RECORDS BLOOM- BERG MESAGES Add Custom Metadata Google $600.00 11AMPST Apple523.00 Apple523.00 Trader–SamMalone Bloomberg 11AM Trader–SamMalone JPMorgan3rdParty 11:20 AM PST Equity E NPV11Billion Nov 15, 2012Nov 15, 2012Nov 15, 2012 Nov 15, 2012 HDDS Search “Nov 15, 2012” and “Sam Malone” and “I have a deal for you” Legal Hold Legal Hold Legal Hold Legal Hold Index and Search
  • 15. INSURANCE MOVING BEYOND I.T.-CENTRIC VALUE TO BUSINESS VALUE ACCIDENT CLAIM INVESTIGATION PAYOUT = Competitive differentiation = Increased customer loyalty
  • 16. INSURANCE EXAMPLE ENTERPRISE CONTENT LIFECYCLE MANAGEMENT AND DISCOVERY Unified Search (HDDS) Virtualized ContentContent Creation Unified Management Mobile Remote/Branch Office On-Site <claim id=1203 date=20110925> <policy id=101> <party id=1 type=car plate=509445> <claim id=1203 date=20110925> <policy id=101> <estimate id=2344 estimator=124 date=20110930> <claim id=1203 date=20110930> <policy id=101> <invoice id=72273881 vendor=2833> Search across all content independent of applications, physical location of data Cloud Storage
  • 17. 17 INDEX AND SEARCH DISCOVER, CONNECT, FILTER, ASSESS, ACT
  • 18. DISCOVER GAIN INSIGHT BY CONNECTING TO YOUR DATA SEARCH ANALYZEINSIGHT
  • 19. MANY DATA SOURCES 5/ 25/ 12 Retreive Well Production Data 1/ 2https:/ / www.dmr.nd.gov/ oilgas/ basic/ getwellprod.asp?filenumber= 19119 Related Links Get Well Production History Data Enter File Number: 20178 Get Monthly Production Data NDIC File No: 19119 API No: 33-105-01865-00-00 CTB No: 119119 Well Type: OG Well Status: A Status Date: 11/5/2010 Wellbore type: Horizontal Location: NENW 26-155-101 Footages: 320 FNL 2529 FWL Latitude: 48.225686 Longitude: -103.636598 Current Operator: BRIGHAM OIL & GAS, L.P. Current Well Name: HEEN 26-35 1-H Elevation(s): 2073 KB 2053 GR 2053 GL Total Depth: 20400 Field: TODD Spud Date(s): 7/27/2010 Casing String(s): 9.625" 2160' 7" 10896' Completion Data Pool: BAKKEN Perfs: 10896-20400 Comp: 11/5/2010 Status: AL Date: 2/10/2011 Spacing: 2SEC Cumulative Production Data Pool: BAKKEN Cum Oil: 162510 Cum MCF Gas: 141410 Cum Water: 150629 Production Test Data IP Test Date: 11/8/2010 Pool: BAKKEN IP Oil: 3425 IP MCF: 2194 IP Water: 6265 Monthly Production Data Pool Date Days BBLS Oil Runs BBLS Water MCF Prod MCF Sold Vent/Flare BAKKEN 3-2012 31 5301 5217 4079 4301 3667 634 BAKKEN 2-2012 29 5050 4971 3756 2723 1185 1538 BAKKEN 1-2012 31 5624 5786 4239 2846 1705 1141 BAKKEN 12-2011 31 5708 5407 4272 4033 3134 899 BAKKEN 11-2011 30 6112 6228 4536 4647 4368 279 BAKKEN 10-2011 31 6227 7526 4857 4903 4303 600 BAKKEN 9-2011 30 6516 5544 4866 5418 5113 305 BAKKEN 8-2011 31 7430 7276 7724 5996 2532 3464 BAKKEN 7-2011 31 8085 7866 5699 7500 7499 1 BAKKEN 6-2011 30 8438 8682 5501 6481 1816 4665 BAKKEN 5-2011 28 6221 6526 6709 4456 0 4456 BAKKEN 4-2011 30 8201 7379 8189 5943 0 5943 BAKKEN 3-2011 31 11263 11928 9963 8345 0 8345 BAKKEN 2-2011 23 10035 10365 7819 9841 0 9841 Structured: Presentation of RDBMS Data Unstructured: Well File, PDF of Scanned Documents, Seismic, etc.
  • 20. SCALE-OUT INDEXING OF INFORMATION Index Metadata and Full Content in Complex Formats and Multiple Languages Process Petabytes of Data Security Protection!
  • 21. DISCOVER, CONNECT, AND ASSESS INFORMATION  Hitachi Data Discovery Suite (HDDS) ‒ Scales using latest open source technologies ‒ Hadoop ‒ HDFS ‒ Zookeeper ‒ 1,000 objects per second per server/node (NFS metadata indexing) ‒ Parallel processing  Structured queries against unstructured information  Rich API  Results for further analysis
  • 22. BREAK DOWN SILOS SOPHISTICATED INSIGHT ACROSS DISPARATE INFORMATION TYPES Identify Trends and Insights With a Single View Across Previously Siloed Data 3 4 4 1 Net-New Revenue Opportunity, Innovation or Competitive Differentiation SINGLE VIRTUALIZATION PLATFORM Block Object File Structured/Unstructured Healthcare Insurance Manufacturing ANALYTICS
  • 23. BRING STRUCTURE TO UNSTRUCTURED DATA
  • 24. USE METADATA TO ORGANIZE AND QUERY Block File M E T A D A T A Object QUERIES
  • 25. BIG METADATA PREPARE DATA FOR ANALYTICS Block FileObject ANALYTICS
  • 26. OBJECT STORAGE FOR STORING, CONTROLLING, TAGGING, ANALYZING, ENRICHING, AND SHARING ENTERPRISE DARK DATA
  • 27. STORE EDD − VOLUME AND VELOCITY 80 Nodes 40 Petabytes of Storage 64 Billion User Objects  Volume: Grow from 4TB to 40PB, by adding storage  Velocity: Rapid read-write of data. Increase bandwidth by adding nodes Scale-Out Architecture  With compression and deduplication, store big data efficiently in Hitachi Content Platform (HCP), inside the enterprise or in cloud-hosted HCP
  • 28. STORE EDD − VARIETY  10,000 namespace divisions within the reservoir Different data management policies for each kind of data – retention, compliance, etc. HCP DESIGNED TO STORE A WIDE VARIETY OF UNSTRUCTURED DATA Office SharePoint Server2007 Office SharePoint Server2007 Office SharePoint Server2007 Office SharePoint Server2007 Microsoft® SharePoint® Microsoft Exchange X-rays  Metadata Schema Adapted for Various Content Types Legal contracts Instant messages Surveillance Call Recordings
  • 29. CONTROL EDD – BACKUP-FREE  Use of proven RAID-6 protection  2 copies of all metadata  Customer configurable redundant local object copies (2, 3, or 4)  Content validation via hashes and automatic object repair  Replication – offsite copies with automated repair from replica  Object versioning – protection from accidental deletes and changes Active data protection built into the object store Equals unparalleled data protection and reduced backup burden
  • 30. P 21 May 21 2036 May Authentication  Policy-based object management guarantees archived data is authentic, available and secure  Guards against corruption or tampering  Selectable hash algorithms include SHA-1, 256/384/512; MD5, and RIPEMD-160 0 1 1 0 0 1 1 0 0 1 0 1 1 1 1 0 1 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 0 0 1 A Retention  Prevents deletion before retention period expires  Strict “compliance” or more liberal “enterprise” mode  Retention classes, date in object, or deferred options. Privileged delete, retention hold Protection  Self-configuring and self-healing with automated policy enforcement, failover and ongoing integrity checks  Ensures specified number of replica copies are maintained to tolerate simultaneous points of failure, depending on value of data CONTROL EDD – PRESERVE AND SECURE Encryption of data at rest  Protects content if media is stolen, using patented Secret Sharing technology  Transparently encrypts all content, metadata, and search indexes  Implements a distributed key management solution Replication  Bidirectional, inbound star, chain topologies  Transparent object-level restore, repair, and read recovery from replica Shredding  Ensures no trace of file is recoverable from disk after deletion; U.S. DoD 5520-M spec. X X X X X X X X X X X X X X X X X X
  • 31. TAG EDD – CUSTOM METADATA <claim id=1203 date=20110925> <policy id=101> <party id=1 type=car plate=509445> <claim id=1203 date=20110925> <policy id=101> <estimate id=2344 estimator=124 date=20110930> <policy id=101> <object type=car plate=454756> <customer id=2355> <tow plate=454756> Object Consists of Files (JPG, PDF, etc.) Plus Appended Tags
  • 32. ANALYZE EDD  Built-in metadata search index  Object query API enables web dashboards  Relational queries link together many kinds of unstructured objects and connect those to structured data  Metadata policy engine – automated management actions on search results Put HOLD on all files related to lawsuit Retrieve all scanned-doctor-notes related to tibia-fracture-xray-images and related insurance-claim-records in SQL DBs
  • 33. EDD LIFECYCLE ENRICH EDD Analyze Enrich Store and Control Capture +  HCP makes existing data more useful. Outcome of analysis leads to more tags for the content. Continuously append custom metadata  Over time, what you learn about EDD becomes more important than the data itself
  • 34. Linux/Unix Filers (NFS) Document Management (WebDAV) Microsoft® Windows® (CIFS) Amazon S3 (Compatible RESTful HTTP(S)) SHARE EDD – MANY ACCESS METHODS Email Journaling (SMTP) https://marketing.xenos. /browser/contract.pdf
  • 35. ADDITIONAL RESOURCES For more information about the technologies behind enterprise dark data, please refer to the following links for more information Hitachi Data Discovery Suite http://www.hds.com/products/file-and-content/data- discovery-suite.html?WT.ac=us_mg_pro_dds Hitachi Content Platform http://www.hds.com/products/file-and-content/content- platform/?WT.ac=us_mg_pro_hcp General EDD questions − Laura Chu-Vial, laura.chu@hds.com
  • 36. SUMMARY  Currently, Dark Data is a burden: ‒ It's created almost everywhere and stored anywhere ‒ Organizations hoard this data because it’s value is unknown and storage is ‘cheap’ ‒ It’s all being treated the same despite widely varying value to the organization ‒ Provides low value outside of legal and compliance  Put your data to work for you: ‒ Identify dark data and assess its value with index and search ‒ Collect, store and organize data in an object store ‒ Analyze your dark data’s content and metadata ‒ Enrich and share insight to drive new innovation
  • 37. QUESTIONS AND DISCUSSION
  • 38. UPCOMING WEBTECHS  HDS Big Data Roadmap, May 1, 9 a.m. PT, noon ET  Hitachi’s Cloud Strategy, Enabling Technologies, and Solutions, May 21, 9 a.m. PT, noon ET  Environmental Pressures Driving an Evolution in File Storage, May 23, 9 a.m. PT, noon ET  HDS Hadoop Reference Architecture, June 5, 9 a.m. PT, noon ET Check www.hds.com/webtech for:  Links to the recording, the presentation and Q&A (available next week)  Schedule and registration for upcoming WebTech sessions
  • 39. THANK YOU