Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Structuring Big Data


Published on

Mark Wilson's lightning talk at the London Cloud Camp on 25 January 2012 about using linked data to integrate data silos in the world of big data

Published in: Technology
  • Be the first to comment

Structuring Big Data

  1. 1. Structuring big data Mark Wilson January 2012#CloudCamp UNCLASSIFIED © Copyright 2012 Fujitsu Services Limited
  2. 2. The problem with big data: and a solutionThe problem:  “New reference architectures will include both big data and enterprise data warehouses” [IDC, 19 January 2012]  Two worlds: structured and unstructured data (plus external data sources, documents stored in structured databases, etc.)  Siloes create issues with management, integration, etc.The solution:  Linked data – a single reference point for all data in the enterprise#CloudCamp 1 UNCLASSIFIED
  3. 3. Some history Fixed structure  Difficult to change schema Simple reporting capabilities  Complex to create new reports#CloudCamp 2 UNCLASSIFIED
  4. 4. Some history Completed transactions transferred to separate database for analysis  “Data warehouse” Better reporting, data mining, etc.  Still highly structured Data is historical  May be aggregated#CloudCamp 3 UNCLASSIFIED
  5. 5. The smart guysReal-time update of completed transactions  Transactions moved to data warehouse upon completion  Smaller transactional databaseAllows for alerts to be generated when specific conditions met and action taken#CloudCamp 4 UNCLASSIFIED
  6. 6. A third “data silo” Masses of unstructured/semi- structured data being processed in NoSQL databases May, or may not be transferred to/from structured databases  Time-consuming and inefficient Three types of data, each with their own limitations and own management considerations#CloudCamp 5 UNCLASSIFIED
  7. 7. Data everywhere!#CloudCamp 6 UNCLASSIFIED
  8. 8. Linked DataTie records together – even from separate data setsWe can express as triples with a specific grammar:Build up a graph to show machine-readable data in human form#CloudCamp 7 UNCLASSIFIED
  9. 9. Then add lots more data…Source:  Each node is itself another graph (zoom in)#CloudCamp 8 UNCLASSIFIED
  10. 10. Aren’t we missing a trick?Use linked data as a the optimal reference source  Broker of all data sourcesSingle view on structured and unstructured data  Bring in external sources tooMapping, interconnecting, indexing and feeding  In real timeQuery linked data to derive new value from old  Infer relationships  Gain new insights#CloudCamp 9 UNCLASSIFIED
  11. 11. About the authorMark Wilson, Strategy Manager, FujitsuMark is an analyst working within Fujitsu’s UK andIreland Office of the CTO, providing thoughtleadership both internally and to customers,shaping business and technology strategy. He has17 years experience of working in the IT industry,12 of which have been with Fujitsu. Mark has abackground in leading large IT infrastructureprojects with customers in the UK, mainlandEurope and Australia. He has a degree inComputer Studies from the University ofGlamorgan. Mark is also active in social media andwon the Individual IT Professional (Male) award inthe 2010 Computer Weekly IT Blog Awards. Markmay be found on Twitter @markwilsonit.If you would like to comment on the topics in thispresentation, Mark would welcome your feedback,by email to