Structuring Big Data


Published on

Mark Wilson's lightning talk at the London Cloud Camp on 25 January 2012 about using linked data to integrate data silos in the world of big data

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Everyone’s talking about big data but the bulk of the conversation seems to focus on a new level of business intelligence and an ever-increasing volume of data organised into OLTP, OLAP and NoSQLsiloes.  In this talk, Mark Wilson puts forward a view that the real value is not from the big data itself but how we can employ linked data concepts to integrate structured, unstructured and semistructured data sets – and then use this unified data source to derive new value.
  • Structuring Big Data

    1. 1. Structuring big data Mark Wilson January 2012#CloudCamp UNCLASSIFIED © Copyright 2012 Fujitsu Services Limited
    2. 2. The problem with big data: and a solutionThe problem:  “New reference architectures will include both big data and enterprise data warehouses” [IDC, 19 January 2012]  Two worlds: structured and unstructured data (plus external data sources, documents stored in structured databases, etc.)  Siloes create issues with management, integration, etc.The solution:  Linked data – a single reference point for all data in the enterprise#CloudCamp 1 UNCLASSIFIED
    3. 3. Some history Fixed structure  Difficult to change schema Simple reporting capabilities  Complex to create new reports#CloudCamp 2 UNCLASSIFIED
    4. 4. Some history Completed transactions transferred to separate database for analysis  “Data warehouse” Better reporting, data mining, etc.  Still highly structured Data is historical  May be aggregated#CloudCamp 3 UNCLASSIFIED
    5. 5. The smart guysReal-time update of completed transactions  Transactions moved to data warehouse upon completion  Smaller transactional databaseAllows for alerts to be generated when specific conditions met and action taken#CloudCamp 4 UNCLASSIFIED
    6. 6. A third “data silo” Masses of unstructured/semi- structured data being processed in NoSQL databases May, or may not be transferred to/from structured databases  Time-consuming and inefficient Three types of data, each with their own limitations and own management considerations#CloudCamp 5 UNCLASSIFIED
    7. 7. Data everywhere!#CloudCamp 6 UNCLASSIFIED
    8. 8. Linked DataTie records together – even from separate data setsWe can express as triples with a specific grammar:Build up a graph to show machine-readable data in human form#CloudCamp 7 UNCLASSIFIED
    9. 9. Then add lots more data…Source:  Each node is itself another graph (zoom in)#CloudCamp 8 UNCLASSIFIED
    10. 10. Aren’t we missing a trick?Use linked data as a the optimal reference source  Broker of all data sourcesSingle view on structured and unstructured data  Bring in external sources tooMapping, interconnecting, indexing and feeding  In real timeQuery linked data to derive new value from old  Infer relationships  Gain new insights#CloudCamp 9 UNCLASSIFIED
    11. 11. About the authorMark Wilson, Strategy Manager, FujitsuMark is an analyst working within Fujitsu’s UK andIreland Office of the CTO, providing thoughtleadership both internally and to customers,shaping business and technology strategy. He has17 years experience of working in the IT industry,12 of which have been with Fujitsu. Mark has abackground in leading large IT infrastructureprojects with customers in the UK, mainlandEurope and Australia. He has a degree inComputer Studies from the University ofGlamorgan. Mark is also active in social media andwon the Individual IT Professional (Male) award inthe 2010 Computer Weekly IT Blog Awards. Markmay be found on Twitter @markwilsonit.If you would like to comment on the topics in thispresentation, Mark would welcome your feedback,by email to