Big Data Brighton | Big Data in Academia | Jan 2013

560
-1

Published on

Four talks about Big Data in Academia at Big Data Brighton Jan 2013. Two of the talks' slides are here. I'll upload Miltos' slides when I receive them.

Dr Patricia Roberts, Senior Lecturer & Researcher in database design, development and management, University of Brighton - Structured vs Unstructured Data: why structure matters.

Simon Wibberley, PhD student in computational linguistics at the Text Analytics Group at the University of Sussex. Real-time text stream analysis, event detection, and entity recognition. Event detection on Twitter.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
560
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Big Data Brighton | Big Data in Academia | Jan 2013

  1. 1. January 2013 at University of Brightonhttp://meetup.com/Big-Data-Brighton
  2. 2. Agenda• Miltos Petridis, Professor of Computer Science, University of Brighton• Dr Patricia Roberts, Senior Lecturer & Researcher in database design, development and management, University of Brighton - Structured vs Unstructured Data: why structure matters.• Simon Wibberley, PhD student in computational linguistics at the Text Analytics Group at the University of Sussex. Real-time text stream analysis, event detection, and entity recognition. Event detection on Twitter.• Kevin Long, Teradata - Summary and Business context
  3. 3. Big Data“A  new  generation  of  technologies  and   architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-speed capture,  discovery  and/or  analysis”1New investment initiatives are coming, such as in the US in 2012:“more  than  $200  million  in  new  funding   through six agencies and departments to improve  the  nation’s   ability to extract knowledge and insights from large and complex collections  of  digital  data”  2
  4. 4. Knowledge and insights... hmmBefore companies rush to use the technologies they should be asking some questions:• Can we make any assumptions about the quality of the data we are using?• Is there a significant difference between structured and unstructured data?• Can the underlying structure of the data affect what you can do with it?
  5. 5. In this brief talk, I will be examining these questions with reference to my research and recent trends
  6. 6. Can we make any assumptions about the quality of the data we are using?• One of the problems about the recent explosion in the amount of data is that some data (particularly collected from social networking sites) is of dubious quality – A straw pole of my students found that 1 in 5 deliberately enter incorrect data about themselves online to protect their identity• We might not have any assurance that the data is true or that it is correctly linked to metadata – Is data typed? – Is the data related to other data? How is it related? – Are relationships between data and its meaning being lost?
  7. 7. 3A view of different data models
  8. 8. Is there a significant difference between structured and unstructured data?• How is data structured?• Does the underlying data model matter?• What are the options for a data model?• Over the years many models of data have evolved and most are still in use• Data models used give insights into assumptions about the semantics of the data
  9. 9. Finding  meaning  from  ‘flat’  data• A  problem  with  ‘flat’  or  unstructured  data   representations is that it has traditionally been difficult to aggregate and present to users in a way that they can understand• In contrast, structured data can be summarised easily and its structure represents the meaning of data within an organization• Data analytics are changing this by presenting  accessible  information  from  ‘flat’   data
  10. 10. Can the underlying structure of thedata affect what you can do with it?• The short answer from my research is ‘YES’• How it affects what you can do with the data is the long answer – It is really easy to store a piece of data but retrieving it (intact with its meaning and its relationships to other data) is more difficult – When  ‘Big  Data’  technologies  are  used  to   knowledge and insights from the data we should be sure that the technology is not introducing new problems
  11. 11. Impedance mismatch problems• Moving data from one paradigm to another often causes the meaning to be lost• Can cause problems for developers who move data from one paradigm to another• Also a problem for end users who may lose the connections
  12. 12. A way forward• Working out goals in your data management• Understanding the structure of the data you are using, wherever it comes from• Getting assurance about the quality of the data• Then having confidence that the knowledge and insights are based in firm foundations
  13. 13. Thank youAny questions?
  14. 14. References1. Carter, P (2011) , Big Data Analytics: Future Architectures, Skills and Roadmaps for the CIO, SAS White paper, IDC Go-to-Market Services2. E. Gianchandani. Obama administration unveils $200m big data r&d initiative. In The Computing Community Consortium (CCC) Blog, 2012.3. Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)
  15. 15. Event Detec on on Twier Simon Wibberley Text Analy cs Group University of Sussex simon.wibberley@sussex.ac.uk
  16. 16. What are Events? We just don’t know.
  17. 17. Event CategoriesWell Reported Relatively Easy Interesting Interesting Very TrickyPoorly Reported Constrained Unconstrained
  18. 18. Algorithms• Query Driven – Volume / rate analysis of matching data – Addresses constrained event type• Data Driven – Mine stream for interes ng data – Addresses unconstrained event type
  19. 19. GB Dressage Gold
  20. 20. London Riots
  21. 21. London Riots
  22. 22. Event Characterisa on• Fill in unknowns• Self explanatory for (very) constrained events• Select representa ve / well formed Tweet[s]• Term relevance / clustering• Topic analysis• Geo-loca on / En ty extrac on
  23. 23. CASM• Centre for the Analysis of Social Media• Collabora on between DEMOS and TAG• Applying text analy cs to social media to answer sociological ques ons• OSI funded EU sen ment anaylsis pilot project hp://www.demos.co.uk/projects/casm/
  24. 24. EthicsIdentityPreserving Judiciary Stasi Social Science Me! Anonymous Narrow Broad Reffin, J (2012)

×