SlideShare a Scribd company logo
1 of 25
Download to read offline
January 2013
               at
     University of Brighton

http://meetup.com/Big-Data-Brighton
Agenda
•   Miltos Petridis, Professor of Computer Science, University
    of Brighton

•   Dr Patricia Roberts, Senior Lecturer & Researcher in
    database design, development and management,
    University of Brighton - Structured vs Unstructured Data:
    why structure matters.

•   Simon Wibberley, PhD student in computational linguistics
    at the Text Analytics Group at the University of Sussex.
    Real-time text stream analysis, event detection, and entity
    recognition. Event detection on Twitter.

•   Kevin Long, Teradata - Summary and Business context
Big Data

“A  new  generation  of  technologies  and  
    architectures, designed to economically
    extract value from very large volumes of a
    wide variety of data, by enabling high-speed
    capture,  discovery  and/or  analysis”1
New investment initiatives are coming, such as
    in the US in 2012:
“more  than  $200  million  in  new  funding  
    through six agencies and departments to
    improve  the  nation’s   ability to extract
    knowledge and insights from large and
    complex collections  of  digital  data”  2
Knowledge and insights... hmm
Before companies rush to use the technologies
    they should be asking some questions:


• Can we make any assumptions about the
  quality of the data we are using?

• Is there a significant difference between
  structured and unstructured data?

• Can the underlying structure of the data
  affect what you can do with it?
In this brief talk, I will be examining these
   questions with reference to my research and
   recent trends
Can we make any assumptions about
 the quality of the data we are using?
• One of the problems about the recent explosion
  in the amount of data is that some data
  (particularly collected from social networking
  sites) is of dubious quality
   – A straw pole of my students found that 1 in 5
     deliberately enter incorrect data about themselves
     online to protect their identity
• We might not have any assurance that the data is
  true or that it is correctly linked to metadata
   – Is data typed?
   – Is the data related to other data? How is it related?
   – Are relationships between data and its meaning
     being lost?
3
A view of different data models
Is there a significant difference
    between structured and unstructured
                     data?
• How is data structured?
• Does the underlying data model matter?
• What are the options for a data model?
• Over the years many models of data have
  evolved and most are still in use
• Data models used give insights into
  assumptions about the semantics of the data
Finding  meaning  from  ‘flat’  data

• A  problem  with  ‘flat’  or  unstructured  data  
  representations is that it has traditionally
  been difficult to aggregate and present to
  users in a way that they can understand
• In contrast, structured data can be
  summarised easily and its structure
  represents the meaning of data within an
  organization
• Data analytics are changing this by
  presenting  accessible  information  from  ‘flat’  
  data
Can the underlying structure of the
data affect what you can do with it?
• The short answer from my research is
  ‘YES’
• How it affects what you can do with the
  data is the long answer
   – It is really easy to store a piece of data but
     retrieving it (intact with its meaning and
     its relationships to other data) is more
     difficult
   – When  ‘Big  Data’  technologies  are  used  to  
     knowledge and insights from the data we
     should be sure that the technology is not
     introducing new problems
Impedance mismatch problems

• Moving data from one paradigm to another
  often causes the meaning to be lost
• Can cause problems for developers who
  move data from one paradigm to another
• Also a problem for end users who may lose
  the connections
A way forward
• Working out goals in your data management
• Understanding the structure of the data you
  are using, wherever it comes from
• Getting assurance about the quality of the
  data
• Then having confidence that the knowledge
  and insights are based in firm foundations
Thank you

Any questions?
References
1.   Carter, P (2011) , Big Data Analytics: Future
     Architectures, Skills and Roadmaps for the CIO, SAS
     White paper, IDC Go-to-Market Services
2.   E. Gianchandani. Obama administration unveils
     $200m big data r&d initiative. In The Computing
     Community Consortium (CCC) Blog, 2012.
3.   Renzo Angles and Claudio Gutierrez. 2008. Survey of
     graph database models. ACM Comput. Surv. 40, 1,
     Article 1 (February 2008)
Event Detec	on on Twier

      Simon Wibberley
     Text Analy	cs Group
     University of Sussex
    simon.wibberley@sussex.ac.uk
What are Events?   We just don’t know.
Event Categories
Well Reported
                   Relatively Easy       Interesting




                   Interesting           Very Tricky
Poorly Reported


                  Constrained        Unconstrained
Algorithms
• Query Driven
   –   Volume / rate analysis of matching data
   –   Addresses constrained event type
• Data Driven
   –   Mine stream for interes	ng data
   –   Addresses unconstrained event type
GB Dressage Gold
London Riots
London Riots
Event Characterisa	on
• Fill in unknowns
• Self explanatory for (very) constrained events
• Select representa	ve / well formed Tweet[s]
• Term relevance / clustering
• Topic analysis
• Geo-loca	on / En	ty extrac	on
CASM
• Centre for the Analysis of Social Media
• Collabora	on between DEMOS and TAG
• Applying text analy	cs to social media to
  answer sociological ques	ons
• OSI funded EU sen	ment anaylsis pilot project
   hp://www.demos.co.uk/projects/casm/
Ethics
Identity
Preserving    Judiciary             Stasi




              Social Science        Me!
 Anonymous


             Narrow                 Broad
                                            Reffin, J (2012)

More Related Content

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Featured

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Big Data Brighton | Big Data in Academia | Jan 2013

  • 1. January 2013 at University of Brighton http://meetup.com/Big-Data-Brighton
  • 2. Agenda • Miltos Petridis, Professor of Computer Science, University of Brighton • Dr Patricia Roberts, Senior Lecturer & Researcher in database design, development and management, University of Brighton - Structured vs Unstructured Data: why structure matters. • Simon Wibberley, PhD student in computational linguistics at the Text Analytics Group at the University of Sussex. Real-time text stream analysis, event detection, and entity recognition. Event detection on Twitter. • Kevin Long, Teradata - Summary and Business context
  • 3.
  • 4. Big Data “A  new  generation  of  technologies  and   architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-speed capture,  discovery  and/or  analysis”1 New investment initiatives are coming, such as in the US in 2012: “more  than  $200  million  in  new  funding   through six agencies and departments to improve  the  nation’s   ability to extract knowledge and insights from large and complex collections  of  digital  data”  2
  • 5. Knowledge and insights... hmm Before companies rush to use the technologies they should be asking some questions: • Can we make any assumptions about the quality of the data we are using? • Is there a significant difference between structured and unstructured data? • Can the underlying structure of the data affect what you can do with it?
  • 6. In this brief talk, I will be examining these questions with reference to my research and recent trends
  • 7. Can we make any assumptions about the quality of the data we are using? • One of the problems about the recent explosion in the amount of data is that some data (particularly collected from social networking sites) is of dubious quality – A straw pole of my students found that 1 in 5 deliberately enter incorrect data about themselves online to protect their identity • We might not have any assurance that the data is true or that it is correctly linked to metadata – Is data typed? – Is the data related to other data? How is it related? – Are relationships between data and its meaning being lost?
  • 8. 3 A view of different data models
  • 9. Is there a significant difference between structured and unstructured data? • How is data structured? • Does the underlying data model matter? • What are the options for a data model? • Over the years many models of data have evolved and most are still in use • Data models used give insights into assumptions about the semantics of the data
  • 10. Finding  meaning  from  ‘flat’  data • A  problem  with  ‘flat’  or  unstructured  data   representations is that it has traditionally been difficult to aggregate and present to users in a way that they can understand • In contrast, structured data can be summarised easily and its structure represents the meaning of data within an organization • Data analytics are changing this by presenting  accessible  information  from  ‘flat’   data
  • 11. Can the underlying structure of the data affect what you can do with it? • The short answer from my research is ‘YES’ • How it affects what you can do with the data is the long answer – It is really easy to store a piece of data but retrieving it (intact with its meaning and its relationships to other data) is more difficult – When  ‘Big  Data’  technologies  are  used  to   knowledge and insights from the data we should be sure that the technology is not introducing new problems
  • 12. Impedance mismatch problems • Moving data from one paradigm to another often causes the meaning to be lost • Can cause problems for developers who move data from one paradigm to another • Also a problem for end users who may lose the connections
  • 13. A way forward • Working out goals in your data management • Understanding the structure of the data you are using, wherever it comes from • Getting assurance about the quality of the data • Then having confidence that the knowledge and insights are based in firm foundations
  • 15. References 1. Carter, P (2011) , Big Data Analytics: Future Architectures, Skills and Roadmaps for the CIO, SAS White paper, IDC Go-to-Market Services 2. E. Gianchandani. Obama administration unveils $200m big data r&d initiative. In The Computing Community Consortium (CCC) Blog, 2012. 3. Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)
  • 16. Event Detec on on Twier Simon Wibberley Text Analy cs Group University of Sussex simon.wibberley@sussex.ac.uk
  • 17. What are Events? We just don’t know.
  • 18. Event Categories Well Reported Relatively Easy Interesting Interesting Very Tricky Poorly Reported Constrained Unconstrained
  • 19. Algorithms • Query Driven – Volume / rate analysis of matching data – Addresses constrained event type • Data Driven – Mine stream for interes ng data – Addresses unconstrained event type
  • 23. Event Characterisa on • Fill in unknowns • Self explanatory for (very) constrained events • Select representa ve / well formed Tweet[s] • Term relevance / clustering • Topic analysis • Geo-loca on / En ty extrac on
  • 24. CASM • Centre for the Analysis of Social Media • Collabora on between DEMOS and TAG • Applying text analy cs to social media to answer sociological ques ons • OSI funded EU sen ment anaylsis pilot project hp://www.demos.co.uk/projects/casm/
  • 25. Ethics Identity Preserving Judiciary Stasi Social Science Me! Anonymous Narrow Broad Reffin, J (2012)