SlideShare a Scribd company logo
1 of 26
Download to read offline
There’s no such thing as “Big Data”

Dr. Andrew Clegg
Data Analytics & Visualization Team
Pearson Technology

Twitter: @andrew_clegg

Opinions are my own.
Speaking of Twitter…


            Many companies think they have a “big data” problem
                        when they really have a big “data problem.”


                              -- @dbasch (Diego Basch), 17 Nov 2012




                                                     Followed the next day by:


                         I interact with companies whose “Big Data” problems
                                                     can be solved on laptops.




2   There's no such thing as “Big Data” l 27/03/13
What is “Big Data” anyway?
    Customer preference and behaviour data is.




3   There's no such thing as “Big Data” l 27/03/13
What is “Big Data” anyway?
    Apparently sensor data is too.




     Image from http://en.wikipedia.org/wiki/Wind_farm (cc) Tomasz Sienicki



4   There's no such thing as “Big Data” l 27/03/13
What is “Big Data” anyway?
    So is web crawl data.




5   There's no such thing as “Big Data” l 27/03/13
What is “Big Data” anyway?
    Definitely social media data…




6   There's no such thing as “Big Data” l 27/03/13
What is “Big Data” anyway?
    … and social media metadata.




     Image from http://www.facebook.com/note.php?note_id=469716398919



7   There's no such thing as “Big Data” l 27/03/13
What is “Big Data” anyway?
    Video, audio and other media, surely.




8   There's no such thing as “Big Data” l 27/03/13
What is “Big Data” anyway?
    Health, medical and life sciences data might well be.




9   There's no such thing as “Big Data” l 27/03/13
What is “Big Data” anyway?
   So particle physics data must be.




     Image from Flickr (cc) “Image Editor”



10 There's no such thing as “Big Data” l 27/03/13
What do all these types of data have in common?




11 There's no such thing as “Big Data” l 27/03/13
What do all these types of data have in common?




                                                    Nothing.




12 There's no such thing as “Big Data” l 27/03/13
What about size?




13 There's no such thing as “Big Data” l 27/03/13
Size isn’t everything.




   All laptop-sized data sets.


14 There's no such thing as “Big Data” l 27/03/13
What about storage?




15 There's no such thing as “Big Data” l 27/03/13
Pick the right architecture for the job.

                                                       Table-structured,       Relational database
         E-commerce data
                                                         transactional           (good old SQL)
                                                     Simple, high-volume,      Columnar database,
               Sensor data
                                                       often aggregated        Hadoop sequencefile
                                                      JSON-based, textual,     Document database,
    Social feeds e.g. Twitter
                                                        nested structures        search engine
       Networks e.g. social                          Directed or undirected       Graph database
          connections                                        graphs              (Aurelius, Neo4j)
                                                     Processed sequentially, Distributed filesystem
               Multimedia
                                                    opaque internal structure (HDFS, Mogile, S3)
    Biochemical sequences                              Complex 2D or 3D       Specialized file formats
         and models                                       structure                  and tools

                     These are just examples. Make the right choice for your application.




16 There's no such thing as “Big Data” l 27/03/13
Pick the right architecture for the job.

                                                              Relational database
                                                                (good old SQL)
                                                              Columnar database,
                       NoSQL?
                                                              Hadoop sequencefile
                                                              Document database,
               These are all as different from each other…      search engine
                                                                 Graph database
                      … as each one of them is from this.       (Aurelius, Neo4j)
                                                             Distributed filesystem
                                                              (HDFS, Mogile, S3)
                                                             Specialized file formats
                                                                    and tools




17 There's no such thing as “Big Data” l 27/03/13
What about analysis?




18 There's no such thing as “Big Data” l 27/03/13
Pick the right methodology for the job.

   Text → topic modelling, sentiment analysis, information extraction


   E-commerce data → propensity analysis, collaborative filtering


   Multimedia → speech-to-text, audio fingerprinting, face recognition


   Clickstream logs → frequent pattern mining, sequence analysis


   Proton-proton collisions from LHC → I have absolutely no idea




19 There's no such thing as “Big Data” l 27/03/13
So why does any of this matter?




20 There's no such thing as “Big Data” l 27/03/13
Because all of these phrases…


   “rethinking our Big Data strategy”


                                  “enterprise-ready Big Data solution”


                                                              “facing the Big Data challenge”



                                                    … are pretty close to meaningless.




21 There's no such thing as “Big Data” l 27/03/13
And statements like these…


   “We can’t do any of that clever Big Data stuff because we’re not
   collecting or storing any Big Data.”


          “We can’t do any of that clever Big Data stuff because we don’t
                         have big enough hardware or expensive tools.”




                          … are signs of opportunities being missed.




22 There's no such thing as “Big Data” l 27/03/13
What should we call it then, Andy?




23 There's no such thing as “Big Data” l 27/03/13
What should we call it then, Andy?




                                                    Data.




24 There's no such thing as “Big Data” l 27/03/13
Thank you for listening!




25 There's no such thing as “Big Data” l 27/03/13

More Related Content

What's hot

History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningJongwook Woo
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionJongwook Woo
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIJongwook Woo
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkJongwook Woo
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Mark Tabladillo
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPCGenoveva Vargas-Solar
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data FundamentalsSmarak Das
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryJongwook Woo
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive AnalysisJongwook Woo
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLJongwook Woo
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraJongwook Woo
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...PROWEBSCRAPER
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataPaco Nathan
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesDATAVERSITY
 

What's hot (20)

History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep Learning
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and Spark
 
Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPC
 
Data mining
Data miningData mining
Data mining
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart Factory
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI era
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
 
Introduction
IntroductionIntroduction
Introduction
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data Challenges
 

Similar to There's no such thing as big data

Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Data lakehouse fallacies
 Data lakehouse fallacies Data lakehouse fallacies
Data lakehouse fallaciesNeil Raden
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...Thomas Rones
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Ramya ppt.pptx
Ramya ppt.pptxRamya ppt.pptx
Ramya ppt.pptxRRamyaDevi
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.docbutest
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic webTony Dobaj
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 
Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Anna Kuhn
 
Teradata Aster Discovery Platform
Teradata Aster Discovery PlatformTeradata Aster Discovery Platform
Teradata Aster Discovery PlatformScott Antony
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 

Similar to There's no such thing as big data (20)

Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Data lakehouse fallacies
 Data lakehouse fallacies Data lakehouse fallacies
Data lakehouse fallacies
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
 
Big Data
Big DataBig Data
Big Data
 
Ramya ppt.pptx
Ramya ppt.pptxRamya ppt.pptx
Ramya ppt.pptx
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic web
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?
 
Big data
Big dataBig data
Big data
 
Teradata Aster Discovery Platform
Teradata Aster Discovery PlatformTeradata Aster Discovery Platform
Teradata Aster Discovery Platform
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, Opportunities
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 

There's no such thing as big data

  • 1.
  • 2. There’s no such thing as “Big Data” Dr. Andrew Clegg Data Analytics & Visualization Team Pearson Technology Twitter: @andrew_clegg Opinions are my own.
  • 3. Speaking of Twitter… Many companies think they have a “big data” problem when they really have a big “data problem.” -- @dbasch (Diego Basch), 17 Nov 2012 Followed the next day by: I interact with companies whose “Big Data” problems can be solved on laptops. 2 There's no such thing as “Big Data” l 27/03/13
  • 4. What is “Big Data” anyway? Customer preference and behaviour data is. 3 There's no such thing as “Big Data” l 27/03/13
  • 5. What is “Big Data” anyway? Apparently sensor data is too. Image from http://en.wikipedia.org/wiki/Wind_farm (cc) Tomasz Sienicki 4 There's no such thing as “Big Data” l 27/03/13
  • 6. What is “Big Data” anyway? So is web crawl data. 5 There's no such thing as “Big Data” l 27/03/13
  • 7. What is “Big Data” anyway? Definitely social media data… 6 There's no such thing as “Big Data” l 27/03/13
  • 8. What is “Big Data” anyway? … and social media metadata. Image from http://www.facebook.com/note.php?note_id=469716398919 7 There's no such thing as “Big Data” l 27/03/13
  • 9. What is “Big Data” anyway? Video, audio and other media, surely. 8 There's no such thing as “Big Data” l 27/03/13
  • 10. What is “Big Data” anyway? Health, medical and life sciences data might well be. 9 There's no such thing as “Big Data” l 27/03/13
  • 11. What is “Big Data” anyway? So particle physics data must be. Image from Flickr (cc) “Image Editor” 10 There's no such thing as “Big Data” l 27/03/13
  • 12. What do all these types of data have in common? 11 There's no such thing as “Big Data” l 27/03/13
  • 13. What do all these types of data have in common? Nothing. 12 There's no such thing as “Big Data” l 27/03/13
  • 14. What about size? 13 There's no such thing as “Big Data” l 27/03/13
  • 15. Size isn’t everything. All laptop-sized data sets. 14 There's no such thing as “Big Data” l 27/03/13
  • 16. What about storage? 15 There's no such thing as “Big Data” l 27/03/13
  • 17. Pick the right architecture for the job. Table-structured, Relational database E-commerce data transactional (good old SQL) Simple, high-volume, Columnar database, Sensor data often aggregated Hadoop sequencefile JSON-based, textual, Document database, Social feeds e.g. Twitter nested structures search engine Networks e.g. social Directed or undirected Graph database connections graphs (Aurelius, Neo4j) Processed sequentially, Distributed filesystem Multimedia opaque internal structure (HDFS, Mogile, S3) Biochemical sequences Complex 2D or 3D Specialized file formats and models structure and tools These are just examples. Make the right choice for your application. 16 There's no such thing as “Big Data” l 27/03/13
  • 18. Pick the right architecture for the job. Relational database (good old SQL) Columnar database, NoSQL? Hadoop sequencefile Document database, These are all as different from each other… search engine Graph database … as each one of them is from this. (Aurelius, Neo4j) Distributed filesystem (HDFS, Mogile, S3) Specialized file formats and tools 17 There's no such thing as “Big Data” l 27/03/13
  • 19. What about analysis? 18 There's no such thing as “Big Data” l 27/03/13
  • 20. Pick the right methodology for the job. Text → topic modelling, sentiment analysis, information extraction E-commerce data → propensity analysis, collaborative filtering Multimedia → speech-to-text, audio fingerprinting, face recognition Clickstream logs → frequent pattern mining, sequence analysis Proton-proton collisions from LHC → I have absolutely no idea 19 There's no such thing as “Big Data” l 27/03/13
  • 21. So why does any of this matter? 20 There's no such thing as “Big Data” l 27/03/13
  • 22. Because all of these phrases… “rethinking our Big Data strategy” “enterprise-ready Big Data solution” “facing the Big Data challenge” … are pretty close to meaningless. 21 There's no such thing as “Big Data” l 27/03/13
  • 23. And statements like these… “We can’t do any of that clever Big Data stuff because we’re not collecting or storing any Big Data.” “We can’t do any of that clever Big Data stuff because we don’t have big enough hardware or expensive tools.” … are signs of opportunities being missed. 22 There's no such thing as “Big Data” l 27/03/13
  • 24. What should we call it then, Andy? 23 There's no such thing as “Big Data” l 27/03/13
  • 25. What should we call it then, Andy? Data. 24 There's no such thing as “Big Data” l 27/03/13
  • 26. Thank you for listening! 25 There's no such thing as “Big Data” l 27/03/13