SlideShare a Scribd company logo
MK99 – Big Data 1
Big data
&
cross-platform analytics
MOOC lectures Pr. Clement Levallois
MK99 – Big Data 2
Note
• You will find terms squared like this in the slides.
• These terms are part of your quizz assignment for
the week, to be found on the online platform.
• Often technical terms, it is vital that you know
their meaning, as they are the basic vocabulary of
data science.
MK99 – Big Data 3
What you we learn here:
• The definition of data
• The many ways to speak about data.
MK99 – Big Data 4
What is data?
• Definition:
– Originally, data is plural for “datum”, a Latin word
– a “datum” is a single factual, a single entity, a single point of matter.
– Datums are most often called “data points”.
– Data represents a collection of data points.
• We speak also of datasets instead of data (so a dataset is a collection of data points).
– Today, “data” is used in a singular or plural form.
-> “My data is…”, but we sometimes still hear “My data are…”
MK99 – Big Data 5
Examples!
• A date
• A color
• A grade
• An address
• A price
• A number of friends
• A longitude
• An index of poverty
• An item in a catalogue
• A sound frequency
• A list of favorite
movies
• A movie
• A number of clicks on
a web page
• A duration
• A book
• An author of a book
• A vote at an election
• A still image
• A measurement of
CO2
• A response to a
consumer survey
• A purchase ticket
• A curriculum vitae
• Your blood pressure
MK99 – Big Data 6
Data or Metadata?
• Metadata: this is some data describing some other data.
• Example:
– The bibliographical reference describing a book.
– Key takeaway: data without metadata can be worthless
-> What would you do with a pile of 10,000 books without any indication on their title,
authors, or date of publication?
– The difference between data and metadata is not always relevant
-> In the alumni network dataset, what is data and what is metadata?
The metadata The data
MK99 – Big Data 7
Data: how to talk about it
• Example of some data point -> “Four more years. http://t.co/bAJE6Vom”
This textual data is in digital form
(because it is stored in bits on a computer, not by hand writing on a piece of paper)
(as opposed to analog).
The tweet is textual
(as opposed to numerical. In programming, text can also be called a String)
this is the type (or format) of the data
The tweet appears plain text
“plain text” is one sort of format for text.
Others formats are JSON, XML or CSV
this is the format of the data
The text of the tweet is encoded in UTF-8 this is the encoding of the data
The tweet is part of a list of tweets I collected this is the data structure
The tweet is stored in a Word file on my laptop this is the format of the data
Notice the
ambiguity in the
terminology!
MK99 – Big Data 8
Data stored in tables: vocabulary
Rows, or lines.
Each represents
a data point
Columns. Each represents an
attribute of the data.
Header: these are the
names of the attributes.
A value.
(can be
empty).
A spreadsheet, or a table.
This is still the most common
way to represent a dataset.
MK99 – Big Data 9
Data and size.
• The size of data gives an idea of what can be done with it and the
challenges it might pose.
• The size of a dataset can be expressed in number of datapoints.
– Often called lines because we store them as lines in a spreadsheet
• Or the size can be expressed in terms of the storage space the data
takes on a computer drive (see next slide).
– A dataset with 23,000 lines and 16 columns takes ~ 2.6Mb when
presented as an Excel file.
MK99 – Big Data 10
Bytes!
1 bit Can store a yes / no value
8 bits 1 byte (or octet) Can store a single letter
~ 1,000 bytes 1 kilobyte (kb) Can store a paragraph
~ 1 million bytes 1 megabyte (Mb) Can store a low res picture.
~ 1 billion bytes 1 gigabyte (Gb) Can store a movie
~ 1 trillion bytes 1 terabyte (Tb) Can store 1,000 movies. Size of
commercial hard drives in 2014.
~ 1,000 trillion bytes 1 petabyte (Pb) 20 Pb = Google Maps in 2013
Most
firms
today
MK99 – Big Data 11
Much more…
• Make the readings for Week 1.
• Watch the video on big data, also in Week 1.
• Start following #bigdata and #dataanalytics on
Twitter.

More Related Content

What's hot

Data visualization
Data visualizationData visualization
Data visualization
Jan Willem Tulp
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
Bernard Marr
 
Data and information
Data and informationData and information
Data and information
Buxoo Abdullah
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Gajanand Sharma
 
Data preparation
Data preparationData preparation
Data preparation
Tony Nguyen
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
Trinath
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
DataminingTools Inc
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
SSaudia
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
Amanda Whitmire
 
Four data types Data Scientist should know
Four data types Data Scientist should knowFour data types Data Scientist should know
Four data types Data Scientist should know
Ranjit Nambisan
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and models
sabah N
 
Data, knowledge and information
Data, knowledge and informationData, knowledge and information
Data, knowledge and information
Haa'Meem Mohiyuddin
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
Salah Amean
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
Bernard Marr
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsakbhanj
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
Dr. Hamdan Al-Sabri
 

What's hot (20)

Data visualization
Data visualizationData visualization
Data visualization
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Data and information
Data and informationData and information
Data and information
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preparation
Data preparationData preparation
Data preparation
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Four data types Data Scientist should know
Four data types Data Scientist should knowFour data types Data Scientist should know
Four data types Data Scientist should know
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and models
 
Data, knowledge and information
Data, knowledge and informationData, knowledge and information
Data, knowledge and information
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 

Viewers also liked

PrePARe: What is 'data'?
PrePARe: What is 'data'?PrePARe: What is 'data'?
PrePARe: What is 'data'?
dspace_cam
 
Présentation FrenchWeb: Qu'est-ce que la visualisation des données?
Présentation FrenchWeb: Qu'est-ce que la visualisation des données?Présentation FrenchWeb: Qu'est-ce que la visualisation des données?
Présentation FrenchWeb: Qu'est-ce que la visualisation des données?
Clement Levallois
 
Research Data Management Planning: problems and solutions
Research Data Management Planning: problems and solutionsResearch Data Management Planning: problems and solutions
Research Data Management Planning: problems and solutions
Arhiv družboslovnih podatkov
 
Reserve bank of india
Reserve bank of india Reserve bank of india
Reserve bank of india
Priyanshi Joshi
 
What is big data?
What is big data?What is big data?
What is big data?
Clement Levallois
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An Introduction
GarethKnight
 
Data, information & its attributes uwsb
Data, information & its attributes   uwsbData, information & its attributes   uwsb
Data, information & its attributes uwsbArnab Roy Chowdhury
 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGENeeraj Goswami
 
Data Strategy
Data StrategyData Strategy
Data Strategy
Jeff Block
 

Viewers also liked (10)

PrePARe: What is 'data'?
PrePARe: What is 'data'?PrePARe: What is 'data'?
PrePARe: What is 'data'?
 
Présentation FrenchWeb: Qu'est-ce que la visualisation des données?
Présentation FrenchWeb: Qu'est-ce que la visualisation des données?Présentation FrenchWeb: Qu'est-ce que la visualisation des données?
Présentation FrenchWeb: Qu'est-ce que la visualisation des données?
 
Research Data Management Planning: problems and solutions
Research Data Management Planning: problems and solutionsResearch Data Management Planning: problems and solutions
Research Data Management Planning: problems and solutions
 
Reserve bank of india
Reserve bank of india Reserve bank of india
Reserve bank of india
 
What is big data?
What is big data?What is big data?
What is big data?
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An Introduction
 
Data, information & its attributes uwsb
Data, information & its attributes   uwsbData, information & its attributes   uwsb
Data, information & its attributes uwsb
 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGE
 
Introduction to computers by abdul rahaman
Introduction to computers by abdul rahamanIntroduction to computers by abdul rahaman
Introduction to computers by abdul rahaman
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 

Similar to What is "data"?

Bioinformatics&Databases.ppt
Bioinformatics&Databases.pptBioinformatics&Databases.ppt
Bioinformatics&Databases.ppt
BlackHunt1
 
Database
DatabaseDatabase
Text Mining
Text MiningText Mining
Text Mining
sathish sak
 
Info systems databases
Info systems databasesInfo systems databases
Info systems databases
MR Z
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
Prakash Zodge
 
Database_Introduction.pdf
Database_Introduction.pdfDatabase_Introduction.pdf
Database_Introduction.pdf
Satyanarayan Shenoy
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
sirishaYerraboina1
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
Suleman Memon
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
Jiaheng Lu
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
Wollo UNiversity
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
thamizh arasi
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
Elsevier
 
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
Dipen Parmar
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.ppt
Prabin Pandit
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
Dios Kurniawan
 
L2 identifying photos
L2   identifying photosL2   identifying photos
L2 identifying photos
MrJRogers
 
Database Management Systems 1
Database Management Systems 1Database Management Systems 1
Database Management Systems 1
Nickkisha Farrell
 
Hector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsHector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business Analytics
Erika Marr
 
RDMS AND SQL
RDMS AND SQLRDMS AND SQL
RDMS AND SQL
milanmehta7
 
MS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database ConceptsMS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database Concepts
DataminingTools Inc
 

Similar to What is "data"? (20)

Bioinformatics&Databases.ppt
Bioinformatics&Databases.pptBioinformatics&Databases.ppt
Bioinformatics&Databases.ppt
 
Database
DatabaseDatabase
Database
 
Text Mining
Text MiningText Mining
Text Mining
 
Info systems databases
Info systems databasesInfo systems databases
Info systems databases
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Database_Introduction.pdf
Database_Introduction.pdfDatabase_Introduction.pdf
Database_Introduction.pdf
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...Kskv kutch university DBMS unit 1  basic concepts, data,information,database,...
Kskv kutch university DBMS unit 1 basic concepts, data,information,database,...
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.ppt
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
L2 identifying photos
L2   identifying photosL2   identifying photos
L2 identifying photos
 
Database Management Systems 1
Database Management Systems 1Database Management Systems 1
Database Management Systems 1
 
Hector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsHector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business Analytics
 
RDMS AND SQL
RDMS AND SQLRDMS AND SQL
RDMS AND SQL
 
MS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database ConceptsMS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database Concepts
 

More from Clement Levallois

Part 2: covid-19 on Twitter, with a focus on 3 new seed accounts
Part 2: covid-19 on Twitter, with a focus on 3 new seed accountsPart 2: covid-19 on Twitter, with a focus on 3 new seed accounts
Part 2: covid-19 on Twitter, with a focus on 3 new seed accounts
Clement Levallois
 
Education et intelligence artificielle
Education et intelligence artificielleEducation et intelligence artificielle
Education et intelligence artificielle
Clement Levallois
 
3 familles d'intelligence artificielle et leurs applications business
3 familles d'intelligence artificielle et leurs applications business3 familles d'intelligence artificielle et leurs applications business
3 familles d'intelligence artificielle et leurs applications business
Clement Levallois
 
Presentation of programming languages for beginners
Presentation of programming languages for beginnersPresentation of programming languages for beginners
Presentation of programming languages for beginners
Clement Levallois
 
Umigon: crowdsourcing in the classroom
Umigon: crowdsourcing in the classroomUmigon: crowdsourcing in the classroom
Umigon: crowdsourcing in the classroom
Clement Levallois
 
Data visualization: enjeux pour le business
Data visualization: enjeux pour le businessData visualization: enjeux pour le business
Data visualization: enjeux pour le business
Clement Levallois
 
Twitter for beginners
Twitter for beginnersTwitter for beginners
Twitter for beginners
Clement Levallois
 
An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for business
Clement Levallois
 
Data and personalization
Data and personalizationData and personalization
Data and personalization
Clement Levallois
 
A Primer on Text Mining for Business
A Primer on Text Mining for BusinessA Primer on Text Mining for Business
A Primer on Text Mining for Business
Clement Levallois
 
The business stakes of data integration
The business stakes of data integrationThe business stakes of data integration
The business stakes of data integration
Clement Levallois
 

More from Clement Levallois (11)

Part 2: covid-19 on Twitter, with a focus on 3 new seed accounts
Part 2: covid-19 on Twitter, with a focus on 3 new seed accountsPart 2: covid-19 on Twitter, with a focus on 3 new seed accounts
Part 2: covid-19 on Twitter, with a focus on 3 new seed accounts
 
Education et intelligence artificielle
Education et intelligence artificielleEducation et intelligence artificielle
Education et intelligence artificielle
 
3 familles d'intelligence artificielle et leurs applications business
3 familles d'intelligence artificielle et leurs applications business3 familles d'intelligence artificielle et leurs applications business
3 familles d'intelligence artificielle et leurs applications business
 
Presentation of programming languages for beginners
Presentation of programming languages for beginnersPresentation of programming languages for beginners
Presentation of programming languages for beginners
 
Umigon: crowdsourcing in the classroom
Umigon: crowdsourcing in the classroomUmigon: crowdsourcing in the classroom
Umigon: crowdsourcing in the classroom
 
Data visualization: enjeux pour le business
Data visualization: enjeux pour le businessData visualization: enjeux pour le business
Data visualization: enjeux pour le business
 
Twitter for beginners
Twitter for beginnersTwitter for beginners
Twitter for beginners
 
An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for business
 
Data and personalization
Data and personalizationData and personalization
Data and personalization
 
A Primer on Text Mining for Business
A Primer on Text Mining for BusinessA Primer on Text Mining for Business
A Primer on Text Mining for Business
 
The business stakes of data integration
The business stakes of data integrationThe business stakes of data integration
The business stakes of data integration
 

Recently uploaded

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 

Recently uploaded (20)

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 

What is "data"?

  • 1. MK99 – Big Data 1 Big data & cross-platform analytics MOOC lectures Pr. Clement Levallois
  • 2. MK99 – Big Data 2 Note • You will find terms squared like this in the slides. • These terms are part of your quizz assignment for the week, to be found on the online platform. • Often technical terms, it is vital that you know their meaning, as they are the basic vocabulary of data science.
  • 3. MK99 – Big Data 3 What you we learn here: • The definition of data • The many ways to speak about data.
  • 4. MK99 – Big Data 4 What is data? • Definition: – Originally, data is plural for “datum”, a Latin word – a “datum” is a single factual, a single entity, a single point of matter. – Datums are most often called “data points”. – Data represents a collection of data points. • We speak also of datasets instead of data (so a dataset is a collection of data points). – Today, “data” is used in a singular or plural form. -> “My data is…”, but we sometimes still hear “My data are…”
  • 5. MK99 – Big Data 5 Examples! • A date • A color • A grade • An address • A price • A number of friends • A longitude • An index of poverty • An item in a catalogue • A sound frequency • A list of favorite movies • A movie • A number of clicks on a web page • A duration • A book • An author of a book • A vote at an election • A still image • A measurement of CO2 • A response to a consumer survey • A purchase ticket • A curriculum vitae • Your blood pressure
  • 6. MK99 – Big Data 6 Data or Metadata? • Metadata: this is some data describing some other data. • Example: – The bibliographical reference describing a book. – Key takeaway: data without metadata can be worthless -> What would you do with a pile of 10,000 books without any indication on their title, authors, or date of publication? – The difference between data and metadata is not always relevant -> In the alumni network dataset, what is data and what is metadata? The metadata The data
  • 7. MK99 – Big Data 7 Data: how to talk about it • Example of some data point -> “Four more years. http://t.co/bAJE6Vom” This textual data is in digital form (because it is stored in bits on a computer, not by hand writing on a piece of paper) (as opposed to analog). The tweet is textual (as opposed to numerical. In programming, text can also be called a String) this is the type (or format) of the data The tweet appears plain text “plain text” is one sort of format for text. Others formats are JSON, XML or CSV this is the format of the data The text of the tweet is encoded in UTF-8 this is the encoding of the data The tweet is part of a list of tweets I collected this is the data structure The tweet is stored in a Word file on my laptop this is the format of the data Notice the ambiguity in the terminology!
  • 8. MK99 – Big Data 8 Data stored in tables: vocabulary Rows, or lines. Each represents a data point Columns. Each represents an attribute of the data. Header: these are the names of the attributes. A value. (can be empty). A spreadsheet, or a table. This is still the most common way to represent a dataset.
  • 9. MK99 – Big Data 9 Data and size. • The size of data gives an idea of what can be done with it and the challenges it might pose. • The size of a dataset can be expressed in number of datapoints. – Often called lines because we store them as lines in a spreadsheet • Or the size can be expressed in terms of the storage space the data takes on a computer drive (see next slide). – A dataset with 23,000 lines and 16 columns takes ~ 2.6Mb when presented as an Excel file.
  • 10. MK99 – Big Data 10 Bytes! 1 bit Can store a yes / no value 8 bits 1 byte (or octet) Can store a single letter ~ 1,000 bytes 1 kilobyte (kb) Can store a paragraph ~ 1 million bytes 1 megabyte (Mb) Can store a low res picture. ~ 1 billion bytes 1 gigabyte (Gb) Can store a movie ~ 1 trillion bytes 1 terabyte (Tb) Can store 1,000 movies. Size of commercial hard drives in 2014. ~ 1,000 trillion bytes 1 petabyte (Pb) 20 Pb = Google Maps in 2013 Most firms today
  • 11. MK99 – Big Data 11 Much more… • Make the readings for Week 1. • Watch the video on big data, also in Week 1. • Start following #bigdata and #dataanalytics on Twitter.