SlideShare a Scribd company logo
1 of 45
Download to read offline
F R O M D ATA P O I N T S T O
D ATA L A K E S
O P E N D A TA
D R J R O G E L - S A L A Z A R
I M P E R I A L C O L L E G E L O N D O N
A N D U N I V E R S I T Y O F H E R T F O R D S H I R E
J . R O G E L @ P H Y S I C S . O R G
@ Q U A N T U M _ T U N N E L / @ H I D D E N _ N O D E
U S E
# D I A L O G O _ O P E N D ATA
S O C I A L M E D I A
D ATA ?
L E T ’ S S TA R T A T T H E B E G I N N I N G
D ATA
I N T E R C O N N E C T E D
K N O W L E D G E
K N O W L E D G E
L I N K E D
I N F O R M AT I O N
I N F O R M AT I O N
S T R U C T U R E D D ATA
D ATA E V E RY W H E R E !
• Lots of data is being collected 

and warehoused
• Scientific studies
• Web data, e-commerce
• Purchases at department/

grocery stores
• Bank/Credit card 

transactions
• Social network
H O W M U C H D ATA ?
• Google processes 100 PB a day (2014)
• Facebook 600 TB/day (2014)
• Twitter 100 TB/day (2013/14)
• CERN’s Large Hydron Collider (LHC) generates 15
PB a year
640K ought to be
enough for anybody.
Source: https://followthedata.wordpress.com/2014/06/24/data-size-estimates/
Maximilien Brice, © CERN
T H E E A R T H S C O P E
•The Earthscope is also a large
science project. Designed to
track North America's
geological evolution, this
observatory records data
over 3.8 million square miles,
amassing 67 terabytes of
data. It analyses seismic slips
in the San Andreas fault, sure,
but also the plume of magma
underneath Yellowstone and
much, much more.
1.
http://www.msnbc.msn.com/id/44363598/ns/technology_and_science-future_of_technology/#.TmetOdQ--uI
T Y P E O F D ATA
• Relational Data (Tables/
Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic
Web (RDF), …
• Streaming Data
• You can only scan the
data once
W H AT T O D O W I T H
T H E S E D ATA ?
• Aggregation and Statistics
• Data warehouse and
OLAP
• Indexing, Searching, and
Querying
• Keyword based search
• Pattern matching (XML/
RDF)
• Knowledge discovery
• Data Mining
• Statistical Modeling
T H E D ATA
• Fundamental to research
• Basis for writing papers
• Important for experiment
replication
• Meet contractual/funding
requirements
• Settle intellectual property
claims
• Defense against a charge of
fraud
Images from the front covers of Circulation Research – S. Elliott (Van Eyk Lab)
I N D I V I D U A L R E S P O N S I B I L I T Y 

D ATA M A N A G E M E N T
Some aspects to consider:
• Ownership
• Collection
• Storage/protection of confidentiality/sharing
Interpretation and publication
W H AT I S
C O P Y R I G H T
?
- U S C O N S T I T U T I O N
“To promote the progress of science and useful
arts, by securing for limited times to authors and
inventors the exclusive right to their respective
writings and discoveries.”
N O T A T O O L
T O C O N T R O L
A L L C O N T E N T
F O R E V E R I N
A L L M E D I A
A S E T O F R I G H T S
• The right to reproduce the work
• The right to prepare derivative works
• The right to distribute the work
• The right to perform the work
• The right to display the work
• The right to license any of the above to third parties
H O W ?
First, it must meet some basic requirements:
• It must be original.
• It must have some level of creativity.
• It must be in a fixed medium.
In the old-days, you would use this symbol:
Provide a date and register it.
I T ’ S I N S TA N T !
N O WA D A Y S
Copyright protects…
Writing
Choreography
Music
Visual art
Film
Architectural works
Copyright doesn’t protect…
Ideas
Facts
Data (mostly)
Useful articles (that’s patent)
H O W L O N G
D O E S I T L A S T ?
F O R N O W…
The life of the author
plus 70 years
And then?
THE PUBLIC DOMAIN
G E N E R A L R U L E S F O R S TAT U S
Works No Longer Protected by Copyright
• Published before 1923
• Published between '23 and '63, but it depends.
• Authored by the Federal Government (US)
V E R B O S E M O D E …
• All works published in the United States before 1923 are in the public domain.
• Works published after 1922, but before 1978 are protected for 95 years from the date
of publication. If the work was created, but not published, before 1978, the copyright
lasts for the life of the author plus 70 years. However, even if the author died over 70
years ago, the copyright in an unpublished work lasts until December 31, 2002.
• For works published after 1977, the copyright lasts for the life of the author plus 70
years. However, if the work is a work for hire (that is, the work is done in the course of
employment or has been specifically commissioned) or is published anonymously or
under a pseudonym, the copyright lasts between 95 and 120 years, depending on the
date the work is published.
• Lastly, if the work was published between 1923 and 1963, you must check with the U.S.
Copyright Office to see whether the copyright was properly renewed. If the author
failed to renew the copyright, the work has fallen into the public domain and you may
use it.
C O N F U S E D ?
Hard to share
W H Y S H A R E ?
A L L O F T H E M C A N …
A N D S H O U L D B E
S H A R E D !
A L L O F T H E M W H E R E B U I LT U P O N
O T H E R P E O P L E ’ S W O R K
W H Y ?
T E R R E N C E TA O B L O G
H T T P S : / / T E R RY TA O . W O R D P R E S S . C O M
K A G G L E
H T T P S : / / W W W. K A G G L E . C O M / C O M P E T I T I O N S
Public
Domain
All
Rights
Reserve
d
least
restrictive
most
restrictive
A SPECTRUM OF RIGHTS
W H AT I S O P E N
D ATA ?
S O …
Open data is information that is available for
anyone to use, for any purpose, at no cost.
G O O D O P E N D ATA
• Can be linked: shared more easily
• Available in a standard format: easily processed
• Guaranteed availability and consistency: easily reliable
• Traceable: easily trusted
F R O M O P E N
A C C E S S T O O P E N
D ATA
O V E R L E A F
H T T P S : / / W W W. O V E R L E A F. C O M
D RYA D
H T T P : / / D A TA D RYA D . O R G
D ATAV E R S E
H T T P S : / / D A TA V E R S E . H A R VA R D . E D U
D ATA G O V U K
H T T P : / / D A TA . G O V. U K
D ATA G O V U S
H T T P : / / W W W. D A TA . G O V
D AT O S G O B M E X
H T T P : / / D A T O S . G O B . M X
A N Y B I G I N S T I T U T I O N
C O U L D P U B L I S H O P E N D ATA
A N Y O N E E L S E ?
T H E G U A R D I A N
H T T P : / / W W W. T H E G U A R D I A N . C O M / N E W S / D A TA B L O G /
O P E N D ATA 5 0 0
H T T P : / / W W W. O P E N D A TA 5 0 0 . C O M
F I G S H A R E
H T T P : / / F I G S H A R E . C O M
U C I M A C H I N E L E A R N I N G
H T T P : / / A R C H I V E . I C S . U C I . E D U / M L /
T R A N S P O R T F O R L O N D O N
H T T P S : / / T F L . G O V. U K / I N F O - F O R / O P E N - D A TA - U S E R S /
W H AT C A N W E
D O W I T H I T ?
W H AT C A N W E D O W I T H I T ?
W H E R E T O F I N D
O P E N D ATA ?
D ATA H U B
H T T P : / / D A TA H U B . I O
F I G S H A R E
H T T P : / / F I G S H A R E . C O M
R E G I S T E R O F D ATA R E P O S
H T T P : / / W W W. R E 3 D A TA . O R G
D ATA B I B
H T T P : / / D A TA B I B . O R G
D ATA C I T E
H T T P S : / / W W W. D A TA C I T E . O R G
O P E N D O A R
H T T P : / / W W W. O P E N D O A R . O R G
C K A N
H T T P : / / C K A N . O R G
G I T H U B
H T T P S : / / G I T H U B . C O M
G I T H U B - D ATA D RYA D
H T T P S : / / G I T H U B . C O M / D A TA D RYA D
S P I R A L - I M P E R I A L C O L L E G E
H T T P S : / / S P I R A L . I M P E R I A L . A C . U K
D S PA C E
H T T P : / / W W W. D S PA C E . O R G
S O U N D S
G O O D …
N O W
W H AT ?
S I M P L E
G U I D E L I N E S
3 T H I N G S
• Keep it simple
• Engage early and often
• Address common fears
and misunderstandings
4 S T E P S
• Choose your dataset(s)
• Licensing
• Make the data available
• Make it discoverable
D ATA S E T S
• Asking the community
• Cost basis
• Ease of release
• Observe peers
L I C E N S I N G
Data that doesn’t explicitly have an open license is
NOT open data
C O P Y R I G H T O V E R
W O R K S Y O U C R E AT E
A N D A R E O R I G I N A L
T O Y O U .
D ATA B A S E R I G H T
O V E R C O L L E C T I O N S
O F D ATA Y O U H AV E
P U T A S U B S TA N T I A L
E F F O R T I N T O
O B TA I N I N G ,
V E R I F Y I N G O R
P R E S E N T I N G ( O N LY
E U , M E X I C O , B R A Z I L )
O W N E R S H I P
C R E AT I V E C O M M O N S L I C E N S I N G
K N O W T H E
T Y P E S ! !
O P E N D ATA C O M M O N S
H T T P : / / O P E N D A TA C O M M O N S . O R G
AVA I L A B I L I T Y
• Data should be complete
• In a (open) machine-
readable format
• It should contain metadata
H O W ?
• Your website
• Existing repositories
• Creating your own repository
M A K E I T D I S C O V E R A B L E
• Publish it in Public services (Datahub)
• Index it in Catalog (Databib)
• Promote it in your community
• Engage with users
D ATA S C I E N C E A N D
D ATA L A K E S
W H AT I S D ATA S C I E N C E
A set of tools and techniques used to extract useful
information from data.
An interdisciplinary, problem-oriented subject.
T H E I N G R E D I E N T S O F A D ATA
S C I E N T I S T
O N E M O R E
T H I N G !
C O M M U N I C AT I O N
S K I L L S
W H AT I S I T ?
D A TA L A K E
Analytics
DW
Hadoop
Y O U R
T H O U G H T S ?
T H A N K S !
D R J R O G E L - S A L A Z A R
J . R O G E L @ P H Y S I C S . O R G
@ Q U A N T U M _ T U N N E L / @ H I D D E N _ N O D E

More Related Content

What's hot

Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01Colleen Sedgwick
 
WordPress in Higher Education
WordPress in Higher EducationWordPress in Higher Education
WordPress in Higher EducationShane Pearlman
 
Prepare for the Future: Tech Strategies You Need to Know (Part 1)
Prepare for the Future: Tech Strategies You Need to Know (Part 1)Prepare for the Future: Tech Strategies You Need to Know (Part 1)
Prepare for the Future: Tech Strategies You Need to Know (Part 1)ALATechSource
 
Introduction to Creative Commons
Introduction to Creative CommonsIntroduction to Creative Commons
Introduction to Creative CommonsAndres Guadamuz
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Jonathon Hare
 
SEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsSEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsJonathon Hare
 
Drupal Decoupled on ARTE
Drupal Decoupled on ARTEDrupal Decoupled on ARTE
Drupal Decoupled on ARTEjolidog
 
Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)Andrea Volpini
 
Tech rfp template
Tech rfp templateTech rfp template
Tech rfp templateAnna Duin
 
Public Sector Social Media Innovation
Public Sector Social Media Innovation Public Sector Social Media Innovation
Public Sector Social Media Innovation Dustin Haisler
 
RadioActive Europe - Presentation at PLE-Conference 2014
RadioActive Europe - Presentation at PLE-Conference 2014 RadioActive Europe - Presentation at PLE-Conference 2014
RadioActive Europe - Presentation at PLE-Conference 2014 Andreas Auwärter
 
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web DesignMaking Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web DesignJenny Magic
 
CIA For WordPress Developers
CIA For WordPress DevelopersCIA For WordPress Developers
CIA For WordPress DevelopersDavid Brumbaugh
 
INCLUSIVE TRADE: THE RISE OF FAB LABS
INCLUSIVE TRADE: THE RISE OF FAB LABSINCLUSIVE TRADE: THE RISE OF FAB LABS
INCLUSIVE TRADE: THE RISE OF FAB LABSMartina F. Ferracane
 
Open Data & Health: food for thoughts
Open Data & Health: food for thoughtsOpen Data & Health: food for thoughts
Open Data & Health: food for thoughtsMatteo Brunati
 
Social media creates social power
Social media creates social powerSocial media creates social power
Social media creates social powerStephanie G. Brooks
 

What's hot (20)

Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
 
WordPress in Higher Education
WordPress in Higher EducationWordPress in Higher Education
WordPress in Higher Education
 
Prepare for the Future: Tech Strategies You Need to Know (Part 1)
Prepare for the Future: Tech Strategies You Need to Know (Part 1)Prepare for the Future: Tech Strategies You Need to Know (Part 1)
Prepare for the Future: Tech Strategies You Need to Know (Part 1)
 
Introduction to Creative Commons
Introduction to Creative CommonsIntroduction to Creative Commons
Introduction to Creative Commons
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
 
SEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsSEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia Streams
 
Drupal Decoupled on ARTE
Drupal Decoupled on ARTEDrupal Decoupled on ARTE
Drupal Decoupled on ARTE
 
Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)
 
Tech rfp template
Tech rfp templateTech rfp template
Tech rfp template
 
Public Sector Social Media Innovation
Public Sector Social Media Innovation Public Sector Social Media Innovation
Public Sector Social Media Innovation
 
Event Planning & Trends: Design, Technology & F&B
Event Planning & Trends: Design, Technology & F&BEvent Planning & Trends: Design, Technology & F&B
Event Planning & Trends: Design, Technology & F&B
 
South Africa & Data Flows
South Africa & Data FlowsSouth Africa & Data Flows
South Africa & Data Flows
 
RadioActive Europe - Presentation at PLE-Conference 2014
RadioActive Europe - Presentation at PLE-Conference 2014 RadioActive Europe - Presentation at PLE-Conference 2014
RadioActive Europe - Presentation at PLE-Conference 2014
 
Help Ukraine
Help UkraineHelp Ukraine
Help Ukraine
 
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web DesignMaking Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
 
CIA For WordPress Developers
CIA For WordPress DevelopersCIA For WordPress Developers
CIA For WordPress Developers
 
INCLUSIVE TRADE: THE RISE OF FAB LABS
INCLUSIVE TRADE: THE RISE OF FAB LABSINCLUSIVE TRADE: THE RISE OF FAB LABS
INCLUSIVE TRADE: THE RISE OF FAB LABS
 
Open Data & Health: food for thoughts
Open Data & Health: food for thoughtsOpen Data & Health: food for thoughts
Open Data & Health: food for thoughts
 
Social media creates social power
Social media creates social powerSocial media creates social power
Social media creates social power
 
Intro to Hackathons (Winter 2015)
Intro to Hackathons (Winter 2015)Intro to Hackathons (Winter 2015)
Intro to Hackathons (Winter 2015)
 

Viewers also liked

Psikologi modul 3 kb 5
Psikologi modul 3 kb 5Psikologi modul 3 kb 5
Psikologi modul 3 kb 5Uwes Chaeruman
 
Trabalho de fotografia
Trabalho de fotografiaTrabalho de fotografia
Trabalho de fotografiacarolinarosa24
 
RedHawk Certificate
RedHawk CertificateRedHawk Certificate
RedHawk CertificateRaja Sekhar
 
MCA certificate
MCA certificateMCA certificate
MCA certificatealfianwira
 
Collaboration with arctic cruise i norway - profile your company
Collaboration with arctic cruise i norway  - profile your company Collaboration with arctic cruise i norway  - profile your company
Collaboration with arctic cruise i norway - profile your company Arctic Cruise In Norway AS
 
PPAR alpha final research paper
PPAR alpha final research paperPPAR alpha final research paper
PPAR alpha final research paperEric Cobb
 
13 algo pencocokankurva
13 algo pencocokankurva13 algo pencocokankurva
13 algo pencocokankurvaArif Rahman
 
Shell Presentation
Shell Presentation Shell Presentation
Shell Presentation aya.samy
 
Design Thinking - A Short Workshop
Design Thinking - A Short Workshop Design Thinking - A Short Workshop
Design Thinking - A Short Workshop Innomantra
 
Project front page, index, certificate, and acknowledgement
Project front page, index, certificate, and acknowledgementProject front page, index, certificate, and acknowledgement
Project front page, index, certificate, and acknowledgementAnupam Narang
 

Viewers also liked (15)

Psikologi modul 3 kb 5
Psikologi modul 3 kb 5Psikologi modul 3 kb 5
Psikologi modul 3 kb 5
 
Armazém geral icms sp
Armazém geral icms spArmazém geral icms sp
Armazém geral icms sp
 
Trabalho de fotografia
Trabalho de fotografiaTrabalho de fotografia
Trabalho de fotografia
 
Barcelona
BarcelonaBarcelona
Barcelona
 
RedHawk Certificate
RedHawk CertificateRedHawk Certificate
RedHawk Certificate
 
MCA certificate
MCA certificateMCA certificate
MCA certificate
 
Collaboration with arctic cruise i norway - profile your company
Collaboration with arctic cruise i norway  - profile your company Collaboration with arctic cruise i norway  - profile your company
Collaboration with arctic cruise i norway - profile your company
 
C.v. ing. mecanico electricista 3
C.v. ing. mecanico electricista 3C.v. ing. mecanico electricista 3
C.v. ing. mecanico electricista 3
 
PPAR alpha final research paper
PPAR alpha final research paperPPAR alpha final research paper
PPAR alpha final research paper
 
13 algo pencocokankurva
13 algo pencocokankurva13 algo pencocokankurva
13 algo pencocokankurva
 
Shell Presentation
Shell Presentation Shell Presentation
Shell Presentation
 
Biliary Disease
Biliary DiseaseBiliary Disease
Biliary Disease
 
Design Thinking - A Short Workshop
Design Thinking - A Short Workshop Design Thinking - A Short Workshop
Design Thinking - A Short Workshop
 
Visys: a arte de vender
Visys: a arte de venderVisys: a arte de vender
Visys: a arte de vender
 
Project front page, index, certificate, and acknowledgement
Project front page, index, certificate, and acknowledgementProject front page, index, certificate, and acknowledgement
Project front page, index, certificate, and acknowledgement
 

Similar to From Data Points to Data Lakes

Reach Out to Research (R2R) Bergen #uhbib2015
Reach Out to Research (R2R) Bergen #uhbib2015Reach Out to Research (R2R) Bergen #uhbib2015
Reach Out to Research (R2R) Bergen #uhbib2015Guus van den Brekel
 
20151015 earthsimulationoceanusoct
20151015 earthsimulationoceanusoct20151015 earthsimulationoceanusoct
20151015 earthsimulationoceanusoctAnselm Hook
 
Practical Approaches to Managing International Development Projects in the Fa...
Practical Approaches to Managing International Development Projects in the Fa...Practical Approaches to Managing International Development Projects in the Fa...
Practical Approaches to Managing International Development Projects in the Fa...Emanuel Souvairan
 
Listening To a Forest for Project Health
Listening To a Forest for Project HealthListening To a Forest for Project Health
Listening To a Forest for Project HealthShelley Lambert
 
ALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
ALA Midwinter 2015 Tech Wrap-Up: Griffey SlidesALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
ALA Midwinter 2015 Tech Wrap-Up: Griffey SlidesALATechSource
 
Learning online 030215
Learning online 030215Learning online 030215
Learning online 030215Sanna Brauer
 
Accidental Ecologist
Accidental EcologistAccidental Ecologist
Accidental EcologistLaura James
 
Hackathon Tips and Tricks
Hackathon Tips and TricksHackathon Tips and Tricks
Hackathon Tips and TricksDaniel Duan
 
Fab Labs: a global network for local entrepreneurship
Fab Labs: a global network for local entrepreneurshipFab Labs: a global network for local entrepreneurship
Fab Labs: a global network for local entrepreneurshipMartina F. Ferracane
 
UX in E-commerce & Conversion
UX in E-commerce & ConversionUX in E-commerce & Conversion
UX in E-commerce & ConversionElymar Apao
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsRonald Ashri
 
Mobile UX - Things to consider
Mobile UX - Things to considerMobile UX - Things to consider
Mobile UX - Things to considerRichard Hewitt
 
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015Toan Bach Quang Bao
 
Hard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach PlacesHard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach PlacesMike Crabb
 
Final presentation
Final presentationFinal presentation
Final presentationJP Day
 
FinalPresentation_JPD
FinalPresentation_JPDFinalPresentation_JPD
FinalPresentation_JPDJP Day
 
Multimedia information and Media
Multimedia information and MediaMultimedia information and Media
Multimedia information and MediaJalen Rebolledo
 
multimediainfomediac17-180302055121.pdf
multimediainfomediac17-180302055121.pdfmultimediainfomediac17-180302055121.pdf
multimediainfomediac17-180302055121.pdfClaesTrinio
 
2014 Power of Habits - Best Marketing BIZ 21 mai 2014
2014 Power of Habits - Best Marketing BIZ 21 mai 20142014 Power of Habits - Best Marketing BIZ 21 mai 2014
2014 Power of Habits - Best Marketing BIZ 21 mai 2014Julian Walder
 

Similar to From Data Points to Data Lakes (20)

Reach Out to Research (R2R) Bergen #uhbib2015
Reach Out to Research (R2R) Bergen #uhbib2015Reach Out to Research (R2R) Bergen #uhbib2015
Reach Out to Research (R2R) Bergen #uhbib2015
 
20151015 earthsimulationoceanusoct
20151015 earthsimulationoceanusoct20151015 earthsimulationoceanusoct
20151015 earthsimulationoceanusoct
 
Practical Approaches to Managing International Development Projects in the Fa...
Practical Approaches to Managing International Development Projects in the Fa...Practical Approaches to Managing International Development Projects in the Fa...
Practical Approaches to Managing International Development Projects in the Fa...
 
Listening To a Forest for Project Health
Listening To a Forest for Project HealthListening To a Forest for Project Health
Listening To a Forest for Project Health
 
ALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
ALA Midwinter 2015 Tech Wrap-Up: Griffey SlidesALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
ALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
 
Learning online 030215
Learning online 030215Learning online 030215
Learning online 030215
 
Accidental Ecologist
Accidental EcologistAccidental Ecologist
Accidental Ecologist
 
ACM Teach - Hackathon Tips and Tricks - Spring 2014
ACM Teach - Hackathon Tips and Tricks - Spring 2014ACM Teach - Hackathon Tips and Tricks - Spring 2014
ACM Teach - Hackathon Tips and Tricks - Spring 2014
 
Hackathon Tips and Tricks
Hackathon Tips and TricksHackathon Tips and Tricks
Hackathon Tips and Tricks
 
Fab Labs: a global network for local entrepreneurship
Fab Labs: a global network for local entrepreneurshipFab Labs: a global network for local entrepreneurship
Fab Labs: a global network for local entrepreneurship
 
UX in E-commerce & Conversion
UX in E-commerce & ConversionUX in E-commerce & Conversion
UX in E-commerce & Conversion
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
 
Mobile UX - Things to consider
Mobile UX - Things to considerMobile UX - Things to consider
Mobile UX - Things to consider
 
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
 
Hard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach PlacesHard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach Places
 
Final presentation
Final presentationFinal presentation
Final presentation
 
FinalPresentation_JPD
FinalPresentation_JPDFinalPresentation_JPD
FinalPresentation_JPD
 
Multimedia information and Media
Multimedia information and MediaMultimedia information and Media
Multimedia information and Media
 
multimediainfomediac17-180302055121.pdf
multimediainfomediac17-180302055121.pdfmultimediainfomediac17-180302055121.pdf
multimediainfomediac17-180302055121.pdf
 
2014 Power of Habits - Best Marketing BIZ 21 mai 2014
2014 Power of Habits - Best Marketing BIZ 21 mai 20142014 Power of Habits - Best Marketing BIZ 21 mai 2014
2014 Power of Habits - Best Marketing BIZ 21 mai 2014
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 

From Data Points to Data Lakes

  • 1. F R O M D ATA P O I N T S T O D ATA L A K E S O P E N D A TA D R J R O G E L - S A L A Z A R I M P E R I A L C O L L E G E L O N D O N A N D U N I V E R S I T Y O F H E R T F O R D S H I R E J . R O G E L @ P H Y S I C S . O R G @ Q U A N T U M _ T U N N E L / @ H I D D E N _ N O D E U S E # D I A L O G O _ O P E N D ATA S O C I A L M E D I A
  • 2. D ATA ? L E T ’ S S TA R T A T T H E B E G I N N I N G D ATA I N T E R C O N N E C T E D K N O W L E D G E K N O W L E D G E L I N K E D I N F O R M AT I O N I N F O R M AT I O N S T R U C T U R E D D ATA
  • 3. D ATA E V E RY W H E R E ! • Lots of data is being collected 
 and warehoused • Scientific studies • Web data, e-commerce • Purchases at department/
 grocery stores • Bank/Credit card 
 transactions • Social network H O W M U C H D ATA ? • Google processes 100 PB a day (2014) • Facebook 600 TB/day (2014) • Twitter 100 TB/day (2013/14) • CERN’s Large Hydron Collider (LHC) generates 15 PB a year 640K ought to be enough for anybody. Source: https://followthedata.wordpress.com/2014/06/24/data-size-estimates/
  • 4. Maximilien Brice, © CERN T H E E A R T H S C O P E •The Earthscope is also a large science project. Designed to track North America's geological evolution, this observatory records data over 3.8 million square miles, amassing 67 terabytes of data. It analyses seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more. 1. http://www.msnbc.msn.com/id/44363598/ns/technology_and_science-future_of_technology/#.TmetOdQ--uI
  • 5. T Y P E O F D ATA • Relational Data (Tables/ Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data • Social Network, Semantic Web (RDF), … • Streaming Data • You can only scan the data once W H AT T O D O W I T H T H E S E D ATA ? • Aggregation and Statistics • Data warehouse and OLAP • Indexing, Searching, and Querying • Keyword based search • Pattern matching (XML/ RDF) • Knowledge discovery • Data Mining • Statistical Modeling
  • 6. T H E D ATA • Fundamental to research • Basis for writing papers • Important for experiment replication • Meet contractual/funding requirements • Settle intellectual property claims • Defense against a charge of fraud Images from the front covers of Circulation Research – S. Elliott (Van Eyk Lab) I N D I V I D U A L R E S P O N S I B I L I T Y 
 D ATA M A N A G E M E N T Some aspects to consider: • Ownership • Collection • Storage/protection of confidentiality/sharing Interpretation and publication
  • 7. W H AT I S C O P Y R I G H T ? - U S C O N S T I T U T I O N “To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.”
  • 8. N O T A T O O L T O C O N T R O L A L L C O N T E N T F O R E V E R I N A L L M E D I A A S E T O F R I G H T S • The right to reproduce the work • The right to prepare derivative works • The right to distribute the work • The right to perform the work • The right to display the work • The right to license any of the above to third parties
  • 9. H O W ? First, it must meet some basic requirements: • It must be original. • It must have some level of creativity. • It must be in a fixed medium. In the old-days, you would use this symbol: Provide a date and register it. I T ’ S I N S TA N T ! N O WA D A Y S
  • 10. Copyright protects… Writing Choreography Music Visual art Film Architectural works Copyright doesn’t protect… Ideas Facts Data (mostly) Useful articles (that’s patent) H O W L O N G D O E S I T L A S T ?
  • 11. F O R N O W… The life of the author plus 70 years And then? THE PUBLIC DOMAIN G E N E R A L R U L E S F O R S TAT U S Works No Longer Protected by Copyright • Published before 1923 • Published between '23 and '63, but it depends. • Authored by the Federal Government (US)
  • 12. V E R B O S E M O D E … • All works published in the United States before 1923 are in the public domain. • Works published after 1922, but before 1978 are protected for 95 years from the date of publication. If the work was created, but not published, before 1978, the copyright lasts for the life of the author plus 70 years. However, even if the author died over 70 years ago, the copyright in an unpublished work lasts until December 31, 2002. • For works published after 1977, the copyright lasts for the life of the author plus 70 years. However, if the work is a work for hire (that is, the work is done in the course of employment or has been specifically commissioned) or is published anonymously or under a pseudonym, the copyright lasts between 95 and 120 years, depending on the date the work is published. • Lastly, if the work was published between 1923 and 1963, you must check with the U.S. Copyright Office to see whether the copyright was properly renewed. If the author failed to renew the copyright, the work has fallen into the public domain and you may use it. C O N F U S E D ?
  • 13. Hard to share W H Y S H A R E ?
  • 14.
  • 15. A L L O F T H E M C A N … A N D S H O U L D B E S H A R E D ! A L L O F T H E M W H E R E B U I LT U P O N O T H E R P E O P L E ’ S W O R K W H Y ?
  • 16. T E R R E N C E TA O B L O G H T T P S : / / T E R RY TA O . W O R D P R E S S . C O M
  • 17. K A G G L E H T T P S : / / W W W. K A G G L E . C O M / C O M P E T I T I O N S Public Domain All Rights Reserve d least restrictive most restrictive A SPECTRUM OF RIGHTS
  • 18. W H AT I S O P E N D ATA ? S O …
  • 19. Open data is information that is available for anyone to use, for any purpose, at no cost. G O O D O P E N D ATA • Can be linked: shared more easily • Available in a standard format: easily processed • Guaranteed availability and consistency: easily reliable • Traceable: easily trusted
  • 20. F R O M O P E N A C C E S S T O O P E N D ATA
  • 21. O V E R L E A F H T T P S : / / W W W. O V E R L E A F. C O M D RYA D H T T P : / / D A TA D RYA D . O R G
  • 22. D ATAV E R S E H T T P S : / / D A TA V E R S E . H A R VA R D . E D U D ATA G O V U K H T T P : / / D A TA . G O V. U K
  • 23. D ATA G O V U S H T T P : / / W W W. D A TA . G O V D AT O S G O B M E X H T T P : / / D A T O S . G O B . M X
  • 24. A N Y B I G I N S T I T U T I O N C O U L D P U B L I S H O P E N D ATA A N Y O N E E L S E ? T H E G U A R D I A N H T T P : / / W W W. T H E G U A R D I A N . C O M / N E W S / D A TA B L O G /
  • 25. O P E N D ATA 5 0 0 H T T P : / / W W W. O P E N D A TA 5 0 0 . C O M F I G S H A R E H T T P : / / F I G S H A R E . C O M
  • 26. U C I M A C H I N E L E A R N I N G H T T P : / / A R C H I V E . I C S . U C I . E D U / M L / T R A N S P O R T F O R L O N D O N H T T P S : / / T F L . G O V. U K / I N F O - F O R / O P E N - D A TA - U S E R S /
  • 27. W H AT C A N W E D O W I T H I T ? W H AT C A N W E D O W I T H I T ?
  • 28. W H E R E T O F I N D O P E N D ATA ?
  • 29. D ATA H U B H T T P : / / D A TA H U B . I O F I G S H A R E H T T P : / / F I G S H A R E . C O M
  • 30. R E G I S T E R O F D ATA R E P O S H T T P : / / W W W. R E 3 D A TA . O R G D ATA B I B H T T P : / / D A TA B I B . O R G
  • 31. D ATA C I T E H T T P S : / / W W W. D A TA C I T E . O R G O P E N D O A R H T T P : / / W W W. O P E N D O A R . O R G
  • 32. C K A N H T T P : / / C K A N . O R G G I T H U B H T T P S : / / G I T H U B . C O M
  • 33. G I T H U B - D ATA D RYA D H T T P S : / / G I T H U B . C O M / D A TA D RYA D S P I R A L - I M P E R I A L C O L L E G E H T T P S : / / S P I R A L . I M P E R I A L . A C . U K
  • 34. D S PA C E H T T P : / / W W W. D S PA C E . O R G S O U N D S G O O D … N O W W H AT ?
  • 35. S I M P L E G U I D E L I N E S 3 T H I N G S • Keep it simple • Engage early and often • Address common fears and misunderstandings
  • 36. 4 S T E P S • Choose your dataset(s) • Licensing • Make the data available • Make it discoverable D ATA S E T S • Asking the community • Cost basis • Ease of release • Observe peers
  • 37. L I C E N S I N G Data that doesn’t explicitly have an open license is NOT open data C O P Y R I G H T O V E R W O R K S Y O U C R E AT E A N D A R E O R I G I N A L T O Y O U . D ATA B A S E R I G H T O V E R C O L L E C T I O N S O F D ATA Y O U H AV E P U T A S U B S TA N T I A L E F F O R T I N T O O B TA I N I N G , V E R I F Y I N G O R P R E S E N T I N G ( O N LY E U , M E X I C O , B R A Z I L ) O W N E R S H I P
  • 38. C R E AT I V E C O M M O N S L I C E N S I N G K N O W T H E T Y P E S ! !
  • 39. O P E N D ATA C O M M O N S H T T P : / / O P E N D A TA C O M M O N S . O R G AVA I L A B I L I T Y • Data should be complete • In a (open) machine- readable format • It should contain metadata
  • 40. H O W ? • Your website • Existing repositories • Creating your own repository M A K E I T D I S C O V E R A B L E • Publish it in Public services (Datahub) • Index it in Catalog (Databib) • Promote it in your community • Engage with users
  • 41. D ATA S C I E N C E A N D D ATA L A K E S
  • 42. W H AT I S D ATA S C I E N C E A set of tools and techniques used to extract useful information from data. An interdisciplinary, problem-oriented subject.
  • 43. T H E I N G R E D I E N T S O F A D ATA S C I E N T I S T O N E M O R E T H I N G ! C O M M U N I C AT I O N S K I L L S
  • 44.
  • 45. W H AT I S I T ? D A TA L A K E Analytics DW Hadoop Y O U R T H O U G H T S ? T H A N K S ! D R J R O G E L - S A L A Z A R J . R O G E L @ P H Y S I C S . O R G @ Q U A N T U M _ T U N N E L / @ H I D D E N _ N O D E