Open Data ≠ Open Source - Monktoberfest 2012

•Download as PPTX, PDF•

0 likes•1,665 views

Black Duck by Synopsys

Rich Sands's presentation on Open Data from Monktoberfest 2012

Technology

Open Data
≠
Open Source

Rich Sands – Product Manager @ Ohloh.net

@ohloh #ohloh

Open Definition

A piece of content or data is open if anyone
is free to use, reuse, and redistribute it –
subject to only, at most, the requirement to
attribute and/or share-alike.

A piece of is
open if anyone is free to use, reuse, and
redistribute it – subject to only, at most, the
requirement to attribute and/or share-alike.

A of content or data is open if
anyone is free to use, reuse, and redistribute
it – subject to only, at most, the requirement
to attribute and/or share-alike.

[T]he first person to find and report a particular fact
has not created the fact; he or she has merely
discovered its existence. . . . Census-takers, for
example, do not “create” the population figures that
emerge from their efforts; in a sense, they copy
these figures from the world around them. . . .
Census data therefore do not trigger copyright
because these data are not “original” in the
constitutional sense.

-Justice Sandra Day O’Connor,
Feist Publications, Inc.
v. Rural Telephone Service

A piece of content or data is open if
, reuse, and redistribute it
– subject to only, at most, the requirement to
attribute and/or share-alike.

A piece of content or data is open if anyone
is free to use, , and redistribute it
– subject to only, at most, the requirement to
attribute and/or share-alike.

A piece of content or data is open if
anyone is free to use, reuse, and
it – subject to
only, at most, the requirement to
attribute and/or share-alike.

Crowdsource

Grow Grow
Adoption Adoption

Crowdsource

Original: I need yellow & blue.
I will contribute
there.

Data Forks, Often Not So Easy
0000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 0000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52
0000010 00 00 00 96 00 00 00 f8 08 04 00 00 00 a4 f6 d7 0000010 00 00 02 2c 00 00 01 4a 08 06 00 00 00 3e 32 04
0000020 43 00 00 00 09 70 48 59 73 00 00 0b 13 00 00 0b 0000020 33 00 00 17 54 69 43 43 50 49 43 43 20 50 72 6f
0000030 13 01 00 9a 9c 18 00 00 03 18 69 43 43 50 50 68 0000030 66 69 6c 65 00 00 78 01 d5 79 67 54 54 cb b3 6f
0000040 6f 74 6f 73 68 6f 70 20 49 43 43 20 70 72 6f 66 0000040 ef 3d 79 98 21 33 64 18 72 ce 39 67 90 24 51 a2
0000050 69 6c 65 00 00 78 da 63 60 60 9e e0 e8 e2 e4 ca 0000050 92 73 66 c8 41 04 04 24 28 08 28 22 a0 28 a8 88
0000060 24 c0 c0 50 50 54 52 e4 1e e4 18 19 11 19 a5 c0 0000060 a8 20 4a 94 24 0a 8a 78 10 51 50 11 95 20 12 44
0000070 7e 9e 81 8d 81 99 81 81 81 81 81 21 31 b9 b8 c0 0000070 c5 80 a0 a0 bc 8d 9e 73 fe f7 ad 7b ef b7 f7 e5
0000080 31 20 c0 87 81 81 81 21 2f 3f 2f 95 01 15 30 32 0000080 f5 5a d3 f3 9b aa ea ea da bb 3a 54 d5 00 c0 b9
0000090 30 7c bb c6 c0 c8 c0 c0 c0 70 59 d7 d1 c5 c9 95 0000090 e2 1d 15 15 06 33 02 10 1e 11 4b b3 37 33 a4 ba
00000a0 81 34 c0 9a 5c 50 54 c2 c0 c0 70 80 81 81 c1 28 00000a0 b8 ba 51 71 53 00 02 28 40 00 22 80 e2 ed 1b 13
00000b0 25 b5 38 99 81 81 e1 0b 03 03 43 7a 79 49 41 09 00000b0 65 60 6b 6b 05 fe d7 f6 6d 02 91 46 da 63 99 1d
00000c0 03 03 63 0c 03 03 83 48 52 76 41 09 03 03 63 01 00000c0 5d ff ab d8 ff cc 60 f2 f3 8f f1 05 00 b2 45 d8
00000d0 03 03 83 48 76 48 90 33 03 03 63 0b 03 03 13 4f 00000d0 3e 7e 31 be e1 08 be 01 00 6c e8 1b 45 8b 05 00
00000e0 49 6a 45 09 03 03 03 83 73 7e 41 65 51 66 7a 46 00000e0 b5 81 d0 47 13 62 a3 10 8c be 87 60 56 1a 62 20
00000f0 89 82 a1 a5 a5 a5 82 63 4a 7e 52 aa 42 70 65 71 00000f0 82 a7 76 70 e0 1f bc ba 83 7d 7e 63 0c fa b7 8c
0000100 49 6a 6e b1 82 67 5e 72 7e 51 41 7e 51 62 49 6a 0000100 a3 bd 11 00 18 0e 00 f0 24 6f 6f 5a 20 00 64 61
0000110 0a 03 03 03 d4 0e 06 06 06 06 5e 97 fc 12 05 f7 0000110 84 4e 8d f7 0d 44 f4 90 8d 01 c0 32 47 f8 05 47
0000120 c4 cc 3c 05 23 03 55 06 2a 83 88 c8 28 05 08 0b 0000120 00 c0 e2 82 60 5d df 20 6f 3f 00 38 cb 11 19 e9
0000130 11 3e 08 31 04 48 2e 2d 2a 83 07 25 03 83 00 83 0000130 f0 f0 c8 1d 7c 07 c1 e2 3e ff 45 4f e0 7f c1 de
0000140 02 83 01 83 03 43 00 43 22 43 3d c3 02 86 a3 0c 0000140 de 3e ff ea f4 f6 0e fc 17 ff 79 16 64 24 32 b1
0000150 6f 18 c5 19 5d 18 4b 19 57 30 de 63 12 63 0a 62 0000150 71 70 4c 54 98 77 d2 ef 1f ff 2f bb f0 b0 38 e4
0000160 9a c0 74 81 59 98 39 92 79 21 f3 1b 16 4b 96 0e 0000160 7d fd 6e cc 48 4f 8a 08 db bd e3 1b 0a f2 59 f4
0000170 96 5b ac 7a ac ad ac f7 d8 2c d9 a6 b1 7d 63 0f 0000170 f3 36 b6 44 be 79 90 cf af a8 b0 df 3e 43 64 20
0000180 67 df cd a1 c4 d1 c5 f1 85 33 91 f3 02 97 23 d7 0000180 2e ff 88 3d 0e 08 6d 07 4b 47 f8 ec b6 f9 1b eb
0000190 16 6e 4d ee 05 3c 52 3c 53 79 85 78 27 f1 09 f3 0000190 06 d0 4c ed 11 8c 8c 85 6c a3 62 0d 77 30 f2 ce
00001a0 4d e3 97 e1 5f 2c a0 23 b0 43 d0 55 f0 8a 50 aa 00001a0 a0 80 a8 58 5b c7 bf e9 69 c9 41 46 bb 11 4c 42
00001b0 d0 0f e1 5e 11 15 91 bd a2 e1 a2 5f c4 26 89 1b 00001b0 e8 c7 fc 63 4c fe d1 73 26 c4 db 62 c7 67 f4 08
00001c0 89 5f 91 a8 90 94 93 3c 26 95 2f 2d 2d 7d 42 a6 00001c0 bd 99 16 67 bf 07 c1 c2 08 ee 8b 89 77 30 41 30
00001d0 4c 56 5d f6 96 5c 9f bc 8b fc 1f 85 ad 8a 85 4a 00001d0 b2 a2 a0 37 c9 41 8e ce 7f cb 7c f5 f3 37 fe 9b
00001e0 7a 4a 6f 95 d7 aa 14 a8 9a a8 fe 54 3b a8 de a5 00001e0 0e c3 01 c1 a6 e6 7f 64 60 e6 e0 58 f3 9d b9 58
00001f0 11 aa a9 a4 f9 41 eb 80 f6 24 9d 54 5d 2b 3d 41 00001f0 11 9f 0b 86 46 5a ee d8 80 cc 05 ab 02 4b 10 06
0000200 bd 57 fa 47 0c 16 18 d6 1a c5 18 db 9a c8 9b 32 0000200 fc 41 1c a0 21 7d 04 90 01 56 c0 08 18 ff dd cb
0000210 9b be 34 bb 60 be d3 62 89 e5 04 ab 3a eb 5c 9b 0000210 80 00 e0 8d 70 e2 11 5e 0c 08 05 6f 11 1c 8e 8c
0000220 38 db 40 3b 57 7b 6b 07 63 47 1d 27 35 67 25 17 0000220 88 44 c6 44 22 98 fa b7 9c d1 7f a3 98 fe 1e 17
0000230 05 57 79 37 05 77 65 0f 75 4f 5d 2f 13 6f 1b 1f 0000230 88 8c fb bf 35 52 81 2f 22 1b f7 ef 9c 7f 66 a3
0000240 77 df 60 bf 04 ff fc 80 fa c0 89 41 4b 83 77 85 0000240 22 73 fe a3 33 18 f8 21 f8 1f ba 37 32 c7 0e 6f
0000250 5c 0c 7d 19 ce 14 21 17 69 15 15 11 5d 11 33 33 0000250 c7 ba 18 cf e0 cc ff cc f9 8f c4 8e be df d6 c8
0000260 76 4f dc 83 04 b6 44 dd a4 b0 e4 86 94 35 a9 37 0000260 37 c8 2f c9 6f fd 63 13 5a 14 ad 88 56 41 1b a2
0000270 d3 39 32 2c 32 33 b3 e6 66 5f cc 65 cf b3 cf af 0000270 75 d0 ba 68 0d 40 45 53 d0 5c 40 06 ad 8c 56 47
0000280 28 d8 54 f8 ae 58 bb 24 ab 74 55 d9 9b 0a fd ca 0000280 1b a0 f5 d0 5a 08 4f 03 98 82 37 88 e6 c0 7f 6c
0000290 92 aa 5d 35 8c b5 5e 75 53 eb 1f 36 ea 35 d5 34 0000290 dc d1 1f de 1c 10 5f 1e 99 a4 e9 14 84 70 77 9e

A piece of content or data is open if anyone
is free to use, reuse, and redistribute it –
subject to , the
requirement to attribute and/or share-alike.

Nice data you got.
I’ll just use it.
Thanks!

Our Data is Open, But Thou Shalt Not
Use Our API To:
Do bad stuff with the data
Violate people’s privacy
Use without attribution
Modify without giving back
Fragment the data by copying the DB
Make $ without our permission
Do other stuff with the data we don’t like

Thanks for asking, but no data dump
is available. Sorry!

How can we open data,
but avoid the pitfalls?

Contact Info
rsands@blackducksoftware.com

Twitter:
• @richsands
• @ohloh
• #ohloh
• #ohlohcode

Ohloh: www.ohloh.net

Viewers also liked

Rus byliny222948

Dvlpt of tourismJeetendra Khilnani

MySQL's NoSQL -- SCaLE 13x Feb. 20, 2015Dave Stokes

Unadev projet tango-aveuglesChristine Hénault

ScaleBase Webinar: Strategies for scaling MySQLScaleBase

Managing the Android Supply Chain and the Role of SPDXBlack Duck by Synopsys

Viewers also liked (6)

Rus byliny

Dvlpt of tourism

MySQL's NoSQL -- SCaLE 13x Feb. 20, 2015

Unadev projet tango-aveugles

ScaleBase Webinar: Strategies for scaling MySQL

Managing the Android Supply Chain and the Role of SPDX

Similar to Open Data ≠ Open Source - Monktoberfest 2012

Abusing archive file formatsAnge Albertini

Horses for Courses: Deep Learning Beyond Niche ApplicationsNikita Johnson

LT SAP HANAネットワークプロトコル初段Koji Shinkubo

talk.pptMatthewTurk7

KILL MD5Ange Albertini

Examining Malware with Pythonmrphilroth

Making performant siteswonko

Dados frequência livre da BE pelos alunos 3º período 2015-16Biblioteca Infanta D. Mafalda

People Power1.What is the purpose of the event specifications gu.docxkarlhennesey

地上デジタル放送のバイナリChisa Youzaka

Appendix e (digital) md5sYury Chemerkin

Copy of Lumber DefectsScott Nunn

Clamp price list-au-dollar 01-2011_3econosteel

CEI Email 5.13.03 (b)Obama White House

How to I/O?C4Media

Learning iPython Notebook Volatility Memory ForensicsVincent Ohprecio

CEI Email 6.3.03 (a)Obama White House

Tata Structura Section/Product ListMRUDEET VALIA

Gpz bearing catalogueBearings Gpz

FLT catalogDEEP CHAND DAYAL CHAND & COMPANY

Similar to Open Data ≠ Open Source - Monktoberfest 2012 (20)

Abusing archive file formats

Horses for Courses: Deep Learning Beyond Niche Applications

LT SAP HANAネットワークプロトコル初段

talk.ppt

KILL MD5

Examining Malware with Python

Making performant sites

Dados frequência livre da BE pelos alunos 3º período 2015-16

People Power1.What is the purpose of the event specifications gu.docx

地上デジタル放送のバイナリ

Appendix e (digital) md5s

Copy of Lumber Defects

Clamp price list-au-dollar 01-2011_3

CEI Email 5.13.03 (b)

How to I/O?

Learning iPython Notebook Volatility Memory Forensics

CEI Email 6.3.03 (a)

Tata Structura Section/Product List

Gpz bearing catalogue

FLT catalog

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

CloudStudio User manual (basic edition):comworks

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

Slack Application Development 101 Slidespraypatel2

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Key Features Of Token Development (1).pptxLBM Solutions

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

GenCyber Cyber Security Day PresentationMichael W. Hawkins

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Understanding the Laravel MVC ArchitecturePixlogix Infotech

How to Remove Document Management Hurdles with X-Docs?XfilesPro

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)

CloudStudio User manual (basic edition):

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

Slack Application Development 101 Slides

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

My Hashitalk Indonesia April 2024 Presentation

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Maximizing Board Effectiveness 2024 Webinar.pptx

Key Features Of Token Development (1).pptx

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

GenCyber Cyber Security Day Presentation

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Next-generation AAM aircraft unveiled by Supernal, S-A2

Unblocking The Main Thread Solving ANRs and Frozen Frames

Understanding the Laravel MVC Architecture

How to Remove Document Management Hurdles with X-Docs?

Open Data ≠ Open Source - Monktoberfest 2012

1. Open Data ≠ Open Source Rich Sands – Product Manager @ Ohloh.net @ohloh #ohloh

2. Open Definition A piece of content or data is open if anyone is free to use, reuse, and redistribute it – subject to only, at most, the requirement to attribute and/or share-alike.

3. Sounds good?

5. A piece of is open if anyone is free to use, reuse, and redistribute it – subject to only, at most, the requirement to attribute and/or share-alike.

6. Content

7. Data

8. A of content or data is open if anyone is free to use, reuse, and redistribute it – subject to only, at most, the requirement to attribute and/or share-alike.

9. [T]he first person to find and report a particular fact has not created the fact; he or she has merely discovered its existence. . . . Census-takers, for example, do not “create” the population figures that emerge from their efforts; in a sense, they copy these figures from the world around them. . . . Census data therefore do not trigger copyright because these data are not “original” in the constitutional sense. -Justice Sandra Day O’Connor, Feist Publications, Inc. v. Rural Telephone Service

11. Europe disagrees.

12. A piece of content or data is open if , reuse, and redistribute it – subject to only, at most, the requirement to attribute and/or share-alike.

13.

14.

15.

16.

17.

18. A piece of content or data is open if anyone is free to use, , and redistribute it – subject to only, at most, the requirement to attribute and/or share-alike.

19.

20.

21.

22. ?

23. A piece of content or data is open if anyone is free to use, reuse, and it – subject to only, at most, the requirement to attribute and/or share-alike.

24. Crowdsource Grow Grow Adoption Adoption Crowdsource

25. Original: I need yellow & blue. I will contribute there.

26. Original: But I need yellow & blue.

27. Not Hypothetical

28. Code Forks Can Be Evaluated

29. Data Forks, Often Not So Easy 0000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 0000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 0000010 00 00 00 96 00 00 00 f8 08 04 00 00 00 a4 f6 d7 0000010 00 00 02 2c 00 00 01 4a 08 06 00 00 00 3e 32 04 0000020 43 00 00 00 09 70 48 59 73 00 00 0b 13 00 00 0b 0000020 33 00 00 17 54 69 43 43 50 49 43 43 20 50 72 6f 0000030 13 01 00 9a 9c 18 00 00 03 18 69 43 43 50 50 68 0000030 66 69 6c 65 00 00 78 01 d5 79 67 54 54 cb b3 6f 0000040 6f 74 6f 73 68 6f 70 20 49 43 43 20 70 72 6f 66 0000040 ef 3d 79 98 21 33 64 18 72 ce 39 67 90 24 51 a2 0000050 69 6c 65 00 00 78 da 63 60 60 9e e0 e8 e2 e4 ca 0000050 92 73 66 c8 41 04 04 24 28 08 28 22 a0 28 a8 88 0000060 24 c0 c0 50 50 54 52 e4 1e e4 18 19 11 19 a5 c0 0000060 a8 20 4a 94 24 0a 8a 78 10 51 50 11 95 20 12 44 0000070 7e 9e 81 8d 81 99 81 81 81 81 81 21 31 b9 b8 c0 0000070 c5 80 a0 a0 bc 8d 9e 73 fe f7 ad 7b ef b7 f7 e5 0000080 31 20 c0 87 81 81 81 21 2f 3f 2f 95 01 15 30 32 0000080 f5 5a d3 f3 9b aa ea ea da bb 3a 54 d5 00 c0 b9 0000090 30 7c bb c6 c0 c8 c0 c0 c0 70 59 d7 d1 c5 c9 95 0000090 e2 1d 15 15 06 33 02 10 1e 11 4b b3 37 33 a4 ba 00000a0 81 34 c0 9a 5c 50 54 c2 c0 c0 70 80 81 81 c1 28 00000a0 b8 ba 51 71 53 00 02 28 40 00 22 80 e2 ed 1b 13 00000b0 25 b5 38 99 81 81 e1 0b 03 03 43 7a 79 49 41 09 00000b0 65 60 6b 6b 05 fe d7 f6 6d 02 91 46 da 63 99 1d 00000c0 03 03 63 0c 03 03 83 48 52 76 41 09 03 03 63 01 00000c0 5d ff ab d8 ff cc 60 f2 f3 8f f1 05 00 b2 45 d8 00000d0 03 03 83 48 76 48 90 33 03 03 63 0b 03 03 13 4f 00000d0 3e 7e 31 be e1 08 be 01 00 6c e8 1b 45 8b 05 00 00000e0 49 6a 45 09 03 03 03 83 73 7e 41 65 51 66 7a 46 00000e0 b5 81 d0 47 13 62 a3 10 8c be 87 60 56 1a 62 20 00000f0 89 82 a1 a5 a5 a5 82 63 4a 7e 52 aa 42 70 65 71 00000f0 82 a7 76 70 e0 1f bc ba 83 7d 7e 63 0c fa b7 8c 0000100 49 6a 6e b1 82 67 5e 72 7e 51 41 7e 51 62 49 6a 0000100 a3 bd 11 00 18 0e 00 f0 24 6f 6f 5a 20 00 64 61 0000110 0a 03 03 03 d4 0e 06 06 06 06 5e 97 fc 12 05 f7 0000110 84 4e 8d f7 0d 44 f4 90 8d 01 c0 32 47 f8 05 47 0000120 c4 cc 3c 05 23 03 55 06 2a 83 88 c8 28 05 08 0b 0000120 00 c0 e2 82 60 5d df 20 6f 3f 00 38 cb 11 19 e9 0000130 11 3e 08 31 04 48 2e 2d 2a 83 07 25 03 83 00 83 0000130 f0 f0 c8 1d 7c 07 c1 e2 3e ff 45 4f e0 7f c1 de 0000140 02 83 01 83 03 43 00 43 22 43 3d c3 02 86 a3 0c 0000140 de 3e ff ea f4 f6 0e fc 17 ff 79 16 64 24 32 b1 0000150 6f 18 c5 19 5d 18 4b 19 57 30 de 63 12 63 0a 62 0000150 71 70 4c 54 98 77 d2 ef 1f ff 2f bb f0 b0 38 e4 0000160 9a c0 74 81 59 98 39 92 79 21 f3 1b 16 4b 96 0e 0000160 7d fd 6e cc 48 4f 8a 08 db bd e3 1b 0a f2 59 f4 0000170 96 5b ac 7a ac ad ac f7 d8 2c d9 a6 b1 7d 63 0f 0000170 f3 36 b6 44 be 79 90 cf af a8 b0 df 3e 43 64 20 0000180 67 df cd a1 c4 d1 c5 f1 85 33 91 f3 02 97 23 d7 0000180 2e ff 88 3d 0e 08 6d 07 4b 47 f8 ec b6 f9 1b eb 0000190 16 6e 4d ee 05 3c 52 3c 53 79 85 78 27 f1 09 f3 0000190 06 d0 4c ed 11 8c 8c 85 6c a3 62 0d 77 30 f2 ce 00001a0 4d e3 97 e1 5f 2c a0 23 b0 43 d0 55 f0 8a 50 aa 00001a0 a0 80 a8 58 5b c7 bf e9 69 c9 41 46 bb 11 4c 42 00001b0 d0 0f e1 5e 11 15 91 bd a2 e1 a2 5f c4 26 89 1b 00001b0 e8 c7 fc 63 4c fe d1 73 26 c4 db 62 c7 67 f4 08 00001c0 89 5f 91 a8 90 94 93 3c 26 95 2f 2d 2d 7d 42 a6 00001c0 bd 99 16 67 bf 07 c1 c2 08 ee 8b 89 77 30 41 30 00001d0 4c 56 5d f6 96 5c 9f bc 8b fc 1f 85 ad 8a 85 4a 00001d0 b2 a2 a0 37 c9 41 8e ce 7f cb 7c f5 f3 37 fe 9b 00001e0 7a 4a 6f 95 d7 aa 14 a8 9a a8 fe 54 3b a8 de a5 00001e0 0e c3 01 c1 a6 e6 7f 64 60 e6 e0 58 f3 9d b9 58 00001f0 11 aa a9 a4 f9 41 eb 80 f6 24 9d 54 5d 2b 3d 41 00001f0 11 9f 0b 86 46 5a ee d8 80 cc 05 ab 02 4b 10 06 0000200 bd 57 fa 47 0c 16 18 d6 1a c5 18 db 9a c8 9b 32 0000200 fc 41 1c a0 21 7d 04 90 01 56 c0 08 18 ff dd cb 0000210 9b be 34 bb 60 be d3 62 89 e5 04 ab 3a eb 5c 9b 0000210 80 00 e0 8d 70 e2 11 5e 0c 08 05 6f 11 1c 8e 8c 0000220 38 db 40 3b 57 7b 6b 07 63 47 1d 27 35 67 25 17 0000220 88 44 c6 44 22 98 fa b7 9c d1 7f a3 98 fe 1e 17 0000230 05 57 79 37 05 77 65 0f 75 4f 5d 2f 13 6f 1b 1f 0000230 88 8c fb bf 35 52 81 2f 22 1b f7 ef 9c 7f 66 a3 0000240 77 df 60 bf 04 ff fc 80 fa c0 89 41 4b 83 77 85 0000240 22 73 fe a3 33 18 f8 21 f8 1f ba 37 32 c7 0e 6f 0000250 5c 0c 7d 19 ce 14 21 17 69 15 15 11 5d 11 33 33 0000250 c7 ba 18 cf e0 cc ff cc f9 8f c4 8e be df d6 c8 0000260 76 4f dc 83 04 b6 44 dd a4 b0 e4 86 94 35 a9 37 0000260 37 c8 2f c9 6f fd 63 13 5a 14 ad 88 56 41 1b a2 0000270 d3 39 32 2c 32 33 b3 e6 66 5f cc 65 cf b3 cf af 0000270 75 d0 ba 68 0d 40 45 53 d0 5c 40 06 ad 8c 56 47 0000280 28 d8 54 f8 ae 58 bb 24 ab 74 55 d9 9b 0a fd ca 0000280 1b a0 f5 d0 5a 08 4f 03 98 82 37 88 e6 c0 7f 6c 0000290 92 aa 5d 35 8c b5 5e 75 53 eb 1f 36 ea 35 d5 34 0000290 dc d1 1f de 1c 10 5f 1e 99 a4 e9 14 84 70 77 9e

30. A piece of content or data is open if anyone is free to use, reuse, and redistribute it – subject to , the requirement to attribute and/or share-alike.

31. Nice data you got. I’ll just use it. Thanks!

32. This is a pain.

33. Our Data is Open, But Thou Shalt Not Use Our API To: Do bad stuff with the data Violate people’s privacy Use without attribution Modify without giving back Fragment the data by copying the DB Make $ without our permission Do other stuff with the data we don’t like Thanks for asking, but no data dump is available. Sorry!

34.

35.

36. How can we open data, but avoid the pitfalls?

37. Contact Info rsands@blackducksoftware.com Twitter: • @richsands • @ohloh • #ohloh • #ohlohcode Ohloh: www.ohloh.net

Editor's Notes

Hi, I’m Rich Sands, product and community wrangler for Ohloh.net, Black Duck Software’s comprehensive databank of FOSS metrics.We recently put Ohloh’s data under a Creative Commons Attribution license. I didn’t know all that much about Open Data before we decided to take this approach. I’ve learned a bit about it in the course of implementing our Open Data initiative, which we rolled out at OSCON in July of this year. That is what this talk is about… understanding some of the pragmatic and legal complications of Open Data, and thinking about how to make the right things happen.So what is open data?
The concept of Open Data owes a lot to the FOSS movement. This definition by the Open Knowledge Foundation issimple, easily understood, and derived mostly from the Open Source definition: (read)This very short, simple definition incorporates all of the critical bits. For data to be open, it must be:Freely redistributable.Allow for derived works.Place no restrictions on field of use, bundling, or use by particular persons or groups, and doesn’t demand particular implementations or technologies.Basically let anyone do anything, as long as they optionally attribute and/or offer modifications.
Sounds good, right?
This is a pretty uncontroversial sounding definition – motherhood and apple pie. Folks who spend time thinking about free software, free culture, open source, and open access probably would expect such a definition for Open Data as essentially allowing complete freedom.It lays out principles that help us judge whether specific data is Open Data. It helps us evaluate existing licenses, and new ones.But this definition, when applied in the real world, with existing laws and legal precedents, and the realities of how data is used and potentially abused, can create some unexpected and possibly unintended results.So lets go through this definition piece by piece and look at some of these consequences, and the differences between data and code that underlie some of these unintended outcomes.
Lets start with the thing itself we’re defining as open - some content or data. Right away we see the definition wrestling with one of the biggest challenges in this area: content and data are quite different, legally. And our definition needs to work within existing legal frameworks, treaties, and agreements so that open data is open everyplace, and there are laws that will allow us to enforce its openness.
Content is something that has been created by a person through original thought. Something written by a person, and made concrete by a visible expression. So a poem, a written paragraph, a song, a drawing, that sort of thing. A piece of source code is content. A map – the visual representation of geodata – is content. Content can be copyrighted. Copyright law is specifically intended to protect content, and there are treaties and conventions that mostly harmonize copyright law around the world so that the protections offered to content in different jurisdictions are similar – things like what is covered, for how long, the specific rights that are protected, and so on.For the purposes of copyright, something is protectable as long as it has some originality. Some spark of creativity. Not much is needed – really if it requires any human mental energy to create it or express it, or if a human applies any judgment to its choice, arrangement, or presentation, then it is copyrightable content.When something is copyrightable, then certain rights to that content are reserved to the author for a period of time. Copyright locks down content such that only the author gets to decide how it is copied and distributed, whether you have to pay someone to make a copy and use it, and any conditions or requirements placed on such copies.Copyright is a really old legal idea – its been around for hundreds of years. In the U.S., the Constitution grants Congress the power to write copyright laws, “…to Promote the Progress of Science and useful Arts.” Its purpose isn’t to reward authors specifically. Rather, its purpose is to establish a limited monopoly on the exploitation of an author’s original content, so that there is an incentive to create. Authors get this monopoly, but only for a time, and copyright is intended to allow others to build on and use the ideas, information, and facts that are conveyed in a work. The whole structure of copyright law and precedent and ideas like “fair use” all derive from both this basic monopoly grant, and the underlying rationale as to why granting it is good for society.And without copyright, open source, open data, free culture – could not be protected. Isn’t it interesting how this works? When you make something open, you are applying a copyright license – a set of rules that govern when and how and to whom your work may be copied and distributed. But the rules you apply state that your work may be freely copied!You get to set the rules. You have the copyright, the monopoly. But you use that right to explicitly free your work – to grant the right to copy and use and distribute your work without any further restrictions. You can’t give those rights away if you don’t have them in the first place! So if you love free culture and open source, you shouldn’t be too keen to abolish copyright.
If Content is the expression of original or creative ideas, data is just the facts. Some examples of facts are the temperature at which water freezes or lead melts. A tide table for Nantucket. A list of names and phone numbers.Data: Information – the fact itself, is not copyrightable.Where do we draw the line though? How little creativity is required to make something content, rather than data?And this leads us to another aspect of the Open Definition….
What do we mean by a “piece” of content or data? The open definition talks about what you can do with an individual chunk of information – content or data. But we care about compilations of information – that is, after all, what a database is, right? If there are separate rights in the collection of information that limit someone’s freedom to use it – well that piece or all pieces like it aren’t open.
There is a famous Supreme Court case in the U.S. from 1991, that established the modern day boundary between content and data and that also addresses the rights in compilations of information. The Feist case held that an alphabetized list of names and phone numbers does not have enough originality to be copyrighted. If you collect a bunch of facts, and write them down in an arrangement that does not entail any human-mental originality, the result cannot be copyrighted. In other words – the law does not grant a monopoly on controlling copies of these facts or an unoriginal collection of these facts.A list of cities and their populations cannot be copyrighted. But a list of cities and their populations, chosen by someone as “The 10 Best Cities For Barbeque” is protected. Human judgment went into the choice of the cities to be included in the list.Lets say you spend a huge amount of time and money and employ a bunch of people to collect some set of facts. Say, the DNA sequence for a human being. Or the topography of the surface of the moon, down to a one foot resolution. Whatever. Feist holds that only the compiler’s selection and arrangements of facts in a compilation may be copyrighted. The individual facts themselves cannot be copyrighted – there is no right to control copying of facts, and they can be copied at will. This is true no matter how hard it is to collect the facts. Copyright does not protect or reward the labor of collecting facts.If you compile that mega database, you can’t copyright it. That means….
Familiar copyright licenses like our friends the GPL, or the Apache Software License, or Creative Commons Attribution, can’t protect databases of facts, or the facts inside databases.Anyone who has access to a database can copy the facts (but not content held in a database – it is separately copyrighted!) and use them any way they want. Mash them up with other data. Lock them up behind a paywall on the Internet. Use them as part of the knowledge needed to build a bomb.“This data is copyright Rich Sands, and licensed under Creative Commons Attribution ShareAlike” is an interesting bit of verbiage that has no effect.It turns out though that if what you’re trying to do is make Open Data, you don’t need this. You need only to have a license that calls out the parts of a database that can be copyrighted, and that uses copyright in those parts to free the database. That would be content, as opposed to data, stored in a database, and any original or creative selection, arrangement, or presentation of the data. The facts themselves are already free.
The Feist case is U.S. law. Things are different in Europe. In 1996, the E.U. adopted a Database Directive that grants additional “sui generis” database rights which protect the labor inherent in collecting a body of facts, even if the facts and their arrangement are entirely unoriginal.This means that an Open Data license must acknowledge these sui generis database rights, and along with the copyrightable elements of a database, explicitly grants others the free use of the collection of data.There are a number of such Open Data licenses out there, but they’re not well-known or understood, and have not been tested. Tricky!
Lets move on to a different aspect of the Open Data definition. This one seems pretty straightforward as well. It isn’t Open Data unless anyone can use content or data for any purpose whatsoever. No discrimination. No restrictions on field of use (“non-commercial” is non-free). This is a familiar idea from the world of FOSS. But where the consequences of freeing code seem overall to be pretty benign – sure, people can use open source code to accomplish evil purposes but the use of data for any purpose whatsoever runs into more fundamental and troubling potential for bad consequences. So do we really mean anyone? Like…
Spammers?
Recruiters? Government agencies and law enforcement? Employers?
Corporations, no matter what they do?
Repressive regimes?
Financial service and Insurance companies?Free culture advocates usually defend freedom on principle – that while the specific consequences of granting unfettered freedom to information and code may be bad, we must defend freedom on principle, because the enormous benefits of freedom can only be had if we accept the downsides as well.I can defend this principle for code. But I don’t know that Open Data’s potential negative consequences can be tolerated without some sort of limitations.One problem we ran into when opening up the Ohloh data is that authorship is inherent in the way source code is developed, and a part of what is open about open source. There is a long-standing desire and convention in the world of FOSS that attribution – knowing and crediting who did what – is central to the establishment of working communities. When you commit code to a FOSS project, you’re publishing as part of that code, that you’re the one that created it. Your identity gets tied to your commits, and as a consequence of how projects work and SCMs are implemented, this means your email becomes public.FOSS developers know this. They accept it as an inevitable consequence of participating in an open process. But Ohloh is collecting all the committer IDs of everyone who has ever contributed to FOSS, and collating those identities in a centralized and conveniently queried database. Sure, recruiters might be able to go into the individual repositories of projects with developers they think could be interesting, and scrape the email addresses. Most recruiters wouldn’t know how to do that, or how to combine search and selection of projects with extracting the IDs from various SCM repositories around the web. Ohloh is far more convenient though. And yes, Ohloh could hide the identities of all the contributors. But that would defeat the purpose of Ohloh!Did someone contributing to a project using their email address as their committer ID really think that this data would be repurposed and republished in a form ready-made for recruiters and spammers to use in targeting them? Probably not. It would be bad indeed if developers decided to curtail their participation in FOSS, because participating becomes an inherent privacy nightmare. It is easy to imagine other scenarios even more disturbing where data collected for entirely different purposes is used in ways not originally contemplated – to bad effect. It seems to me unavoidable that somehow, there needs to be some limits on how Open Data is used, by whom. But isn’t that the very opposite of Open Data?
Lets move on to another bit of the definition – about reuse.This is really about how Open Data can be combined.
There are two options here: is the Open Data under a “ShareAlike” license, or no? “ShareAlike” is like the GPL’s copyleft – if you modify some data you must share the modifications also as Open Data.ShareAlike has similar effects on data as copyleft has on code.
Open Data licensed under a ShareAlike license may be more attractive to contributors. They know that their contributions cannot be mashed up or modified by someone and then those modifications locked-up. So inbound information flow may be enhanced. But outbound use of the data may be inhibited if commercial users believe that using such data puts too much burden of disclosure on their own data. This is not dissimilar to how commercial entities often look at the GPL and copyleft.
Likewise, Open Data licensed under a more permissive license requiring only attribution might dissuade some contributors from participating, since their participation might end up aiding efforts they don’t wish to support. But commercial interests will have less worry about using it, since there is no requirement to disclose their own data or data mixed and modified by them.
Wouldn’t it be nice if there was some sort of compromise that everyone could agree is Open Data, and that creates strong incentives for both contributions and use?
Redistribution is another interesting challenge for Open Data. The challenge comes in what happens to the community, and to the integrity of the data when lots of copies are out there, fulfilling lots of uses, by different players in the ecosystem.
When a community is contributing to a body of data, a virtual cycle develops where useful data attracts people who want to make it even more useful, which in turn attracts more adoption – you have seen this before.When you’re trying to build a large and accurate body of knowledge and put it out as Open Data to the world, this dynamic is your friend. What drives this cycle is the aggregation of the data into a single data set with consistent format and expectations on quality and coverage. This authoritative consistency attracts more participation and makes the data more trustworthy. This is what I mean by integrity.
So what happens when you have Open Data without a ShareAlike requirement? Because the data can be freely redistributed, multiple copies of the data set may spring up and be used by different players for different purposes. Where do contributions get made in such a situation? If contributions are made where data is used, there won’t be a single, authoritative version of data. Rather, the data will end up fragmented, and users will have to evaluate which version of the data is more accurate for their particular needs. Some copies might be missing some data, and over time these versions will diverge. Community engagement will be spread across lots of versions.
When you have Open Data with ShareAlike, there is less chance that the data will fragment. But there is also less likelihood that the data will be as widely adopted. What if you need some combination of Open Data with proprietary data to solve your problem, but the proprietary owners don’t think using a ShareAlike data set fits their model?
This is just what happens with mobile mapping. Google Maps does not use OpenStreetMap data today. Apple’s new IOS 6 Map app needed OpenStreetMap data to fill out some parts of their geo database. If you notice a mistake in IOS 6 Maps and submit a correction, if it is a correction to OpenStreetMap info, Apple will be required to share that correction. But Apple only will use OpenStreetMap data when there are no commercially viable geodata sets for a particular region. Otherwise the corrections will go to their proprietary partners.If OpenStreetMap were not ShareAlike, it is possible that both Google and Apple and others might use it more heavily, and even contribute back. But there wouldn’t be a requirement to do so, and the FOSS cartographic community would be much less keen on contributing.
Another difference between open data and open source: you can read diffs of a fork and understand what is happening and decide whether the original or the forked code best meets your needs.
Data is often not like that. It may be much harder to know which version of a fragmented data set is the one you want to use.
So this brings us to the last element of the Open Data definition I’ll look at today. The Open Data Definition has a familiar bit of language that says you cannot add additional limitations or conditions that constrain the basic freedoms. This stops end-runs around the licenses and rules which lock down data. But it also prevents many of the most likely solutions to the challenges I’ve been talking about.
Here is the usual concept behind using Open Data: grab it in a free download, in aggregate, and have at it.This is the easiest possible way to gain value from Open Data.It is also the fastest route to spam, privacy problems, fragmentation, and other potential evils. What to do?
Mediate access to your Open Data through an API.The data you get from the API is Open Data, licensed freely, and able to be copied and redistributed. But your access to that resource is through an API with Terms of Use, rate limitations, and restrictions on what you can do using the API or website through which you access the data.
This is a way of attaching limitations and conditions of use on Open Data while still putting it out there under an Open Data license. The Terms of Use might say something like this: (read)
The Open Data movement calls limits on data access through APIs, and their corresponding limitations, “Webstacles”. Ouch!But in the end, this was the approach we chose for Ohloh. We were not willing to accept the potential for fragmentation and loss of integrity, or for privacy violations and other evils. We knew we wanted to make Ohloh’s data as useful as possible for all kinds of applications, including commercial ones (but not spam or gratuitous advertising) in order to drive more adoption, engagement, and attribution traffic back to Ohloh. And the Open Data licenses that address the EU’s database directive were too unfamiliar and untested. We chose:Creative Commons Attribution (not ShareAlike)No data dumpsTerms of Use and API License that limit “evil” uses, prohibit wholesale copying of the entire database, and block commercial use without our permission (which we will grant if you aren’t a recruiter, a spammer, an ad network, or similar).Is it open? We think it is open enough so that most people will decide the benefits far outweigh the negatives. There will be people who don’t agree with us and that’s ok!
Our host has just published this very insightful piece on the ultimate value of data in driving revenue, building barriers to entry, and constructing sustainable competitive advantage. Steve’s right – anyone thinking software and data are two separate and distinct things is living in the last decade, or century. Steve concludes that companies would be smart to lock down their data as a “moat” – a barrier to entry – to compensate for the commoditization of software driven by FOSS. As a business strategy that makes sense. But I think it would be profoundly bad for the planet if data becomes a weapon, instead of a tool that can be leveraged for common benefit. It isn’t as easy as just declaring Open Data to be like Open Source.
How do we meet these challenges of fragmentation, intolerable “bad uses”, differing laws in multiple jurisdictions, and the tension between ShareAlike and commercial use, while gaining the benefits of Open Data?Thanks!

Open Data ≠ Open Source - Monktoberfest 2012

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Open Data ≠ Open Source - Monktoberfest 2012

Similar to Open Data ≠ Open Source - Monktoberfest 2012 (20)

More from Black Duck by Synopsys

More from Black Duck by Synopsys (20)

Recently uploaded

Recently uploaded (20)

Open Data ≠ Open Source - Monktoberfest 2012

Editor's Notes