Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FINODEX open data training

3,199 views

Published on

Handbook about open data focused in training proposers in FINODEX 1st Open Call for proposers.
Data licences, open data business models, open data definition, etc.

Published in: Technology
  • Be the first to comment

FINODEX open data training

  1. 1. OPEN DATA TRAINING MATERIAL November 2014 Page 1
  2. 2. Table of contents 1. Defining Open Data 2. Understanding Law and Licensing 3. Big Data vs. Open Data 4. Open data as part of your business model 5. Case studies: Open Data Business 6. Where do I find Open Data? 7. How to develop your open data business 8. Open Data training materials already available. A list 9. Slides and inspiring presentations: link-o-graphy 10.Recommended videos, audio files and books November 2014 Page 2
  3. 3. 1. Defining Open Data “A key promise of open data is that it can freely accessed and used. Without a clear definition of what exactly that means (e.g. used by whom, for what purpose) there is a risk of dilution especially as open data is attractive for data users” (Pollock, 2014). Main goal of this material is to make sure that people willing to re-use open datasets are aware of what “open” really means. First step we take is to explore some guidelines you find online. The Open Data Institute and Open Knowledge keep posting interesting simple guides and contents, ready for open data publishers and reusers. Let’s start from the basics: What makes data open and The Open definition v2.0. What makes data open? Original contents for this material is provided online at http://theodi.org/guides/what-open-data and http://theodi.org/guides/publishers-guide-open-data-licensing . Open data is data that is made available by organisations, businesses and individuals for anyone to access, use and share. Open data has to have a licence that says it is open data. Without a licence, the data can’t be reused. The licence might also say: ● that people who use the data must credit whoever is publishing it (this is called attribution) ● that people who mix the data with other data have to also release the results as open data (this is called share-alike) ● that people can do whatever they want with your work, if the holder has waived the copyright of database rights (public domain) Example: The Department for Education makes available open data about the performance of schools in England. The data is available as CSV and is available under the Open Government Licence, which only requires reusers to say that they got the data from the Department for Education. These principles for open data are detailed in the Open Definition in the next paragraph. November 2014 Page 3 Good open data ● are rich of documentation and metadata ● can be linked to, so that it can be easily shared and talked about ● is available in a standard, structured format, so that it can be easily processed
  4. 4. Open Definition The Open Definition, created in 2005, is the main international standard for open data and open data licences, and provides principles and guidance for all things “open”. Open Data Mark: indicates compliance with Open Definition Definition You can find the entire updated version of the Open definition at http://opendefinition.org/od/ . The Open Definition is a project by Open Knowledge, that provides details and additional contents as well on its official web page. This material is licensed under a CC 4.0 Attribution https://creativecommons.org/licenses/by/4.0/. Open data is data that can be freely used, shared and built on by anyone, anywhere, for any purpose. The “standard” provided by the Open Definition – common requirements that must be met if a data is to be called “open” – is crucial because much of the value of open data lies in the ease with which different sources of open data can be combined – practically every app or insight made with data requires combining several pieces of data. For example, you need to know the bus timetable and have a map showing bus stops to be able to reach your destination on time. Both legal and technical compatibility is vital, and the Open Definition ensures that openly-licensed data can be combined successfully. This eliminates the risk of a “Tower of Babel” of data, with a proliferation of licences and terms of use for open data leading to complexity and incompatibility. The Open Definition prevents this fragmentation – and resulting destruction in value – by ensuring a common standard for all “open” data. Evidence for the practical success of the effort can be found in the reuse of the definition key principles and language in other important areas including UK and US government policy, and include the transition in terminology from “public sector information” to “open government data”. Thanks to the efforts of many translators in the community, the Open Definition is available in 30+ languages. The Open definition explains what can be defined as open work and open license. The term work is used to denote the item or piece of knowledge being transferred. The term license refers to the legal conditions under which the work is made available. Where no license has been offered this should be interpreted as referring to default legal conditions governing use of the work (for example, copyright or public domain). November 2014 Page 4
  5. 5. November 2014 Page 5 Open Works An open work must satisfy the following requirements in its distribution: ● Open License The work must be available under an open license (as defined in Section 2). Any additional terms accompanying the work (such as terms of use, or patents held by the licensor) must not contradict the terms of the license. ● Access The work shall be available as a whole and at no more than a reasonable one-time reproduction cost, preferably downloadable via the Internet without charge. Any additional information necessary for license compliance (such as names of contributors required for compliance with attribution requirements) must also accompany the work. ● Open Format The work must be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights. Specifically, data should be machine-readable, available in bulk, and provided in an open format (i.e., a format with a freely available published specification which places no restrictions, monetary or otherwise, upon its use) or, at the very least, can be processed with at least one free/libre/open-source software tool.
  6. 6. Open Licenses A license is open if its terms satisfy the following conditions: ● Required Permissions: The license must irrevocably permit (or allow) the following: 1.1 Use: The license must allow free use of the licensed work. 1.2 Redistribution: The license must allow redistribution of the licensed work, including sale, whether on its own or as part of a collection made from works from different sources. 1.3 Modification: The license must allow the creation of derivatives of the licensed work and allow the distribution of such derivatives under the same terms of the original licensed work. 1.4 Separation: The license must allow any part of the work to be freely used, distributed, or modified separately from any other part of the work or from any collection of works in which it was originally distributed. All parties who receive any distribution of any part of a work within the terms of the original license should have the same rights as those that are granted in conjunction with the original work. 1.5 Compilation: The license must allow the licensed work to be distributed along with other distinct works without placing restrictions on these other works. 1.6 Non-discrimination: The license must not discriminate against any person or group. 1.7 Propagation: The rights attached to the work must apply to all to whom it is redistributed without the need to agree to any additional legal terms. 1.8 Application to Any Purpose: The license must allow use, redistribution, modification, and compilation for any purpose. The license must not restrict anyone from making use of the work in a specific field of endeavor. 1.9 No Charge: The license must not impose any fee arrangement, royalty, or other compensation or monetary remuneration as part of its conditions. ● Acceptable Conditions ● The license shall not limit, make uncertain, or otherwise diminish the permissions required in previous section except by the following allowable conditions: Attribution: The license may require distributions of the work to include attribution of contributors, rights holders, sponsors and creators as long as any such prescriptions are not onerous. Integrity: The license may require that modified versions of a licensed work carry a different name or version number from the original work or otherwise indicate what changes have been made. Share-alike: The license may require copies or derivatives of a licensed work to remain under a license the same as or similar to the original Notice: The license may require retention of copyright notices and identification of the license. Source: The license may require modified works to be made available in a form preferred for further modification. November 2014 Page 6
  7. 7. Technical Restriction Prohibition: The license may prohibit distribution of the work in a manner where technical measures impose restrictions on the exercise of otherwise allowed rights. Non-aggression: The license may require modifiers to grant the public additional permissions (for example, patent licenses) as required for exercise of the rights allowed by the license. The license may also condition permissions on not aggressing against licensees with respect to exercising any allowed right (again, for example, patent litigation). A list of conformant licenses is available at http://opendefinition.org/licenses/ . We explore licensing in the next section. November 2014 Page 7
  8. 8. 2. Understanding Law and Licensing In this section, we intend to provide some additional materials on the licenses the applicants are invited to look for. You can find here an extended list of licenses that are conformant with the principles laid out in the Open Definition. Conformant Licenses The following licenses are conformant with the principles set forth in the Open Definition. ● Domain = Domain of application, i.e. what type of material this license should/can be applied to. Note if you are looking for an open license for software, please see Open Source Definition conformant licenses. ● BY = requires attribution ● SA = require share-alike ● Recommended conformant licenses These licenses conform to the Open Definition and are: ● Reusable: Not specific to an organization or jurisdiction. ● Compatible: Must be compatible with at least one of GPL-3.0+, CC-BY-SA-4.0, and ODbL-1.0. Permissive/attribution-only licenses must be compatible with all 3 of the aforementioned licenses, and at least one of Apache-2.0, CC-BY-4.0, and ODC-BY-1.0. ● Current: Widely used and generally considered best practice by a broad spectrum of projects and actors within the domains of applicability of the license. License Domain By SA Comments Creative Commons CCZero (CC0) Content, Data N N Dedicate to the Public Domain (all rights waived) Open Data Commons Public Domain Dedication and Licence (PDDL) Data N N Dedicate to the Public Domain (all rights waived) Creative Commons Attribution 4.0 (CC- BY-4.0) Content, Data Y N Open Data Commons Attribution License(ODC-BY) Data Y N Attribution for data(bases) Creative Commons Attribution Share- Alike 4.0 (CC-BY-SA-4.0) Content, Data Y Y Open Data Commons Open Database Data Y Y Attribution-ShareAlike November 2014 Page 8
  9. 9. License (ODbL) for data(bases) November 2014 Page 9
  10. 10. ● Other conformant licenses These licenses conform to the Open Definition, but do not meet reusability or compatibility requirements for recommended licenses, or have been superseded by newer license versions or newer licenses with similar use cases, or are little-used. These licenses may be reasonable for the particular organization they were crafted for to use, or to use for legacy reasons. Projects outside such contexts are strongly advised to use a recommended conformant license from the list above. License Domain By SA Comments Against DRM Content Y Y Little used. Creative Commons Attribution versions 1.0-3.0 Content Y N Includes all jurisdiction "ports"; Superseded by CC-BY-4.0. Creative Commons Attribution- ShareAlike versions 1.0-3.0 Content Y Y Includes all jurisdiction "ports"; Superseded by CC-BY-SA-4.0. Additionally, CC-BY-SA-1.0 is Incompatible with any other license. Data licence Germany – attribution – version 2.0 Data Y N Non-reusable. For use by Germany government licensors. Note version 1.0 is not approved as conformant. Data licence Germany – Zero – version 2.0 Data N N Non-reusable. For use by Germany government licensors. Note there is no previous version. Design Science License Content Y Y Little used, Incompatible with any other license. EFF Open Audio License Content Y Y Deprecated in favor of CC-BY-SA. Free Art License (FAL) Content Y Y GNU Free Documentation License (GNU FDL) Content Y Y Incompatible with any other license. Only conformant if used with no cover texts and no invariant sections. MirOS License Code, Content Y N Little used. November 2014 Page 10
  11. 11. Open Government Licence Canada 2.0 Content, Data Y N Non-reusable. For use by Canada government licensors. Note version 1.0 is not approved as conformant. Open Government Licence United Kingdom 2.0 and 3.0 Content, Data Y N Non-reusable. For use by UK government licensors; re-uses of OGL-UK-2.0 and OGL-UK-3.0 material may be released under CC-BY or ODC-BY. Note version 1.0 is not approved as conformant. Talis Community License Data Y Y Draft only, Deprecated in favour of ODC licenses. Non-Conformant Licenses Non conformant licenses are usually those that though supporting some of the definition’s principles do not support all of them. ● Creative Commons No-Derivatives Licenses Creative Commons No-Derivatives (by-nd-*) violate OD 1.1#3., “Reuse”, as they do not allow works, in part or in whole, to be re-used in derivative works. Creative Commons licenses with the Noderivs stipulation include: ● Attribution-NoDerivs (by-nd) ● Attribution-NonCommercial-NoDerivs (by-nc-nd) ● ● Creative Commons NonCommercial Creative Commons NonCommercial licenses (by-nc-*) do not support the OD 1.1#8., “No Discrimination Against Fields of Endeavor”, as they exclude usage in commercial activities. Creative Commons licenses with the non-commercial stipulation include: ● Attribution-Noncommercial (by-nc) ● Attribution-NonCommercial-ShareAlike (by-nc-sa) ● Attribution-NonCommercial-NoDerivs (by-nc-nd) November 2014 Page 11
  12. 12. Licence Compatibility The applicants, as reusers and publishers of open data, often need to understand whether the licenses applied to datasets are "compatible". We recommend to the Finodex proposers to have a look at this page: https://github.com/theodi/open-data-licensing/blob/master/guides/licence-compatibility.md The most important step towards understanding compatibility in more detail is to understand the basic provisions of each license. The Creative Commons Rights Expression Language defines some basic facets of licenses, covering Permissions, Requirements and Prohibitions. As the CC licenses are already described using these facets, which are also common to many other licenses, it is possible to put together a matrix that identifies which facets apply to which licenses. Table 1 summarises how a number of licenses can be classified based on these facets. There are several things to note here: ● The Share Alike requirement requires that derived data is published under the same or compatible terms as the original. This places limits on how remixes can be distributed, i.e. only under compatible terms. ● The Derivative Works prohibition limits re-users from distributing any form of derivative work at all. Even if those derivatives are not distributed. However it is still possible to include the database in a collection in which the original is preserved. When it comes to publishing derivatives there are, broadly, two different scenarios to consider: publishing a simple derivative based on a single source, and publishing a remix of several datasets. Once a derivative has been created, then it too can be the source of additional derivation. Derivation is a process that can be repeated either by the original publisher (e.g. mixing in additional further datasets) or by third-parties (e.g to create new derivatives). November 2014 Page 12 Questions about licence compatibility: ● Can some data published with Licence X be combined with some additional data published under Licence Y? ● What license(s) could be applied to a derived or aggregated dataset? ● Are there provisions associated with a licence that inhibit or constrain the creation and Set of questions for open data publishers and reusers Author: David Tarrant ● Do you have rights or permission to publish? ● Do you have rights to use the information/data? ● Is the data derived from other sources?
  13. 13. Further readings: http://www.scribd.com/doc/128356210/Business-considerations-or-privacy-and-open-data-how-not-to-get-caught-out http://www.scribd.com/doc/125638490/Getting-to-grips-with-the-National-Pupil-Database-personal-data-in-an-open-data- world USEFUL GUIDES for reusers and publishers released by The Open Data Institute The ODI Publisher's Guide to Open Data Licensing Source: http://theodi.org/guides/publishers-guide-open-data-licensing In Europe, there are two kinds of rights that you are automatically given over things that you have created: ● you get copyright over works (content) that you create and which are original to you, such as text that you write or photographs you take ● you get a database right over collections of data that you have put a substantial effort into obtaining, verifying or presenting Note: As far as we know the database right only arises within the European Union and in Mexico. In some countries there may be no protection for collections of data. Database right: 15 years since database was last updated Database copyright: Life of author + 70 years from date database was created November 2014 Page 13 Suggestion for the proposers: If you are uncertain about what rights you may have over a piece of content or dataset or how you can use it... Contact the owner. Ask.
  14. 14. If you apply original judgement in putting together a database, for example in choosing which items to include within the database or which information about them to include, you have a copyright over that database, because it is a creative work. For example, if you were to build a database about the best 100 cars, this might involve: ● choosing which cars count as the best cars ● writing a description about each car ● researching and gathering facts about them You would have copyright over the database, because you chose which cars were “best”. You would have copyright over the descriptions, because you wrote them. And you would probably have the database right for the database you’ve built, because you put substantial effort into gathering information about them. Importantly, you don’t own the facts about the cars — anyone else can build their own database containing exactly those facts without violating your database right — but no one else can reuse your database or your descriptions without your permission because you own the copyright over them. You probably do not have a database right if you create the facts in a database, as opposed to gathering them from elsewhere, unless you put substantial effort into verifying or presenting the database. For example, if you own a restaurant and create a database of the dishes that you offer and when you offer them, you probably do not have a database right over that database, though you might have copyright because of the creative judgement involved in working out which dishes should be offered on particular days to provide a balanced menu. Copyright and database right are types of Intellectual Property Rights (IPR). There are other kinds of IPR that you can get, such as patents, trademarks and (some) design rights, which must be registered (for example with the Intellectual Property Office). November 2014 Page 14 Database definition “A collection of independent works, data or materials which are a) arranged in a systematic or methodical way and
  15. 15. ● What About Data From Other Organisations? You might not own all the content or data that you have and use within your organisation. In particular, rather than creating the content or gathering the data yourself, some of the content and data you hold and use within your organisation, and might want to publish, might be: ● completely licensed from someone else ● include an extract of content or data that you have licensed from someone else ● be derived from the content or data that you have licensed from someone else The Reuser’s Guide to Open Data Licensing describes what you can do with content or data that you licence from someone else. If you do reuse that content or data in your own publications, you should indicate the licence under which you are reusing that content, so that people reusing that content or data know what they can do with it. ● What About My Brand? Organisations who publish content or data under an open licence are often concerned that this might enable reusers to also copy their brand. Your brand should be protected through a trade mark. A trade mark restricts how other people use your logo or company name. You will also have copyright on the logo. Although your trade mark will protect you from other people using your logo directly, if your logo is incorporated into some content that you licence, you should make sure the logo is explicitly not covered by that licence, as you will usually want to place additional restrictions on its use (such as its adaptation). For example, if you have written a report that includes your logo, and you want to licence the content of the report under the Creative Commons Attribution licence, you could say: The text, figures and tables in this report are licensed under a Creative Commons Attribution 4.0 International License. What If I Publish the Data on a Website? November 2014 Page 15
  16. 16. You still have rights over your database and your content when you publish them on a website. Others cannot legally extract and reuse a substantial portion of your data or content without your permission. You can also indicate that others should not scrape data from your website through your Terms and Conditions and through technical mechanisms such as robots.txt. There are two sets of open licences. You should use a licence from one of these sets rather than creating your own licence, for three reasons: 1. it’s less work 2. it ensures that the legal language in the licence is correct 3. it makes it a lot easier for reusers to know what they can do with your data ● Open Licences for Creative Content Creative content, such as text, photographs, slides and so on, should be licensed using a Creative Commons Licence. There are three of these that you should consider using for open content: Level of Licence Creative Commons Licence public domain CC0 attribution CC-by attribution & share-alike CC-by-sa Make sure that you use the latest (version 4.0) Creative Commons licenses, which are international. The links in the table above go to the correct licences. There are other types of Creative Commons licences that are not open licences. For example, the Creative Commons Attribution-NonCommercial licence does not allow commercial reuse of November 2014 Page 16
  17. 17. content, and therefore is not an open licence. If you use the Creative Commons licence chooser, only those that are described as “Free Culture” licences are open licences. ● Open Licences for Databases We now recommend that you also use a Creative Commons 4.0 licence for data as well as for content. You may alternatively use a similar set of licences that was created specifically for databases from the Open Data Commons. There are again three levels that you can choose from: Level of Licence Open Data Commons Licence public domain PDDL attribution ODC-by attribution & share-alike ODbL ODBL licence is used for OpenStreetMap. You can find more details here: https://blog.openstreetmap.org/2014/08/06/at-the-edge-of-the- license/ Which Licence Should I Use? The licence that you use should support your open data business model. It is unusual for organisations to place content or data in the public domain as being given attribution for the content or data usually helps to achieve some of the goals of opening it up. It is possible to license content or data under more than one licence, and let reusers choose which licence to use it under. Typically you would dual-license some content or data by making it available under an open licence and under a paid-for licence that does not have the same restrictions. Dual-licensing is typically used with a share-alike licence, as outlined below. November 2014 Page 17
  18. 18. Some open data business models work best with a share-alike licence. For example: ● a share-alike licence will usually be unattractive to commercial businesses who don’t want to open up their own data, so using a share-alike licence coupled with a charged licence can be a good basis for a freemium business model ● when you are collaborating with others to create a shared resource, a share-alike licence can help to ensure that you can bring back into that resource any work that others do on their own copies On the other hand, if you are hoping to gain other benefits for your business through the reuse of your data, using a cross-subsidy business model, you may find that a share-alike licence prevents people from reusing it, and therefore want to avoid having a share-alike restriction. There are two cases where you have no choice over what licence you can use for the content or data that you publish. 1. If you are publishing content or data that is derived from content or data that was licensed to you using a share-alike licence, then you must publish your content or data using that same licence. 2. With very few exceptions, if you are a government department or arms-length body then the content or data that you have created or gathered is owned by the Crown. Unless you have an exemption, granted by the Office of Public Sector Information (OPSI), you must publish this data using the Open Government Licence. What Attribution Should I Ask For? If you choose a licence that includes a requirement for attribution, you need to specify what that attribution should look like. In choosing what attribution to ask for, you should consider the ways in which your data or content might be reused, and the fact that it might be combined with other data or content that might require its own attribution. If you want to encourage the reuse of your data or content, you need to make it easy for reusers to satisfy your attribution requirements. There are two things you should document: November 2014 Page 18
  19. 19. 1. What should the attribution include? You will usually want the name of your organisation, and a link to either your organisation’s home page or a page about the data or content you are licensing. Keep this as minimal as possible. 2. Where and how should the attribution be presented? Some attribution requirements specify that the attribution must be presented directly wherever the data is used, and may even specify the size or format of the attribution. These requirements can be difficult to adhere to, particularly for mobile application developers who have limited screen space to include such attributions. Allowing reusers to provide attribution on a separate page makes this easier. Note that under the terms of the licences listed above, when a reuser uses your data or content to add value to or to create new data or content, they cannot relicense your work. Any onward reusers are bound by the same attribution requirements as the direct reusers of your content or data. It’s a good idea to explicitly document this requirement because it might not be obvious to reusers. How Do I Indicate the Licence of Content or Data? You should indicate the licence for content or data you make available using both a human- readable description and computer-readable metadata. The clearer you make it which licence applies to your content or data, the easier it is for reusers to know that they can reuse the content or data you are licensing. The human-readable descriptions and marks that you should use are spelled out on the Creative Commons and Open Data Commons websites: ● Creative Commons licence chooser ● Open Data Commons licences It is best to embed information about the licence that some content or data is available under directly within the content or data. This ensures that the licensing information is carried around with the content or data. In addition to human-readable text, you should provide computer-readable metadata. The separate Publisher’s Guide to the Open Data Rights Statement Vocabulary describes how to do this. If you add your dataset to a catalog, such as data.gov.uk or the Data Hub, you should make sure that you indicate the licence under which the dataset is available within that catalog. This gives November 2014 Page 19
  20. 20. people searching the catalog a quick and easy way of seeing that they will be able to reuse the dataset. November 2014 Page 20
  21. 21. The ODi Reuser's Guide to Open Data Licensing Source: http://theodi.org/guides/reusers-guide-open-data-licensing The fact that you can get hold of some information does not necessarily mean that you can do whatever you want with it. You need to have permission from the owner of that information to do what you want to do. A licence tells you what you can do. But what does it mean to license data? What requirements can a licence place on you? What different licences to publishers use? How can you find out what licence a dataset is available under? This guide answers these questions. Note: This guide focuses on data published by organisations based in the UK. Licensing law is different in different countries, so some of this information might not apply to you if you are reusing information that is published elsewhere. It does not address other potential legal considerations, such as compliance with the Data Protection Act. ● What Do Publishers Own? In Europe, there are two kinds of rights that publishers — organisations or individuals who make available content or data — are given over things that they have created: ● they get copyright over works (content) that they create and which are original to them, such as text that they write or photographs they take ● they get a database right over collections of data that they have put a substantial effort into obtaining, verifying or presenting Note: As far as we know the database right is unique to the European Union. In some countries there may be no protection for collections of data. If someone applies original judgement in putting together a database, for example in choosing which items to include within the database or which information about them to include, they have a copyright over that database, because it is a creative work. For example, if someone were to build a database about the best 100 cars, this might involve: ● choosing which cars count as the best cars ● writing a description about each car ● researching and gathering facts about them November 2014 Page 21
  22. 22. They would have copyright over the database, because they chose which cars were “best”. They would have copyright over the descriptions, because they wrote them. And they would probably have the database right for the database they’ve built, because they put substantial effort into gathering information about the cars. Importantly, they don’t own the facts about the cars — you or anyone else could build your own database containing exactly those facts without violating their database right — but no one else can reuse their database or their descriptions without their permission because they own the copyright over them. Publishers probably do not have a database right if they create the facts in a database, as opposed to gathering them from elsewhere, unless they put substantial effort into verifying or presenting the database. For example, if someone owns a restaurant and creates a database of the dishes that they offer, and when they offer them, they probably do not have a database right over that database, though they might have copyright because of the creative judgement involved in working out which dishes should be offered on particular days to provide a balanced menu. ● What About Data From Third Parties? Publishers might not own all the content or data that they publish themselves. In particular, rather than creating the content or gathering the data themselves, some of the content and data they publish might be: ● completely licensed by them from someone else ● include an extract of content or data that they have licensed from someone else ● be derived from the content or data that they have licensed from someone else When they publish the data, the publisher should tell you about which content or data is owned by another organisation, and under which licence it is being republished. ● What About Brands? Brands are usually protected through a trade mark. A trade mark restricts how you can use an organisation’s logo or company name. They will also have copyright on the logo. Licences for content or data usually explicitly exclude logos and company names, so you cannot, for example, adapt a logo by changing the colours used within it. You also cannot use the company name or logo to lend weight to your product without permission to do so. However, the attribution requirements of a licence may require you to use the company name and logo to indicate that you have reused data owned by that company. ● What Can’t You Do? There are a few things that you can do with content or data without a licence, but in general you need to be given a licence by a publisher if you want to reuse their content or data. Having access to some content or data — for example by downloading it from a publisher’s website — does not give you the right to reuse it. November 2014 Page 22
  23. 23. ● Republishing and Adding Value You do not automatically have the right to republish, in its entirety, content or data that someone else owns, even if they have given you a licence to use it yourself. You need to check the terms of the licence for the content or data to make sure that you can republish it. The same applies if you are adding value to the content or data, for example by automatically adding links or styling to content, or adding columns with extra information into a dataset. The new content or data includes the entirety of someone else’s content or data, so you cannot publish it unless you have their permission. ● Publishing Extracts You have the right to publish extracts of content or databases that you have access to, regardless of what the licence says, so long as the extract is not “substantial”. However, it is often hard to tell if the extract that you have made is “substantial”. The licence that you have been given might let you republish any amount of the content or data (open licences do this). Otherwise, you should take legal advice about whether the extracts that you want to publish are likely to count as substantial or not. ● Publishing Derived Content or Data You might want to create new content or databases by adapting, deriving, or otherwise processing some content or data. To do that, you first have to ensure you have been given a licence to use the data in the first place. You then need to look at what the licence says about creating derived works. For example, say you have been given a licence to use a photograph on your website. You could create a new version of that photograph by changing it from colour to black & white, or by adding a speech bubble to it. In this case, the photograph is a creative work, and the person who took it owns the copyright. Because the photograph is protected by copyright, you can only create these new images if the licence under which you are using the photograph allows you to do so. Copyright can exist in small pieces of content, such as phrases. For example, if you analyse some content to create a new database, you should make sure that you have the right to reuse any snippets of content that you might keep in the new database. If the content includes a presentation of data from a database, you have to consider database rights as well: scraping data from the page might equate to creating an extract. Database rights are slightly different, because they only extend to creating extracts or re-utilising (republishing) a database. For example, say you analysed the data about prescriptions of each drug within each GP practice within the UK, along with other data about the coverage of each practice, to create a new dataset that provided the average spend per patient of each practice. So long as you had no separate contractual obligations to the owners of the two datasets you have brought together, you might well be free to do what you liked with the result, as it would not be possible to reconstruct the original databases from the aggregated data. November 2014 Page 23
  24. 24. ● What Do Licences Say? Licences tell you what you can do with the content or data that you access. A licence will tell you whether you can: ● republish the content or data on your own website ● derive new content or data from it ● make money by selling products that use it ● republish it while charging a fee for access Many licences will let you access content or data for free, but say that you cannot republish it or adapt it, or use it within commercial products. If you break the terms of the licence, the owner of the content or data can take you to court. ● What Do Open Licences Say? An open licence is one that places very few restrictions on what you can do with the content or data that is being licensed. According to the Open Definition, there are only two kinds of restrictions that an open licence can place: ● that you must give attribution to the source of the content or data ● that you must publish any derived content or data under the same licence (this is called share-alike) An open licence might do neither or one or both of these. So, you might encounter content or data available under one of three levels of licence: 1. a public domain licence has no restrictions at all (technically, these indicate that the rights owner has waived their rights to the content or data) 2. an attribution licence just says that you must give attribution to the publisher 3. an attribution & share-alike licence says that you must give attribution and share any derived content or data under the same licence November 2014 Page 24
  25. 25. ● How Do You Provide Attribution? You should provide attribution even if the licence does not require it. Giving attribution is a way of recognising both the efforts that the publisher has made to put together the content or data you are reusing, and their generosity in making it available for reuse. When content or data is licensed using a licence that includes attribution, the publisher might specify: ● what wording the attribution should include ● where and how the attribution should be presented You should follow what the publisher asks you to do. If it is not practical, for example if you are providing a service that does not have room for the attribution statement that they request, then get in touch with them to ask what to do. It is good practice to provide the name of the organisation that published the data or content, and a link to their home page. Specifying the name of the dataset and providing a link to its location also helps other reusers to find the data you are reusing. If you are building a tool that reuses some content or data, you should try to include attribution on every page or screen in which the content or data is used. If this is impractical (for example because you are pulling together information from lots of different sources), you should provide a clear link to a page or screen that then provides attribution information. If you are republishing data or content, its reusers are still bound by the attribution requirements of the original data or content. To make it easier for them to understand and fulfil those requirements, it is good practice to include the attribution for the source data or content in the attribution that you ask for. This might sometimes be impractical, for example because you are creating derived data or content includes data or content from a large number of sources. In these cases, you should provide a full list of the sources and request an attribution which links to that list. ● How Do You Share-Alike? A share-alike licence requires you to republish new content or data that you create using the given content or data under the same, share-alike licence. Creating new ways of presenting data does not count as derivation or adaptation, but combining two sets of data to create a new set probably does. Publishing the content and data that you create from open data, as open data, is a good thing to do even if the licence does not require it. Opening up your content and data enables others to reuse and build on your work, and can add value to your work. ● What Open Licences Are There? There are two sets of open licences that you may encounter. November 2014 Page 25
  26. 26. ● Open Licences for Creative Content Creative content, such as text, photographs, slides and so on, may be licensed using a Creative Commons Licence. There are three of these that you might encounter: Level of Licence Creative Commons Licence public domain CC0 attribution CC-by attribution & share-alike CC-by-sa There are different versions for each of these licences, the most recent being version 4.0. There are also different variants which take into account differences in the law in different countries. The links in the table above are to the version 4.0 versions, which apply internationally, but you may find publishers using other versions. You can reuse content under these licences no matter what country you are in. There are other types of Creative Commons licences that are not open licences. For example, the Creative Commons Attribution-NonCommercial licence does not allow commercial reuse of content, and therefore is not an open licence. The human-readable summaries of the Creative Commons licences spell out exactly what you can do under each licence. ● Open Licences for Databases You might encounter a similar set of licences which is available for databases from the Open Data Commons. There are again three levels: Level of Licence Open Data Commons Licence public domain PDDL attribution ODC-by attribution & share-alike ODbL ● Other Licences There are other licences that enable reuse and which you may encounter, particularly around public sector information: November 2014 Page 26
  27. 27. ● Open Government Licence is an attribution licence that covers both copyright and database right and is mainly used for information made available by UK central government ● OS Open Licence is an attribution licence that is exactly the same as the Open Government Licence but ensures that the attribution is to the Ordnance Survey ● How is the Licence Indicated? The licence under which information is published should be clear both in human-readable content and as machine-readable data. If you cannot work out the licence for information that you discover on the web, you should contact the owner of the site to ask: the lack of licensing information means that you cannot assume the right to reuse the content or data. Human-readable descriptions and marks that you may encounter are shown on the Creative Commons and Open Data Commons websites: ● Creative Commons licence chooser ● Open Data Commons licences Where possible, the publisher should have embedded information about the licence directly within the content or data itself. Often, however, you will have to look at the page from which you access the content or data, or the licence information for the entire website, which is often linked to from the footer of the page. If a publisher adds their dataset to a catalog, such as data.gov.uk or the Data Hub, they may indicate the licence under which the dataset is available in the metadata supplied by the catalog. You should check that this is consistent with any licence information they supply on their own site or within the data itself: if it is not, you should ask them for clarification. Legal tools for open data Open Data Commons is the home of a set of legal tools to help you provide and use Open Data http://opendatacommons.org/ http://opendatacommons.org/faq/licenses/ 3. Big Data vs. Open Data November 2014 Page 27
  28. 28. Big Data vs Open Data - Diagram Source: http://www.opendatanow.com/2013/11/new-big-data-vs-open-data-mapping-it-out/#.VGDCrfSG9Zt As Joel Gurin points out: “there’s general agreement that Open Data should be free of charge or cost just a minimal amount. Starting with some basic descriptions, the intersection of these three concepts (big data, open data, open government) defines the six subtypes of data shown on the Venn diagram. (There’s no separate category for the intersection of Big Data and Open Government – anything in that category is also Open Data.) Here are characteristic examples of each, referring to the numbers above. 1. Big Data that’s not Open Data. A lot of Big Data falls in this category, including some Big Data that has great commercial value. All of the data that large retailers hold on customers’ buying habits, that hospitals hold about their patients, or that banks hold about their credit-card holders, falls here. It’s information that the data-holders own and can use for commercial advantage. National security data, like the data collected by the NSA, is also in this category. 2. Open Government work that’s not Open Data. This is the part of Open Government that focuses purely on citizen engagement. For instance, the White House has started a petition website, called We the People, to open itself to citizen input. While the site makes its data available, publishing Open Data – beyond numbers of signatures – is not its main purpose. 3. Big, Open, Non-Governmental Data. Here we find scientific data-sharing and citizen science projects like Zooniverse. Big data from astronomical observations, from large biomedical projects like the Human Genome Project, or from other sources realizes its greatest value through an open, shared approach. While some of this research may be government-funded, it’s not “government data” because it’s not generally held, maintained, or analyzed by government agencies. This category also includes a very different kind of Open Data: the data that can be analyzed from Twitter and other forms of social media. 4. Open Government Data that’s not Big Data. Government data doesn’t have to be Big Data to be valuable. Modest amounts of data from states, cities, and the federal government can have a major impact when it’s released. This kind of data fuels the participatory budgeting movement, where cities around the world invite their residents to look at the city budget and help decide how to spend it. It’s also the fuel for apps that help people use city services like public buses or health clinics. November 2014 Page 28
  29. 29. 5. Open Data – not Big, not from Government. This includes the private-sector data that companies choose to share for their own purposes – for example, to satisfy their potential investors or to enhance their reputations. Environmental, social, and governance (ESG) metrics fall here. In addition, reputational data, such as data from consumer complaints, is highly relevant to business and falls in this category. 6. Big, Open, Government Data (the trifecta). These datasets may have the most impact of any category. Government agencies have the capacity and funds to gather very large amounts of data, and making those datasets open can have major economic benefits. National weather data and GPS data are the most often-cited examples. U.S. Census data, and data collected by the Securities and Exchange Commission and the Department of Health and Human Services, are others. With the new Open Data Policy, this category will likely become larger, more robust, and even more significant. November 2014 Page 29
  30. 30. November 2014 Page 30 4 key steps These are in very approximate order — many of the steps can be done simultaneously. 1. Choose your dataset(s). Choose the dataset(s) you plan to make open. Keep in mind that you can (and may need to) return to this step if you encounter problems at a later stage. 2. Apply an open license. ○ Determine what intellectual property rights exist in the data. ○ Apply a suitable ‘open’ license that licenses all of these rights and supports the definition of openness. ○ NB: if you can’t do this go back to step 1 and try a different dataset. ○ 3. Make the data available — in bulk and in a useful format. You may also wish to consider alternative ways of making it available such as via an API. 4. Make it discoverable — post on the web and perhaps organize a central catalog to list your open datasets.
  31. 31. 4. Categories and Type of Data Open can apply to information from any source and about any topic. Anyone can release their data under an open licence for free use by and benefit to the public. Although we may think mostly about government and public sector bodies releasing public information such as budgets or maps, or researchers sharing their results data and publications, any organisation can open information (corporations, universities, NGOs, startups, charities, community groups and individuals). There is open information in transport, science, products, education, sustainability, maps, legislation, libraries, economics, culture, development, business, design, finance. So the explanation of what open means applies to all of these information sources and types. Source: http://blog.okfn.org/2013/10/03/defining-open-data/#sthash.nXnXf8Bx.dpuf November 2014 Page 31
  32. 32. Categories Business and Legal services Data/Technology Education Energy Environment and weather Finance and Investment Food and Agriculture Geospatial/Mapping Governance Healthcare Housing/ real estate Insurance Lifestyle and Consumer Media Research and Consulting Scientific Research Transportation November 2014 Page 32
  33. 33. The Open Data Consumers Checklist: Source: http://theodi.org/guides/the-open-data-consumers-checklist The Open Data Handbook: Source: http://opendatahandbook.org/ The handbook introduces you to the legal, social and technical aspects of open data. It can be used by anyone but is especially useful for those working with government data. It discusses the why, what and how of open data — why to go open, what open is, and the how to do open. Read it online or download a PDF . November 2014 Page 33
  34. 34. 4. Open data as part of your business model Al-Debei and Avison (2010) derived a unified business model framework based on a comprehensive review of the literature. They argue that the model provides an abstract but holistic view and that the fundamental dimensions are value based. There are four relevant aspects to the business model framework: ● Value proposition—the business logic for creating value for customers by offering products ● and services for targeted segments, ● Value architecture—an architecture for the technological and organizational infrastructure ● used in the provisioning of products and services, ● Value network—collaboration and coordination with other organizations, and ● Value finance—the costing, pricing, and revenue breakdown associated with sustaining and improving the creation of value. New business models and practices driven by social media and open data have hardly been investigated. Exceptions are the analyses of companies in the United Kingdom (Hammell, Perricos,Lewis, & Branch, 2012) and a classification of social business models based on the revenue model (for instance, Ferro, 2012; Ferro & Osella, 2012; Ferro & Osella, 2013; Ubaldi, 2013). Based on the analysis of a number of companies in the United Kingdom, five archetypes of business models can be identified (Hammell et al., 2012). These include: (1) suppliers—public and private sector organizations—publishing the data, (2) aggregators linking open data to produce useful insights, (3) developers—organizations and individuals—building apps, (4) enrichers using open data to enable their existing products and services, and (5) enablers facilitating the supply and use of open data. Ferro and Osella (2013) identify the following models: 1. Premium—end users are offered a service or product in exchange for payment. 2. Freemium—basic services or products are offered free of charge. Profit is made by having end users pay for extended features. 3. Open source like—data are offered for free through cross subsidization. 4. Infrastructural Razor and Blades—data sets are stored for free and are accessible to everyone via Application Programming Interfaces (APIs) (‘‘razors’’), while reusers are charged only for the computing power that they employ on demand in as-a-service mode (‘‘blades’’). 5. Demand-oriented platform—the company provides developers with a one-stop shop of data sets that are catalogued using metadata. Revenue is made in exchange for advanced services and refined data sets or data flows. 6. Supply-oriented platform—this business model is quite similar to the previous one, but the PSI providers are charged in lieu of developers. 7. Free as branded advertising—the company uses PSI as a tool to attract attention from November 2014 Page 34
  35. 35. customers by providing them with useful services. The company expects that the public will then favor its particular brand or company. Revenue is expected not to come directly from PSI, but from other business lines that represent the company’s core business. 8. White label development—a company wants to use PSI as an attraction tool but does not have the competencies required to do so. The company then uses an advertising factory, which receives payment in the form of a lump sum or recurring fees in exchange for turnkey solutions, depending upon whether the solution is in the form of a product or a service (Ferro & Osella, 2013). The revenue model can be payment by open data providers or users in the form of (1) recurring fees, granting access for a specific time period, or pay per use, (2) advertisement, or (3) ensuring visibility for creating revenue for other activities (Ferro & Osella, 2013). Although these eight options describe a complete array of possible business models, they are derived from the revenue. Infomediary Business Models for Connecting Open Data Providers and Users Available here: http://ssc.sagepub.com/content/early/2014/01/30/0894439314525902.full.pdf+html All infomediary business models can be developed and operated by either public or private organizations. The business model might be initiated by public events (hackathons) but operated by private party, yet when a best practice is adopted the roles can be reversed. The following six business models were identified. 1. Single-purpose apps provide real-time services such as information about weather, quality of restrooms, vehicles, houses, and pollution. These apps often provide a single function, based on one type of open data provided. The app processes the data and presents it visually for the ease of the users. 2. Interactive apps: In addition to single-purpose apps, this type of business model provides users the opportunity to add content. Ratings are often included, as is additional information such as complaints. 3. Information aggregators take many published open data sources and combine and process them for subsequent presentation to the users. An example is a transportation planner that aggregates information from various transport modalities and companies. Often interoperability is a challenge that requires agreements among data providers. 4. Comparison models: This type of business model aggregates open data from various sources for the purpose of comparing the performance of entities with each other. For example, it can be used to compare schools and other public organizations. The data can originate from official sources (school inspection) or from users (criminal chart) and used by citizens (in determining a school for their children or a place to live) and public organizations (in developing measures to improve schools or for crime interventions). 5. Open data repositories are used by governments to publish their information. These can be national open data portals or more specialized portals, such as websites of statistical agencies. The essence is that these portals are relatively closed and only a limited number of public organizations can publish open data on them. There is little to no user interaction, and the focus is on being able to indiscriminately open data sets. Searching for open data is a key aspect, although it is often difficult to find the right information. They can provide basic functionalities for processing and visualizing data. November 2014 Page 35
  36. 36. 6. Service platforms: These platforms provide all kinds of features for searching, importing, cleansing, processing, and visualizing information. Service platforms often contain open data repositories or are connected to open data repositories that function as the data source. Service platforms can vary in the level of openness; some are based on payment (e.g., www.junar. com) whereas others are free of charge (www.engagedata.eu ). Further reading: Business models for open data applications available at: http://www.appsforeurope.eu/article/business-models-open-data-applications November 2014 Page 36
  37. 37. 5. CASE Studies: Open Data Business Success stories about the open datastartups from the ODI Startup Programme November 2014 Page 37 Transport API http://www.transportapi.com Clients: Transport for London, Heathrow Airport. Greater London Authority, Citymapper, Elgin, Giraffe.co.uk, Network rail Products: TransportAPI Achievements: Transport API solutions have powered award winning apps, such as Citymapper The TransportApi story: TransportAPI is Britain’s first comprehensive open platform for transport solutions. the company’s objective is to enhance travel experience through real time information, and enable new transportation insights through analytics. It uses open data feeds from key industry sources as Traveline, Network rail and Transport for London. The company offers nationwide timetables, departure and infrastructure informations for schedules, live departures and archived service running across all transport modes. The data feeds are available for integration by web and app developers. Data Components such as the ‘nearest transport’ widget can be used in travel portals, hyperlocal sites and business analytics. TransportAPI currently has 700 developers and organizations signed up on its platform. They are individual taxpayers, but also public sector organizations like universities and local authorities who are getting free data. As Jonathan Raper, Managing director, says, “Our intervention in the market has led prices for transport data fall and previous monopoly transport data providers to relax their terms.” The company also scales data usage and provides a new, single source option for its customers, like Heathrow Airport, who now use TransportAPI for all their public transport information. Jonathan further explain that “TransportAPI employs 6 people now and the tax we generate per year is nudging £75K”.
  38. 38. November 2014 Page 38 Mastodon C http://www.mastodonc.com Clients: Technology Strategy Board, CDEC’s Open Health Data platform, Nesta Products: Kixi Data Platform Achievements: Mastodon C identified £200m of potential savings to the NHS in its prescribing analytics project, which investigated the use of branded statins over cheaper generic versions. The Mastodon C story Mastodon C helps businesses make sense of the proliferation of data that now exists, allowing them to make better decisions. It does this using a cloud-based open source data processing and analytics platform, which it customises to each client’s datasets. The team also applies data science techniques to gain insights, make predictions and find business value from data, which is built back into client systems. The team at Mastodon C uses open data together with the closed data that clients own. Francine Bennett, Co- Founder and CEO at Mastodon C says: “We often find ourselves introducing clients to open data concepts through our work, as we’ll suggest useful datasets which they can make use of to help their business.”
  39. 39. 6. Where do I find open data? A list open data catalogs http://publicdata.eu/ https://open-data.europa.eu/it/data http://datacatalogs.org/ http://planet.openstreetmap.eu http://wikidata.org dbpedia.org November 2014 Page 39
  40. 40. 7. How can you develop your open data business? This chapter has been elaborated by the Finodex team and It’s already included in the Finodex Handbook. Summary: In this chapter we provide basic knowledge regarding how you can develop your business using open data. We’ll show how to generate a business model, exploring the components of the Business Model Canvas in detail. In particular, we’ll offer an overview of open data business models. In the case of reuse of PSI (Public sector information) Osella & Ferro have developed an interesting framework “that focuses on decision-making levers that a business developer has at his/her fingertips for molding the overarching architecture of a business venture hinged on public data re-use”. They combined the framework with the business model ontology by employing the Business Model Canvas in order to visualize archetypal business models at an enterprise level. The tool has been proved very useful and could probably be adopted in the development and assessment of any data intensive business venture. After exploring eight business models we introduce the importance of the adoption of the Lean methodology for business development, offering a case study of open data business development in which the Lean approach has been used. Moreover, defining and setting your business goals need a competitor analysis, which is also explained. Last but not least, we describe the rights connected to using open datasets. Licensing and related issues of compatibility between licenses are crucial when you deal with open data. Index: a. Business Modeling b. Open Data Business models c. Lean methodology d. Competitor Analysis e. Intellectual Property Rights Introduction In this chapter we provide essential knowledge regarding how you can develop your open data business. We’ll show how to generate a business model, exploring the components of the Business Model Canvas in detail. In particular, we’ll offer an overview of open data business models. In the case of reuse of PSI (Public Sector Information) Osella & Ferro have developed an interesting framework “that focuses on decision-making levers that a business developer has at his/her fingertips for molding the overarching architecture of a business venture hinged on public data re-use”. They combined the framework with the business model ontology by employing the Business Model Canvas in order to visualize archetypal business models at an enterprise level. The tool has been proved very useful and could probably be adopted in the development and assessment of any data intensive business venture. After exploring eight business models we introduce the importance of the adoption of the Lean methodology for business development, offering a case study of open data business development in which the Lean approach has been used. Moreover, defining and setting your business goals need a competitor analysis, which is also explained. Last but not least, we describe the rights connected to using open datasets. Licensing and related issues of compatibility between licenses are crucial when you deal with open data. November 2014 Page 40
  41. 41. a) Business Modeling A business model is a strategic tool that indicates how the company makes money specifying the sources of the company’s revenues as well as how much and how often these sources are willing to do that. Since its publication in 2004, the book “Business Model Generation” by Osterwalder and Pigneur, soon has become the bible for startups and SMEs. In their book the authors explain the so called Business Model Canvas (Figure 1), which is a tool that will help you to visually and capture the components of a business model, and will assist you in the business model generation process. In order to keep track of all of your steps in creating your business model, you may want to download here the “canvas” and start to write down all the assumptions and progress that you make! Figure 1. Business Model Canvas Source: “A business model describes the rationale of how an organization creates, delivers, and captures value” in Osterwalder & Pigneur, Business Model Generation, 2004. According to Osterwalder, in order to build an effective business model you have to identify several blocks. In the following we briefly list them. For each of them, rather than a theoretical description, we provide a set of practical questions for you to answer. Down to work! 1. Customer segments First of all, you need to define which customers you aim to reach. You have to answer two important questions: ● For whom are we creating value? ● Who are our most important customers? November 2014 Page 41
  42. 42. 2. Value Proposition You should provide to your customers a product or a service with an added value. The “value proposition” is a statement that summarizes why potential consumers should buy your particular product or service, and prefer it to similar offerings. In this case, you should answer the following questions: ● What value do we deliver to the customer? ● Which one of our customer’s problems are we helping to solve? ● Which customer needs are we satisfying? ● What bundles of products and services are we offering to each Customer Segment? Factors such as newness, performance, customization, design, brand/status, cost reduction, risk reduction, accessibility, and convenience/usability can add value to your business. Your value proposition may be qualitative (privileging customer experience and outcome) and/or qualitative (price and efficiency). 3. Sales Channels Once you have understood your value proposition and your customer segment, you need to take care of channels able to deliver the value to your clients. You should ask yourself: ● Through which channels do our customer segments want to be reached? ● How are we reaching them now? ● How are our channels integrated? Which ones work best? ● Which ones are most cost-efficient? ● How are we integrating them with customer routines? You can reach your clients either through your own channels (store front), your partner channels (major distributors), or a combination of both. 4. Customer Relationships Another important step: you have to identify the kind of relationship you establish with each of your customer segments. These are the main questions you should answer: ● What type of relationship does each of our customer/segments expect us to establish and maintain with them? ● Which ones have we established? ● How costly are they? ● How are they integrated with the rest of our business model? The different types of customer relationships are: personal assistance, automated service, communities and so on. 5. Revenue streams You need to plan how you are going to generate cash through the customer segment (costs must be subtracted from revenues to create earnings). The meaningful questions are: ● For what value are our customers really willing to pay? ● For what do they currently pay? ● How are they currently paying? November 2014 Page 42
  43. 43. ● How would they prefer to pay? ● How much does each Revenue Stream contribute to overall revenues? There are several possibility to generate revenue streams such as asset sales, usage fee, subscription fees, lending/leasing/renting, licensing, etc. 6. Key resources & key activities You need then to understand what are the assets that will make your business model work. Hence answer at the following questions: ● What Key Resources do our Value Propositions require? ● Our Distribution Channels? ● Customer Relationships? ● Revenue Streams? ● What are then the action you can do in order to make your business model work. ● What Key Activities do our Value Propositions require? ● Our Distribution Channels? ● Customer Relationships? ● Revenue streams? 7. Key partnerships You will probably need to require the help of external help of partners and/or suppliers in order to make your business model to work properly: ● Who are our Key Partners? ● Who are our key suppliers? ● Which Key Resources are we acquiring from partners? ● Which Key Activities do partners perform? 8. Cost structure Last but not least, you want to consider what are costs you will incur as well as the consequences, when you will start applying your business model on your product. What are the most important costs inherent in our business model? Which Key Resources are most expensive? Which Key Activities are most expensive? Further reading ● A. Osterwalder & Y. Pigneur, Business Model Generation, 2004 ● Elements of a business plan, available online b) Open data business models In the case of PSI (Public Sector Information) reuse performed by private sector entrepreneurs, many inherent roadblocks, coupled with a certain vagueness surrounding the rationale underlying business endeavors, keep slowing the process down. The advent of the Open Data framework, oriented towards data openness (i.e. open by default), poses new issues regarding the access to November 2014 Page 43
  44. 44. information which occurs free of charge and different forms of payment may be required for restricting the access to derivative works. Two Italian researchers Michele Osella and Enrico Ferro (2012) developed a framework “that focuses on decision-making levers that a business developer has at his/her fingertips for molding the overarching architecture of a business venture hinged on public data re-use”. Figure 2. Framework for PSI business model analysis by Osella & Ferro Source: Osella & Ferro, “Business Models for PSI Re-Use: A Multidimensional Framework”, 2012 Figure 3. Framework for PSI business model analysis by Osella & Ferro Source: Osella & Ferro, “Business Models for PSI Re-Use: A Multidimensional Framework”, 2012 November 2014 Page 44
  45. 45. While developing the framework surrounding the PSI reuse, they realize that it was not sufficient to grasp the business logic and the mechanisms needed to build an effective strategy. A solution came from the combination with Osterwalder's business model ontology, by employing the Business Model Canvas (explained in the previous paragraphs) in order to visualize archetypal business models at an enterprise level. The tool has been proved very useful and could probably be adopted in the development and assessment of any data intensive business venture. The result is the identification of eight business models currently employed by the actors present in the Public Sector Information centric (PSI-centric) ecosystem. In particular, the choice of the business model to adopt is function of the position covered in the value chain and of the strategic choices made. Why are they useful? From a business model viewpoint, which is one of the perspectives on the PSI realm showed by Osella here, our interest is to identify the steps needed to maximise the benefits for reusers of open data, “a profit-driven reuse and value creation”. You can find, in the following list, the eight business models as described by Osella and Ferro: 1. Premium Product / Service. 2. Freemium Product / Service. A classic example in this vein is represented by mobile apps related to public transportation in urban areas. 3. Open Source. OpenCorporates and OpenPolis 4. Infrastructural Razor & Blades. Public Data Sets on Amazon Web Service 5. Demand-Oriented Platform. DataMarket and Infochimps 6. Supply-Oriented Platform. Socrata 7. Free, as Branded Advertising. 8. White-Label Development.. This business model has not consolidated yet, but some embryonic attempts seem to be particularly promising. In this paragraph we are exploring the identified eight business models more in details. The main references are two papers co-authored by Ferro and Osella: “Business Models for PSI Re-Use: A Multidimensional Framework” (2012) and “Eight Business Model Archetypes for PSI Re-Use” (2013). #1 Premium Product / Service: While implementing this business model, a core re-user offers to end-users a product or a service presumably characterized by high intrinsic value in exchange for a payment that could occur à la carte or in the guise of a recurring fee: while the former implies the November 2014 Page 45
  46. 46. payment of an amount of money for each unit of product purchased (pay-per-use), the latter has an "all-inclusive" nature since it grants for a given timeframe the access to certain features in accordance with contractual terms. In this business model, probably associated to the “mainstream” model by the majority of analysts, the high intrinsic value, coupled with the price mechanism, calls for B2B customers (often called “high-end market”) and for long or medium terms relationships going beyond single transactions (Osella & Ferro, 2013). Figure 4. Premium Product / Service (framework view) Source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013 November 2014 Page 46
  47. 47. Figure 5. Premium Product / Service (“Canvas” view) Source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013 #2 Freemium Product / Service. Core re-users resorting to this business model offer to end-users a product or a service in accordance with freemium price logic: one of the offerings is free-of- charge and entails only basic features, while customers willing to take advantage of refined features or add-ons are charged. In the PSI realm, the implementation of this business model has its roots in limitations deliberately imposed by the core re-user in terms of data access: as a result, ad-hoc payments may be required to enjoy advanced features, to have recourse to additional formats or, sometimes, to weed out advertising. In contrast with the previous model, here the prominent target market is the consumer one (often called “low-end market”) with which the firm establishes medium or short terms relationships that usually do not involve the customization. Target customers are generally reached via the Web or via the mobile channel, which are promising to “hit” a considerable number of installed bases. (Osella & Ferro, 2013). Fi gure 6. Freemium Product / Service (“Canvas” view) source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013 #3 Open Source Like. This very peculiar business model takes place on top of products, services, or simple unpackaged data that are provided for free and in an open format. In terms of economics, a cross-subsidization occurs in the enterprise under examination since the costs incurred for free offering of data are covered by revenues stemming from supplementary business lines that are still PSI-based: in fact, trickles of revenue for the core re-users may stem only from added-value services or from license variations (dual licensing). The resemblance with Open Source software is given by the fact that in this circumstance data is provided in a totally open format that allows free elaboration, usage and redistribution without any technical barrier (Osella & Ferro, 2013). November 2014 Page 47
  48. 48. . Figure 7. Open Source Like. (“Canvas” view) source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013 #4 Infrastructural Razor & Blades. Entering in the realm of enablers, this business model is chosen by enterprises acting as intermediaries that facilitate the access to PSI resources by profit- oriented developers or scientists not driven by commercial intent. As it happens in the well-known model “razor & blades”, the value proposition hinges on an attractive, inexpensive or free initial offer (“razor”) that encourages continuing future purchases of follow-up items or services (“blades”) that are usually consumables characterized by inelastic demand curve and high margins. Applying this model in the PSI environment, datasets are stored for free on cloud computing platforms being accessible by everyone via APIs (“razor”) while re-users are charged only for the computing power that they employ on-demand in as-a-service mode (“blades”). This business model exhibits another case of cross-subsidization whereby profits accrued from the provision of on-demand computing capacity cover costs attributable to the storage and maintenance of data. Finally, it goes without saying that application of this model is limited to contexts and domains in which the computational costs are significant (Osella & Ferro, 2013). November 2014 Page 48
  49. 49. Figure 8. Infrastructural Razor and Blades (“Canvas” view) Source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013 #5 Demand-Oriented Platform. Following this business model, the enabler acting as intermediary provides developers with easier access to PSI resources that are stored on proprietary servers having high reliability. Once collected, PSI datasets are subsequently catalogued using metadata, harmonized in terms of formats and exposed through APIs, making it easier to dynamically retrieve data in meaningful way. As a result, a wide range of critical issues pertaining to original raw data are made irrelevant due to the usage of platforms capable to convert datasets in data streams, contributing significantly to the "commoditization" and "democratization" of data. In addition, developers may reap the benefits given by the "one stop shopping" nature of such platforms: they may resort to one supplier and access a variety of information resources through standardized APIs - even beyond the borders of the PSI - without having to worry about interfaces connecting to each original source. This “procurement” approach is crucial to minimize search costs and, by consequence, transaction costs. In terms of pricing, as a good that was born free and open (such as Open Government Data) cannot be charged in absence of added value on top of it, enablers adopting this business model earn revenues in exchange for advanced services and refined datasets or data flows. To sum up, re-users are charged according to a freemium pricing model that sets the boundary between free and premium in light of feature limitations (Osella & Ferro, 2013). November 2014 Page 49
  50. 50. Figure 9. Demand-oriented platform (“Canvas” view) Source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013 #6 Supply-Oriented Platform. To conclude with enablers, this business model entails the presence of an intermediary business actor having again an infrastructural role. However, on the contrary of the previous case, according to this logic PSI holders are charged in lieu of developers. In fact, the enabler, following the golden rules of two-sided market, fixes the price according to the degree of positive externality that each side is able to exert on the other one. Consequently, this approach is beneficial for both sides of the resulting arena: from developers’ perspective, their barriers are wiped out (i.e., they can retrieve data without incurring cost) while, from the governmental angle, PSI holders become platform owners taking advantage of some handy features such as cloud storage, rapid upload of brand-new datasets by public employees, standardization of formats, tagging with metadata and, above all, automated external exposure of data via APIs and GUI. Public agencies that adhere to such programs in order to dip their toes into the water of Open Data establish long term relationships with providers and are required to pay a periodic fee that depends on the degree of sophistication characterizing the solutions purchased and on some technical parameters (Osella & Ferro, 2013). November 2014 Page 50
  51. 51. Figure 10. Supply-oriented platform (“Canvas” view) Source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013 #7 Free as Branded Advertising. Service advertising is an emerging form of communication aimed at encouraging or persuading an audience towards a brand or a company. Conversely to the more famous “display advertising”, where commercial messages are simply visualized, in service advertising the advertiser strives to conquer the customer by providing him or her with services of general usefulness. That said, in the PSI realm, services offered in this way do not generate any direct revenue but they are supposed to bring positive return in a broad sense, driving economic results on other business lines - unrelated to PSI - that represent the enterprise’s core business. The rationale fuelling this “enlightened” business model is twofold. Firstly, it may be based on a powerful advertising boost that leads the company to consider the cost as a promotional investment in the marketing mix. Secondly, it seems to be very convenient in presence of zero marginal costs, a situation that occurs when the costs of distribution and usage are not significant (Osella & Ferro, 2013). November 2014 Page 51
  52. 52. Figure 11. Free as Branded Advertising. (“Canvas” view) source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013 #8 White-Label Development. Last but not least, if service advertisers do not have in-house sufficient competencies required to develop their business endeavors, they can knock the door of advertising factories. Such firms, in fact, come into play as outsourcers carrying out duties that otherwise would be handled by service advertisers. Hence, the development of PSI-based solutions is particularly compelling for companies willing to use PSI as "attraction tool" but not equipped with competencies required to do so (e.g., data retrieval, software development, service maintenance, marketing promotion). In order to let the service advertiser’s brand stand out, solutions are developed in a white-label manner, i.e., shadowing the outsourcer’s brand and giving full visibility to the sole service advertiser’s brand. Taking into account the “one stop shopping supply” and the business-criticality of the solutions in terms of corporate image, the resulting one- to-one relationship between provider and customer is tailor-made and “cemented”. Concerning financials, advertising factories collect lump-sum payments or recurring fees in exchange for turn-key solutions so developed, depending on whether the crafted solution takes the form of product or service: whilst in the former case service advertisers perceive the cost as CAPEX, in the latter one the respective cost assumes an OPEX nature (Osella & Ferro, 2013). November 2014 Page 52
  53. 53. Figure 12. White Label Development. (“Canvas view”) Source: M.Osella & E.Ferro, Eight Business Model Archetypes for PSI Re-Use, 2013 Case studies You can find a lot of examples of companies that employ the business models described above here. Herein we describe one example on the freemium model. A variety of web applications use the freemium business model. The free product or service here is subsidised through a paid-for product or service that offers some kind of added value on top of what is made available as open data. The free product acts as marketing, establishing the provider in the marketplace and increasing the take-up of the paid-for product (The ODI Guide, How to make a business case for open data). One way of using a freemium model is to release your open data using a share-alike license. This ensures that organisations who do things with your data have to either openly share their results (which means you can benefit from what they do) or have to negotiate with you to be able to use the data under a different (potentially charged) license. OpenCorporates uses this business model, licensing their database with a share-alike license while offering paid-for licenses for companies who do not want to share their data. Another approach to a freemium model is to offer a paid-for product that: ● incorporates additional data, perhaps from third-party sources ● is provided in a different format from the open data ● is more up-to-date, complete or detailed than the open data ● is the result of an analysis or model based on the released open data November 2014 Page 53
  54. 54. ● is a dump of data that can otherwise be accessed through an API Alternatively, you could offer a paid-for service based on the open data you are publishing that: ● provides an API over open data that can otherwise be accessed as a dump ● provides availability guarantees through a Service-Level Agreement ● removes rate limits Recently the U.S. Government has launched a new section of the open government data catalog, data.gov. The new sub-domain “Impact” profiles companies that are making use of open government data. References and further reading ● The Open Data Institute, How to make a business case for open data, available on line. ● Alex Howard, Open data economy: Eight business models for open data and insight from Deloitte UK, available here. ● Elements of open data startups, presentation available here. ● Enrico Ferro, Emerging Business models in PSI reuse, available here. ● E.Ferro & M.Osella, Business Models for PSI Re-Use: A Multidimensional Framework, 2012 available on line. ● E.Ferro & M.Osella, Eight Business Model Archetypes for PSI Re-Use, 2013 available on line. c) Lean methodology After exploring the eight business models on which the PSI reuse relies on, we introduce the importance of the adoption of the Lean methodology for business development. You have already identified the opportunities offered by the reuse of open data by employing the Business Model Canvas and the framework developed by Osella and Ferro and now you want to start developing your own business. Lean methodology is a method for developing businesses and products with the goal to find product-market fit and make a cash flow positive and sustainable company before it runs out of money. “Validated learning,” experimentation, testing, measurement actual progress and learn what customers really want are the main pillars of the methodology. All the process, then, should be accomplished as fast as possible and as cheap as possible. Pioneers of the Lean Startup movement are Steve Blank (The startup owner’s manual: the step by step guide for building a company, 2012; The four steps to the epiphany, 2006) and Eric Ries (The Lean Startup, 2011). The lean approach aims at being as much effective as possible in achieving your final goal. According to lean methodology you should follow a build-measure-learn feedback loop. Ideas > build > product > measure > data > learn > ideas > and so on (circle) November 2014 Page 54
  55. 55. Figure 13. Build-measure-learn feedback loop Image source: Andrew Walpole, Build - Measure - Learn Feedback Loop infographic, 2013 Here we explain the loop step by step: 1) Idea: When you process your idea keep in mind that the final goal is to provide benefit to your customer, the rest is just waste of time. So, first of all, ask yourself: ➢ Can I build a sustainable business around this set of products and services? What you want to achieve is, in fact, a compromise between your vision and what your customers would accept. Hence, you want to focus on an idea that answers a problem that really needs a solution. You want also to make explicit all implicit assumptions you are making on how you can create a business on that idea. Please, answer at the following questions before building your product: ➢ Do consumers recognize that they have the problem you are trying to solve? ➢ If there was a solution, would they buy it? ➢ Would they buy it from you? ➢ Can you build a solution for that problem? “Success is not delivering a feature; success is learning how to solve the customer’s problem.” (Eric Ries, The Lean Startup, 2011). 2) Build: Develop a minimum viable product (MVP) in order to start learning process as soon as possible. ➢ MVP A minimum viable product is a version of a new product or feature which allows to test the assumptions you made. When you are building your MVP, remove any feature, process or effort that does not contribute directly to the learning you seek. When you will test your MVP you will learn which elements of your product or strategy are not appropriated. 3) Measure: November 2014 Page 55
  56. 56. When MVP is establish, measure how your customer respond build on metrics that can lead to to cause and effect questions. Metrics have to show a clearly defined action to take once analyzed. Examples: ➢ A/B Split-Test Results ➢ Per-customer metrics ➢ Direct customer feedback 4) Learn: Analyze your product, feedback and metrics to assess your progress in an objective way. ➢ Validate learning “Validated learning” means that you need to run experiments that you have to scientifically validate based on empirical data collected by real customers that allow you to test each element of your vision. During the all process should utilize an investigative development the so called "Five Whys"-asking yourself simple questions to study and solve problems along the way. When this process of measuring and learning is done and you made small changes for optimizing your product, you should be able to understand whether the drivers of your business model are appropriate or not and decide to pivot or persevere. Figure 14. Description Step by Step of the feedback loop Image source: Andrew Walpole, Build - Measure - Learn Feedback Loop infographic, 2013 November 2014 Page 56
  57. 57. Pivot: If you decide to pivot you need to take a big change in the direction or make structural course correction to test new ideas/hypotheses about the product, strategy and engine of growth and start the cycle once again from the beginning. If your new experiment runs in a more productive way than the experiments you were running before it is probably a sign that you made a successful pivot. Persevere: If you think that your test is going in the right direction then you should continue to test more assumptions and build towards executing your current vision. The lean methodology underlines the importance of experimenting in order to learn. Pivoting is just a part of the process - “if you cannot fail, you cannot learn.” (Eric Ries, The Lean Startup, 2011). Until a precise business model is found, it is important to keep your initial vision. This way, adjustments can be made to the model without reassessing the entire market. Lean approach in open data business development: a case study Steve Blank mentions a story of a startup called Tidepool as the perfect example to be studied in order to demonstrate the power of the customer development, one of the key parts in Lean Methodology. Tidepool team were severely criticized about their business model. They began believing they were selling an open data and software platform for people with Type 1, Diabetes into a multi-sided market comprised of patients, providers, device makers, app builders and researchers. They firstly reduced what they thought was a five-sided market to a simpler two-sided one. But the big payoff came when their discussions with medical device customers revealed an entirely new way to think about pricing - potentially tripling their revenue. Figure 15. Screenshot of Tidepool home page Image source: http://tidepool.org Further reading ● Eric Ries, The Lean Startup, available online ● Steve Blank, The Four Steps to the Epiphany, available online ● Steve Blank & Bob Dorf, The Startup Owner Manual: The Step by Step Guide for Building a Great Company, available online ● Steve Blank, When Customer Make you Smarter, available online. ● Andrew Walpole, Build - Measure - Learn Feedback Loop, available online ● The Lean Startup Methodology, available online Learning resources ● Steve Blank, How to Build a Startup, available online November 2014 Page 57
  58. 58. ● Steve Blank, Lean Customer Development - Part 1, available online ● Steve Blank, Lean Customer Development - 3 tool for startups, Part 2, available online ● Steve Blank, Lean Customer Development - Customer Development in action, Part 3 - 3 tool for startups, available online ● Steve Blank, Lean Customer Development - Closing, Part 3, available online November 2014 Page 58
  59. 59. 8.Open Data training materials already available. A list ● Useful links by the ODI use on our 3-day Open Data in Practice course here ● Slides used in the business sections on ODI’s Open Data in Practice course here ● ODI’s stories section : good place to find examples of real world impact. ● It's also worth looking at ODI Start-Ups page for ways entrepreneurs are using open data to build new businesses. You'll find details of business approach, short pitch videos and for some of the companies case-studies. ● You can explore all the materials and tutorials released by the team of School of Data. You can find interesting guides at http://schoolofdata.org/courses/ November 2014 Page 59
  60. 60. 9.SLIDES and inspiring presentations: link-o-graphy http://www.slideshare.net/MicheleOsella http://www.slideshare.net/search/slideshow?searchfrom=header&q=open+data+business http://www.slideshare.net/OReillyStrata http://www.slideshare.net/TheODINC http://www.slideshare.net/MGHProfessional/leading-with-data?qid=9626d5fe-9a72-4e37-9bcf- 579ef5d75c88&v=qf1&b=&from_search=1 http://www.slideshare.net/JenvanderMeer/strata-open-data-its-not-just-for-govts2112014? qid=9626d5fe-9a72-4e37-9bcf-579ef5d75c88&v=default&b=&from_search=15 http://www.slideshare.net/deirdrelee/deirdre-lee-opendata?qid=9626d5fe-9a72-4e37-9bcf- 579ef5d75c88&v=qf1&b=&from_search=8 http://www.slideshare.net/WorldBankGroupFinances/world-bank-gurin?qid=9626d5fe-9a72-4e37- 9bcf-579ef5d75c88&v=qf1&b=&from_search=6 http://training.theodi.org/resources/ODP_Business.pdf http://theodi.github.io/presentations/2013-10-tsb-workshop-tom.html#/cover http://www.slideshare.net/napo/a-dive-into-open-data November 2014 Page 60
  61. 61. 10. Videos, Audio files and books So you want to build an open data business? https://www.youtube.com/watch?v=jNscjJ5DetM The value of open data to business - the Open Data 500 Study http://theodi.org/lunchtime-lectures/friday-lunchtime-lecture-the-value-of-open-data-to-business- the-open-data-500-study Learning from New York City’s open-data effort http://www.mckinsey.com/insights/public_sector/learning_from_new_york_citys_open_data_effort Some useful webinars: http://www.socrata.com/webinars/ Opening up open data: An interview with Tim O’Reilly http://www.mckinsey.com/insights/business_technology/opening_up_open_data_an_interview_wit h_tim_o_reilly What is Open Data and how can it transform your business? https://www.youtube.com/watch?v=hXZaf08gjfo A very interesting list of recommended books is available here: https://github.com/theodi/training-web/blob/gh-pages/Bibliography/index.md November 2014 Page 61

×