Successfully reported this slideshow.

If we build it will they come? BOSC2012 Keynote Goble

8

Share

Loading in …3
×
1 of 61
1 of 61

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

If we build it will they come? BOSC2012 Keynote Goble

  1. 1. If we build it will they come? Prof Carole Goble FREng FBCS CITP carole.goble@manchester.ac.uk BOSC, Long Beach, July 14 2012 http://www.mygrid.org.uk
  2. 2. Est. 2001 Improving Knowledge Turning, Enabling Reuse and Reproducibility [Josh Sommer] Keep the vision, modify the plan
  3. 3. Computational Methods LGPL Scientific workflows. Distributed web/grid/cloud services Third party, independent service reuse Data pipelines and analytics Volunteerist Human Computation BSD e-Laboratories - social collaboration and sharing environments for scientific artefacts. Libraries and Catalogues. Asset safe havens, sharing, reuse. Knowledge Acquisition Tools Various Semantic technology, semantic applications, research objects, executable papers. OWL Data/Metadata curation & reuse POPULOUS SKOSEdit
  4. 4. The Taverna Suite of Tools Web Portals Workflow Repository GUI Workbench Client User Interfaces Virtual Machine Service Catalogue Third Party Tools Workflow Engine Provenance Workflow Store Command Line Server Activity and Service Plug-in Manager Open Provenance Model Programming and Secure Service Access APIs
  5. 5. Community Haven Sharing Resource Social Collaboration http://www.myexperiment.org 5820 members, 304 groups, 2415 workflows, 604 files and 229 packs (research objects) http://wiki.myexperiment.org/index.php/Galaxy
  6. 6. BioCatalogue: crowd curation of web services Contribute, Find and understand Web Services Curate, review and comment Learning resource Monitor Services Cloud Registry 2295 REST and SOAP services, 169 service providers. 674 members, 27 countries
  7. 7. Find experts, colleagues and peers. Find, exchange and interlink, preserve, publish data, models, publications, SOPs & analyses. ISA Compliant SysMO: 16 consortia, 110 institutes, 1600+ assets, 350+ members Launch and validate Gateway to GerontoSys models and analyses: public tools and JWS Online resources, e.g. BioModels livSYSiPS
  8. 8. Public http://www.seek4science.org SEEK
  9. 9. Standards & Content Sharing Platform Governance & Policy & Trusted Service Software & Tools Open source Gateway Comp Sci Research Platform Knowledge Network Preservation & Skills & Community Building Publication Platforms
  10. 10. Laissez-faire Philosophy • Bottom Up – Emergent & scruffy (to a degree…) • Reliant on third party contributions – Non-prescriptive, non-interfering and flexible – We make no content ourselves…. • Part of a wider ecosystem – Other services, data, tools, platforms, people… • Inspired by social environments • Scarred by top-down, dictated, tech-driven and unused monoliths
  11. 11. http://www.flickr.com/photos/hellaoakland/3137360455/ Never underestimate Liberty through how scruffy third Limitations party stuff can be How often metadata is People say they want missing and messy if flexibility. They prefer the left to its own simplicity of order and will devices… adapt to adopt.
  12. 12. Who is they? • Jobbing Bioinformatician? • Expert Bioinformatician? • Sys admin? • Service provider? • Application developer? • Tool developer? • Biologist?
  13. 13. Who is THEY? Drug Toxicity Pharmacogenomics Trypanosomiasis in The Virtual (OpenTox Project) GWAS African Cattle Liver Physiopathology of Genetic differences Systems Biology of the human body between breeds of Metagenomics cattle Micro-Organisms Medical Imaging
  14. 14. Consortia Organised, Planned, Strong connections with resource Independents…. Bovine providers and Trypanosomiasis each other. Consortium Research Distributed Groups & Groups Independent Lone rangers Long tail, Disconnected from data providers and each other, emergent, Individuals
  15. 15. Specialise or Diversify? • Flexibility and extensibility -> customised Software and Document Services, Cookie cutter Helio- Preservation Physics • Widen adoption • Spread risk, extend resourcing streams BioDiversity Astronomy • Cross development alignment and coordination • More communities to build, nurture, support and sustain • Core Drift and Bashing Social Science Engineering: JPL, NASA FLOSS
  16. 16. BioDiversity Virtual e-Laboratory http://www.biovel.eu Biodiversity Services Catalogues / Execution Repositories environment Provenance Phylogenetic BLAST,Hmmer, WebDaV Data MrBayes, Management Blast, PAML, Taverna EMBOSS,… Workbench Search Open Taxonomic Synonyms Visualisation Authentication / Authorisation BioSTIF Taverna Workflow Engine Google Refine CSW and Server Modelling/GeoProcessing Grid, Cloud, etc. R openModeller Platforms WPS / WCPS
  17. 17. Who is We? The ego-system biologists, bioinformaticians, biodiversity informaticians, astro-informaticians, social scientists modellers, software engineers, computer scientists, systems administrators, resource providers
  18. 18. My World CS Research Methods & Practice Productio n Science
  19. 19. http://www.wf4ever-project.org • Research Objects Citation Reproducibility, Integrated Publishing, • Aggregation Carriers of Research Context • Annotation • Provenance • Lifecycle • Preservation • Decay • Sharing • Stereotypical Profiles • Services and APIs • myExperiment 2.0 Encodings: Semantic Web: LOD, VoID, OAI-ORE, AO/OAC, SIOC, OPM/PROV, Memento….
  20. 20. Applications Production Publishing Training Research Community Community
  21. 21. So if we build it will they come? Be useful for something: immediately, continuously, responsively Be usable by somebody: user experience, worth the effort, adoption path Some of the time: as part of a big picture Under promise and over deliver Acquire Critical Mass
  22. 22. Four things that drive adoption of software or service. 1. Added value – Do something that couldn’t do before or now do faster, gain competitive advantage, improve productivity, scale up 2. New asset – Get or retain access to something important (data, method, technique, skills, knowledge) 3. Keep up with the field. A Community. – Future-proof my practice, New skills and capacity, there is a vibe about it and I’ll be left out 4. Because there is no choice – Business depends on it, its mandated, its de facto mandated
  23. 23. Seven things that hinder adoption of software or service 1. Not enough added value • It doesn’t solve a problem or not as well or as cheaply as something else, no content or the right content It Sucks 2. Not fit for take-on. It doesn’t work! • No: help, guides, documentation, manuals, examples, content, templates, portability, migration / legacy support, easy installation, virtual machines, testing, stability, version control, release cycle, roadmap, sustainability prospect, way of introducing my favourite component/data/environment. 3. No Time or Capacity to take on • To learn, migrate personal legacy code/data/applications, no pathway/ramp to adoption • Training and special system needs
  24. 24. Software practices Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a Computational science: ...Error…why scientific programming does not compute. “As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software”
  25. 25. Software Stewardship “Better Science through Superior Software” – C Titus Brown Software sustainability Software practices Software deposition Long term access to software Credit for software Licensing advice Open licenses Reproducible Research Standard, Victoria Stodden, Intl J Comm Law & Policy, 13 2009
  26. 26. Seven things that hinder adoption of software or service 1. Cost – Of disruption, of long-term ownership – It’s too costly 2. Exposure to Risk. First to take-up, Support and sustainability dependencies, fear of scrutiny, misrepresentation or being scooped, 3. No Community – Support and comfort 4. Changes to work practices – Obligations, unclear or unenforced reciprocity protocols.
  27. 27. • It sucks but it’s the only thing around • It’s ace but it’s one of many, too late in the game and not enough to switch • Tipping point is likely not technical Betamax vs VHS
  28. 28. Bonus Hinder Never heard of it. We’ve built it but we haven’t told anyone. • Make noise…physically and virtually • Customer and Contributor Relationship Building • Self-supporting communities, multi-level marketing • Highly Resource Intensive
  29. 29. Bonus Hinder Never heard of it. We’ve built it but we haven’t told anyone. Market User Community Development It all kicks off Developer Community
  30. 30. Adoption Intentions Be careful what you wish for • Incidental – “I built it for myself, and stuck it out there” • Familial – “I built it for people just like me” • Fundamental – “I built it for others, many who are not like me”
  31. 31. Open Innovation: Development and Content you are not alone. you can’t do it all alone motivate & enable others to fill gaps “App Store Style” software, services, content, examples…. • Really Interoperate. Don’t tweak. • Be Simple and Standard. • Be Helpful. Be Set up. Be reusable. Be Smart Friends Galaxy+Taverna/myExperiment Family • Others will develop on top of you. But don’t assume they will re- contribute or tell you. Acquaintances • It’s much harder than you think. Strangers • It’s unequal.
  32. 32. Ladder Model of OSS Adoption (adapted from Carbone P., Value Derived from Open Source is a Function of Family Acquaintances Friends Maturity Levels) Strangers Moore's technology adoption curve [FLOSS@Sycracuse]
  33. 33. "it's better, initially, to make a small number of users really love you than a large number kind of like you" Paul Buchheit paulbuchheit.blogspot.com
  34. 34. PALS: Building Friendships Intelligence, Guidance, Advocacy, Evangelism, Market Research What’s in it for the PAL? – Long tail: Money, kudos, special support, special resources, skills, reputation building, influence, stuff they can’t do alone, CV building – Consortia: co-funded • Who is a PAL? – Post-docs, Post-grads, Administrators, Developers – PI: protector/champion • PAL handlers – Customer Relationship Manager, Nanny and Mediator, Scientist
  35. 35. Do not under-estimate… The power of the sprint / The power of a whizzy *-athon / fest / drinking interface. Even for plumbing. The importance of supporting and propagating best practice
  36. 36. Participatory, Embedded Design-Build-Run-Manage is Good Act Local Reality Think Global Check Eat your own The Bigger Dog Food Picture
  37. 37. Participatory Design Work Together on a Real Problem Funders Project PIs PALs Data sharing Data control Spreadsheets. Data standards Own databases Yellow Pages. Just enough SOPs A database exchange. Understanding Long term Visibility limitations standards preservation Project dependence Curating. Examples. 3 Years later 15/16 consortia Safe Haven abandoned their own systems and Project went with the SEEK system. independence
  38. 38. If you build it will they come and contribute?
  39. 39. Participation Cooperation? Coordination? Collaboration? Citizens Integration? Evolution and entropy models Public scientists Trusted Collaborators Private Groups Lone scholars Closed Controlled Open [based on an idea by Liz Lyon] Access
  40. 40. Critical mass spiral: 90:9:1 Driven by needs of and benefits to the scientist, rather than top down policies. Content tipping point [Andrew Su]
  41. 41. Trust, Fame and Blame: Reciprocity, Competition, Contribution and Use • Scooping, Scrutiny and Misinterpretation • Curation Cost • Poor quality • Reputation / Asset Economics • Public Peer Pressure Reciprocity Sucks • Flirting • Hugging • Controlled Sharing • Voyerism • Poor feedback / credit Nature 461, 145 (10 September 2009) Victoria Stodden, The Scientific Method in Practice: Reproducibility in the Computational Sciences Feb 9, 2010 MIT Sloan Research Paper No. 4773-10, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1550193
  42. 42. Harness Competitiveness Carrots Pride • Reputation: Cult, Credit & Attribution for all Protection • Just enough Sharing, Licensing & Liability • Quality, Peer review, Metadata Preservation • Safe havens and Sunsets (project churn) Publishing / Release • Citability, Supporting Exchange Productivity • Availability of assets, help, capability, ramps
  43. 43. Sticks? Community, Journal and Funder mandates There are very few real sticks.
  44. 44. Adoption Ramps http://www.rightfield.org.uk Instrument familiar, widely-used tools Spreadsheets and Email
  45. 45. Adoption Stealth • Data at home promise with automated harvesting • Sharing creep, Incremental metadata, Low obligations • URL upload in BioCatalogue • Web Service “come as you are” take-on in Taverna • Metadata prompting, Right tools, right time, right place • Service collections & Packaged services
  46. 46. Be vigilant • PAL burn-out and over familiarity • Unadjusted over- user accommodation • Drifting apart and not keeping it fresh • Step back, observe and adapt/intervene! • So relieved to get a community…. • Instrument adoption and observation Participatory Development is a mutual long term relationship Not flirty speed dating, One night stand, Crush, Me Me Me
  47. 47. Urgent-Important • Technical bog down, operational burn-out • Little things that are important but don’t seem that urgent… • Dominant projects • Not-software content • It all takes way longer than you think • Simplicity drift Participatory Development is a mutual long term relationship Not flirty speed dating, One night stand, Crush, Me Me Me
  48. 48. Beware Version 2 Syndrome! Version 2 Syndrome
  49. 49. The Jam-based Adoption Model aka Added Value Value Proposition Return On Investment http://delicious-cooks.com/photos/raspberry-jam/04/
  50. 50. What’s is the Special Jam? What is your Jam Value Chain and for Who? What: SysMO: safe haven, spreadsheet tooling, linking SOPs, models and data, examples Taverna: power, adaptability and myExperiment Who: Focused on contributors and experts Provider-consumer balance Functionality-Simplicity Syndrome Changing Who - Challenging baked-ins
  51. 51. Jam today and more, better Jam tomorrow Just Enough Jam, Just in Time not Just in Case * Feature Creep Conundrum * Big Picture Paradox * Core vs Specifics Syndrome * Content Decay Dilemma * Working to working Stability Stress
  52. 52. Customised Specific Jam beats Generic * Flexibility/Functionality – Simplicity Conundrum * Diversification Dilemma
  53. 53. http://www.gettyimages.co.uk/detail/photo/empty-jam-jar-royalty-free-image/136976198 Where is my Jam? Jam for All • What are WE (platform providers, Software builders, Community builders and Service providers) getting out if it? • Need credit and interest too. • Altmetrics Howison and Herbsleb, Scientific Software Production: Incentives and Collaboration, CSCW 2011, March 19–23, 2011, Hangzhou, China http://james.howison.name/pubs/HowisonHerbsleb2011SciSoftIncentives.pdf
  54. 54. Jam forever They came. Have the evidence. Have a plan. Did you wish for this? Do you want it? Fragile Flux • Content, services, bits, communities Funding Plan • Novelty over sustainability, • Research-Production Falsehoods • Wave invention, Political lobbying Securing the community • Leadership & Foundations Business model??? Software is Free like Puppies Are Free
  55. 55. Jam not forever • Acquire • Retain • Widen – More/Different • Reposition – Different/New Stage • Changing Community is Challenging… [Daron Green]
  56. 56. Adoption is a The Social and the Merry-Go-Round Technical are Inseparable
  57. 57. You know they came when… …you were useful and usable to someone some of the time, but they might not tell you … people ask you to join their consortia or use it … they gave up their own home grown stuff for yours … someone you don’t know uses it and tells you all about your own stuff. … someone publishes papers about it. Without citing you. … someone else claims credit. … people you don’t know start bitching about it. … its just expected to be there and you are kind of expected to be there too. …your Head of School complains you don’t do enough CS research because you are doing too much Software Engineering and Support.
  58. 58. James Howison Heather Piwowar Victoria Stodden Janet Vertesi Christine Borgman Nosh Contractor Acknowledgements (1) Jay Liebowitz Robert Kraut
  59. 59. Acknowledgements (2) • The myGrid family, friends and contributors • But especially: Katy Wolstencroft, David Withers, Marco Roos, Alan Williams, Jits Bhagat, Stuart Owen, Stian Soiland-Reyes, Shoab Sufi, Robert Stevens, Paul Fisher, Peter Li, Ian Dunlop, Finn Bacall, Mannie Tags, Niall Beard, Rob Haines, Christian Brenninkmeijer, Alasdair Gray, Tim Clark, Pinar Alper, Paolo Missier, Khalid Belhajjame, Duncan Hull, Sean Bechhofer, david De Roure, Don Cruickshank, Wolfgang Mueller, Olga Krebs, Franco Du Preez, Quyen Nguyen, Jacky Snoep. • The members of Wf4ever, SysMO, BioVel, HELIO, SCAPE, OMII, SSI, NeiSS, Obesity e-Lab and anyone else I forgot
  60. 60. • Further Information myGrid – http://www.mygrid.org.uk • Taverna – http://www.taverna.org.uk • myExperiment – http://www.myexperiment.org • BioCatalogue – http://www.biocatalogue.org • SysMO-SEEK – http://www.sysmo-db.org • MethodBox – http://www.methodbox.org.uk • Rightfield – http://www.rightfield.org.uk • Wf4ever – http://www.wf4ever-project.org • BioVeL – http://www.biovel.eu • Software Sustainability Institute – http://www.software.ac.uk • Software Carpentry – http://software-carpentry.org/
  61. 61. Coalface Patrons users Skeptic Champions Keep your Friends Close Friends and Family Fit in Favours will Embed Favour you Jam Today Jam Tomorrow Act Local Think Global End Users Developers Just Enough Design for Know Anticipate Just in Time Network Effects Service your Change Providers Users Enable Users System to Add Value Administrators Keep Sight of the Bigger Picture SUMMARY (De Roure and Goble, IEEE Software 2009)

Editor's Notes

  • If I build it will they come? : What is it we are building? What is it we are building ? Who is they? Who are we? Over the years I have built a bunch of open source software and services for researchers: the Taverna workflow system, myExperiment for workflow sharing, BioCatalogue for services, SEEK for Systems Biology data and models, and most recently MethodBox for longitudinal data sets. As well as building software we built communities: development communities and user communities. So what drives/hinders adoption? What do I know now that I wished I had known before? How do we sustain communities on time-limited grants? How do we build it so they come, stay and join in?
  • Because we don’t make any content ourselves
  • Templates, controlled vocabularies, metadata collection, components, better descriptions….
  • Distributed Groups Independents and Partners Organised Teams, Planned, Strong connections with resource providers and each other. Structured, Cross-partner sharing, Retained results Distributed Groups & Independent Lone rangers Long tail, Disconnected from data providers and each other, emergent, fluid, personal stores, small science from big Make workflows for group Run workflows from platforms Store and Find Workflows Catalogue and Find Services Catalogue, store and find data, SOPs, Models Link stuff Release & Share stuff Curate stuff Cooperate / Collaborate / Coordinate / CoShape Vary on Coordination, collaboration, cooperation, contribution, integration, sustainability, longevity
  • Make workflows for group Run workflows from platforms Store and Find Workflows Catalogue and Find Services Catalogue, store and find data, SOPs, Models Link stuff Release & Share stuff Curate stuff Cooperate / Collaborate / Coordinate / CoShape
  • Still some people missing!
  • Knowledge Transfer Three tracks Large Team.
  • Developer and user adoption Contributed collaborative content Collaborative development
  • Maybe you don’t care…. Content and Promotion matter more than software, but harder to fund and different people to software developers.
  • Incidental – not really building for adoption or others to take up Familial – the producer and the consumer are the same – many are like this in BOSC
  • CLAs for set up. Remember upgrade paths Cooperate, Network effects, Amplify Self-supporting, Multi-level marketing There are no green fields.
  • Please some of the people some of the time
  • They all start off like this…
  • Working the first time User experience over smart. Cool interfaces (even for plumbing)
  • Primary Community Review Facebook generation! Community participation Sharing Commons based production Social Curation Voluntary contribution 1. Primary Content 2. Curation duties GeneWiki, Rfam, myExperiment, PloS, UsefulChem, OpenWetWare Open Science vs Long Tail Social networks vs the Long Tail Incentives and Obstacles Myths and Miracles Contribution. Curation. Volunteer science
  • Limited focus Social networking around content . Feedback loops.
  • PAL recruitment Content contribution Stick: Community, Journal and funder mandates – there is no stick Credit for peer review
  • Don’t forget to make more demands though!
  • User burn-out and over familiarity Over-friendly Stockhausen syndrome, absence of friendly fire, Keep enemies even closer Unadjusted over-user accommodation Fit in at first, get buy-in, move in, move on Drifting apart and not keeping it fresh Keep jointly working on real, concrete cases Don’t assume they will stay: Users are fickle. Step back, observe and adapt/intervene! So relieved get a community forget to see what they do (e.g. dubious workflow designs) Much easier with e-Laboratory Services that are inherently social collaboration spaces. Complacency Esp. dangerous outside funded collaborations Measuring impact and getting feedback Downloads ≠ useful (or usable) Don’t be prescriptive. Scientists control. – but actually we need to be a bit prescriptive Danger! Going native. Missing users. Fossilisation and complacency User experience over smart. Cool interfaces (even for plumbing *-athons Embedded co-working The total problem Replying Eating your own dog food Examples! Working the first time
  • Version 2 Syndrome Being too clever, forgetting about engagement Technical bog down and operational burn-out Fire fighting, Heads down not eyes up Little simple things that are important but don’t seem that urgent… But are the ha’peth of tar that sinks the ship Major project dominance He who pays the piper calls the tune Non-software innovations Seek and contribute content/component and contributing partners
  • Activation Energy Argument Balance against feature creep short-termism Keep planning the big stuff… Balance the cost to the benefit. But hacks survive – and don’t do the strategy.
  • 58% by students, 24% unmaintained Schultheiss et al. (2010) PLoS Comp Bio Content and Promotion matter more than software, but harder to fund and different people to software developers. What’s your plan? Maintaining content, software, services Different groups, evolving practices, changing times, new patterns….. Funding cycles, chasms and reinventions Reward not hinder adoption. Foundations, Friends and Business Models…and the Open Source Community Silver Bullet!
  • Hard to Plan….
  • When the program’s Data Management Group chair claims it’s the only data system they have used that works. To your funders. Whoo-hoo!
  • Computer Supported Cooperative Work, Team Science, Knowledge Management, Social Science, Information Science, Library Science, Digital Scholarship, Collaboratories…
  • ×