Opening presentation from the Launch of the Hague Declaration on Knowledge Discovery in the Digitial Age by LIBER Executive Director, Susan Reilly. Launched in Brussels on May 6 2015.
OpenAIRE-COAR conference 2014: Open peer review to save the world, by Michae...OpenAIRE
Presentation at the OpenAIRE-COAR Conference: "Open Access Movement to Reality: Putting the Pieces Together", Athens - May 21-22, 2014.
Session 5: The now and the future of open scholarly communication.
Open peer review to save the world, by Michael Taylor - Co-founder of Open Scholar, National Observatory Athens
Taking ownership of the challenges and problems of owning a grotty API and tu...Jexia
Mendeley is part of Elsevier, a big scientific technical and medical publisher, and they provide tools to work with and share documents. Joyce will share the story of the Mendeley API. She is a Developer Advocate for the Mendeley API
Apps for Science - Elsevier Developer Network Workshop 201102remko caprio
This presentation is an introduction into programming OpenSocial Gadgets for Science.
1. overview of apps
2. social networks
3. opensocial
4. SciVerse Platform
5. SciVerse APIs
6. Coding OpenSocial Gadgets for SciVerse
7. Resources
As the media leader who first brought a public content API to the market in 2008, NPR continues to innovate and learn about what it means to have flexible content. Our philosophy assumes that to maintain relevancy in an online world media companies need to be adroit at delivering content to multiple channels and disparate platforms. This in turn has lead us to keep a strategic focus on our API development. This positions us not just to meet our distribution needs, but has also helped drive business opportunity and allows for effective design and user experience whether in a browser or on a mobile device. This presentation will share our lessons learned and key metrics around successful creation and use of flexible content – from technology needs to business, editorial and design opportunities in an increasingly fragmented online product landscape.
The Guardian's Open Platform initiative enables partners to build applications with The Guardian. As part of this initiative, The Guardian provides the Content API - a rich interface to all The Guardian's content and metadata back to 1991 - over 1 million documents. This talk starts with a brief overview of the latest iteration of the content API. It will then cover how we implemented this in Scala using Solr, addressing real-world problems in creating an index of content:
how we represented a complex relational database model in Solr
how we keep the index up to date, meeting a sub-5 minute end-to-end update requirement
how we update the schema as the API evolves, with zero downtime
how we scale in response to unpredictable demand, using cloud services
This presentation tackles new ways of distributing content that Application Programming Interfaces at the heart of distribution strategy
and show how APIs are set to become the primary channel for companies seeking to reach partners and users. The keynote covers technical scenarios and architectures as well customer examples.
OpenAIRE-COAR conference 2014: Open peer review to save the world, by Michae...OpenAIRE
Presentation at the OpenAIRE-COAR Conference: "Open Access Movement to Reality: Putting the Pieces Together", Athens - May 21-22, 2014.
Session 5: The now and the future of open scholarly communication.
Open peer review to save the world, by Michael Taylor - Co-founder of Open Scholar, National Observatory Athens
Taking ownership of the challenges and problems of owning a grotty API and tu...Jexia
Mendeley is part of Elsevier, a big scientific technical and medical publisher, and they provide tools to work with and share documents. Joyce will share the story of the Mendeley API. She is a Developer Advocate for the Mendeley API
Apps for Science - Elsevier Developer Network Workshop 201102remko caprio
This presentation is an introduction into programming OpenSocial Gadgets for Science.
1. overview of apps
2. social networks
3. opensocial
4. SciVerse Platform
5. SciVerse APIs
6. Coding OpenSocial Gadgets for SciVerse
7. Resources
As the media leader who first brought a public content API to the market in 2008, NPR continues to innovate and learn about what it means to have flexible content. Our philosophy assumes that to maintain relevancy in an online world media companies need to be adroit at delivering content to multiple channels and disparate platforms. This in turn has lead us to keep a strategic focus on our API development. This positions us not just to meet our distribution needs, but has also helped drive business opportunity and allows for effective design and user experience whether in a browser or on a mobile device. This presentation will share our lessons learned and key metrics around successful creation and use of flexible content – from technology needs to business, editorial and design opportunities in an increasingly fragmented online product landscape.
The Guardian's Open Platform initiative enables partners to build applications with The Guardian. As part of this initiative, The Guardian provides the Content API - a rich interface to all The Guardian's content and metadata back to 1991 - over 1 million documents. This talk starts with a brief overview of the latest iteration of the content API. It will then cover how we implemented this in Scala using Solr, addressing real-world problems in creating an index of content:
how we represented a complex relational database model in Solr
how we keep the index up to date, meeting a sub-5 minute end-to-end update requirement
how we update the schema as the API evolves, with zero downtime
how we scale in response to unpredictable demand, using cloud services
This presentation tackles new ways of distributing content that Application Programming Interfaces at the heart of distribution strategy
and show how APIs are set to become the primary channel for companies seeking to reach partners and users. The keynote covers technical scenarios and architectures as well customer examples.
Our Users Who Settle for Less – Ensuring Usable Accessibility for Blind Users...LIBER Europe
Our Users Who Settle for Less – Ensuring Usable Accessibility for Blind Users of a Digital Library (Heli J. Kautonen, National Library of Finland, Finland). This presentation was one of the 10 most highly ranked at LIBER's Annual Conference 2014 in Riga, Latvia. Learn more: www.libereurope.eu
Knowledge and Wisdom: the role of research libraries in supporting the Europe...LIBER Europe
The paper will set the scene for challenges facing research libraries in Europe using the the United Kingdom (UK) experience as exemplar. Included will be a look at pan-European development to bring resource discovery to the network layer highlighting two developments: Europeana, Libraries and Research; and, as a case study, the introduction of the Primo search engine into UCL Library Services (University College London) in the UK. In addition, Open Access to research publications and its potential impact on the dissemination of scholarly research outputs will be examined including PEER's (Publishing and the Ecology of European Research) investigation of the effects of the large-scale, systematic depositing of authors’ final peer reviewed accepted manuscripts (so-called Green Open Access) with the aim of providing input for evidence-based policy-making in the area of Green Open Access. Also, two examples of Gold Open Access will be illustrated: Gold Open Access monograph publishing and the development of Gold ‘overlay journals’. This will be followed by a look at Research Data and the importance of data-driven science concentrating on three exemplars from the UK. The requirements for the storage and preservation of research data will be explored and the potential of tools offered by Ex Libris investigated to see what it required. Finally, the paper will map the findings of the paper in terms of network developments, Open Access to research publications, and the storage and re-use of research data against the findings of the opening section – the strategic needs of European research Universities. This paper will end by identifying how the technical developments outlined in the paper need to be aligned with the top-level strategic needs of European Universities in order for research libraries to support their home Universities.
Business and Corporate Christmas Cards from Hallmark. Celebrate the spirit of the season with your customers and clients by sending them Christmas business greetings from Hallmark. You can choose from cards that honor the religious importance of Christmas to one that simply celebrates the festive excitement of this joyous holiday. More information at hallmark.businessgreetings.com/holiday-b usinesschristmasc...
Haystack 2019 - Ontology and Oncology: NLP for Precision Medicine - Sean MullaneOpenSource Connections
This session gives an overview of the importance of precision medicine in cancer treatment and describes an approach used by UVA in the TREC 2018 Precision Medicine workshop. The PM track aims to encourage research into precision oncology medicine to provide more relevant information to physicians and researchers.
For this task we ranked articles from a corpus of bio-medical article abstracts from PubMed and MEDLINE for relevance for the treatment, prevention, and prognosis of the disease given specific medical information about each patient.
We demonstrated using a flexible graph-based query expansion method that existing medical ontologies can be leveraged to improve precision in document relevance ranking with little to no other clinical input.
Microtask crowdsourcing for disease mention annotation in PubMed abstractsBenjamin Good
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Benjamin M. Good, Max Nanis, Andrew I. Su
Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses that would otherwise be impossible. As a result, many biological natural language processing (BioNLP) projects attempt to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are vital to the process of knowledge extraction but are always in short supply. Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text.
Here, we investigated the use of the AMT in capturing disease mentions in Pubmed abstracts. We used the recently published NCBI Disease corpus as a gold standard for refining and benchmarking the crowdsourcing protocol. After merging the responses from 5 AMT workers per abstract with a simple voting scheme, we were able to achieve a maximum f measure of 0.815 (precision 0.823, recall 0.807) over 593 abstracts as compared to the NCBI annotations on the same abstracts. Comparisons were based on exact matches to annotation spans. The results can also be tuned to optimize for precision (max = 0.98 when recall = 0.23) or recall (max = 0.89 when precision = 0.45). It took 7 days and cost $192.90 to complete all 593 abstracts considered here (at $.06/abstract with 50 additional abstracts used for spam detection).
This experiment demonstrated that microtask-based crowdsourcing can be applied to the disease mention recognition problem in the text of biomedical research articles. The f-measure of 0.815 indicates that there is room for improvement in the crowdsourcing protocol but that, overall, AMT workers are clearly capable of performing this annotation task.
Can Computers understand the scientific literature (includes compscie material)TheContentMine
Published on Jan 24, 2014 by PMR
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)Erich Gombocz
Semantic W3C standards provide a framework for the creation of knowledge bases that are extensible, coherent, interoperable, and on which interactive analytics systems can be developed. A growing number of knowledge bases are being built on these standards— in particular as Linked Open Data (LOD) resources, and their availability has received increasing attention in industry and academia. Using LOD resources to provide value to industry is challenging, however, and early expectations have not always been met: issues arise from the alignment of public and experimental corporate standards, from inconsistent URI policies, and from the use of internal, non-formal application ontologies. To add to this, often the reliability of resources is problematic, from service levels to SPARQL endpoint uptime to URI persistence. Not the least, in many cases provenance issues have not properly resolved, and there are serious funding concerns related to government grant-backed resources. For this reasons, an integrated data appliance (iDA) preloaded with semantically integrated public knowledgebases provides an enterprise-ready “Semantics In-a-box” solution to address those shortcomings effectively.
Data dialogue - Human Genomic Data DiscoveryFiona Nielsen
Presenting at The Data Dialogue. Time to Share: Navigating Boundaries & Benefits - Afternoon session: Sharing difficult data.
July 28 - 2016 @ University of Cambridge
http://www.ses.ac.uk/event/data-dialogue-time-share-navigating-boundaries-benefits/
In this talk I present an overview of human genomic data sources around the world, their funding, access policies and type of data they contain. Discussing why data sharing is hard, including issues of data privacy and a research culture that does not incentivise sharing of data and results.
Presented by Fiona Nielsen, founder and CEO of Repositive
http://repositive.io
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...Human Variome Project
The success of whole exome sequencing (WES) for highly heterogeneous disorders, such as mitochondrial disease, is limited by substantial technical and bioinformatics challenges to correctly identify and prioritize the extensive number of sequence variants present in each patient. The likelihood of success can be greatly improved if a large cohort of patient data is assembled in which sequence variants can be systematically analysed, annotated, and interpreted relative to known phenotype. This effort has engaged and united more than 100 international mitochondrial clinicians, researchers, and bioinformaticians in the Mitochondrial Disease Sequence Data Resource (MSeqDR) consortium that formed in June 2012 to identify and prioritize the specific WES data analysis needs of the global mitochondrial disease community. Through regular web-based meetings, we have familiarized ourselves with existing strengths and gaps facing integration of MSeqDR with public resources, as well as the major practical, technical, and ethical challenges that must be overcome to create a sustainable data resource. We have now moved forward toward our common goal by establishing a central data resource (http://mseqdr.org/) that has both public access and secure web-based features that allow the coherent compilation, organization, annotation, and analysis of WES and mtDNA genome data sets generated in both clinical- and research-based settings of suspected mitochondrial disease patients. The most important aims of the MSeqDR consortium are summarized in the MSeqDR portal within the Consortium overview sections. Consortium participants are organized in 3 working groups that include (1) Technology and Bioinformatics; (2) Phenotyping, databasing, IRB concerns and access; and (3) Mitochondrial DNA specific concerns. The online MSeqDR resource is organized into discrete sections to facilitate data deposition and common reannotation, data visualization, data set mining, and access management. With the support of the United Mitochondrial Disease Foundation (UMDF) and the NINDS/NICHD U54 supported North American Mitochondrial Disease Consortium (NAMDC), the MSeqDR prototype has been built. Current major components include common data upload and reannotation using a novel HBCR based annotation tool that has also been made publicly available through the website, MSeqDR GBrowse that allows ready visualization of all public and MSeqDR specific data including labspecific aggregate data visualization tracks, MSeqDR-LSDB instance of nearly 1250 mitochondrial disease and mitochodnrial localized genes that is based on the Locus Specific Database model, exome data set mining in individuals or families using the GEM.app tool, and Account & Access Management. Within MSeqDR GBrowse it is now possible to explore data derived from MitoMap, HmtDB, ClinVar, UCSC-NumtS, ENCODE, 1000 genomes, and many other resources that bioinformaticians recruited to the project are organizing.
Our Users Who Settle for Less – Ensuring Usable Accessibility for Blind Users...LIBER Europe
Our Users Who Settle for Less – Ensuring Usable Accessibility for Blind Users of a Digital Library (Heli J. Kautonen, National Library of Finland, Finland). This presentation was one of the 10 most highly ranked at LIBER's Annual Conference 2014 in Riga, Latvia. Learn more: www.libereurope.eu
Knowledge and Wisdom: the role of research libraries in supporting the Europe...LIBER Europe
The paper will set the scene for challenges facing research libraries in Europe using the the United Kingdom (UK) experience as exemplar. Included will be a look at pan-European development to bring resource discovery to the network layer highlighting two developments: Europeana, Libraries and Research; and, as a case study, the introduction of the Primo search engine into UCL Library Services (University College London) in the UK. In addition, Open Access to research publications and its potential impact on the dissemination of scholarly research outputs will be examined including PEER's (Publishing and the Ecology of European Research) investigation of the effects of the large-scale, systematic depositing of authors’ final peer reviewed accepted manuscripts (so-called Green Open Access) with the aim of providing input for evidence-based policy-making in the area of Green Open Access. Also, two examples of Gold Open Access will be illustrated: Gold Open Access monograph publishing and the development of Gold ‘overlay journals’. This will be followed by a look at Research Data and the importance of data-driven science concentrating on three exemplars from the UK. The requirements for the storage and preservation of research data will be explored and the potential of tools offered by Ex Libris investigated to see what it required. Finally, the paper will map the findings of the paper in terms of network developments, Open Access to research publications, and the storage and re-use of research data against the findings of the opening section – the strategic needs of European research Universities. This paper will end by identifying how the technical developments outlined in the paper need to be aligned with the top-level strategic needs of European Universities in order for research libraries to support their home Universities.
Business and Corporate Christmas Cards from Hallmark. Celebrate the spirit of the season with your customers and clients by sending them Christmas business greetings from Hallmark. You can choose from cards that honor the religious importance of Christmas to one that simply celebrates the festive excitement of this joyous holiday. More information at hallmark.businessgreetings.com/holiday-b usinesschristmasc...
Haystack 2019 - Ontology and Oncology: NLP for Precision Medicine - Sean MullaneOpenSource Connections
This session gives an overview of the importance of precision medicine in cancer treatment and describes an approach used by UVA in the TREC 2018 Precision Medicine workshop. The PM track aims to encourage research into precision oncology medicine to provide more relevant information to physicians and researchers.
For this task we ranked articles from a corpus of bio-medical article abstracts from PubMed and MEDLINE for relevance for the treatment, prevention, and prognosis of the disease given specific medical information about each patient.
We demonstrated using a flexible graph-based query expansion method that existing medical ontologies can be leveraged to improve precision in document relevance ranking with little to no other clinical input.
Microtask crowdsourcing for disease mention annotation in PubMed abstractsBenjamin Good
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Benjamin M. Good, Max Nanis, Andrew I. Su
Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses that would otherwise be impossible. As a result, many biological natural language processing (BioNLP) projects attempt to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are vital to the process of knowledge extraction but are always in short supply. Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text.
Here, we investigated the use of the AMT in capturing disease mentions in Pubmed abstracts. We used the recently published NCBI Disease corpus as a gold standard for refining and benchmarking the crowdsourcing protocol. After merging the responses from 5 AMT workers per abstract with a simple voting scheme, we were able to achieve a maximum f measure of 0.815 (precision 0.823, recall 0.807) over 593 abstracts as compared to the NCBI annotations on the same abstracts. Comparisons were based on exact matches to annotation spans. The results can also be tuned to optimize for precision (max = 0.98 when recall = 0.23) or recall (max = 0.89 when precision = 0.45). It took 7 days and cost $192.90 to complete all 593 abstracts considered here (at $.06/abstract with 50 additional abstracts used for spam detection).
This experiment demonstrated that microtask-based crowdsourcing can be applied to the disease mention recognition problem in the text of biomedical research articles. The f-measure of 0.815 indicates that there is room for improvement in the crowdsourcing protocol but that, overall, AMT workers are clearly capable of performing this annotation task.
Can Computers understand the scientific literature (includes compscie material)TheContentMine
Published on Jan 24, 2014 by PMR
With the semantic web machines can autonomously carry out many knowledge-based tasks as well as humans. The main problems are not technical but the prevention of access to information. I advocate automatic downloading and indexing of all scientific information
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)Erich Gombocz
Semantic W3C standards provide a framework for the creation of knowledge bases that are extensible, coherent, interoperable, and on which interactive analytics systems can be developed. A growing number of knowledge bases are being built on these standards— in particular as Linked Open Data (LOD) resources, and their availability has received increasing attention in industry and academia. Using LOD resources to provide value to industry is challenging, however, and early expectations have not always been met: issues arise from the alignment of public and experimental corporate standards, from inconsistent URI policies, and from the use of internal, non-formal application ontologies. To add to this, often the reliability of resources is problematic, from service levels to SPARQL endpoint uptime to URI persistence. Not the least, in many cases provenance issues have not properly resolved, and there are serious funding concerns related to government grant-backed resources. For this reasons, an integrated data appliance (iDA) preloaded with semantically integrated public knowledgebases provides an enterprise-ready “Semantics In-a-box” solution to address those shortcomings effectively.
Data dialogue - Human Genomic Data DiscoveryFiona Nielsen
Presenting at The Data Dialogue. Time to Share: Navigating Boundaries & Benefits - Afternoon session: Sharing difficult data.
July 28 - 2016 @ University of Cambridge
http://www.ses.ac.uk/event/data-dialogue-time-share-navigating-boundaries-benefits/
In this talk I present an overview of human genomic data sources around the world, their funding, access policies and type of data they contain. Discussing why data sharing is hard, including issues of data privacy and a research culture that does not incentivise sharing of data and results.
Presented by Fiona Nielsen, founder and CEO of Repositive
http://repositive.io
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...Human Variome Project
The success of whole exome sequencing (WES) for highly heterogeneous disorders, such as mitochondrial disease, is limited by substantial technical and bioinformatics challenges to correctly identify and prioritize the extensive number of sequence variants present in each patient. The likelihood of success can be greatly improved if a large cohort of patient data is assembled in which sequence variants can be systematically analysed, annotated, and interpreted relative to known phenotype. This effort has engaged and united more than 100 international mitochondrial clinicians, researchers, and bioinformaticians in the Mitochondrial Disease Sequence Data Resource (MSeqDR) consortium that formed in June 2012 to identify and prioritize the specific WES data analysis needs of the global mitochondrial disease community. Through regular web-based meetings, we have familiarized ourselves with existing strengths and gaps facing integration of MSeqDR with public resources, as well as the major practical, technical, and ethical challenges that must be overcome to create a sustainable data resource. We have now moved forward toward our common goal by establishing a central data resource (http://mseqdr.org/) that has both public access and secure web-based features that allow the coherent compilation, organization, annotation, and analysis of WES and mtDNA genome data sets generated in both clinical- and research-based settings of suspected mitochondrial disease patients. The most important aims of the MSeqDR consortium are summarized in the MSeqDR portal within the Consortium overview sections. Consortium participants are organized in 3 working groups that include (1) Technology and Bioinformatics; (2) Phenotyping, databasing, IRB concerns and access; and (3) Mitochondrial DNA specific concerns. The online MSeqDR resource is organized into discrete sections to facilitate data deposition and common reannotation, data visualization, data set mining, and access management. With the support of the United Mitochondrial Disease Foundation (UMDF) and the NINDS/NICHD U54 supported North American Mitochondrial Disease Consortium (NAMDC), the MSeqDR prototype has been built. Current major components include common data upload and reannotation using a novel HBCR based annotation tool that has also been made publicly available through the website, MSeqDR GBrowse that allows ready visualization of all public and MSeqDR specific data including labspecific aggregate data visualization tracks, MSeqDR-LSDB instance of nearly 1250 mitochondrial disease and mitochodnrial localized genes that is based on the Locus Specific Database model, exome data set mining in individuals or families using the GEM.app tool, and Account & Access Management. Within MSeqDR GBrowse it is now possible to explore data derived from MitoMap, HmtDB, ClinVar, UCSC-NumtS, ENCODE, 1000 genomes, and many other resources that bioinformaticians recruited to the project are organizing.
Join us in Boston this coming Fall to attend Cambridge Healthtech Institute's (CHI) 2nd Annual FAST: Functional Analysis & Screening Technologies Congress on November 17-19, 2014 and meet with a community of 250+ biologists, screening managers, assay developers, engineers and pharmacologists dedicated to improving in vitro cell models and phenotypic screening to advance drug discovery and development at 6 conferences: Phenotypic Drug Discovery (Part I & II), Engineering Functional 3D Models, Screening and Functional Analysis of 3D Models, Organotypic Culture Models for Toxicology and Physiologically-Relevant Cellular Tumor Models for Drug Discovery. Delegates have the opportunity to share insights in interactive panel discussions and connect during networking breaks. View innovative technologies and scientific research revolutionizing early-stage drug discovery in the exhibit/poster hall.
Cambridge Healthtech Institute (CHI) is pleased to announce the Third Annual FAST: Functional Analysis and Screening Technologies Congress. Now in its third year, the FAST Congress brings you the latest technologies and research in cellular screening.
The Third Annual Phenotypic Drug Discovery meeting will return with new updates and case studies in phenotypic screening, high-content analysis, physiologically-relevant cellular models, chemical genomics and chemical proteomics. The rapidly evolving area of 3D cellular models will be addressed by two back-to-back meetings, with the Inaugural 3D Cell Culture: Organoid, Spheroid, and Organ-on-a-Chip Models meeting focusing on the new predictive cellular models for drug discovery and toxicity assessment. It will review the use of primary and stem cells, complex co-culture cell models, tumor spheroid models, novel organ-on-a-chip models for efficacy and safety screening, functional analysis, and compound profiling. The Third Annual Screening and Functional Analysis of 3D Models meeting will follow with case studies of phenotypic and high-content screening of complex 3D cellular systems for compound and target selection.
The 2014 Congress attracted more than 250 senior delegates, representing over 160 companies from 20 countries. With half of the attendees from big pharma and biotech and a third from academia and government, the FAST Congress offers exclusive networking opportunities with diverse international attendance. Please join our focused Screening event and learn from 60+ scientific presentations, an assortment of educational courses, 20+ exhibitors and your fellow expert delegates. We look forward to seeing you at the event.
Can SAR Database: An Overview on System, Role and Applicationinventionjournals
: The intention of this paper is to provide an technical overview on the largest cancer database, the canSAR database system. This overview includes the basic definitions and terminology, findings and advancements infield of cancer research through canSAR database, with basic system architecture, design, data source, processing pipelines, screening tests and structure activity relationship of system.
Similar to Launch: The Hague Declaration on Knowledge Discovery in the Digital Age (20)
LIBER Webinar: Turning FAIR Data Into RealityLIBER Europe
These slides relate to a LIBER Webinar given on 23 April 2018. Turning FAIR Data Into Reality — Progress and Plans from the European Commission FAIR Data Expert Group.
In this webinar, Simon Hodson, Executive Director of CODATA and Chair of the FAIR Data Expert Group, and Sarah Jones, Associate Director at the Digital Curation Centre and Rapporteur, reported on the Group’s progress.
Copyright Reform: EU Legislative Process & LIBER AdvocacyLIBER Europe
LIBER's Copyright & Legal Matters Working Group met in Helsinki on 7 December 2017. This presentation, outlining the EU legislative process on copyright reform and LIBER advocacy, was given at the meeting by Helena Lovegrove, LIBER's Advocacy Adviser.
Enabling the Exchange and use of Data in AgricultureLIBER Europe
This presentation by Imma Subirats was part of the "Research Data Support Meets Disciplines: Opportunities & Challenges" workshop at LIBER's 2017 Annual Conference in Patras, Greece. For more information, see www.libereurope.eu
GDPR - Thoughts on the EU Data Protection Regulation, Research and LibrariesLIBER Europe
This presentation by Jonas Holm was part of the "Research Data Support Meets Disciplines: Opportunities & Challenges" workshop at LIBER's 2017 Annual Conference in Patras, Greece. For more information, see www.libereurope.eu
Research Data Services and Data Collections: Library Synergies for Economic R...LIBER Europe
This presentation by Thomas Bourke was part of the "Research Data Support Meets Disciplines: Opportunities & Challenges" workshop at LIBER's 2017 Annual Conference in Patras, Greece. For more information, see www.libereurope.eu
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Launch: The Hague Declaration on Knowledge Discovery in the Digital Age
1. Elsevier TDM Policy
• Access through API only
• Text only- no images, tables
• Research must register details
• Click-through licence
• Terms can change any time
• Reproducibility of results
2. Knowledge Discovery in 3 Hops
• Human computers
• Shakespeare’s frugality with words
• Cancer diagnosis
4. Use of words(2009)
Marsden J, Budden D, Craig H, Moscato P (2013) Language Individuation and Marker Words:
Shakespeare and His Maxwell's Demon. PLoS ONE 8(6): e66813. doi:10.1371/journal.pone.0066813
6. The point?
• Multidisciplinary practitioner-led approach
• Potential will be more than we can begin to
imagine now
• Open underpins evolution
• Cannot errect artificial barriers
Physicist T.C. Mendenhall hired 2 women to count the length of words in Shakepeares works. Word length frequency curve remains consistent- way to ascertain authenticity. Unlike most English authors he used more 4 letter words than 3 letter words. No correlation to Bacon but (as was discovered years later) was as similar to Christopher Marlowe (another Elizabethan playwrite and poet) as he was to himself
Marker of Shakespeares writing is his comparative underuse of words and selections of words
Identified 20 most used words across body of literature of period. New scoring of markers not just based on the use of words but on the underuse. Excellent methods for identifying markers in large datasets fluctuations of the observed frequencies of words
all
to (infinitive)
now
ye
Biomarker could be elevated enzyme levels, help to indvidualise treatment, panels of biomarkers more effective, increaing sensitivity and specificity. application of scoring method developed previously- mislabelled samples, significant outliers etc in big data