Creating a Truly Innovative Holistic System that Captures and Channels Insights out to the Right People.
Global Data Office Biogen
Sebastien Lefebvre, Sr Director
This presentation was provided by Marilyn White, Katelynd Bucher, and Briget Wynne, all of NIST, during the NISO webinar, Engineering Access Under the Hood, Part Two, held on November 15, 2017.
Hadoop and Data Virtualization - A Case Study by VHADenodo
Access to full webinar: http://goo.gl/dQjxRe
This webinar by Hortonworks, VHA and Denodo provides information about the functionalities and benefits of Hadoop in Modern Data Architectures; how Hadoop along with data virtualization simplify data management and enable faster data discovery; and what data virtualization can offer in big data projects. VHA explains how they deployed data virtualization and Hadoop together and presents their lessons learned and best practices for data lake and data virtualization deployment.
This presentation was provided by Marilyn White, Katelynd Bucher, and Briget Wynne, all of NIST, during the NISO webinar, Engineering Access Under the Hood, Part Two, held on November 15, 2017.
Hadoop and Data Virtualization - A Case Study by VHADenodo
Access to full webinar: http://goo.gl/dQjxRe
This webinar by Hortonworks, VHA and Denodo provides information about the functionalities and benefits of Hadoop in Modern Data Architectures; how Hadoop along with data virtualization simplify data management and enable faster data discovery; and what data virtualization can offer in big data projects. VHA explains how they deployed data virtualization and Hadoop together and presents their lessons learned and best practices for data lake and data virtualization deployment.
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
The slides from the Pistoia Alliance Debates Webinar where a panel of experts from technology support providers and the biopharma industry, who have been invited to share their views on the "Benefits and costs of FAIR Implementation for life science industry".
Data analytics is extremely powerful, but trouble is on the horizon. This deck makes the case for a new approach to data science to gain its benefits without the downfalls.
Every company collaborates on data—and often it’s done with unsecured, untraceable email attachments.
Keeping control over the process and the data is tough, consolidating or re-using it almost impossible—not to talk about data security or process audits.
rogr.io is the secure platform to asynchronouslyrequest, control, distribute, and collaborate on data.
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsBrett Tully
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
Islam M1,2, Christiansen J3, Mahboob S4, Valova V4, Baker M4, Capes-Davis D4, Hains P4, Balleine R1,4, Zhong Q1,4, Reddel R1,4, Robinson P1,4, Tully B4
1 The University of Sydney, Camperdown, Sydney, NSW, 2050, Australia
2 Intersect, Level 13/50 Carrington St, Sydney, NSW, 2000, Australia
3 Queensland Cyber Infrastructure Foundation Ltd, Axon Building 47, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
4 Children’s Medical Research Institute, Westmead, NSW, 2145, Australia
Background
The ACRF International Centre for the Proteome of Cancer (ProCan) at Children’s Medical Research Institute (CMRI) is an “industrial scale” program specialising in small-sample proteomics analysis from human cancer tissue.
ProCan seeks to generate both a wide and deep analytics pipeline and requires an enabling data framework. The framework must accommodate initial analysis and proteomic profiling of a large number of tumor samples, along with the clinical and demographic information, subsequent multi-omics studies, and any previously recorded responses to treatment. The curated datasets will provide a valuable resource beyond their primary use and ProCan is committed to making its data accessible to collaborators and the wider scientific community.
Objectives
The objective of the project is to an establish efficient, reliable, secure and ethical data sharing and publication framework based on the best practice data sharing principles, such as the FAIR principle. The framework must address various challenges that stem from the scale and complexity of the program, and ProCan’s focus on human-derived data and associated challenges presented in sharing these data while maintaining the privacy of any research participants.
Method
The project adopted a requirements-driven methodology and engaged with a wide range of ProCan stakeholders nationally and internationally. Together, various industrial-scale proteomics data management and sharing scenarios were explored such that robust and ethical sharing of the data would be achieved.
Results
The project developed a data sharing framework based on the FAIR principle that currently forms the basis of ongoing implementation work within the ProCan program.
We live in a time where digital technology is profoundly impacting our lives, from the way we connect with each other to how we interpret our world. First and foremost, this digital transformation is causing a tsunami of data. In fact, IDC estimates that in 2025, the world will create and replicate 163ZB of data, representing a tenfold increase from the amount of data created in 2016. In the past, organizations primarily dealt with documents and emails. But now they’re also dealing with instant messaging, text messaging, video files, images, and DIO files. The internet of things, or IOT, will only add to this explosion in data.
Managing this data overload and the variety of devices from which it is created is complicated and onerous as the market for solutions is fragmented and confusing. There are many categories of solutions, and within each, there are even more solutions to choose from. Many companies are struggling to decide how many of those solutions they need and where to start. Additionally, using multiple solutions means they won’t be integrated, so companies end up managing multiple applications from multiple disparate interfaces.
The question we often get asked is, “How can Microsoft 365 help me?”
This presentation was provided by Ellen Rotenberg and Rick Stevenson, both of Clarivate Analytics, during the NISO Webinar, Engineering Access Under the Hood, Part One, held on Wednesday, November 1, 2017.
Speakers from Australian Data Archive described their journey to become a trusted data repository certified with the CoreTrustSeal. Webinar given on 13th March 2018.
Recordings and transcript available from the ANDS website: http://www.ands.org.au/news-and-events/presentations/2018
The universe of identifiers and how ANDS is using themAndrew Treloar
Presentation on identifiers in general, and ANDS' approach to identifiers for objects and people in particular. Given at ODIP 3rd Workshop on August 7, 2014.
Understanding human information
•Access and understand virtually any source of information on-premise and in the cloud
•A strategic pillar of HP’s HAVEnBig Data platform
•Non-disruptive, manage-in-place approach complements any organization
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
The slides from the Pistoia Alliance Debates Webinar where a panel of experts from technology support providers and the biopharma industry, who have been invited to share their views on the "Benefits and costs of FAIR Implementation for life science industry".
Data analytics is extremely powerful, but trouble is on the horizon. This deck makes the case for a new approach to data science to gain its benefits without the downfalls.
Every company collaborates on data—and often it’s done with unsecured, untraceable email attachments.
Keeping control over the process and the data is tough, consolidating or re-using it almost impossible—not to talk about data security or process audits.
rogr.io is the secure platform to asynchronouslyrequest, control, distribute, and collaborate on data.
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsBrett Tully
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
Islam M1,2, Christiansen J3, Mahboob S4, Valova V4, Baker M4, Capes-Davis D4, Hains P4, Balleine R1,4, Zhong Q1,4, Reddel R1,4, Robinson P1,4, Tully B4
1 The University of Sydney, Camperdown, Sydney, NSW, 2050, Australia
2 Intersect, Level 13/50 Carrington St, Sydney, NSW, 2000, Australia
3 Queensland Cyber Infrastructure Foundation Ltd, Axon Building 47, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
4 Children’s Medical Research Institute, Westmead, NSW, 2145, Australia
Background
The ACRF International Centre for the Proteome of Cancer (ProCan) at Children’s Medical Research Institute (CMRI) is an “industrial scale” program specialising in small-sample proteomics analysis from human cancer tissue.
ProCan seeks to generate both a wide and deep analytics pipeline and requires an enabling data framework. The framework must accommodate initial analysis and proteomic profiling of a large number of tumor samples, along with the clinical and demographic information, subsequent multi-omics studies, and any previously recorded responses to treatment. The curated datasets will provide a valuable resource beyond their primary use and ProCan is committed to making its data accessible to collaborators and the wider scientific community.
Objectives
The objective of the project is to an establish efficient, reliable, secure and ethical data sharing and publication framework based on the best practice data sharing principles, such as the FAIR principle. The framework must address various challenges that stem from the scale and complexity of the program, and ProCan’s focus on human-derived data and associated challenges presented in sharing these data while maintaining the privacy of any research participants.
Method
The project adopted a requirements-driven methodology and engaged with a wide range of ProCan stakeholders nationally and internationally. Together, various industrial-scale proteomics data management and sharing scenarios were explored such that robust and ethical sharing of the data would be achieved.
Results
The project developed a data sharing framework based on the FAIR principle that currently forms the basis of ongoing implementation work within the ProCan program.
We live in a time where digital technology is profoundly impacting our lives, from the way we connect with each other to how we interpret our world. First and foremost, this digital transformation is causing a tsunami of data. In fact, IDC estimates that in 2025, the world will create and replicate 163ZB of data, representing a tenfold increase from the amount of data created in 2016. In the past, organizations primarily dealt with documents and emails. But now they’re also dealing with instant messaging, text messaging, video files, images, and DIO files. The internet of things, or IOT, will only add to this explosion in data.
Managing this data overload and the variety of devices from which it is created is complicated and onerous as the market for solutions is fragmented and confusing. There are many categories of solutions, and within each, there are even more solutions to choose from. Many companies are struggling to decide how many of those solutions they need and where to start. Additionally, using multiple solutions means they won’t be integrated, so companies end up managing multiple applications from multiple disparate interfaces.
The question we often get asked is, “How can Microsoft 365 help me?”
This presentation was provided by Ellen Rotenberg and Rick Stevenson, both of Clarivate Analytics, during the NISO Webinar, Engineering Access Under the Hood, Part One, held on Wednesday, November 1, 2017.
Speakers from Australian Data Archive described their journey to become a trusted data repository certified with the CoreTrustSeal. Webinar given on 13th March 2018.
Recordings and transcript available from the ANDS website: http://www.ands.org.au/news-and-events/presentations/2018
The universe of identifiers and how ANDS is using themAndrew Treloar
Presentation on identifiers in general, and ANDS' approach to identifiers for objects and people in particular. Given at ODIP 3rd Workshop on August 7, 2014.
Understanding human information
•Access and understand virtually any source of information on-premise and in the cloud
•A strategic pillar of HP’s HAVEnBig Data platform
•Non-disruptive, manage-in-place approach complements any organization
Webinar: Leveraging big data in life sciences & healthcareKnowledgent
Slides from May 2014 webinar hosted by Knowledgent, Hortonworks, and the CEOi. Titled “Leveraging Big Data in the Life Sciences and Healthcare,” the webinar featured thoughts on using big data to further Alzheimer's care delivery from Justin Sears, Industry Specialist at Hortonworks, Drew Holzapfel, Executive Director from the Global CEO Initiative on Alzheimer’s Disease, and Knowledgent's Tom Johnstone and Chris Young.
Fair webinar, Ted slater: progress towards commercial fair data products and ...Pistoia Alliance
Elsevier is a global information analytics business that helps institutions and professional’s
advance healthcare and open science to improve performance for the benefit of humanity.
In this webinar, we discuss how Elsevier is increasingly leveraging the FAIR Guiding Principles to improve its products and services to better serve the scientific community.
SciBite is an award-winning leading provider of semantic solutions for the life sciences industry. Our fast, scalable easy-to-use semantic technologies understand the complexity and variability of content within life sciences. We can quickly identify and extract scientific terminology from unstructured text and transform it into valuable machine-readable data for your downstream applications. Our hand-curated ontologies ensure accuracy and reliability of high-quality results. Headquartered in the UK, we support our customers with additional sites in the US and Japan.
More infos at: www.scibite.com
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
Themes and objectives:
To position FAIR as a key enabler to automate and accelerate R&D process workflows
FAIR Implementation within the context of a use case
Grounded in precise outcomes (e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership)
To make data actionable through FAIR interoperability
Speakers:
Mathew Woodwark,Head of Data Infrastructure and Tools, Data Science & AI, AstraZeneca
Erik Schultes, International Science Coordinator, GO-FAIR
Georges Heiter, Founder & CEO, Databiology
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo
Watch full webinar here: https://bit.ly/3rrE6rh
Self service is a major goal of modern data strategists. Denodo’s data catalog is a key piece in Denodo’s portfolio to bridge the gap between the technical data infrastructure and business users. It provides documentation, search, governance and collaboration capabilities, and data exploration wizards. It’s the perfect companion for a virtual layer to fully empower those self service initiatives with minimal IT intervention. It provides business users with the tool to generate their own insights with proper security, governance and guardrails.
In this session we will see:
- The role of a virtual semantic layer in self service initiatives
- What are the key capabilities of Denodo’s new Data Catalog
- Best practices and advanced tips for a successful deployment
- How customers are using the Denodo’s Data Catalog to enable self-service initiatives
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
1. 1
Creating a Truly Innovative Holistic
System that Captures and
Channels Insights out to the Right
People
Global Data Office
Sebastien Lefebvre, Director Data Engineering & Technology
BiogenIdec Feb 2015
Sebastien.lefebvre@biogenidec.com
2. Biogen Idec - Confidential
DATA / INFORMATION SHARING CHALLENGES
Enterprise Content Management restrict access to most people
Increasing partnerships and collaborations across the industry
Increasing internal and external complexity of information sources
Provide simplified access to an ever-growing corpus of information
Low accessibility to information and data
Low quality of information curation and
lack of ability to integrate disparate
information
Ad hoc nature to managing flow of
information with collaborators and partners
High accessibility to information through
search and knowledge dashboard capabilities
High linkage across information and datasets
A single simplified external partner and
collaborator platform for information sharing
3. Biogen Idec - Confidential
OUR APPROACH
Knowledge is a shared resource that we need to collect, evolve, and
leverage
3-clicks to any information
Introduce Content Analytics
MDM, Search, Dashboards, Graph computing, Information Portal,
Information Hub…old concepts meet new technologies
Get user adoption one iteration at a time…and do Beta releases
5. Biogen Idec - Confidential
LET’S LOOK BACK AT 2014
• Reduced
compliance risk
• Improved ability to
respond
• More time
available for
innovative work
• Data and
information are
available for
analysis and use
when needed
• Streamline data
and information
flow across R&D
Sharing data
and information
Data and
Information
Dashboard
Common
Vocabulary and
Data Catalog
Promote re-use
of data,
information
• Knowledge Discovery,
Trends & Insights
• Information &
Data Catalog
• Improved ability to find and
integrate data/information
• Bring Data and
Information together
within a given context and
ready for analysis
• Leverage existing
findings/work
• Real time data
analysis
• Recommendation
engine
Xlab
Collaboration
Service
• Capture the language
of Biogen
• Provide the right environment
D360
Fly2Man
6. Biogen Idec - Confidential
ETL & ELT
Search Platform
Data / Information Lake
Social Information Portal
Knowledge Dashboard
Solution & Approach
Taxonomies
Dictionaries
TrustedKeys
Lexicon
Ownership, Security, Lineage
Old concepts meets new technological capabilities
Apply constant
transformation
Graph Computing
FingerPrinting
Crowd Curation
Channeling
Profiling
xlab
D360
Search Based Applications
Fly2Man
7. Biogen Idec - Confidential
SEARCH & ANALYTICS PLATFORM
Index
Document
Record
Linguistics
(Concepts & Similarity)
Language of Science
(Concepts & Vocabs)
Custom
Semantic
Enrichment
TMA
Taxonomies
Dictionaries
TrustedKeys
Lexicon
Faceted Search
{Entity, {Document, {Concepts}}}
Trend Analytics
API calls
Social Portal
{Document, {Concepts}}
Content Analytics
{Document Cluster,
{Concepts}}
Content Clustering
8. Biogen Idec - Confidential
Fatigue severity scaleWorking memory
25,000 abstracts over 10 years
332 abstracts
objectively interpret 10 years of congress abstracts in less than 4 seconds
xlab1,472 abstracts