This document summarizes a keynote presentation about big data integration in the context of drug discovery. It discusses challenges with integrating diverse data sources, including issues with data volume, variety, veracity, and velocity. It presents the Open PHACTS platform as a case study, which integrates multiple biomedical databases into a single access point using semantic web technologies. Open PHACTS has developed apps and APIs to enable complex queries across integrated data related to diseases, tissues, targets, compounds and pathways. The talk highlights ongoing work to address issues like data licensing, identity resolution, quantitative data standards, quality assurance, and data provenance tracking in big data integration efforts.
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
It is widely known that the discovery, development, and commercialization of new classes of drugs can take 10-15 years and greater than $5 billion in R&D investment only to see less than 5% of the drugs make it to market.
AstraZeneca is a global, innovation-driven biopharmaceutical business that focuses on the discovery, development, and commercialization of prescription medicines for some of the world’s most serious diseases. Our scientists have been able to improve our success rate over the past 5 years by moving to a data-driven approach (the “5R”) to help develop better drugs faster, choose the right treatment for a patient and run safer clinical trials.
However, our scientists are still unable to make these decisions with all of the available scientific information at their fingertips. Data is sparse across our company as well as external public databases, every new technology requires a different data processing pipeline and new data comes at an increasing pace. It is often repeated that a new scientific paper appears every 30 seconds, which makes it impossible for any individual expert to keep up-to-date with the pace of scientific discovery.
To help our scientists integrate all of this information and make targeted decisions, we have used Spark on Azure Databricks to build a knowledge graph of biological insights and facts. The graph powers a recommendation system which enables any AZ scientist to generate novel target hypotheses, for any disease, leveraging all of our data.
In this talk, I will describe the applications of our knowledge graph and focus on the Spark pipelines we built to quickly assemble and create projections of the graph from 100s of sources. I will also describe the NLP pipelines we have built – leveraging spacy, bioBERT or snorkel – to reliably extract meaningful relations between entities and add them to our knowledge graph.
Preservation Metadata, CARLI Metadata Matters series, December 2010Claire Stewart
This document discusses preservation metadata and provides examples of how it can be implemented. Preservation metadata supports ensuring the long-term usability of digital resources by documenting their creation, format, and any events that impact them over time. The document outlines the PREMIS data model and provides sample PREMIS XML documents following that model. It also presents case studies of how preservation metadata has been implemented for Northwestern University's digitized book collection and by organizations like Portico and HathiTrust.
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...Work-Bench
This document summarizes a statistician's experience working at a healthcare technology startup that uses electronic health record data. It describes how the company initially had just one quantitative scientist but grew its team to include 70 software engineers and 10 quantitative scientists. It discusses how the company cultivated an R culture through internal packages, training, and hiring. It provides examples of when the company uses R for prototyping but implements in other languages for production, when R is used as a long-term solution, and when R and other languages are used in parallel for analysis.
GORM, which started as a part of Grails framework is now a standalone library. Developers can use GORM for developing the data layer of your applications. This presentation demonstrates how GORM provides a unified API for working across different types of data stores without sacrificing their uniqueness & strength.
1) BCG's Gamma division provides data science teams and expertise to clients. It has over 550 analytics practitioners worldwide with experience across industries.
2) Gamma organizes data science teams with a mix of roles including data scientists, software engineers, and data engineers. It advocates for surgical teams based on principles from the 1970s.
3) Gamma views people, platform, and process as pillars of transformative analytics. It structures teams, advises on integrated systems to streamline projects, and takes a business-led approach through pilots and roadmaps.
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
It is widely known that the discovery, development, and commercialization of new classes of drugs can take 10-15 years and greater than $5 billion in R&D investment only to see less than 5% of the drugs make it to market.
AstraZeneca is a global, innovation-driven biopharmaceutical business that focuses on the discovery, development, and commercialization of prescription medicines for some of the world’s most serious diseases. Our scientists have been able to improve our success rate over the past 5 years by moving to a data-driven approach (the “5R”) to help develop better drugs faster, choose the right treatment for a patient and run safer clinical trials.
However, our scientists are still unable to make these decisions with all of the available scientific information at their fingertips. Data is sparse across our company as well as external public databases, every new technology requires a different data processing pipeline and new data comes at an increasing pace. It is often repeated that a new scientific paper appears every 30 seconds, which makes it impossible for any individual expert to keep up-to-date with the pace of scientific discovery.
To help our scientists integrate all of this information and make targeted decisions, we have used Spark on Azure Databricks to build a knowledge graph of biological insights and facts. The graph powers a recommendation system which enables any AZ scientist to generate novel target hypotheses, for any disease, leveraging all of our data.
In this talk, I will describe the applications of our knowledge graph and focus on the Spark pipelines we built to quickly assemble and create projections of the graph from 100s of sources. I will also describe the NLP pipelines we have built – leveraging spacy, bioBERT or snorkel – to reliably extract meaningful relations between entities and add them to our knowledge graph.
Preservation Metadata, CARLI Metadata Matters series, December 2010Claire Stewart
This document discusses preservation metadata and provides examples of how it can be implemented. Preservation metadata supports ensuring the long-term usability of digital resources by documenting their creation, format, and any events that impact them over time. The document outlines the PREMIS data model and provides sample PREMIS XML documents following that model. It also presents case studies of how preservation metadata has been implemented for Northwestern University's digitized book collection and by organizations like Portico and HathiTrust.
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...Work-Bench
This document summarizes a statistician's experience working at a healthcare technology startup that uses electronic health record data. It describes how the company initially had just one quantitative scientist but grew its team to include 70 software engineers and 10 quantitative scientists. It discusses how the company cultivated an R culture through internal packages, training, and hiring. It provides examples of when the company uses R for prototyping but implements in other languages for production, when R is used as a long-term solution, and when R and other languages are used in parallel for analysis.
GORM, which started as a part of Grails framework is now a standalone library. Developers can use GORM for developing the data layer of your applications. This presentation demonstrates how GORM provides a unified API for working across different types of data stores without sacrificing their uniqueness & strength.
1) BCG's Gamma division provides data science teams and expertise to clients. It has over 550 analytics practitioners worldwide with experience across industries.
2) Gamma organizes data science teams with a mix of roles including data scientists, software engineers, and data engineers. It advocates for surgical teams based on principles from the 1970s.
3) Gamma views people, platform, and process as pillars of transformative analytics. It structures teams, advises on integrated systems to streamline projects, and takes a business-led approach through pilots and roadmaps.
Weather is part of our everyday lives. Who doesn’t check the rain radar before heading out, or the weather forecast when planning a weekend away? But where does this data come from, and what is it made of? The answer is a mix of measurements, models and statistics, meaning that the use of weather and climate data can get complex very quickly. This session provides a brief overview of the science behind weather and climate forecasts and provides you with the tools to get started with weather data - even if you aren't a meteorologist.
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
Organizations are utilizing Sqrrl Enterprise to securely integrate vast amounts of multi-structured data (e.g., tens of petabytes) onto a single Big Data platform and then are building real-time applications using this data and Sqrrl Enterprise’s analytical interfaces. The secure integration is enabled by Accumulo’s innovative cell-level security capabilities and Sqrrl Enterprise’s security extensions, such as encryption.
Keynote at Gateways 2017 Conference, Ann Arbor MI
Speaker: Ian Stokes-Rees
"Connecting Cyberinfrastructure Back To The Laptop"
Science Gateways today are generally built to provide a web-accessible interface for a particular scientific community to access a combination of software, hardware, and data deployed in an expertly managed computing center. But what happens when the scientist wants to repatriate their data? Or perform some analysis that is not supported by the gateway? Both for the purposes of encouraging innovative workflows and serving an audience with a wide range of computational experience it is important to consider how a gateway can fit into the broader computational ecosystem of a particular researcher or research group. One simple starting point for this is to ask the question "how can the gateway connect back to the laptop?". This talk will consider how this is being done today in science gateways and present some ideas for how this could be expanded in the future.
Leading organizations today all have data scientists and analytics teams. A key challenge is establishing cross-functional teams that can collaboratively derive insights from data and move exploratory interactive analytics into automated production systems. Boston Consulting Group, founded on quantitative decision making, guides global F500 companies in the technical and organizational structures that will provide a foundation for agility, innovation, and competitive advantage. This talk will outline key strategies for building effective cloud-native analytics teams.
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationGreg Goltsov
This document appears to be a slide deck presentation on empowering others through data exploration. The presentation discusses removing barriers to data, making feedback fast, and removing yourself from blocking others. It emphasizes visualizing data pipelines and augmenting data warehouses with data lakes to handle varying data volumes, varieties, and velocities. The goal is to turn data into insights that create business value.
The document summarizes Second Genome's Helios2 platform for discovering drugs and biomarkers from microbiome data. It describes how the platform collects clinical microbiome data and conducts multi-omics analysis to find bacterial biomarkers. It then uses these biomarkers to select bacterial polypeptide therapeutic candidates and test them in disease models. The key technology underpinning the platform is a Neo4j graph database called SGKnowledgeBase that organizes omics data and clinical metadata for systematic mining. Future work aims to integrate additional biomedical data layers and network analysis features to further accelerate discovery.
The document discusses the evolution of big data and Hadoop technologies over time. It summarizes that Google published the seminal paper on MapReduce in 2004. Yahoo developed the first version of Hadoop in 2005 which later became open source. Hadoop 1.0 was released in 2011 and Hadoop 2.0 in 2013. The document also notes that the big data landscape extends beyond Hadoop to include other open source technologies like Spark, Storm, HBase and MongoDB. It argues that different use cases have different requirements for scale and speed that are best suited by different big data capabilities.
A talk from AnacondaCON presenting my personal journey from physics to finance to biology and how collaborative team-based data science has been the big enabler. The talk looks at Python, Big Data, Jupyter Notebooks, Anaconda. Discusses CERN LHCb particle physics computing, protein structure determination, and patterns in data science.
The document discusses big data and business analytics. It notes that the volume of data created in the last two years is greater than the previous history and is estimated to grow 50 times by 2020. It highlights challenges of volume, velocity, and variety of data and the importance of analyzing data to run and change businesses. The document promotes Oracle's comprehensive big data solutions including Hadoop, NoSQL databases, and analytics applications.
This document is a curriculum vitae for Ravi Kumar. It outlines his personal and contact information, objectives, professional experience working as a billing executive and operations executive, educational qualifications including an MBA in healthcare administration and bachelor's degree in hospital management, professional qualifications and skills including Microsoft Office, networking, problem solving and languages. It also provides details on his extracurricular activities, hobbies and interests.
This document provides an introduction to HTML basics, including:
- The objectives of learning HTML tags to format text, add images, tables, colors and hyperlinks.
- Instructions on using a basic text editor and saving files with the .html extension to author HTML documents.
- Examples of basic HTML tags for headings, paragraphs, bold text, and line breaks.
Prabhat Kumar Singh is applying for the position of AME (A&C)/B1 Engineer. He has over 16 years of experience as an aircraft maintenance engineer and holds Cat A & C licenses for the ATR 72-212A and ATR 42-500. Currently he works for FlyMe (Villa Air) in the Maldives as a B1 certifying engineer and maintenance controller for the ATR 72-212A and ATR 42-500. Previously he has worked for several other aviation companies in India performing line and base maintenance on ATR and other aircraft types. He has undergone extensive training to maintain his certifications and skills.
Unsure how tight integration with the SAP HANA Cloud database can ensure optimal performance? Lesson Three of our IoT series will illustrate the true value of maximizing application development simplicity and deployment, while minimizing architectural layers. You’ll learn step-by-step how to create a web browser enabled XS application, which you can run directly on your Free SAP HANA Cloud trial account.
El documento describe el papel de Juan el Bautista según los evangelios. Juan el Bautista es presentado como precursor de Jesús y da testimonio de que Jesús es el Mesías, viendo al Espíritu Santo descender sobre él. Aunque sus seguidores lo veían como Elías, el Evangelio de Juan aclara que Juan el Bautista no era la luz sino testigo de la luz, que es Jesús.
The document summarizes surveys of global agricultural inputs and services exports from Israel from 2008-2011. It finds that while fertilizer exports declined, exports of other inputs such as irrigation systems, seeds, and equipment increased. As a result, total agricultural input exports returned to 2008 levels. It attributes Israel's success in this sector to decades of developing advanced technologies to address its own agricultural constraints of scarce land and water.
Weather is part of our everyday lives. Who doesn’t check the rain radar before heading out, or the weather forecast when planning a weekend away? But where does this data come from, and what is it made of? The answer is a mix of measurements, models and statistics, meaning that the use of weather and climate data can get complex very quickly. This session provides a brief overview of the science behind weather and climate forecasts and provides you with the tools to get started with weather data - even if you aren't a meteorologist.
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
Organizations are utilizing Sqrrl Enterprise to securely integrate vast amounts of multi-structured data (e.g., tens of petabytes) onto a single Big Data platform and then are building real-time applications using this data and Sqrrl Enterprise’s analytical interfaces. The secure integration is enabled by Accumulo’s innovative cell-level security capabilities and Sqrrl Enterprise’s security extensions, such as encryption.
Keynote at Gateways 2017 Conference, Ann Arbor MI
Speaker: Ian Stokes-Rees
"Connecting Cyberinfrastructure Back To The Laptop"
Science Gateways today are generally built to provide a web-accessible interface for a particular scientific community to access a combination of software, hardware, and data deployed in an expertly managed computing center. But what happens when the scientist wants to repatriate their data? Or perform some analysis that is not supported by the gateway? Both for the purposes of encouraging innovative workflows and serving an audience with a wide range of computational experience it is important to consider how a gateway can fit into the broader computational ecosystem of a particular researcher or research group. One simple starting point for this is to ask the question "how can the gateway connect back to the laptop?". This talk will consider how this is being done today in science gateways and present some ideas for how this could be expanded in the future.
Leading organizations today all have data scientists and analytics teams. A key challenge is establishing cross-functional teams that can collaboratively derive insights from data and move exploratory interactive analytics into automated production systems. Boston Consulting Group, founded on quantitative decision making, guides global F500 companies in the technical and organizational structures that will provide a foundation for agility, innovation, and competitive advantage. This talk will outline key strategies for building effective cloud-native analytics teams.
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationGreg Goltsov
This document appears to be a slide deck presentation on empowering others through data exploration. The presentation discusses removing barriers to data, making feedback fast, and removing yourself from blocking others. It emphasizes visualizing data pipelines and augmenting data warehouses with data lakes to handle varying data volumes, varieties, and velocities. The goal is to turn data into insights that create business value.
The document summarizes Second Genome's Helios2 platform for discovering drugs and biomarkers from microbiome data. It describes how the platform collects clinical microbiome data and conducts multi-omics analysis to find bacterial biomarkers. It then uses these biomarkers to select bacterial polypeptide therapeutic candidates and test them in disease models. The key technology underpinning the platform is a Neo4j graph database called SGKnowledgeBase that organizes omics data and clinical metadata for systematic mining. Future work aims to integrate additional biomedical data layers and network analysis features to further accelerate discovery.
The document discusses the evolution of big data and Hadoop technologies over time. It summarizes that Google published the seminal paper on MapReduce in 2004. Yahoo developed the first version of Hadoop in 2005 which later became open source. Hadoop 1.0 was released in 2011 and Hadoop 2.0 in 2013. The document also notes that the big data landscape extends beyond Hadoop to include other open source technologies like Spark, Storm, HBase and MongoDB. It argues that different use cases have different requirements for scale and speed that are best suited by different big data capabilities.
A talk from AnacondaCON presenting my personal journey from physics to finance to biology and how collaborative team-based data science has been the big enabler. The talk looks at Python, Big Data, Jupyter Notebooks, Anaconda. Discusses CERN LHCb particle physics computing, protein structure determination, and patterns in data science.
The document discusses big data and business analytics. It notes that the volume of data created in the last two years is greater than the previous history and is estimated to grow 50 times by 2020. It highlights challenges of volume, velocity, and variety of data and the importance of analyzing data to run and change businesses. The document promotes Oracle's comprehensive big data solutions including Hadoop, NoSQL databases, and analytics applications.
This document is a curriculum vitae for Ravi Kumar. It outlines his personal and contact information, objectives, professional experience working as a billing executive and operations executive, educational qualifications including an MBA in healthcare administration and bachelor's degree in hospital management, professional qualifications and skills including Microsoft Office, networking, problem solving and languages. It also provides details on his extracurricular activities, hobbies and interests.
This document provides an introduction to HTML basics, including:
- The objectives of learning HTML tags to format text, add images, tables, colors and hyperlinks.
- Instructions on using a basic text editor and saving files with the .html extension to author HTML documents.
- Examples of basic HTML tags for headings, paragraphs, bold text, and line breaks.
Prabhat Kumar Singh is applying for the position of AME (A&C)/B1 Engineer. He has over 16 years of experience as an aircraft maintenance engineer and holds Cat A & C licenses for the ATR 72-212A and ATR 42-500. Currently he works for FlyMe (Villa Air) in the Maldives as a B1 certifying engineer and maintenance controller for the ATR 72-212A and ATR 42-500. Previously he has worked for several other aviation companies in India performing line and base maintenance on ATR and other aircraft types. He has undergone extensive training to maintain his certifications and skills.
Unsure how tight integration with the SAP HANA Cloud database can ensure optimal performance? Lesson Three of our IoT series will illustrate the true value of maximizing application development simplicity and deployment, while minimizing architectural layers. You’ll learn step-by-step how to create a web browser enabled XS application, which you can run directly on your Free SAP HANA Cloud trial account.
El documento describe el papel de Juan el Bautista según los evangelios. Juan el Bautista es presentado como precursor de Jesús y da testimonio de que Jesús es el Mesías, viendo al Espíritu Santo descender sobre él. Aunque sus seguidores lo veían como Elías, el Evangelio de Juan aclara que Juan el Bautista no era la luz sino testigo de la luz, que es Jesús.
The document summarizes surveys of global agricultural inputs and services exports from Israel from 2008-2011. It finds that while fertilizer exports declined, exports of other inputs such as irrigation systems, seeds, and equipment increased. As a result, total agricultural input exports returned to 2008 levels. It attributes Israel's success in this sector to decades of developing advanced technologies to address its own agricultural constraints of scarce land and water.
This document discusses strategies for improving lambing and kidding percentages on sheep and goat farms. It begins by stating that the average lambing rate in the US in 2015 was 111 lambs per 100 ewes, and in Virginia it was 116% in 2015 and 104% in 2014. The rest of the document provides tips for increasing birthing rates, including focusing on genetics through selection and crossbreeding, optimal nutrition, culling underperforming females, matching breeding seasons to natural cycles, and accelerated lambing/kidding systems. The key factors that influence birthing rates are fertility, litter size, and survival from birth to weaning.
El Evangelio de Marcos es único. Juan Marcos era un judío que pertenecía a una familia acomodada de Jerusalén, su madre María acogió en muchas oportunidades a la naciente iglesia en su casa. Era sobrino de Bernabé y acompañó a Pablo durante su primer viaje misionero, pero al llegar a lo que hoy es Turquía lo abandonó. Años más tarde Bernabé y Pablo discuten por Juan Marcos, Pablo lo desecha y Bernabé le da una oportunidad. Luego lo vemos a los pies de Pedro, aprendiendo de él. Muchos llaman a su evangelio: “el evangelio de Pedro”. Finalmente se reconcilia con Pablo y es llamado a servir a su lado. El escribe con un objetivo determinado: Demostrar al pueblo romano que Jesús no era ningún revolucionario ni enemigo de Roma. Al contrario, presenta a Jesús con el título: “el hijo del hombre”. como un hombre leal y servicial. Se le ve sirviendo a la gente. En Ezequiel 1:10 y Apocalipsis 4:7 se mencionan a cuatro ángeles o seres vivientes, cada uno de ellos representa a uno evangelio. Como Marcos presenta a Jesús como aquel que no ha venido para que lo sirvan sino a servir, sería representado por quien es considerado el animal del arduo trabajo y servicio en las faenas agrícolas: el Buey. Marcos es el buey. Damos una breve explicación de las generalidades del libro y presentamos los pasajes más resaltantes de este libro.
Small ruminant nutrition is important as feed costs account for up to 70% of production costs. Proper nutrition is key to health, productivity and profitability. The main nutrients required are energy, protein, minerals, vitamins and water. Energy and protein requirements vary based on factors like species, size, production stage and desired performance. Common nutritional disorders include acidosis, bloat, copper toxicity, enterotoxemia and pregnancy toxemia. Proper feeding management is needed to meet requirements and prevent issues.
Modern storage structures for seeds and grains include silos and storage bins. Silos are large steel structures constructed in clusters at processing plants to store grains in bulk. Storage bins can be made from reinforced concrete, steel, aluminum, fiber glass or brick. Modern storage structures provide advantages like less expensive and easier handling, quality control, less space needed, and protection from losses due to birds and rodents. Silos and bins are classified as deep or shallow depending on whether a plane of rupture from the grain surface reaches the opposite side before emerging.
This document discusses the benefits of exercise for seniors, such as maintaining weight and reducing illness, and recommends getting medical approval before starting a slow exercise regimen incorporating cardio, strength, flexibility, and balance training to feel stronger. However, it also lists some common excuses seniors make for not exercising that are addressed in the benefits outlined.
In this session we will explore how Google's Cloud services (CloudML, Vision, Genomics API) can be used to process genomic and phenotypic data and solve problems in healthcare and agriculture.
Big Data is no longer considered a hype according to research firm Gartner, but it is an emerging trend that is here to stay. While Hadoop is commonly associated with Big Data, Big Data encompasses more than just Hadoop. Big Data requires not only technical changes but also cultural changes in how organizations approach data. Example applications of Big Data were presented, including a cloud-based electronic traceability system for semiconductor manufacturing and a research project aiming to profitably share vehicle diagnostic data across automotive partners while protecting private data. In conclusion, while Big Data applications are still developing, the concepts of collecting all available data now with the goal of analyzing it later have taken hold as storage and processing capabilities increase.
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
AstraZeneca share their experience of share their experience of building a knowledge graph platform and central service, to power the next generation of insights and analytics at AstraZeneca.
The rise of big data governance: insight on this emerging trend from active o...DataWorks Summit
Each of today’s most forward-thinking enterprises have been forced to face similar data challenges: the reliance on real-time data to better serve their customers and, subsequently, the requirement of complying with regulations to protect that data – one example being the General Data Protection Regulation (GDPR).
The solution to this emerging challenge is a tricky one – for companies like ING, this data governance challenge has been met with metadata, a consistent view across a large heterogeneous ecosystem and collaboration with an active open source community.
This joint presentation, John Mertic – Director of ODPi – and Ferd Scheepers – Global Chief Information Architect of ING – will address the benefits of a vendor-neutral approach to data governance, the need for an open metadata standard, along with insight around how companies ING, IBM, Hortonworks and more are delivering solutions to this challenge as an open source initiative.
Audience Takeaways include:
Understand the role of metadata;
Understand the need for a cross technology view on metadata;
Understand the role of Apache Atlas as a reference implementation; and
Understand the role of ODPi in offering value-added services including certification.
Speaker
John Mertic, Director of Program Management for ODPi, R Consortium, and Open Mainframe Project, The Linux Foundation
The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned, reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy centric data products (confidential computing) as well as integration with cloud services
The document introduces Tag.bio as a low-code analytics application platform built from interconnected data products in a data mesh architecture. It consists of data, algorithms, and analysis apps contributed by different groups - data engineers, data scientists, and domain experts. The platform can integrate various data sources and enable collaboration between groups. It then provides demos of the Tag.bio developer studio and data portal. Key capabilities discussed include integration with AWS services like AI/ML and HealthLake, as well as security features like confidential computing. Example use cases presented are for clinical trials, healthcare, life sciences, and universities.
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Inside Analysis
The Briefing Room with Robin Bloor and Pervasive Software
Slides from the Live Webcast on May 1, 2012
The old methods of delivering data for analysts and other business users will simply not scale to meet new demands. Hadoop is rapidly emerging as a powerful and economic platform for storing and processing Big Data. And yet, the biggest obstacle to implementing Hadoop solutions is the scarcity of Hadoop programming skills.
Check out this episode of The Briefing Room to learn from veteran Analyst Robin Bloor, who will explain why modern information architectures must embrace the new, massively parallel world of computing as it relates to several enterprise roles: traditional business analysts, data scientists, and line-of-business workers. He'll be briefed by David Inbar and Jim Falgout of Pervasive Software, who will explain how Pervasive RushAnalyzer™ was designed to accommodate the new reality of Big Data.
For more information visit: http://www.insideanalysis.com
Watch us on YouTube: http://www.youtube.com/playlist?list=PL5EE76E2EEEC8CF9E
Querying open data with R - Talk at April SheffieldR Users GpPaul Richards
Presentation given at the April SheffieldR meeting by Paul Richards, looking at how R fits into the open data philosophy and a few examples of packages to query open datasets
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...DataWorks Summit
Each of today’s most forward-thinking enterprises have been forced to face similar data challenges: the reliance on real-time data to better serve their customers and, subsequently, the requirement of complying with regulations to protect that data – one example being the General Data Protection Regulation (GDPR).
The solution to this emerging challenge is a tricky one – for companies like ING, this data governance challenge has been met with metadata, a consistent view across a large heterogeneous ecosystem and collaboration with an active open source community.
This joint presentation, John Mertic – director of program management for ODPi – and Ferd Scheepers – Global Chief Information Architect of ING – will address the benefits of a vendor-neutral approach to data governance, the need for an open metadata standard, along with insight around how companies ING, IBM, Hortonworks and more are delivering solutions to this challenge as an open source initiative.
Speakers
John Mertic, Director of Program Management for ODPi, R Consortium, and Open Mainframe Project, The Linux Foundation
Maryna Strelchuk, Information Architect, ING
Agile development of data science projects | Part 1 Anubhav Dhiman
This document discusses agile development of data science projects. It begins by defining data science as focusing on predicting, prescribing, or explaining something, distinct from business intelligence which focuses on reporting past events. It notes data science encompasses quantitative research, advanced analytics, predictive modeling, and machine learning. It then discusses how reliably data science teams can deliver value, showing a data science readiness level chart ranging from algorithm design to proven systems. The rest of the document discusses collaborating across teams and organizations to move from initial concepts to specific, integrated predictive systems.
The document provides information about a course on Big Data Analytics taught at Malla Reddy College of Engineering & Technology. It includes 5 units that will be covered: Introduction to Big Data and Analytics, Introduction to Technology Landscape, Introduction to MongoDB and MapReduce Programming, Introduction to Hive and Pig, and Introduction to Data Analytics with R. The course aims to introduce students to big data tools and information standard formats. It will cover topics such as structured and unstructured data, Hadoop, MongoDB, MapReduce, Hive, Pig, and machine learning algorithms.
The document provides information about a course on Big Data Analytics taught at Malla Reddy College of Engineering & Technology. It includes 5 units that will be covered: Introduction to Big Data and Analytics, Introduction to Technology Landscape, Introduction to MongoDB and MapReduce Programming, Introduction to Hive and Pig, and Introduction to Data Analytics with R. The course aims to introduce students to big data tools and information standard formats to help them design data for analytics and work with tools like Hadoop, Scala, and machine learning algorithms.
Knowledge Graphs: Changing How We Think About DataTim Williams
An introduction to Knowledge Graphs for FDA Department of Regulatory Science Technology Forum 2020-08-17. Includes SPARQL querying, SHACL, 3D visualization.
The document discusses transforming information into a liquid form and channeling liquid insights to the right people. It describes challenges with existing enterprise content management restricting access and increasing complexity from partnerships and information sources. The proposed approach is to create a centralized information hub and dashboard that simplifies access to information through search capabilities and links across disparate data sources using graph computing and controlled vocabularies. This will provide a 360-degree view of information and enable high accessibility, linkage, and flow of information to collaborators.
This document discusses opportunities for using the open source cBioPortal platform in a commercial setting. It summarizes The Hyve's experiences supporting cBioPortal for the Center for Translational Molecular Medicine's TraIT project. The Hyve provides professional support for open source bioinformatics software like cBioPortal through software development, data services, consultancy, and hosting. For translation projects, The Hyve employs a phased approach including definition, pilot, implementation, and evaluation phases to implement cBioPortal and demonstrate its capabilities for data integration and analysis.
This presentation was provided by Mark Hahnel of Figshare, during the NISO Hot Topic Virtual Conference "Building Access, Openness, and Sharing." The event was held on Wednesday, September 28, 2022.
The document discusses big data testing using the Hadoop platform. It describes how Hadoop, along with technologies like HDFS, MapReduce, YARN, Pig, and Spark, provides tools for efficiently storing, processing, and analyzing large volumes of structured and unstructured data distributed across clusters of machines. These technologies allow organizations to leverage big data to gain valuable insights by enabling parallel computation of massive datasets.
The document discusses the rise of big data and how organizations can leverage it. It defines big data as data that cannot be analyzed with traditional tools due to its large volume, velocity, and variety. It describes how technological advances have led to more data being generated and collected from a variety of sources. The document advocates that organizations must find ways to analyze all this data to gain valuable insights that can improve decision making, customer experiences, and business strategies. It provides several examples of how companies in different industries have successfully used big data analytics.
Similar to The crusade for big data in the AAL domain (20)
EIP-AHA: Towards Platform InteroperabilityAALForum
The document summarizes two sessions from an EIP-AHA meeting focused on platform interoperability. The first session presented requirements from various organizations, including a need for unified IoT services across homes, open service platforms, and interoperability profiles. The second session featured presentations on different platforms and architectures, including Allseen, OneM2M, FIWARE, and universAAL. Discussions addressed issues like scalability, liability, quality of service, and privacy. Participants agreed more work is needed on semantic and platform interoperability, and that continued discussions could help progress these issues in the context of future IoT initiatives.
Smart engagement for smart solutions: innovative methods of involving users i...AALForum
Presentation by Vesna Dolničar and Edwin Mermans during the session 'Smart engagement for smart solutions: innovative methods of involving users in developing ICT for AAL' (Vesna Dolničar and Edwin Mermans) - AAL Forum 2015
Requirements meet solutions: How to successfully transfer stakeholder needs i...AALForum
Presentation by Markus Garschall, Katja Neureiter, Mona Marill, Christiane Moser and Lex van Velsen during the session 'Requirements meet solutions: How to successfully transfer stakeholder needs in AAL projects' (Markus Garschall) - AAL Forum 2015
Visual Monitoring of People in Private SpacesAALForum
Presentation by Francisco Flórez-Revuelta during the session 'Monitoring People in Private Spaces: technological advances and societal issues' (Francisco Flórez-Revuelta) - AAL Forum 2015
Unobtrusive monitoring of patients with dementia in nursing homes facilities:...AALForum
Presentation by Carlos Chiatti, Susanna Spinsante, Ennio Gambi,
Lorena Rossi and Laura Raffaeli during the session 'Monitoring People in Private Spaces: technological advances and societal issues' (Francisco Flórez-Revuelta) - AAL Forum 2015
Legal considerations on the use of monitoring systems at homeAALForum
Presentation by Griet Verhenneman during the session 'Monitoring People in Private Spaces: technological advances and societal issues' (Francisco Flórez-Revuelta) - AAL Forum 2015
Interoperability defined by its reason d'êtreAALForum
Presentation by Paul Valckenaers and Patrick De Maziére during the workshop Interoperability defined by its reason d'être by Paul Valckenaers - AAL Forum 2015
- Middelpunt is a holiday care center in Belgium that provides accommodation and care assistance for people with disabilities. It aims to be a hotel run by and for people with disabilities.
- The 44 room facility cost €7.2 million to build and provides family rooms, rooms for groups, and full accessibility. Care assistance is provided through partnerships with local medical services.
- The business plan focuses on room reservations as the core activity while ensuring costs such as infrastructure, replacements, and sustainable energy are managed over the 27 year period. The goal is for Middelpunt to become a sustainable social enterprise providing holiday care.
From construction & concrete to valued and smart ageingAALForum
Presentation by Piet Verhoeve during the workshop From construction & concrete to valued smart ageing inspired by the PRoF consortium by Piet Verhoeve - AAL Forum 2015
Presentation by Serge Lefevere during the workshop From construction & concrete to valued smart ageing inspired by the PRoF consortium by Piet Verhoeve - AAL Forum 2015
ICT for Active and Healthy Ageing: Requirements for platforms and interoperab...AALForum
This document discusses ICT solutions for active and healthy aging. It outlines several key areas that ICT can support, including health monitoring, prevention, well-being, social inclusion, and daily living. The European Innovation Partnership on Active and Healthy Ageing aims to improve quality of life for European citizens through collaborative innovation in healthcare and a sustainable care system. The partnership focuses on specific actions across sectors to engage stakeholders and support three pillars: prevention, care and cure, and independent living and active aging. The document also discusses requirements for platforms and interoperability to effectively deliver ICT aging solutions.
As Mumbai's premier kidney transplant and donation center, L H Hiranandani Hospital Powai is not just a medical facility; it's a beacon of hope where cutting-edge science meets compassionate care, transforming lives and redefining the standards of kidney health in India.
Satisfying Spa Massage Experience at Just 99 AED - Malayali Kerala Spa AjmanMalayali Kerala Spa Ajman
Our Spa Massage Center Ajman prioritizes efficiency to ensure a satisfying massage experience for our clients at Malayali Kerala Spa Ajman. We offer a hassle-free appointment system, effective health issue identification, and precise massage techniques.
Our Spa in Ajman stands out for its effectiveness in enhancing wellness. Our therapists focus on treating the root cause of issues, providing tailored treatments for each client. We take pride in offering the most satisfying Pakistani Spa service, adjusting treatment plans based on client feedback.
For the most result-oriented Russian Spa treatment in Ajman, visit our Massage Center. Our Russian therapists are skilled in various techniques to address health concerns. Our body-to-body massage is efficient due to individualized care and high-grade massage oils.
Research, Monitoring and Evaluation, in Public Healthaghedogodday
This is a presentation on the overview of the role of monitoring and evaluation in public health. It describes the various components and how a robust M&E system can possitively impact the results or effectiveness of a public health intervention.
At Malayali Kerala Spa Ajman, Full Service includes individualized care for every client. We specifically design each massage session for the individual needs of the client. Our therapists are always willing to adjust the treatments based on the client's instruction and feedback. This guarantees that every client receives the treatment they expect.
By offering a variety of massage services, our Ajman Spa Massage Center can tackle physical, mental, and emotional illnesses. In addition, efficient identification of specific health conditions and designing treatment plans accordingly can significantly enhance the quality of massaging.
At Malayali Kerala Spa Ajman, we firmly believe that everyone should have the option to experience top-quality massage services regularly. To achieve that goal we offer cheap massage services in Ajman.
If you are interested in experiencing transformative massage treatment at Malayali Kerala Spa Ajman, you can use our Ajman Massage Center WhatsApp Number to schedule your next massage session.
Contact @ +971 529818279
Visit @ https://malayalikeralaspaajman.com/
Test bank clinical nursing skills a concept based approach 4e pearson educati...rightmanforbloodline
Test bank clinical nursing skills a concept based approach 4e pearson education
Test bank clinical nursing skills a concept based approach 4e pearson education
Test bank clinical nursing skills a concept based approach 4e pearson education
Basics of Electrocardiogram
CONTENTS
●Conduction System of the Heart
●What is ECG or EKG?
●ECG Leads
●Normal waves of ECG.
●Dimensions of ECG.
● Abnormalities of ECG
CONDUCTION SYSTEM OF THE HEART
ECG:
●ECG is a graphic record of the electrical activity of the heart.
●Electrical activity precedes the mechanical activity of the heart.
●Electrical activity has two phases:
Depolarization- contraction of muscle
Repolarization- relaxation of muscle
ECG Leads:
●6 Chest leads
●6 Limb leads
1. Bipolar Limb Leads:
Lead 1- Between right arm(-ve) and left arm(+ve)
Lead 2- Between right arm(-ve) and left leg(+ve)
Lead 3- Between left arm(-ve)
and left leg(+ve)
2. Augmented unipolar Limb Leads:
AvR- Right arm
AvL- Left arm
AvF- Left leg
3.Chest Leads:
V1 : Over 4th intercostal
space near right sternal margin
V2: Over 4th intercostal space near left sternal margin
V3:In between V2 and V4
V4:Over left 5th intercostal space on the mid
clavicular line
V5:Over left 5th intercostal space on the anterior
axillary line
V6:Over left 5th intercostal space on the mid
axillary line.
Normal ECG:
Waves of ECG:
P Wave
•P Wave is a positive wave and the first wave in ECG.
•It is also called as atrial complex.
Cause: Atrial depolarisation
Duration: 0.1 sec
QRS Complex:
•QRS’ complex is also called the initial ventricular complex.
•‘Q’ wave is a small negative wave. It is continued as the tall ‘R’ wave, which is a positive wave.
‘R’ wave is followed by a small negative wave, the ‘S’ wave.
Cause:Ventricular depolarization and atrial repolarization
Duration: 0.08- 0.10 sec
T Wave:
•‘T’ wave is the final ventricular complex and is a positive wave.
Cause:Ventricular repolarization Duration: 0.2 sec
Intervals and Segments of ECG:
P-R Interval:
•‘P-R’ interval is the interval
between the onset of ‘P’wave and onset of ‘Q’ wave.
•‘P-R’ interval cause atrial depolarization and conduction of impulses through AV node.
Duration:0.18 (0.12 to 0.2) sec
Q-T Interval:
•‘Q-T’ interval is the interval between the onset of ‘Q’
wave and the end of ‘T’ wave.
•‘Q-T’ interval indicates the ventricular depolarization
and ventricular repolarization,
i.e. it signifies the
electrical activity in ventricles.
Duration:0.4-0.42sec
S-T Segment:
•‘S-T’ segment is the time interval between the end of ‘S’ wave and the onset of ‘T’ wave.
Duration: 0.08 sec
R-R Interval:
•‘R-R’ interval is the time interval between two consecutive ‘R’ waves.
•It signifies the duration of one cardiac cycle.
Duration: 0.8 sec
Dimension of ECG:
How to find heart rhytm of the heart?
Regular rhytm:
Irregular rhytm:
More than or less than 4
How to find heart rate using ECG?
If heart Rhytm is Regular :
Heart rate =
300/No.of large b/w 2 QRS complex
= 300/4
=75 beats/mins
How to find heart rate using ECG?
If heart Rhytm is irregular:
Heart rate = 10×No.of QRS complex in 6 sec 5large box = 1sec
5×6=30
10×7 = 70 Beats/min
Abnormalities of ECG:
Cardiac Arrythmias:
1.Tachycardia
Heart Rate more than 100 beats/min
English Drug and Alcohol Commissioners June 2024.pptxMatSouthwell1
Presentation made by Mat Southwell to the Harm Reduction Working Group of the English Drug and Alcohol Commissioners. Discuss stimulants, OAMT, NSP coverage and community-led approach to DCRs. Focussing on active drug user perspectives and interests
2024 Media Preferences of Older Adults: Consumer Survey and Marketing Implica...Media Logic
When it comes to creating marketing strategies that target older adults, it is crucial to have insight into their media habits and preferences. Understanding how older adults consume and use media is key to creating acquisition and retention strategies. We recently conducted our seventh annual survey to gain insight into the media preferences of older adults in 2024. Here are the survey responses and marketing implications that stood out to us.
Digital Health in India_Health Informatics Trained Manpower _DrDevTaneja_15.0...DrDevTaneja1
Digital India will need a big trained army of Health Informatics educated & trained manpower in India.
Presently, generalist IT manpower does most of the work in the healthcare industry in India. Academic Health Informatics education is not readily available at school & health university level or IT education institutions in India.
We look into the evolution of health informatics and its applications in the healthcare industry.
HIMMS TIGER resources are available to assist Health Informatics education.
Indian Health universities, IT Education institutions, and the healthcare industry must proactively collaborate to start health informatics courses on a big scale. An advocacy push from various stakeholders is also needed for this goal.
Health informatics has huge employment potential and provides a big business opportunity for the healthcare industry. A big pool of trained health informatics manpower can lead to product & service innovations on a global scale in India.
At Malayali Kerala Spa Ajman we providing the top quality massage services for our customers.
Our massage center prioritizes efficiency to ensure a quality massage experience for our clients at Malayali Kerala Spa Ajman. We offer a convenient appointment system and precise massage services.
Reach us at Villa No 7, Near Ammar Bin Yasir Street Al Rashidiya 2 - Ajman - United Arab Emirates.
Phone : +971 529818279
The Importance of Black Women Understanding the Chemicals in Their Personal C...bkling
Certain chemicals, such as phthalates and parabens, can disrupt the body's hormones and have significant effects on health. According to data, hormone-related health issues such as uterine fibroids, infertility, early puberty and more aggressive forms of breast and endometrial cancers disproportionately affect Black women. Our guest speaker, Jasmine A. McDonald, PhD, an Assistant Professor in the Department of Epidemiology at Columbia University in New York City, discusses the scientific reasons why Black women should pay attention to specific chemicals in their personal care products, like hair care, and ways to minimize their exposure.
9. Data Integration in a
Big Data Context
Open PHACTS Case Study
Alasdair J G Gray
A.J.G.Gray@hw.ac.uk
alasdairjggray.co.uk
@gray_alasdair
10. Big Data
@gray_alasdair Big Data Integration 11
Volume Velocity
Variety Veracity
http://i.kinja-img.com/gawker-media/image/upload/lvzm0afp8kik5dctxiya.jpg
11. Open PHACTS Use Case
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
Chemical Properties (Chemspider)
Launched drugs (Drugbank)
Human => Mouse (Homologene)
Protein Families (Enzyme)
Bioactivty Data (ChEMBL)
… other info (Uniprot/Entrez etc.)
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
@gray_alasdair Big Data Integration 12
12. Open PHACTS Mission:
Integrate Multiple Research
Biomedical Data Resources
Into A Single Open & Free
Access Point
@gray_alasdair Big Data Integration 13
16. OPS Discovery Platform
@gray_alasdair Big Data Integration 17
Drug Discovery Platform
Apps
Domain API
Interactive
responses
Production quality
integration platform
Method
Calls
Standard Web
Technologies
17. App Ecosystem
@gray_alasdair
An “App Store”?
Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium
MOE Collector Cytophacts Utopia Garfield SciBite
KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna
Big Data Integration 18https://www.openphacts.org/2/sci/apps.html
21. API Hits
@gray_alasdair Big Data Integration 22
0
10
20
30
40
50
60
Jan
2013
Feb
2013
Mar
2013
Apr
2013
May
2013
June
2013
July
2013
Aug
2013
Sept
2013
Oct
2013
Nov
2013
Dec
2013
Jan
2014
Feb
2014
Mar
2014
Apr
2014
May
2014
June
2014
July
2014
Aug
2014
Sept
2014
Oct
2014
Nov
2014
Dec
2014
Jan
2015
Feb
2015
Mar
2015
Apr
2015
May
2015
June
2015
NoofHits
Millions
Month
Public launch
of 1.2 API
1.3 API 1.4 API 1.5 API
22. OPS Discovery Platform
Nanopub
Db
VoID
Data Cache
(Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public Ontologies
User
Annotations
Apps
@gray_alasdair Big Data Integration 23
24. John Wilbanks consulted for us
A framework built around STANDARD well-understood
Creative Commons licences – and how they interoperate
Deal with the problems by:
Interoperable licences
Appropriate terms
Declare expectations to users and
data publishers
One size won‘t fit all requirements
Data Licensing (Or Lack Of!)
28. P12047
X31045
GB:29384
Identity Mapping
@gray_alasdair Big Data Integration 29
Andy Law's Third Law
“The number of unique identifiers
assigned to an individual is never less
than the number of Institutions
involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws/
30. Gleevec®: Imatinib Mesylate
@gray_alasdair Big Data Integration 31
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N
Are these records the same?
It depends upon your task!
31. Big Data Integration 32
skos:exactMatch
(InChI)
Strict Relaxed
Analysing Browsing
Structure Lens
@gray_alasdair
I need to perform an analysis, give me
details of the active compound in Gleevec.
32. Big Data Integration 33
skos:closeMatch
(Drug Name)
skos:closeMatch
(Drug Name)
skos:exactMatch
(InChI)
Strict Relaxed
Analysing Browsing
Name Lens
@gray_alasdair
Which targets are known to interact
with Gleevec?
37. Open PHACTS Approach
1. Know your audience
Web developers
2. Understand your use cases
Prioritised business questions
3. Identify access pathways
Identify data
Identify connections
Implement API
@gray_alasdair Big Data Integration 40
38. Questions
Alasdair J G Gray
A.J.G.Gray@hw.ac.uk
alasdairjggray.co.uk
@gray_alasdair
Open PHACTS
contact@openphacts.org
openphacts.org
@open_phacts
@gray_alasdair Big Data Integration 41
45. 4848
Some tips!
Delay your judgement
Be open to naive and crazy ideas
Openess & enthusiasm
Use associative thinking
Piggyback on ideas of others
46. 4949
Selection of ideas
• Summarize 3 key ideas
• How to select?
– Keep the goal in mind!
– Think in opportunities
– What are you enthusiastic about?
– Personal engagement
– What is needed in the short term?
– Most promising
47. 5050
Selection of ideas
• 5 Votes
• Put your name & e-mail on the sheet if you want to be involved in
working out the idea!
48. THANK YOU FOR YOUR
TIME
Contact me @ Femke.Ongenae@intec.ugent.be
Editor's Notes
A trend is emerging towards AAL services that are truly personalized. Modern AAL services need to be adapted to the needs and preferences of care receivers and they need to accurately take into account context specificities. Moreover, modern AAL services need to be designed in a way such that they offer added value to the care process.
A trend is emerging towards AAL services that are truly personalized. Modern AAL services need to be adapted to the needs and preferences of care receivers and they need to accurately take into account context specificities. Moreover, modern AAL services need to be designed in a way such that they offer added value to the care process.
In order to achieve true personalization and to evaluate the design of the services, large data sets based on real-life context and profiles are needed.
Living lab environments, such as the Care Living Labs (Zorgproeftuinen) in Flanders, Karolinska Living Lab in Sweden and CASALA (Centre forAffective Solutions for Ambient Living Awareness) in Ireland, have been set up in recent years to enable the collection of such real-life context and profile data. The valorization and dissemination of context-aware and personalized AAL services could be significantly stimulated, by allowing various parties to re-use these data sets in a user-friendly manner.
However, these data-sets are not readily available for further research or the development of novel services as several issues remain to be discussed with regard to a smart data sharing culture for AAL services, such as:
How to express which types of data are available from which living lab environment?
How do we achieve structured, exchangeable data?
How to maintain and express the quality and reliability of the data?
How can these different data sets easily be aligned?
How can these different data sets easily be shared and accessed, without too much effort?
How can these different data sets easily be shared and accessed, without legal constraints?
How to process and synthesize the data so that it is useful and usable by various stakeholders?
Can a payment model be set up for usage of the data and thus support the operation of the living lab?
What about the ethics and privacy related to these data sets?
Who or what should be the frontrunner in realizing this idea? How will this be organized?
What can we learn from other domains where sharing of big data sets has been made possible?
Deriving value from the data
Volume: More data than you can process – relative term; complexity of processing
Velocity: Data constantly being generated
Variety: Multiple sources, formats, models
Veracity: Accuracy of the data
Open PHACTS: Not dealt with Velocity, although it is a challenge for us
1 of 83 business driver questions
Took a team of 5 experienced researchers 6 hours to manually gather the answer
Start of the project couldn’t be answered by a computer system
6 months in 30s with prototype
now subsecond
Pharma are all accessing, processing, storing & re-processing external research data Big waste of resources
No competitive advantage
OPS: 29 partners including many major pharma
83 questions ranked and top 20 taken as target
18 of top 20
A platform for integrated pharmacology data
Relied upon by pharma companies
Public domain, commercial, and private data sources
Provides domain specific API
Making it easy to build multiple drug discovery applications: examples developed in the project
Not just in-house apps
Actively being used for different purposes
Public launch April 2013
Averaging 20 million hits a month from the start of 2015
38 million in the last 30 days
Heavy usage from pharma, academia, and biotech
500+ registered users
Import data into cache
Integration approach
Data kept in original model but cached centrally
API call translated to SPARQL query
Query expressed in terms of original data
Queries expanded by IMS to cover URIs of original datasets
Data provided by many publishers
Originally in many formats: relational, SD files and RDF
Worked closely with publishers
Data licensing was a major issue
Over 3 billion triples – 12 datasets
Hosted on beefy hardware; data in memory (aim)
Extensive memcaching
Pose complex queries to extract data
Interactions needed to satisfy use cases
Gradually added additional types of data and interactions
No standard units
Even in curated sources!
Feedback issues to data providers
Validation & Standardization Platform
Developed by Royal Society of Chemistry
http://bit.ly/NZF5VB
Example drug: Gleevec Cancer drug for leukemia
Lookup in three popular public chemical databases Different results
Chemistry is complicated, often simplified for convenience
Data is messy!
Are these records the same? It depends on what you are doing with the data!
Each captures a subtly different view of the world
Chemistry is complicated, often simplified for convenience
Data is messy!
Interested in physiochemical properties of Gleevec
Interested in biomedical and pharmacological properties
sameAs != sameAs depends on your point of view
Links relate individual data instances: source, target, predicate, reason.
Links are grouped into Linksets which have VoID header providing provenance and justification for the link.
Open for anybody
API grouped into theme areas
Two phase interaction:
Resolve thing to identifier
Retrieve data about the identifier
Sustainability
API -> queries
3 steps we’ll do the first two now...the others will be for after the workshop for the interested participants
Use the paper to write on, use the post-its, one idea per post-it
Make it easy for the moderator to group things