A podium abstract presented at AMIA 2016 Joint Summits on Translational Science. This discusses Data Café — A Platform For Creating Biomedical Data Lakes.
An update on the latest BioSharing work; including work with ELIXIR and NIH BD2K, also our survey to assess user needs (530 replies) and the work on the recommender tool
This is the presentation of DMAH workshop in conjunction with VLDB'17. This describes my work during my stay at Emory BMI.
More information: https://kkpradeeban.blogspot.com/2017/08/on-demand-service-based-big-data.html
An update on the latest BioSharing work; including work with ELIXIR and NIH BD2K, also our survey to assess user needs (530 replies) and the work on the recommender tool
This is the presentation of DMAH workshop in conjunction with VLDB'17. This describes my work during my stay at Emory BMI.
More information: https://kkpradeeban.blogspot.com/2017/08/on-demand-service-based-big-data.html
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...Databricks
Chesapeake Regional Information System for our Patients (CRISP) is a nonprofit healthcare information exchange (HIE) whose customers include states like Maryland and healthcare providers such as Johns Hopkins. CRISP’s work supports the local healthcare community by securely sharing the kind of data that facilitates care and improves health outcomes.
When the pandemic started, the Maryland Department of Health reached out to CRISP with a request: Get us the demographic data we need to track COVID-19 and proactively support our communities. As a result, CRISP employees spent long hours attempting to handle multiple data sources with complex data enrichment processes. To automate these requests, CRISP partnered with Slalom to build a data platform powered by Databricks and Delta Lake.
Using the power of the Databricks Lakehouse platform and the flexibility of Delta Lake, Slalom helped CRISP provide the Maryland Department of Health with near real-time reporting of key COVID-19 measures. With this information, Maryland has been able to track the path of the pandemic, target the locations of new testing sites, and ultimately improve access for vulnerable communities.
The work did not stop there—once CRISP’s customers saw the value of the platform, more requests starting coming in. Now, nearly one year since the platform was created, CRISP has processed billons of records from hundreds of data sources in an effort to combat the pandemic. Notable outcomes from the work include hourly contact tracing with data already cross-referenced for individual risk factors, automated reporting on COVID-19 hospitalizations, real-time ICU capacity reporting for EMTs, tracking of COVID-19 patterns in student populations, tracking of the vaccination campaign, connecting Maryland MCOs to vulnerable people who need to be prioritized for the vaccine, and analysis of the impact of COVID-19 on pregnancies.
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...OpenAIRE
Presentation from a joint FREYA and OpenAIRE webinar "New developments in the field of Persistent Identifiers" (PIDs) that covers OpenAIRE Content Acquisition Policy, role of PIDs in OpenAIRE, OpenAIRE Guidelines and their objectives, use of PIDs for different kinds of entities and provides some examples.
This is module 6 in the EDI Data Publishing training course. In this module, you will learn how to create quality metadata and be introduced to the landscape of data repositories and their functions.
Role of PIDs in connecting scholarly worksOpenAIRE
Presentation from a joint webinar FREYA and OpenAIRE: New developments in the field of Persistent Identifiers by Dr. Amir Aryani, Director, Research Graph Foundation
Presentation from a joint FREYA and OpenAIRE webinar "New developments in the field of Persistent Identifiers" (PID) on FREYA-WP3: New PID developments by Ketil Koop-Jakobsen, PANGAEA, Bremen University, Germany
9 facts about statice's data anonymization solutionStatice
Are you wondering if Statice has the right synthetic data solution for your needs? In this post, we discuss some of the advantages of working with our software. From integration to evaluation, our data anonymization solution has everything to fit your team’s requirements.
Data is produced at a phenomenal rate
Our ability to store has grown
Users expect more sophisticated information
How?
Objective: Fit data to a model
Potential Result: Higher-level meta information that may not be obvious when looking at raw data
Similar terms
Exploratory data analysis
Data driven discovery
Deductive learning
As a part of my CRM course we had the opportunity to work on the various functions of how CRM enables organizations. The aspect that we focused on was the Data collection, storage and access that CRM enables.
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
With the tons of bits of data around enterprises and the challenge to turn these data into knowledge, meaning is arguably in the systems of the best database holder.
Turning data pieces into actionable knowledge and data-driven decisions takes a good and reliable database. The RDF database is one such solution.
It captures and analyzes large volumes of diverse data while at the same time is able to manage and retrieve each and every connection these data ever get to enter in.
In our latest slides, you will find out why we believe RDF graph databases work wonders with serving information needs and handling the growing amounts of diverse data every organization faces today.
Introduction
Domain Expert
Goal identification and Data Understanding
Data Cleaning
Missing values
Noisy Data
Inconsistent Data
Data Integration
Data Transformation
Data Reduction
Feature Selection
Sampling Discretization
Role of Data Cleaning in Data WarehouseRamakant Soni
Data cleaning, also called data cleansing or scrubbing, deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data.
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...Databricks
Chesapeake Regional Information System for our Patients (CRISP) is a nonprofit healthcare information exchange (HIE) whose customers include states like Maryland and healthcare providers such as Johns Hopkins. CRISP’s work supports the local healthcare community by securely sharing the kind of data that facilitates care and improves health outcomes.
When the pandemic started, the Maryland Department of Health reached out to CRISP with a request: Get us the demographic data we need to track COVID-19 and proactively support our communities. As a result, CRISP employees spent long hours attempting to handle multiple data sources with complex data enrichment processes. To automate these requests, CRISP partnered with Slalom to build a data platform powered by Databricks and Delta Lake.
Using the power of the Databricks Lakehouse platform and the flexibility of Delta Lake, Slalom helped CRISP provide the Maryland Department of Health with near real-time reporting of key COVID-19 measures. With this information, Maryland has been able to track the path of the pandemic, target the locations of new testing sites, and ultimately improve access for vulnerable communities.
The work did not stop there—once CRISP’s customers saw the value of the platform, more requests starting coming in. Now, nearly one year since the platform was created, CRISP has processed billons of records from hundreds of data sources in an effort to combat the pandemic. Notable outcomes from the work include hourly contact tracing with data already cross-referenced for individual risk factors, automated reporting on COVID-19 hospitalizations, real-time ICU capacity reporting for EMTs, tracking of COVID-19 patterns in student populations, tracking of the vaccination campaign, connecting Maryland MCOs to vulnerable people who need to be prioritized for the vaccine, and analysis of the impact of COVID-19 on pregnancies.
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...OpenAIRE
Presentation from a joint FREYA and OpenAIRE webinar "New developments in the field of Persistent Identifiers" (PIDs) that covers OpenAIRE Content Acquisition Policy, role of PIDs in OpenAIRE, OpenAIRE Guidelines and their objectives, use of PIDs for different kinds of entities and provides some examples.
This is module 6 in the EDI Data Publishing training course. In this module, you will learn how to create quality metadata and be introduced to the landscape of data repositories and their functions.
Role of PIDs in connecting scholarly worksOpenAIRE
Presentation from a joint webinar FREYA and OpenAIRE: New developments in the field of Persistent Identifiers by Dr. Amir Aryani, Director, Research Graph Foundation
Presentation from a joint FREYA and OpenAIRE webinar "New developments in the field of Persistent Identifiers" (PID) on FREYA-WP3: New PID developments by Ketil Koop-Jakobsen, PANGAEA, Bremen University, Germany
9 facts about statice's data anonymization solutionStatice
Are you wondering if Statice has the right synthetic data solution for your needs? In this post, we discuss some of the advantages of working with our software. From integration to evaluation, our data anonymization solution has everything to fit your team’s requirements.
Data is produced at a phenomenal rate
Our ability to store has grown
Users expect more sophisticated information
How?
Objective: Fit data to a model
Potential Result: Higher-level meta information that may not be obvious when looking at raw data
Similar terms
Exploratory data analysis
Data driven discovery
Deductive learning
As a part of my CRM course we had the opportunity to work on the various functions of how CRM enables organizations. The aspect that we focused on was the Data collection, storage and access that CRM enables.
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
With the tons of bits of data around enterprises and the challenge to turn these data into knowledge, meaning is arguably in the systems of the best database holder.
Turning data pieces into actionable knowledge and data-driven decisions takes a good and reliable database. The RDF database is one such solution.
It captures and analyzes large volumes of diverse data while at the same time is able to manage and retrieve each and every connection these data ever get to enter in.
In our latest slides, you will find out why we believe RDF graph databases work wonders with serving information needs and handling the growing amounts of diverse data every organization faces today.
Introduction
Domain Expert
Goal identification and Data Understanding
Data Cleaning
Missing values
Noisy Data
Inconsistent Data
Data Integration
Data Transformation
Data Reduction
Feature Selection
Sampling Discretization
Role of Data Cleaning in Data WarehouseRamakant Soni
Data cleaning, also called data cleansing or scrubbing, deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data.
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsNitish Aggarwal
This presentation describes three contributions of my PhD work:
1. Distributional Semantics for Entity Relatedness (DiSER)
2. Wikipedia Features for Entity Recommendations (WiFER)
3. Non-Orthogonal Explicit Semantic Analysis (NESA) for Word Relatedness
Further, it presents some of our work in collaboration with IBM Watson and Yahoo Research.
Presented at the Workshop on the Potential of Social Media Tools and Data in the News Media Industry (SocMedNews) of the 6th International Conference on Weblogs and Social Media (ICWSM 12).
Linked data in the digital humanities skills workshop for realising the oppo...jodischneider
This workshop will introduce participants to Linked Data, a key semantic web technology, and its uses in the digital humanities. Through examples of Linked Data websites and applications, we will explore how Linked Data is being used by individual digital humanities scholars, by organisations such as the BBC and the Central Statistics Office, and by cultural heritage institutions worldwide. We will make comparisons to other approaches to structuring data (including markup and metadata approaches such as TEI and XML) and discuss best practices for creating and reusing Linked Data (such as the importance of identifiers and standard vocabularies). Participants will also be introduced to tools for creating and exploring Linked Data. The workshop will also include a hands-on exercise in creating Linked Data.
Linked Data in the Digital Humanities was a Skills Workshop
http://dri.ie/skills-workshops
part of Realising the Opportunities of Digital Humanities
http://dri.ie/realising-opportunities-digital-humanities
Presenters: Jodi Schneider and Michael Hausenblas
with support from
Stefan Decker, Nuno Lopes, and Bahareh Heravi
all of the Digital Enterprise Research Institute, National University of Ireland Galway
Brief presentation on social media conversations at the 4th Research Data Alliance (RDA) Plenary. Amsterdam, Sept. 21-24, 2014.
For more information see https://rd-alliance.org/plenary-meetings/fourth-plenary/communications-social-media.html
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...Ronak Shah
Protein-Protein interactions discovered by the existing high-throughput techniques contain very high amount of false positives. Here we present an SVM based approach to generate a model that is built on sequence and non-sequence based information of the interacting proteins. This model is used to assess the reliability of given protein-protein interactions. It was run on the interaction data of a pathogenic bacterium; Treponema pallidum (causes Syphilis in humans) obtained from Yeast two hybrid experiments. Various kernels were used for building the model and of all, Sigmoid kernel performed well when used with all the features combined with area under the receiver operating curve (ROC) as 0.53.
Data centers offer computational resources with various levels of guaranteed performance to the tenants, through differentiated Service Level Agreements (SLA). Typically, data center and cloud providers do not extend these guarantees to the networking layer. Since communication is carried over a network shared by all the tenants, the performance that a tenant application can achieve is unpredictable and depends on factors often beyond the tenant’s control.
We propose ViTeNA, a Software-Defined Networking-based virtual network embedding algorithm and approach that aims to solve these problems by using the abstraction of virtual networks. Virtual Tenant Networks (VTN) are isolated from each other, offering virtual networks to each of the tenants, with bandwidth guarantees. Deployed along with a scalable OpenFlow controller, ViTeNA allocates virtual tenant networks in a work-conservative system. Preliminary evaluations on data centers with tree and fat-tree topologies indicate that ViTeNA achieves both high consolidation on the allocation of virtual networks and high data center resource utilization.
2016 07 12_purdue_bigdatainomics_seandavisSean Davis
Newer, faster, cheaper molecular assays are driving biomedical research. I discuss the history of biomedical data including concepts of data sharing, hypothesis-driven vs generating research, and the potential to expand our thinking on biomedical research to be much more integrated through smart, creative, and open use of technologies and more flexible, longitudinal studies.
Industry Report: The State of Customer Data Integration in 2013Scribe Software Corp.
Report from Scribe Software that surveyed over 900 businesses worldwide, states customer data integration has become a core business issue as organizations struggle to attain the ideal of the connected enterprise and drive business value from IT investments while managing increasingly complex IT environments. “Businesses are struggling to reach the connected enterprise nirvana,” noted Lou Guercia, CEO of Scribe. “With the continued move to cloud and complex hybrid environments, the lack of integration between these systems is becoming clearer and significantly slowing business value.”
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive even though commercially clouds devote many more resources to data analytics than supercomputers devote to simulations.
Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on the Apache software stack that is well used in modern cloud computing.
We give some examples including clustering, deep-learning and multi-dimensional scaling.
One suggestion from this work is value of a high performance Java (Grande) runtime that supports simulations and big data
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
Geisinger Health System is well known in the healthcare community as a pioneer in data and analytics. We have had an Electronic Health Record (EHR) since 1996, and an Electronic Data Warehouse (EDW) since 2008. Much of daily and weekly operational reporting, as well as an abundance of ad hoc analytics, come from the EDW.
Approximately 18 months ago, the Data Management team implemented Hadoop in the Hortonworks Data Platform (HDP), and successes in implementation and development have proven to the organization that we should abandon the traditional EDW in favor of the Big Data (HDP) platform.
In less than 18 months, we stood up the platform, created a data ingestion pipeline, duplicated all source feeds from the EDW into HDP, and had several analytics developed with HDP and Tableau. Furthermore, we have exploited the new capabilities of the platform, where we use Natural Language Processing (NLP) to interrogate valuable (but previously hidden) clinical notes. The new platform has data that is modeled and governed, setting the stage to push Geisinger Health System from a pioneer to a leader in Big Data and Analytics.
This session will focus on Hortonworks Data Platform, covering data architecture, security, data process flow, and development. It is geared toward Data Architects, Data Scientists, and Operations/I.T. audiences.
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
Talk given at Two Sigma:
The Dataverse project, developed at Harvard's Institute for Quantitative Social Science since 2006, is a widely used software platform to share and archive data for research. There are currently more than 20 Dataverse repository installations worldwide, with the Harvard Dataverse repository alone hosting more than 60,000 datasets. Dataverse provides incentives to researchers to share their data, giving them credit through data citation and control over terms of use and access. In this talk, I'll discuss the Dataverse project, as well as related projects such as DataTags to share sensitive data and Cloud Dataverse to share Big Data.
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
Presented at The Hawaii International Conference on System Sciences by Hong-Mei Chen and Rick Kazman (University of Hawaii), Serge Haziyev (SoftServe).
Green Shoots:Research Data Management Pilot at Imperial College LondonTorsten Reimer
This presentation by Ian McArdle and Torsten Reimer was given at the 10th International Digital Curation Conference in London (10th February 2015). It describes a "Green Shoots" research data management pilot programme at Imperial College London.
RDBMS gave us table schemas. A table schema, which is an essential metadata component, gave us the power to validate data types, and enforce constraints. In the age of varying data and schema-less data stores, how can we enforce these rules and how can we leverage metadata (even in RDBMS) to empower data validity, code checks, and automation.
This is a brief background into Big data (data lake) to put in context the importance of metadata from a governance perspective and more especially in todays heterogeneous big data platforms.
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
Businesses often have to interact with different data sources to get a unified view of the business or to resolve discrepancies. These EDW data repositories are often large and complex, are business critical, and cannot afford downtime. This session will share best practices and lessons learned for building a Data Fabric on Spark / Hadoop / HIVE/ NoSQL that provides a unified view, enables a simplified access to the data repositories, resolves technical challenges and adds business value. Businesses often have to interact with different data sources to get a unified view of the business or to resolve discrepancies. These EDW data repositories are often large and complex, are business critical, and cannot afford downtime. This session will share best practices and lessons learned for building a Data Fabric on Spark / Hadoop / HIVE/ NoSQL that provides a unified view, enables a simplified access to the data repositories, resolves technical challenges and adds business value.
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
Keynote during BiDaTA 2013 in Genoa, a special track of the ADBIS 2013 conference. URL: http://dbdmg.polito.it/bidata2013/index.php/keynote-presentation
Using The Hadoop Ecosystem to Drive Healthcare InnovationDan Wellisch
Presentation delivered to the Chicago Technology For Value-Based Healthcare Meetup (https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/)
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Ben Blaiszik from University of Chicago and Argonne National Laboratory Data Science and Learning Division.
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
Keynote at Sixth International Workshop on Cloud Data Management CloudDB 2014 Chicago March 31 2014.
Abstract: We introduce the NIST collection of 51 use cases and describe their scope over industry, government and research areas. We look at their structure from several points of view or facets covering problem architecture, analytics kernels, micro-system usage such as flops/bytes, application class (GIS, expectation maximization) and very importantly data source.
We then propose that in many cases it is wise to combine the well known commodity best practice (often Apache) Big Data Stack (with ~120 software subsystems) with high performance computing technologies.
We describe this and give early results based on clustering running with different paradigms.
We identify key layers where HPC Apache integration is particularly important: File systems, Cluster resource management, File and object data management, Inter process and thread communication, Analytics libraries, Workflow and Monitoring.
See
[1] A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures, Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha and Geoffrey Fox, accepted in IEEE BigData 2014, available at: http://arxiv.org/abs/1403.1528
[2] High Performance High Functionality Big Data Software Stack, G Fox, J Qiu and S Jha, in Big Data and Extreme-scale Computing (BDEC), 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf
Hadoop for Bioinformatics: Building a Scalable Variant StoreUri Laserson
Talk at Mount Sinai School of Medicine. Introduction to the Hadoop ecosystem, problems in bioinformatics data analytics, and a specific use case of building a genome variant store backed by Cloudera Impala.
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However the same is not so true for data intensive problems even though commercial clouds presumably devote more resources to data analytics than supercomputers devote to simulations. We try to establish some principles that allow one to compare data intensive architectures and decide which applications fit which machines and which software.
We use a sample of over 50 big data applications to identify characteristics of data intensive applications and propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks. We consider hardware from clouds to HPC. Our software analysis builds on the Apache software stack (ABDS) that is well used in modern cloud computing, which we enhance with HPC concepts to derive HPC-ABDS.
We illustrate issues with examples including kernels like clustering, and multi-dimensional scaling; cyberphysical systems; databases; and variants of image processing from beam lines, Facebook and deep-learning.
Similar to Data Café — A Platform For Creating Biomedical Data Lakes (20)
Google Summer of Code (GSoC) is a remote open-source internship program funded by Google, for contributors to remotely work with an open source organization (and get paid) over a summer.
https://kkpradeeban.blogspot.com/2022/11/google-summer-of-code-gsoc-2023.html
GSoC 2022 comes with more changes and flexibility. This presentation aims to give an introduction to the contributors and what to expect this summer.
https://kkpradeeban.blogspot.com/2022/01/google-summer-of-code-gsoc-2022.html
GSoC 2022 comes with more changes and flexibility. This presentation aims to give an introduction to the contributors and what to expect this summer.
https://kkpradeeban.blogspot.com/2022/01/google-summer-of-code-gsoc-2022.html
Niffler is an efficient DICOM Framework for machine learning pipelines and processing workflows on metadata. It facilitates efficient transfer of DICOM images on-demand and real-time from PACS to the research environments, to run processing workflows and machine learning pipelines.
https://github.com/Emory-HITI/Niffler/
This is an introductory presentation to GSoC 2021. This year there were a few specific changes to GSoC compared to the past years. Specifically, workload and the student stipend have been made half in 2021 compared to the previous years.
We propose Niffler (https://github.com/Emory-HITI/Niffler), an open-source ML framework that runs in research
clusters by receiving images in real-time using DICOM protocol from hospitals' PACS.
This presentation aims to introduce GSoC to new mentors and mentoring organizations. More details - https://kkpradeeban.blogspot.com/2019/12/google-summer-of-code-gsoc-2020-for.html
An introductory presentation to Google Summer of Code (GSoC), focusing on the year 2020. More information can be found at https://kkpradeeban.blogspot.com/search/label/GSoC
The diversity of data management systems affords developers the luxury of building heterogeneous architectures to address the unique needs of big data. It allows one to mix-n-match systems that can store, query, update, and process data based on specific use cases. However, this heterogeneity brings
with it the burden of developing custom interfaces for each data management system. Existing big data frameworks fall short in mitigating these challenges imposed. In this paper, we present Bindaas, a secure and extensible big data middleware that offers uniform access to diverse data sources. By providing a RESTful web service interface to the data sources, Bindaas exposes query, update, store, and delete functionality of the data sources as data service APIs, while providing turn-key support for standard operations involving access control and audit-trails. The research community has deployed Bindaas in
various production environments in healthcare. Our evaluations highlight the efficiency of Bindaas in serving concurrent requests to data source instances with minimal overheads.
This is the 2nd defense of my Ph.D. double degree.
More details - https://kkpradeeban.blogspot.com/2019/08/my-phd-defense-software-defined-systems.html
The presentation slides of my Ph.D. thesis. For more information - https://kkpradeeban.blogspot.com/2019/07/my-phd-defense-software-defined-systems.html
The presentation slides of my Ph.D. thesis proposal ("CAT" as known in my university). I received a score of 18/20.
Supervisors:
Prof. Luís Veiga (IST, ULisboa)
Prof. Peter Van Roy (UCLouvain)
Jury:
Prof. Javid Taheri (Karlstad University)
Prof. Fernando Mira da Silva (IST, ULisboa)
This is my presentation at IFIP Networking 2018 in Zurich.
In this paper, we propose a cloud-assisted network as an alternative connectivity provider.
More details: https://kkpradeeban.blogspot.com/2018/05/moving-bits-with-fleet-of-shared.html
Services that access or process a large volume of data are known as data services. Big data frameworks consist of diverse storage media and heterogeneous data formats. Through their service-based approach, data services offer a standardized execution model to big data frameworks. Software-Defined Networking (SDN) increases the programmability of the network, by unifying the control plane centrally, away from the distributed data plane devices. In this paper, we present Software-Defined Data Services (SDDS), extending the data services with the SDN paradigm. SDDS consists of two aspects. First, it models the big data executions as data services or big services composed of several data services. Then, it orchestrates the services centrally in an interoperable manner, by logically separating the executions from the storage. We present the design of an SDDS orchestration framework for network-aware big data executions in data centers. We then evaluate the performance of SDDS through microbenchmarks on a prototype implementation. By extending SDN beyond data centers, we can deploy SDDS in broader execution environments.
https://kkpradeeban.blogspot.com/2018/04/software-defined-data-services.html
This is a poster I presented at ACRO Summer School at Karlstad University. This presents my PhD work.
More details: http://kkpradeeban.blogspot.com/2017/07/my-first-polygonal-journey.html
This is the presentation I did to the audience of EMJD-DC Spring Event 2017 Brussels to discuss my research. http://kkpradeeban.blogspot.be/2017/05/emjd-dc-spring-event-2017.html
Global launch of the Healthy Ageing and Prevention Index 2nd wave – alongside...ILC- UK
The Healthy Ageing and Prevention Index is an online tool created by ILC that ranks countries on six metrics including, life span, health span, work span, income, environmental performance, and happiness. The Index helps us understand how well countries have adapted to longevity and inform decision makers on what must be done to maximise the economic benefits that comes with living well for longer.
Alongside the 77th World Health Assembly in Geneva on 28 May 2024, we launched the second version of our Index, allowing us to track progress and give new insights into what needs to be done to keep populations healthier for longer.
The speakers included:
Professor Orazio Schillaci, Minister of Health, Italy
Dr Hans Groth, Chairman of the Board, World Demographic & Ageing Forum
Professor Ilona Kickbusch, Founder and Chair, Global Health Centre, Geneva Graduate Institute and co-chair, World Health Summit Council
Dr Natasha Azzopardi Muscat, Director, Country Health Policies and Systems Division, World Health Organisation EURO
Dr Marta Lomazzi, Executive Manager, World Federation of Public Health Associations
Dr Shyam Bishen, Head, Centre for Health and Healthcare and Member of the Executive Committee, World Economic Forum
Dr Karin Tegmark Wisell, Director General, Public Health Agency of Sweden
R3 Stem Cells and Kidney Repair A New Horizon in Nephrology.pptxR3 Stem Cell
R3 Stem Cells and Kidney Repair: A New Horizon in Nephrology" explores groundbreaking advancements in the use of R3 stem cells for kidney disease treatment. This insightful piece delves into the potential of these cells to regenerate damaged kidney tissue, offering new hope for patients and reshaping the future of nephrology.
The dimensions of healthcare quality refer to various attributes or aspects that define the standard of healthcare services. These dimensions are used to evaluate, measure, and improve the quality of care provided to patients. A comprehensive understanding of these dimensions ensures that healthcare systems can address various aspects of patient care effectively and holistically. Dimensions of Healthcare Quality and Performance of care include the following; Appropriateness, Availability, Competence, Continuity, Effectiveness, Efficiency, Efficacy, Prevention, Respect and Care, Safety as well as Timeliness.
Leading the Way in Nephrology: Dr. David Greene's Work with Stem Cells for Ki...Dr. David Greene Arizona
As we watch Dr. Greene's continued efforts and research in Arizona, it's clear that stem cell therapy holds a promising key to unlocking new doors in the treatment of kidney disease. With each study and trial, we step closer to a world where kidney disease is no longer a life sentence but a treatable condition, thanks to pioneers like Dr. David Greene.
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdfSachin Sharma
This content provides an overview of preventive pediatrics. It defines preventive pediatrics as preventing disease and promoting children's physical, mental, and social well-being to achieve positive health. It discusses antenatal, postnatal, and social preventive pediatrics. It also covers various child health programs like immunization, breastfeeding, ICDS, and the roles of organizations like WHO, UNICEF, and nurses in preventive pediatrics.
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdfSachin Sharma
Pediatric nurses play a vital role in the health and well-being of children. Their responsibilities are wide-ranging, and their objectives can be categorized into several key areas:
1. Direct Patient Care:
Objective: Provide comprehensive and compassionate care to infants, children, and adolescents in various healthcare settings (hospitals, clinics, etc.).
This includes tasks like:
Monitoring vital signs and physical condition.
Administering medications and treatments.
Performing procedures as directed by doctors.
Assisting with daily living activities (bathing, feeding).
Providing emotional support and pain management.
2. Health Promotion and Education:
Objective: Promote healthy behaviors and educate children, families, and communities about preventive healthcare.
This includes tasks like:
Administering vaccinations.
Providing education on nutrition, hygiene, and development.
Offering breastfeeding and childbirth support.
Counseling families on safety and injury prevention.
3. Collaboration and Advocacy:
Objective: Collaborate effectively with doctors, social workers, therapists, and other healthcare professionals to ensure coordinated care for children.
Objective: Advocate for the rights and best interests of their patients, especially when children cannot speak for themselves.
This includes tasks like:
Communicating effectively with healthcare teams.
Identifying and addressing potential risks to child welfare.
Educating families about their child's condition and treatment options.
4. Professional Development and Research:
Objective: Stay up-to-date on the latest advancements in pediatric healthcare through continuing education and research.
Objective: Contribute to improving the quality of care for children by participating in research initiatives.
This includes tasks like:
Attending workshops and conferences on pediatric nursing.
Participating in clinical trials related to child health.
Implementing evidence-based practices into their daily routines.
By fulfilling these objectives, pediatric nurses play a crucial role in ensuring the optimal health and well-being of children throughout all stages of their development.
Defecation
Normal defecation begins with movement in the left colon, moving stool toward the anus. When stool reaches the rectum, the distention causes relaxation of the internal sphincter and an awareness of the need to defecate. At the time of defecation, the external sphincter relaxes, and abdominal muscles contract, increasing intrarectal pressure and forcing the stool out
The Valsalva maneuver exerts pressure to expel faeces through a voluntary contraction of the abdominal muscles while maintaining forced expiration against a closed airway. Patients with cardiovascular disease, glaucoma, increased intracranial pressure, or a new surgical wound are at greater risk for cardiac dysrhythmias and elevated blood pressure with the Valsalva maneuver and need to avoid straining to pass the stool.
Normal defecation is painless, resulting in passage of soft, formed stool
CONSTIPATION
Constipation is a symptom, not a disease. Improper diet, reduced fluid intake, lack of exercise, and certain medications can cause constipation. For example, patients receiving opiates for pain after surgery often require a stool softener or laxative to prevent constipation. The signs of constipation include infrequent bowel movements (less than every 3 days), difficulty passing stools, excessive straining, inability to defecate at will, and hard feaces
IMPACTION
Fecal impaction results from unrelieved constipation. It is a collection of hardened feces wedged in the rectum that a person cannot expel. In cases of severe impaction the mass extends up into the sigmoid colon.
DIARRHEA
Diarrhea is an increase in the number of stools and the passage of liquid, unformed feces. It is associated with disorders affecting digestion, absorption, and secretion in the GI tract. Intestinal contents pass through the small and large intestine too quickly to allow for the usual absorption of fluid and nutrients. Irritation within the colon results in increased mucus secretion. As a result, feces become watery, and the patient is unable to control the urge to defecate. Normally an anal bag is safe and effective in long-term treatment of patients with fecal incontinence at home, in hospice, or in the hospital. Fecal incontinence is expensive and a potentially dangerous condition in terms of contamination and risk of skin ulceration
HEMORRHOIDS
Hemorrhoids are dilated, engorged veins in the lining of the rectum. They are either external or internal.
FLATULENCE
As gas accumulates in the lumen of the intestines, the bowel wall stretches and distends (flatulence). It is a common cause of abdominal fullness, pain, and cramping. Normally intestinal gas escapes through the mouth (belching) or the anus (passing of flatus)
FECAL INCONTINENCE
Fecal incontinence is the inability to control passage of feces and gas from the anus. Incontinence harms a patient’s body image
PREPARATION AND GIVING OF LAXATIVESACCORDING TO POTTER AND PERRY,
An enema is the instillation of a solution into the rectum and sig
Explore our infographic on 'Essential Metrics for Palliative Care Management' which highlights key performance indicators crucial for enhancing the quality and efficiency of palliative care services.
This visual guide breaks down important metrics across four categories: Patient-Centered Metrics, Care Efficiency Metrics, Quality of Life Metrics, and Staff Metrics. Each section is designed to help healthcare professionals monitor and improve care delivery for patients facing serious illnesses. Understand how to implement these metrics in your palliative care practices for better outcomes and higher satisfaction levels.
India Clinical Trials Market: Industry Size and Growth Trends [2030] Analyzed...Kumar Satyam
According to TechSci Research report, "India Clinical Trials Market- By Region, Competition, Forecast & Opportunities, 2030F," the India Clinical Trials Market was valued at USD 2.05 billion in 2024 and is projected to grow at a compound annual growth rate (CAGR) of 8.64% through 2030. The market is driven by a variety of factors, making India an attractive destination for pharmaceutical companies and researchers. India's vast and diverse patient population, cost-effective operational environment, and a large pool of skilled medical professionals contribute significantly to the market's growth. Additionally, increasing government support in streamlining regulations and the growing prevalence of lifestyle diseases further propel the clinical trials market.
Growing Prevalence of Lifestyle Diseases
The rising incidence of lifestyle diseases such as diabetes, cardiovascular diseases, and cancer is a major trend driving the clinical trials market in India. These conditions necessitate the development and testing of new treatment methods, creating a robust demand for clinical trials. The increasing burden of these diseases highlights the need for innovative therapies and underscores the importance of India as a key player in global clinical research.
Telehealth Psychology Building Trust with Clients.pptxThe Harvest Clinic
Telehealth psychology is a digital approach that offers psychological services and mental health care to clients remotely, using technologies like video conferencing, phone calls, text messaging, and mobile apps for communication.
Telehealth Psychology Building Trust with Clients.pptx
Data Café — A Platform For Creating Biomedical Data Lakes
1. 1
Data Café — A Platform For Creating
Biomedical Data Lakes
Pradeeban Kathiravelu1,2, Ameen Kazerouni2, Ashish Sharma2
1 Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
2 Department of Biomedical Informatics, Emory University, Atlanta, USA
www.sharmalab.info
2. 2
Data Landscape
for Precision Medicine
DATA
CHARACTERISTICS
• Large number of small datasets
• Structured…Semi-structured
…Unstructured…Ill formed
• Noisy and Fuzzy/Uncertain
• Spatial, Temporal relationships
DATA MANAGEMENT
• Variety in storage and messaging
protocols
• No shared interface
3. 3
Illustrative Use Case
Execute a Radiogenomics workflow on the diffusion images of GBM
patients who received a TMZ + experimental regimen with an overall
survival of 18months or more.
Execute a Radiogenomics workflow on the diffusion images of GBM
patients who received a TMZ + experimental regimen with an overall
survival of 18months or more
PACS + EMR + AIM + RT + Molecular
4. 4
Motivation
• Most current solutions require a DBA to initiate the migration of data into
a Data Warehousing environment
• to query and explore all the data at once.
• Costly to set up such warehouses.
• Unified warehouse with access to query and explore the data.
• Limitations
• Scalability and extensibility to incorporate new data sources
• A priori knowledge of the data models of the different data sources.
5. BIOMEDICAL DATA LAKES
• Cohort Discovery and Creation — Assembled per-study
• Heterogeneous data collected in a loosely structured fashion.
• Agile and easy to create.
• Integrate with data exploration/visualization via REST APIs.
• Problem or hypothesis specific virtual data set.
• Powered by Drill + HDFS, Data Sources via APIs.
6. 6
Data Café
• An agile approach to creating and extending the concept of a star
schema
• to model a problem/hypothesis specific dataset.
• by leveraging Apache Drill to easily query the data.
• Tackles the limitations in the existing approaches.
• Provides researchers the ability to add new data models and sources.
7. 7
Core Concepts
Step 1. Given a set of data sources,
create a graphical representation of
the join attributes.
This graph represents how data is
connected across the various data
sources
8. 8
Core Concepts
Step 2. Run a set of parallel queries on
the data sources that include the
attributes that are present in the
query graph.
In the top figure, our query is of type:
{id1: A1 > x and B2 == y}
We run similar queries across C, D and
E and retrieve the set of relevant id’s
(join attributes).
9. 9
Core Concepts
Step 3. Compute intersection across
the various id’s (join attributes). The
data of interest can now be obtained
using the id’s in this intersection.
A subsequent query will allow us to
stream, in parallel, data from
individual sources, given the relevant
ids (join attributes)
11. 11
Apache Drill
• Variety – Query a range of non-relational data sources.
• Flexibility.
• Agility – Faster Insights.
• Scalability.
12. 12
Evaluation Environment
• Data Café was deployed along with the data sources and Drill in Amazon
EC2.
• MongoDB instantiated in EC2 instances.
• Hive on Amazon EMR (Elastic MapReduce).
• EMR HDFS was configured with 3 nodes.
• Various datasets for evaluation
• Two synthetic datasets.
• Clinical Data from the TCGA BRCA collection
13. 13
Results
• Quick creation of data lakes
• without prior knowledge of the data schema.
• Very fast execution of large queries
• with Apache Drill.
• Data Café can be an efficient platform for exploring an integrated data
source.
• Integrated data source construction process may be time consuming.
• Less critical path.
• Done less frequently than the data queries from HDFS/Hive using Drill.
14. 14
Conclusion
• A novel platform for integrating multiple data sources.
• Without a priori knowledge of the data models of the sources that are being
integrated.
• Indices to do the actual integration
• Enables parallelizing the push of the actual data into HDFS.
• Apache Drill as a fast query execution engine that supports SQL.
• Currently ingesting data from TCGA.
15. 15
Current State and Future Plans
• Ongoing efforts to evaluate the platform with diverse and heterogeneous data
sources.
• Expanding to a larger multi-node distributed cluster.
• Integration with DataScope.
• Multiple data stores and larger data sets.
• Integration with imaging clients such as caMicroscope, as well as archives such
as The Cancer Imaging Archive (TCIA).
16. Acknowledgements
Google Summer of Code 2015
NCIP/Leidos 14X138, caMicroscope
— A Digital Pathology Integrative
Query System; Ashish Sharma PI
Emory/WUSTL/Stony Brook
NCI U01 [1U01CA187013-01],
Resources for development and
validation of Radiomic Analyses &
Adaptive Therapy, Fred Prior, Ashish
Sharma (UAMS, Emory)
The results published here are in part
based upon data generated by the
TCGA Research Network:
http://cancergenome.nih.gov/