Studying Public Medical Images from Open Access Literature and Social Networks for Model Training and Knowledge Extraction
Henning Müller, Vincent Andrearczyk, Oscar Jimenez, Anjani Dhrangadhariya
Multi-Label Modality Classification for Figures in Biomedical LiteratureAthanasios Lagopoulos
CBMS 2017 presentation on multi-Label modality classification for figures in biomedical literature. Presenting three different multi-label approaches to classify biomedical figures from PubMed Central and MEDIEVAL, a web application where you can easily search and filter by modality PMC figures.
Advances in Learning Analytics and Educational Data Mining MehrnooshV
This presentation is about the state-of-the-art of Learning Analytics and Edicational Data Mining. It is presented by Mehrnoosh Vahdat as the introductory tutorial of Special Session 'Advances in Learning Analytics and Educational Data Mining' at ESANN 2015 conference.
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where
data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge.Data mining is the process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehouse and the whole process is divded into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have reviewed Knowledge Discovery perspective in Data Mining and consolidated different areas of data
mining, its techniques and methods in it.
2011.10.10 Multi-Disciplinary Research Themes and TrainingNUI Galway
Dr Diane Payne, Director of the Dynamics Lab, Geary Institute, University College Dublin talked about the Geary Institute in this seminar "Multi-Disciplinary Research Themes and Training" at the Whitaker Institute on 10th October 2011.
Multi-Label Modality Classification for Figures in Biomedical LiteratureAthanasios Lagopoulos
CBMS 2017 presentation on multi-Label modality classification for figures in biomedical literature. Presenting three different multi-label approaches to classify biomedical figures from PubMed Central and MEDIEVAL, a web application where you can easily search and filter by modality PMC figures.
Advances in Learning Analytics and Educational Data Mining MehrnooshV
This presentation is about the state-of-the-art of Learning Analytics and Edicational Data Mining. It is presented by Mehrnoosh Vahdat as the introductory tutorial of Special Session 'Advances in Learning Analytics and Educational Data Mining' at ESANN 2015 conference.
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where
data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge.Data mining is the process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehouse and the whole process is divded into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have reviewed Knowledge Discovery perspective in Data Mining and consolidated different areas of data
mining, its techniques and methods in it.
2011.10.10 Multi-Disciplinary Research Themes and TrainingNUI Galway
Dr Diane Payne, Director of the Dynamics Lab, Geary Institute, University College Dublin talked about the Geary Institute in this seminar "Multi-Disciplinary Research Themes and Training" at the Whitaker Institute on 10th October 2011.
Learning Analytics: Seeking new insights from educational dataAndrew Deacon
CPUT Fundani TWT - 22 May 2014
Analytics is a buzzword that encompasses the analysis and visualisation of big data. Current interest results from the growing access to data and the many software tools now available to analyse this data in Higher Education, through platforms such as Learning Management Systems. This seminar provides an overview of current applications and uses of learning analytics and how it can help institutions of learning better support their learners. The illustrative examples look at institutional and social media data that together provide rich insights into institutional, teaching and learning issues. A few simple ways to perform such analytics in a context of Higher Education will be introduced.
Educational Data Mining/Learning Analytics issue brief overviewMarie Bienkowski
An overview of the Draft Issue Brief prepared by SRI International for the US Department of Education on Educational Data Mining and Learning Analytics
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...Yandex
Лекция одного из самых известных в России специалистов по машинному обучению Дмитрия Ветрова, который руководит департаментом больших данных и информационного поиска на факультете компьютерных наук, работающим во ВШЭ при поддержке Яндекса.
This is a North Central University PowerPoint presentation (EDR 8204-3). It is written in APA format, has been graded by an instructor(A), and includes references. Most education communities submit assignments to turnitin, so remember to paraphrase.
Case Studies in Teaching and Learning with Social Media in Higher EducationMichael Johnson
In this session the presenters shared best practices in using social media by presenting data derived from multiple case studies at a large university in the western United States. The researchers will discuss the effects of these technologies on students’ learning experiences, general principles for successful use of social media, challenges encountered by their use, and ideas for improving the use of social media in higher education courses from both the instructor and student perspectives.
For more information on our cases, see http://spreadsheets5.google.com/a/byu.edu/ccc?key=tponeuwhMQ-XEY2p0c5i02A&hl=en
Connections b/w active learning and model extractionAnmol Dwivedi
Codes on https://github.com/anmold-07/Model-Extraction-with-RL
https://www.usenix.org/conference/usenixsecurity20/presentation/chandrasekaran
This paper formalizes model extraction and discusses possible defense strategies by drawing parallels between model extraction and an established area of active learning. In particular, the authors show that recent advancements in the active learning domain can be used to implement powerful model extraction attacks and investigate possible defense strategies.
Data Management Lab: Data mapping exercise instructionsIUPUI
Spring 2014 Data Management Lab: Session 1 Data mapping exercise instructions (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
2016년 교육정보학회 동계학술대회 발표자료
일시: 2016년 1월 14일
장소: 광주교육대학교
<초록>
그동안 정보통신기술을 활용한 교육을 위해 수많은 디지털 자료와 콘텐츠들로 구성된 학습 환경을 경 험해 왔다. 그러한 학습 환경의 중심은 디지털 자원이었으며, 교육과정 정보는 제한적으로 자원의 메타데 이터 내에서 분류체계의 형태로만 활용되었다. 그러나 이러한 학습 환경은 개인의 필요나 학습자의 수준에 맞춘 자원들을 구성하는 데 한계가 있다. 이러한 제한을 극복하기 위해 교육과정과 성취기준을 링크드 데 이터로 발행하여 학습 자원을 연결하는 모델에 대한 연구를 추진하였다. 이 모델의 특징은 역량을 중심으 로 학습 자원을 재배치하는 접근법이다. 링크드 데이터로 발행된 성취기준들은 구조적.의미적으로 연결되 기 때문에 앞으로 개별화된 학습 경로를 추천할 때 기준이 되는 노드 정보로도 활용할 수 있다. 이 연구는 국내뿐 아니라 다른 나라의 성취기준과도 연결이 가능하고 풍부한 학습 자원을 주제 단위의 성취기준으로 탐색 및 활용할 수 있는 기반을 마련한 것이다.
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMSunayana Gawde
I made these slides for 1st round of 2nd semester M Tech seminars. These are based on the work done by DOCEAR team and research papers by them. I also referred other material on mind maps to understand the concept.
On March 23, 2016, Prof. Henning Müller (HES-SO Valais-Wallis and Martinos Center) presented Medical image analysis and big data evaluation infrastructures at Stanford medicine.
Presentation by Prof. Dr. Henning Müller.
Overview:
- Medical image retrieval projects
- Image analysis and 3D texture modeling
- Data science evaluation infrastructures (ImageCLEF, VISCERAL, EaaS – Evaluation as a Service)
- What comes next?
Learning Analytics: Seeking new insights from educational dataAndrew Deacon
CPUT Fundani TWT - 22 May 2014
Analytics is a buzzword that encompasses the analysis and visualisation of big data. Current interest results from the growing access to data and the many software tools now available to analyse this data in Higher Education, through platforms such as Learning Management Systems. This seminar provides an overview of current applications and uses of learning analytics and how it can help institutions of learning better support their learners. The illustrative examples look at institutional and social media data that together provide rich insights into institutional, teaching and learning issues. A few simple ways to perform such analytics in a context of Higher Education will be introduced.
Educational Data Mining/Learning Analytics issue brief overviewMarie Bienkowski
An overview of the Draft Issue Brief prepared by SRI International for the US Department of Education on Educational Data Mining and Learning Analytics
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...Yandex
Лекция одного из самых известных в России специалистов по машинному обучению Дмитрия Ветрова, который руководит департаментом больших данных и информационного поиска на факультете компьютерных наук, работающим во ВШЭ при поддержке Яндекса.
This is a North Central University PowerPoint presentation (EDR 8204-3). It is written in APA format, has been graded by an instructor(A), and includes references. Most education communities submit assignments to turnitin, so remember to paraphrase.
Case Studies in Teaching and Learning with Social Media in Higher EducationMichael Johnson
In this session the presenters shared best practices in using social media by presenting data derived from multiple case studies at a large university in the western United States. The researchers will discuss the effects of these technologies on students’ learning experiences, general principles for successful use of social media, challenges encountered by their use, and ideas for improving the use of social media in higher education courses from both the instructor and student perspectives.
For more information on our cases, see http://spreadsheets5.google.com/a/byu.edu/ccc?key=tponeuwhMQ-XEY2p0c5i02A&hl=en
Connections b/w active learning and model extractionAnmol Dwivedi
Codes on https://github.com/anmold-07/Model-Extraction-with-RL
https://www.usenix.org/conference/usenixsecurity20/presentation/chandrasekaran
This paper formalizes model extraction and discusses possible defense strategies by drawing parallels between model extraction and an established area of active learning. In particular, the authors show that recent advancements in the active learning domain can be used to implement powerful model extraction attacks and investigate possible defense strategies.
Data Management Lab: Data mapping exercise instructionsIUPUI
Spring 2014 Data Management Lab: Session 1 Data mapping exercise instructions (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
2016년 교육정보학회 동계학술대회 발표자료
일시: 2016년 1월 14일
장소: 광주교육대학교
<초록>
그동안 정보통신기술을 활용한 교육을 위해 수많은 디지털 자료와 콘텐츠들로 구성된 학습 환경을 경 험해 왔다. 그러한 학습 환경의 중심은 디지털 자원이었으며, 교육과정 정보는 제한적으로 자원의 메타데 이터 내에서 분류체계의 형태로만 활용되었다. 그러나 이러한 학습 환경은 개인의 필요나 학습자의 수준에 맞춘 자원들을 구성하는 데 한계가 있다. 이러한 제한을 극복하기 위해 교육과정과 성취기준을 링크드 데 이터로 발행하여 학습 자원을 연결하는 모델에 대한 연구를 추진하였다. 이 모델의 특징은 역량을 중심으 로 학습 자원을 재배치하는 접근법이다. 링크드 데이터로 발행된 성취기준들은 구조적.의미적으로 연결되 기 때문에 앞으로 개별화된 학습 경로를 추천할 때 기준이 되는 노드 정보로도 활용할 수 있다. 이 연구는 국내뿐 아니라 다른 나라의 성취기준과도 연결이 가능하고 풍부한 학습 자원을 주제 단위의 성취기준으로 탐색 및 활용할 수 있는 기반을 마련한 것이다.
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMSunayana Gawde
I made these slides for 1st round of 2nd semester M Tech seminars. These are based on the work done by DOCEAR team and research papers by them. I also referred other material on mind maps to understand the concept.
On March 23, 2016, Prof. Henning Müller (HES-SO Valais-Wallis and Martinos Center) presented Medical image analysis and big data evaluation infrastructures at Stanford medicine.
Presentation by Prof. Dr. Henning Müller.
Overview:
- Medical image retrieval projects
- Image analysis and 3D texture modeling
- Data science evaluation infrastructures (ImageCLEF, VISCERAL, EaaS – Evaluation as a Service)
- What comes next?
Automating Data Science over a Human Genomics Knowledge BaseVaticle
# Automating Data Science over a Human Genomics Knowledge Base
Radouane Oudrhiri, the CTO of Eagle Genomics, will talk about how Eagle Genomics is building a platform for automating data science over a human genomics knowledge base. Rad will dive into the architecture Eagle Genomics and also discuss how Grakn serves as the knowledge base foundation of the system. Rad also give a brief history of databases, semantic expressiveness and how Grakn fits in the big picture.
# Radouane Oudrhiri, CTO, Eagle Genomics
Radouane has an extensive experience in leading world-class software and data-intensive system developments in different industries from Telecom to Healthcare, Nuclear, Automotive, Financials. Radouane is Lean/Six Sigma Master Black Belt with speciality in high-tech, IT and Software engineering and he is recognised as the leader and early adaptor of Lean/Six Sigma and DFSS to IT and Software. He is a fellow of the Royal Statistical Society (RSS) and member of the ISO Technical Committee (TC69: Applications of Statistical methods) where he is co-author of the Lean & Six Sigma Standard (ISO 18404) as well as the new standard under development (Design for Six Sigma). He is also part of the newly formed international Group on Big Data (nominated by BSI as the UK representative/expert). Radouane has also been Chair of the working group on Measurement Systems for Automated Processes/Systems within the ISPE (International Society for Pharmaceutical Engineering).
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...Lauri Eloranta
Final lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Lauri Eloranta
Third lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
The state of the art in integrating machine learning into visual analyticsCagatay Turkay
Slides for my talk on our paper at EuroVis 2017 on the STAR track:
Endert, A., Ribarsky, W., Turkay, C., Wong, B.L., Nabney, I., Blanco, I.D. and Rossi, F., 2017, March. The state of the art in integrating machine learning into visual analytics. In Computer Graphics Forum.
http://openaccess.city.ac.uk/16739/
On April 11th 2016, Prof. Prof. Henning Müller (HES-SO Valais-Wallis and Martinos Center) presented Challenges in medical imaging and the VISCERAL model at National Cancer Institute in Washington.
Introduction to Big Data and its Potential for Dementia ResearchDavid De Roure
Presentation at Dementia Conference (Evington Initiative) held at Wellcome Trust, 22-23 October 2012. Acknowledgements to McKinsey & Company, also Tim Clark (MGH) and Iain Buchan (University of Manchester), for input to slides.
First Steps Towards a Risk of Bias Corpus of Randomized Controlled Trials. The risk of bias specifically pertains to systematic errors in the design, conduct, or reporting of a study that can potentially lead to a deviation from the true effect being measured.
Exploiting biomedical literature to mine out a large multimodal dataset of rare cancer studies. Presentation of Anjani K. Dhrangadhariya (Institute of Information Systems, HES-SO Valais-Wallis, Sierre) at SPIE Medical Imaging 2020.
Présentation de Prof. Yann Bocchi de l'institut informatique de gestion HES-SO Valais-Wallis à la Conférence TechnoArk 2020 sur le thème de l'industrie connectée.
Maria Tootell (Oprisko)
Risques opérationnels et le système de contrôle interne : les limites d’un tel système
Cyrille Reynard et Jean-Jaques Kohler (Oprisko)
Cas pratiques issus de la gestion des risques, applicables aux secteurs public ou privé
eGov Workshop – La plus-value du système de contrôle interne
Creating an optimal travel plan is not an easy task, particularly for people with mobility disabilities, for whom even simple trips, such as eating out in a restaurant, can be extremely difficult. Many of their travel plans need to be made days or even months in advance, including the route and time of day to travel. These plans must take into account ways in which to navigate the area, as well as the most suitable means of transportation. In response to these challenges, this study was designed to develop a solution that used linked data technologies in the domains of tourism services and e-governance to build a smart city application for wheelchair accessibility. This smart phone application provides useful travel information to enable those with mobility disabilities to travel more easily.
Ou quelques réflexions autour des comportements d’un leader stratégique qui semblent être sans valeurs mesurables mais qui sont certainement à haute valeur ajoutée pour l’équipe/entreprise/organisation.
Après une courte introduction qui va présenter une définition de leadership stratégique, cet atelier va se baser, comme fil rouge, sur les 10 principes communément admis du leadership stratégique (suite à une large étude de PWC). Pour chacun de ces principes, nous allons interagir avec les participant-e-s tant des comportements à (haute) valeur ajoutée que ceux plutôt toxiques ; puis débattre autour des indicateurs de mesures possibles (ou déjà expérimentés par les participants)
L’objectif principal est que chaque participant-e s’interroge sur son leadership stratégique et la valeur amenée dans l’entreprise/organisation et qu’il-elle soit parfois défié par le regard d’autres participant-e-s.
We propose a novel imaging biomarker of lung cancer relapse from 3-D texture analysis of CT images. Three-dimensional morphological nodular tissue properties are described in terms of 3-D Riesz-wavelets. The responses of the latter are aggregated within nodular regions by means of feature covariances, which leverage rich intra- and inter-variations of the feature space dimensions. The obtained Riesz-covariance descriptors lie on a manifold governed by Riemannian geometry requiring specific geodesic metrics to locally approximate scalar products. The latter are used to construct a kernel for support vector machines (SVM). The effectiveness of the presented models is evaluated on a dataset of 92 patients with non-small cell lung carcinoma (NSCLC) and cancer recurrence information. Disease recurrence within a timeframe of 12 months could be predicted with an accuracy above 80, and highlighted the importance of covariance-based texture aggregation. At the end of the talk, computer tools will be presented to easily extract 3D radiomics quantitative features from PET-CT images.
Dans le cadre des Swiss Mobility Days organisés à Martigny (Suisse) en avril 2016, Yann Bocchi, Prof. à l'institut Informatique de Gestion de la HES-SO Valais-Wallis, présente le projet NOSE (Nomadic, Modular and Scalable IT Ecosystem for Pervasive Sensing).
At the Knime Berlin summit 2016, Prof. Dr. Dominique Genoud presented a novel way to implement a KNIME workflow that perform machine learning and signal processing on an Android platform. The use case was to detect soft falls (not from a standing position) using an Android watch. This application has a big impact on how we can detect automatically when elderly people fall from their bed of their chair. This work was originally based on the Master Thesis in Business Administration realized by Vincent Cuendet in 2015 at the HES-SO with the help of the FST (Fédération Suisse pour les Téléthèses), an organization that helps disabled and elderly people to keep their autonomy.
Presented by Adrien Depeursinge, PhD, at MICCAI 2015 Tutorial on Biomedical Texture Analysis (BTA), Munich, Oct 5 2015.
Texture-based imaging biomarkers complement focal, invasive biopsy based biomarkers by providing information on tissue structure over broad regions, non-invasively, and repeatedly across multiple time points. Texture has been used to predict patient survival, tissue function, disease subtypes and genomics (imagenomics and radiogenomics). Nevertheless, several challenges remain, such as: the lack of an appropriate framework for multi-scale, multi-spectral analysis in 2D and 3D; localization uncertainty of texture operators; validation; and, translation to routine clinical applications.
Mocodis is a web application facilitating the transfer of skills between senior and junior associates. It can be used in companies, institutions to capitalize on the experience of older employees, or can be used to train employees top down. Mocodis automatically generates dynamic micro-courses combining text, audio and video resources, and uses an algorithm to analyze user satisfaction to produce better courses at the next request.
This paper aims at reporting on the findings of two quantitative studies and one qualitative study conducted among HES-SO undergraduate and graduate students. We have outlined the characteristics of the “digital natives” generation of students attending our courses and have submitted a sample of these students to an experiment using the Google Glass, in order to assess whether the use of this new device could meet the students’ expectations for accessing enriched learning resources. This paper also presents some thoughts for consideration regarding future research to be lead in the field of innovative technologies and learning processes
This work presents a data-intensive solution to predict Photovoltaïque energy (PV) production.
PV and other renewable sources have widely spread in recent years. Although those sources provide an environmentally-friendly solution, their integration is a real challenge in terms of power management as it depends on meteorological conditions. The ability to predict those variable sources considering meteorological uncertainty plays a key role in the management of the energy supply needs and reserves.
This paper presents an easy-to-use methodology to predict PV production using time series analyses and sampling algorithms. The aim is to provide a forecasting model to set the day-ahead grid electricity need. This information is useful for power dispatching plans and grid charge control. The main novelties of our approach is to provide an easy implemented and flexible solution that combines classification algorithms to predict the PV plant efficiency considering weather conditions and nonlinear regression to predict weather forecasted errors in order to improve prediction results.
The results are based on the data collected in the Techno-pôle’s microgrid in Sierre (Switzerland) described further in the paper.
The best experimental results have been obtained using hourly historical weather measures (radiation and temperature) and PV production as training inputs and weather forecasted parameters as prediction inputs. Considering a 10 month dataset and despite the presence of 17 missing days, we achieve a Percentage Mean Absolute Deviation (PMAD) of 20% in August and 21% in September. Better results can be obtained with a larger dataset but as more historical data were not available, other months have not been tested.
Switzerland is one of the most desirable European destinations for Chinese tourists, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leading social media platform – Sina Weibo, has more than 600 million users. Weibo’s great market penetration suggests that tourism operators and markets need to understand how to build effective and sustainable communications on Chinese social media platforms. The goal of this research is to understand Chinese tourists’ behaviors and patterns in Switzerland by adopting a linked data approach on Sina Weibo, and to design a decision support system based on the findings.
How social media could be used to interpret the satisfaction of clients visiting a destination based on real use cases?
More than just to communicate with clients, this analysis let the resort analysing the effective tourist needs and hope when he is coming in this tourism hotspot (Mountain Bike, Ski, …). With a semantic approach, it is possible to know what the interests of tourists are when they are travelling in a specific region.
Cette formation a été donnée dans le cadre de la 1ère Université de la Valeur qui s'est tenue à l'Université de Genève du 31 août au 4 septembre 2015.
Après une présentation des principes de la méthodologie agile et des concepts principaux de management agile, les participant-e-s ont effectué un travail de clarification de leurs valeurs et de leur alignement avec celles de leur entreprise.
Plan de la formation :
- Principes de l’agilité
- Concepts de management agile
- Détermination de ses valeurs et de celles de l’entreprise
- Alignement des valeurs
- Exercice de co-création de la suite de la journée (application d’un outil de l’agilité)
More from Institute of Information Systems (HES-SO) (20)
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Studying Public Medical Images from Open Access Literature and Social Networks for Model Training and Knowledge Extraction
1. Studying Public Medical Images from the Open Access
Literature and Social Networks for Model Training and
Knowledge Extraction
Vincent Andrearczyk
HES-SO, Switzerland
MMM 2020, 08.01.2020
Henning Müller, Vincent Andrearczyk, Oscar Jimenez, Anjani Dhrangadhariya,
Roger Schaer, and Manfredo Atzori
2. Motivation
• Deep learning has been a driving force for
improving many applications of image analysis
• Complex networks require large amounts of
training data
- Data diversity is important for generalizability
• Most medical data sets have strong class
imbalances (rare diseases)
- Rare diseases require data from multiple centers
making the organization complex
• Many resources that include images have become
available in the past few years
- PubMed Central, TCIA, social networks, etc.
3. Objectives of this article
• Summarize existing approaches that harvest
public data
– Focusing on PubMed Central and social networks
• Highlight advantages and difficulties in exploiting
the data
– (+) Very diverse data
– (+) Rare cases are oversampled
– (-) Much pre-treatment and filtering is required
• Develop next steps required to fully use the data
4. PubMed Central
• Repository with the biomedical open access
literature, including images as files, etc.
– 3-4 images per article,
5. PubMed Central
• Repository with the biomedical open access
literature, including images as files, etc.
– 3-4 images per article,
– increasing # articles
6. Methodology for finding articles
• Analysis of tasks of ImageCLEF and work done
on these tasks using data from ImageCLEF
– Over the past 12 years
– Steps of filtering out data taken from this
• Use of Google scholar to add references
– Terms “medical image classification”, “publicly
accessible resources”, “medical literature”,
“machine learning” were combined
• Dynamically growing data sets were favored
• Journal papers were referenced over
conference publication
7. Image retrieval
• Allows to search for images with text
– Or semantic terms such as UMLS or MeSH
• Content-based image retrieval
Demner-Fushman, et al. (2012), Journal of Computing Science and Engineering
8. Structuring the visual content
• Define types of images to make the literature
images classifiable
– Extremely large variety in most categories
– Many sub-categories are possible
– Categories with clinical relevance
are most important
– Allows removing noise
– Compound figures
are separately treated
[ImageCLEF 2013]
9. Challenges in the data
• Look-alikes
– Much strange content that needs to be removed
10. Challenges in the data
• Look-alikes
– Much strange content that needs to be removed
• Compound figures can not easily be classified,
as they may contain aspects of several classes
– Cutting them into subfigures makes content
accessible
11. Meta data available for PMC
• Text of the figure caption
– Relatively specific but often short
– Hard for compound figures that contain many parts
• Full text of the article
– Non specific for individual figures
– Location of the figure is available
• Article title and author-generated key words
• Global MeSH terms (Manually attached)
– Cover species and organs
• Not all is available for all articles (incomplete)
12. Tasks to make figures accessible
• Removing very small images & strange aspect
ratios
• Classify figures into figure types
– Using image data and also text
– Remove non-relevant images, e.g. flowcharts
• Detect and cut compound figures into their parts
– Classify these into figure types again
• Filter human and animal tissue
• Filter specific organs of interest
• Find diseases or grading/staging
– Ground truth classes for machine learning
13. Advantages of literature images
• Rare images are generally used for articles and
case descriptions
– Mostly extreme cases to share the knowledge
on them
– Creates critical mass for rare diseases
• Images are from many laboratories and thus
contain many image variations
– Increase generalizability of learned models
• Exponentially increasing content
14. Problems with filtered images
• Many images might be missed by automatic
filtering
• Ground truth is not always solid
• Images might not have clinical quality
– Grey level resolution
– No information on level/window setting
– Cropped images, arrows in images, other overlays
• Size of the images is often small for publications
• Scale of images is not known (can be detected)
Otalora et al. (2018) MICCAI 2018
15. An example of Twitter images
• Images and information posted by pathologists on
Twitter
• Create dataset of histopathology images
• Train machine learning algorithms
– identify stains (H&E, IHC ...)
– discriminate between different tissues
– predict malignant tumors
• Limitations:
– good results (AUROC 0.9) only for simple tasks: H&E
vs rest
Schaumberg et al. (2018), BioRxiv
16. Next steps
• Quickly increasing content offers many possibilities
– Automatic pipelines need to contain update
mechanisms based on latest imaging equipment
– Community efforts for data curation
• Distribute the class labels with confidence scores
via PMC
• Evaluate impact on machine learning tasks of
adding such diverse sources
17. Next steps
• We have been working on it!
– Mined out 32,486 light microscopy human rare
cancer images Dhrangadhariya et al. (2020) SPIE2020
– Automatic generalizable filtering pipeline
In preparation: Jimenez et al. (2020) Journal of the American Medical Informatics Association
– Benefits in deep learning clinical tasks … to come
18. Conclusions
• Images from public resources are complementary to
clinical images for machine learning
– Rare cases, much diversity
– Very large amount of data
• How can we obtain high quality annotations with
limited effort (for example via active learning)
19. Contact
• More information can be found at
– http://medgift.hevs.ch/
– http://publications.hevs.ch
• Contact:
– vincent.andrearczyk@hevs.ch
– henning.mueller@hevs.ch