Brainomics is an open source management system for exploring and merging heterogeneous brain mapping data. It is based on CubicWeb, an open-source semantic web framework. Brainomics allows modeling of neuroimaging scans, questionnaires, genetic data, and other study information. It can handle large datasets with over 10,000 subjects. Future work includes improving support for transferring large files and better integration with reference databases.
A quick summary of some of the highlights of SXSWi 2009 for the Refresh Columbia meetup on March 25th, 2009. All sketch notes featured in the presentation were created by Mike Rohde of http://rohdesign.com and used with his permission.
This presentation was part of a workshop held at Arvetica. It is a general introduction to strategic thinking for those unfamiliar with the field and guides through the schools of strategic thinking, gives a better understanding of dateless strategy icons and management gurus of our time. Learn how their ideas apply to your business setting and your daily work in order to improve your strategic performance.
A quick summary of some of the highlights of SXSWi 2009 for the Refresh Columbia meetup on March 25th, 2009. All sketch notes featured in the presentation were created by Mike Rohde of http://rohdesign.com and used with his permission.
This presentation was part of a workshop held at Arvetica. It is a general introduction to strategic thinking for those unfamiliar with the field and guides through the schools of strategic thinking, gives a better understanding of dateless strategy icons and management gurus of our time. Learn how their ideas apply to your business setting and your daily work in order to improve your strategic performance.
Discover BigQuery ML, build your own CREATE MODEL statementMárton Kodok
With BigQuery ML, you can build machine learning models without leaving the database environment and training it on massive datasets. In this demo session we are going to demonstrate common marketing Machine Learning use cases of how to build, train, eval, and predict, your own scalable machine learning models using SQL language in Google BigQuery and to address the following use cases: - Customer Segmentation + Product cross sale recommendation - Conversion/Purchase prediction - Inference with other in-built >20 models The audience will get first-hand experience with how to write CREATE MODEL sql syntax to build machine learning models such as: - Multiclass logistic regression for classification - K-means clustering - Matrix factorization - ARIMA time series predictions ... and more Models are trained and accessed in BigQuery using SQL — a language data analysts know. This enables business decision-making through predictive analytics across the organization without leaving the query editor. In the end, the audience will learn how everyday developers can build/train/run their own machine-learning models straight from the database query editor, by issuing CREATE MODEL statements
databricks ml flow demonstration using automatic features engineeringMohamed MEJDOUBI
demonstration of using featuretools package to generate features / aggregates from raw relational data, and using ml flow to track entire model building & hyperparams optimization
On the relation between Model View Definitions (MVDs) and Linked Data technol...Ana Roxin
This white paper outlines the proposals from the Linked Data Working Group (LDWG) on how technologies and approaches that are common to the domains of Semantic Web, Linked Data, and the Web of Data (hereafter jointly called ‘Linked Data’) are related to Model View Definitions (MVDs). After a brief introduction of both the MVD concept (Section 1) and linked data technologies (Section 2), two main topics are discussed:
● Technical: handling MVDs with Linked Data technologies (Section 3)
● Industrial use cases: making most from the traditional MVD approach and Linked Data technologies (Section 4)
The purpose of this white paper is to discuss how Linked Data technologies and approaches could be effectively deployed to support industrial use cases that are typically related to the generation, use and maintenance of MVDs.
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
For organisations to successfully adopt data mesh, setting up and maintaining infrastructure needs to be easy.
We believe the best way to achieve this is to leverage the learnings from building a ‘central nervous system‘, commonly used in modern data-streaming ecosystems. This approach formalises and automates of the manual parts of building a data mesh.
This presentation introduces SpecMesh; a methodology and supporting developer toolkit to enable business to build the foundations of their data mesh.
MLOps pipelines using MLFlow - From training to productionFabian Hadiji
This talk was given at the Cologne AI and Machine Learning Meetup on April 13, 2023 (https://www.meetup.com/de-DE/cologne-ai-and-machine-learning-meetup/events/291513393/) by Dr. Andreas Weiden, Co-Lead Cloud / Data Engineering at skillbyte: MLOps pipelines using MLFlow - From training to production
In this talk we explore the world of MLOps pipelines and how MLFlow can be used to facilitate workflows for getting your machine learning models from training to production. We will briefly delve into the tracking aspects of MLFlow and how to store experiments and runs. Next, we will move on to an actual use case that involves managing artefacts generated by multiple training pipelines running on a daily schedule. These artefacts are used in prediction services but also in managed vector search engines such as ElasticSearch and Google VertexAI. A simple microservice that polls the MLFlow registry is used to update both REST-APIs running in Kubernetes and to ingest the models into the vector search services. Finally, we will compare different alternatives that were considered.
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.
To solve for these challenges, Databricks unveiled last year MLflow, an open source project that aims at simplifying the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
In the past year, the MLflow community has grown quickly: over 120 contributors from over 40 companies have contributed code to the project, and over 200 companies are using MLflow.
In this tutorial, we will show you how using MLflow can help you:
Keep track of experiments runs and results across frameworks.
Execute projects remotely on to a Databricks cluster, and quickly reproduce your runs.
Quickly productionize models using Databricks production jobs, Docker containers, Azure ML, or Amazon SageMaker.
We will demo the building blocks of MLflow as well as the most recent additions since the 1.0 release.
What you will learn:
Understand the three main components of open source MLflow (MLflow Tracking, MLflow Projects, MLflow Models) and how each help address challenges of the ML lifecycle.
How to use MLflow Tracking to record and query experiments: code, data, config, and results.
How to use MLflow Projects packaging format to reproduce runs on any platform.
How to use MLflow Models general format to send models to diverse deployment tools.
Prerequisites:
A fully-charged laptop (8-16GB memory) with Chrome or Firefox
Python 3 and pip pre-installed
Pre-Register for a Databricks Standard Trial
Basic knowledge of Python programming language
Basic understanding of Machine Learning Concepts
WSO2 Machine Learner takes data one step further, pairing data gathering and analytics with predictive intelligence: this helps you understand not just the present, but to predict scenarios and generate solutions for the future.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
Facilitating Data Curation: a Solution Developed in the Toxicology DomainChristophe Debruyne
Christophe Debruyne, Jonathan Riggio, Emma Gustafson, Declan O'Sullivan, Mathieu Vinken, Tamara Vanhaecke, Olga De Troyer.
Presented at the 2020 IEEE 14th International Conference on Semantic Computing, San Diego, California, 3-5 February 2020
Toxicology aims to understand the adverse effects of
chemical compounds or physical agents on living organisms. For
chemicals, much information regarding safety testing of cosmetic
ingredients is now scattered in a plethora of safety evaluation
reports. Toxicologists in our university intend to collect this
information into a single repository. Their current approach uses
spreadsheets, does not scale well, and makes data curation and
querying cumbersome. Semantic technologies (e.g., RDF, OWL,
and Linked Data principles) would be more appropriate for
this purpose. However, this technology is not very accessible to
toxicologists without extensive training. In this paper, we report
on a tool that supports subject matter experts in the construction
of an RDF–based knowledge base for the toxicology domain. The
tool is using the jigsaw metaphor for guiding the subject matter
experts. We demonstrate that the jigsaw metaphor is a viable
option for generating RDF. Future work includes investigating
appropriate methods and tools for knowledge evolution and data
analysis.
Open Source & Open Data : les bienfaits des communsLogilab
Le monde du logiciel a été précurseur en réintroduisant un mode de production "coopératif" dès le début des années 80.
Avec le mouvement de l'Open Data, cette coopération s'est étendue à la production des données.
On parle ici de biens communs dans le sens d'un "bien partagé par les membres d'une même communauté".
More Related Content
Similar to BRAINOMICS A management system for exploring and merging heterogeneous brain mapping data based on CubicWeb
Discover BigQuery ML, build your own CREATE MODEL statementMárton Kodok
With BigQuery ML, you can build machine learning models without leaving the database environment and training it on massive datasets. In this demo session we are going to demonstrate common marketing Machine Learning use cases of how to build, train, eval, and predict, your own scalable machine learning models using SQL language in Google BigQuery and to address the following use cases: - Customer Segmentation + Product cross sale recommendation - Conversion/Purchase prediction - Inference with other in-built >20 models The audience will get first-hand experience with how to write CREATE MODEL sql syntax to build machine learning models such as: - Multiclass logistic regression for classification - K-means clustering - Matrix factorization - ARIMA time series predictions ... and more Models are trained and accessed in BigQuery using SQL — a language data analysts know. This enables business decision-making through predictive analytics across the organization without leaving the query editor. In the end, the audience will learn how everyday developers can build/train/run their own machine-learning models straight from the database query editor, by issuing CREATE MODEL statements
databricks ml flow demonstration using automatic features engineeringMohamed MEJDOUBI
demonstration of using featuretools package to generate features / aggregates from raw relational data, and using ml flow to track entire model building & hyperparams optimization
On the relation between Model View Definitions (MVDs) and Linked Data technol...Ana Roxin
This white paper outlines the proposals from the Linked Data Working Group (LDWG) on how technologies and approaches that are common to the domains of Semantic Web, Linked Data, and the Web of Data (hereafter jointly called ‘Linked Data’) are related to Model View Definitions (MVDs). After a brief introduction of both the MVD concept (Section 1) and linked data technologies (Section 2), two main topics are discussed:
● Technical: handling MVDs with Linked Data technologies (Section 3)
● Industrial use cases: making most from the traditional MVD approach and Linked Data technologies (Section 4)
The purpose of this white paper is to discuss how Linked Data technologies and approaches could be effectively deployed to support industrial use cases that are typically related to the generation, use and maintenance of MVDs.
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
For organisations to successfully adopt data mesh, setting up and maintaining infrastructure needs to be easy.
We believe the best way to achieve this is to leverage the learnings from building a ‘central nervous system‘, commonly used in modern data-streaming ecosystems. This approach formalises and automates of the manual parts of building a data mesh.
This presentation introduces SpecMesh; a methodology and supporting developer toolkit to enable business to build the foundations of their data mesh.
MLOps pipelines using MLFlow - From training to productionFabian Hadiji
This talk was given at the Cologne AI and Machine Learning Meetup on April 13, 2023 (https://www.meetup.com/de-DE/cologne-ai-and-machine-learning-meetup/events/291513393/) by Dr. Andreas Weiden, Co-Lead Cloud / Data Engineering at skillbyte: MLOps pipelines using MLFlow - From training to production
In this talk we explore the world of MLOps pipelines and how MLFlow can be used to facilitate workflows for getting your machine learning models from training to production. We will briefly delve into the tracking aspects of MLFlow and how to store experiments and runs. Next, we will move on to an actual use case that involves managing artefacts generated by multiple training pipelines running on a daily schedule. These artefacts are used in prediction services but also in managed vector search engines such as ElasticSearch and Google VertexAI. A simple microservice that polls the MLFlow registry is used to update both REST-APIs running in Kubernetes and to ingest the models into the vector search services. Finally, we will compare different alternatives that were considered.
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.
To solve for these challenges, Databricks unveiled last year MLflow, an open source project that aims at simplifying the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
In the past year, the MLflow community has grown quickly: over 120 contributors from over 40 companies have contributed code to the project, and over 200 companies are using MLflow.
In this tutorial, we will show you how using MLflow can help you:
Keep track of experiments runs and results across frameworks.
Execute projects remotely on to a Databricks cluster, and quickly reproduce your runs.
Quickly productionize models using Databricks production jobs, Docker containers, Azure ML, or Amazon SageMaker.
We will demo the building blocks of MLflow as well as the most recent additions since the 1.0 release.
What you will learn:
Understand the three main components of open source MLflow (MLflow Tracking, MLflow Projects, MLflow Models) and how each help address challenges of the ML lifecycle.
How to use MLflow Tracking to record and query experiments: code, data, config, and results.
How to use MLflow Projects packaging format to reproduce runs on any platform.
How to use MLflow Models general format to send models to diverse deployment tools.
Prerequisites:
A fully-charged laptop (8-16GB memory) with Chrome or Firefox
Python 3 and pip pre-installed
Pre-Register for a Databricks Standard Trial
Basic knowledge of Python programming language
Basic understanding of Machine Learning Concepts
WSO2 Machine Learner takes data one step further, pairing data gathering and analytics with predictive intelligence: this helps you understand not just the present, but to predict scenarios and generate solutions for the future.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
Facilitating Data Curation: a Solution Developed in the Toxicology DomainChristophe Debruyne
Christophe Debruyne, Jonathan Riggio, Emma Gustafson, Declan O'Sullivan, Mathieu Vinken, Tamara Vanhaecke, Olga De Troyer.
Presented at the 2020 IEEE 14th International Conference on Semantic Computing, San Diego, California, 3-5 February 2020
Toxicology aims to understand the adverse effects of
chemical compounds or physical agents on living organisms. For
chemicals, much information regarding safety testing of cosmetic
ingredients is now scattered in a plethora of safety evaluation
reports. Toxicologists in our university intend to collect this
information into a single repository. Their current approach uses
spreadsheets, does not scale well, and makes data curation and
querying cumbersome. Semantic technologies (e.g., RDF, OWL,
and Linked Data principles) would be more appropriate for
this purpose. However, this technology is not very accessible to
toxicologists without extensive training. In this paper, we report
on a tool that supports subject matter experts in the construction
of an RDF–based knowledge base for the toxicology domain. The
tool is using the jigsaw metaphor for guiding the subject matter
experts. We demonstrate that the jigsaw metaphor is a viable
option for generating RDF. Future work includes investigating
appropriate methods and tools for knowledge evolution and data
analysis.
Open Source & Open Data : les bienfaits des communsLogilab
Le monde du logiciel a été précurseur en réintroduisant un mode de production "coopératif" dès le début des années 80.
Avec le mouvement de l'Open Data, cette coopération s'est étendue à la production des données.
On parle ici de biens communs dans le sens d'un "bien partagé par les membres d'une même communauté".
Le 1er salon professionnel de l'open data qui vise à consolider la filière française des données ouvertes. Ce salon valorise les acteurs positionnés sur l'open data et assure l'information des participants à travers des retours d'expériences, conférences stratégiques, ateliers pratiques et mises en relations directe avec les prestataires et ceux qui font l'open data aujourd'hui en France.
Pydata Paris Python for manufacturing musical instrumentsLogilab
Pydata Paris Python for manufacturing musical instruments.
Context: making and repair of musical instruments.
Assets: traditional craftsmanship ; world-famous quality
Challenge: aggressive competition from foreign countries
Olivier Cayrol a présenté la société Logilab et ses services à l'occasion de la Semaine de l'Industrie.
Logilab : un industriel du logiciel
- développement
- conseil
- formation
- expertise
Les archives du département de la Gironde, de la Ville de Bordeaux et de la Métropole de Bordeaux ont mis en place un système d'archivage mutualisé leur permettant de gérer leurs documents tout au long de leur cycle de vie, depuis leur conception initiale jusqu'à leur archivage. Ce système repose sur diverses applications logiciel libre prenant chacune en charge une partie du cycle de vie des documents (Alfresco pour la GED courante, Asalae pour l'archivage, etc.)
Dans le cadre de ce système, lors du passage d'un outil à l'autre, un grand nombre de métadonnées sont perdues et doivent donc être ressaisies. D'où l'idée de mettre en place un référentiel commun qui va constituer un pivot sur lequel s'appuient les autres outils, et qui va conserver et enrichir les métadonnées d'un document tout au long de son cycle de vie, au fil de ses passages d'une application à l'autre. Ces métadonnées sont, par exemple, le rangement dans des
plans de classement, les agents étant intervenus sur le document, les étapes du processus de validation, etc.)
Dans le cadre d'un marché public, Logilab développe le référentiel commun du système d'archivage pour le compte du Conseil Départemental de la Gironde, de la Métropole de Bordeaux et de la Ville de Bordeaux.
La présentation présente l'outil logiciel libre qui a vocation à être partagé avec d'autres institutions rencontrant une problématique similaire, mais également de revenir sur le projet lui-meme qui implique de nombreux acteurs et a été géré avec une méthode agile et divers outils innovants (MVP, UX Design, etc.)
Voir http://saem.e-bordeaux.org/projet-module-r%C3%A9f%C3%A9rentiel pour plus de détails.
Utiliser salt pour tester son infrastructure sur open stack ou dockerLogilab
Vous pouvez accéder à cette présentation sur ce lien : http://slides.logilab.fr/2015/poss2015_salt-docker/#/
Configurer et orchestrer son infrastructure avec un outil de gestion de configuration centralisée tel que Salt comporte de nombreux avantages.
La conservation et l'historisation des fichers de configuration dans un entrepôt de source geré par un DVCS (mercurial ou git) en fait partie.
Salt permet ensuite de faire évoluer son infrastructure en la testant dans des environements isolés. Une fois la description complète, reproduire une partie de son infrastructure de production sur un environnement virtualisé tel qu'un cloud
privé (OpenStack) devient possible et automatisable avec
*salt-cloud*. L'étape suivante est de pouvoir reproduire des portions de son infrastructure dans des conteneurs légers tels que docker ou lxc directement sur son portable. Pour cela, le pilotage de docker par salt et les fonctionnalités d'orchestration de salt permettent une agilité sans précédent.
Il s'agit d'un bon complément pour le TDI : Test Driven Infrastructure. L'infrastructure est testée en mode "intégration continue" et on peut tester et débugger une partie de l'infrastructure en mode "bac à sable".
Ce modèle peut ensuite être décliné avec l'utilisation des branches dans git ou mercurial où certaines branches vont être appliquées à la partie production de l'infrastructure alors que d'autres sont appliqués a la préproduction ou aux environnements docker ou lxc en local.
Salt est un outil de gestion de configuration centralisé généralement utilisé pour configurer et orchestrer son infrastructure système en bénéficiant de la conservation et de l'historisation des fichiers de configuration dans un entrepôt source géré par mercurial ou git. Toutefois, les possibilités offertes par Salt vont beaucoup plus loin. Une fois la description Salt de l'infrastructure de production terminée, il est possible d'en reproduire automatiquement tout ou partie
avec salt-cloud dans un environnement virtualisé (cloud privé OpenStack) et ainsi de pouvoir mener des tests. En allant plus loin, il est possible de reproduire des portions d'infrastructure dans des conteneurs légers (docker, lxc) et de travailler directement sur son ordinateur portable.
Les fonctionnalités d'orchestration de Salt et son pilotage de docker amènent une agilité sans précédent dans ce processus de travail.
Dans le modèle décrit ci-dessus, excellent complément du TDI (Test-Driven Infrastructure), l'infrastructure est testée et déboguée en mode "bac à sable" puis déployée selon un mécanisme automatisé d'intégration continue. Le modèle peut être décliné en utilisant des branches dans l’entrepôt source de Salt et en choisissant quelles branches sont appliquées
à l'infrastructure en production, en pré-production ou en test dans les environnements locaux (docker, lxc). Des mécanismes de relecture et de validation peuvent alors être mis en œuvre.
Importer des données en Python avec CubicWeb 3.21Logilab
Slides en version HTML : http://slides.logilab.fr/2015/pyconfr2015_import_donnees_cubicweb/
Nous introduirons une nouvelle méthode pour importer des données externes (service REST, fichier, etc.) dans une application. Simple et flexible, elle favorise la réutilisation des composants afin de se concentrer sur ce qui est spécifique à chaque cas. Un exemple d'implémentation sera donné avec le framework CubicWeb 3.21 qui introduit une API d'import de données fondée sur cette méthode.
Abstract
Il est courant pour un développeur de devoir implémenter des fonctionnalités d'import de données. Les utilisateurs peuvent en effet avoir besoin de données disponibles par ailleurs, dans un tableur par exemple, dans une autre base, ou accessibles par des services web. Nous introduisons ici une méthode simple pour réaliser cela en favorisant la généricité et la ré-utilisabilité des composants développés.
Un flux personnalisable d'entités
L'idée est de transformer petit à petit les données externes afin de les rendre compatibles avec le schéma de l'application et donc insérables en base.
Le principe est de commencer par stocker les données de la source sur une «[HTML_REMOVED]entité externe[HTML_REMOVED]». Puis on utilise des générateurs Python pour créer un «[HTML_REMOVED]flux[HTML_REMOVED]» d'entités[HTML_REMOVED]: chaque entité passe de fonction en fonction, ce qui va la transformer peu à peu pour la rapprocher du modèle attendu et qu'elle soit finalement prête à être insérée, ou au contraire rejetée si ce n'est pas possible.
L'étape d'insertion est alors identique, quelle que soit la source d'où proviennent les données.
Exemple avec CubicWeb 3.21
Avec sa version 3.21 sortie en 2015, le framework CubicWeb dispose d'une toute nouvelle API fondée sur cette méthode. Elle permet au développeur de se concentrer sur le flux de données provenant des sources qui l'intéressent.
De plus, l'étape d'insertion offre plusieurs composants interchangeables. En fonction du compromis qu'il souhaite faire entre sécurité et rapidité, le développeur peut choisir entre une insertion sûre mais lente, où chaque entité à insérer est préalablement vérifiée quant au type de ses données et la validité de ses relations, et une insertion plus rapide mais pouvant échouer, où les entités sont insérées en masse.
Tout cela sera illustré par le moissonnage des données de portails Open Data. La diversité des formats (RDF (open-data.europa.eu), REST (data.gouv.fr, OpenDataSoft), CSW (geocatalogue.fr), etc.) ainsi que des modèles de données impose de créer des flux différents afin que toutes ces données puissent être importées dans un schéma unique de CubicWeb.
De la même façon, un autre exemple sera l'import de données SKOS où des thesaurus entiers sont importés dans CubicWeb grâce à cette API.
Simulagora, met la simulation numérique à la portée de tous !
Simulagora est un service Web de simulation numérique dans le cloud permettant la réalisation de calculs d'envergure sans investissement :
- pas de matériel dédié : une connexion à Internet suffit,
- pas de connaissances poussées en informatique : les principaux codes de calcul libres sont pré-installés,
- pas d'investissement financier : le paiement se fait à l'usage.
Simulagora donne accès à la puissance quasi-illimitée des nuages publics :
- lancez vos calculs exigeants sur des machines ayant jusqu'à 32 CPU et 120 Go RAM,
- obtenez le résultat de vos études paramétriques en un temps record en utilisant plusieurs centaines de machines simultanément.
Simulagora n'est pas un service d'exécution de calcul comme les autres :
- il s'insère dans vos processus actuels (script, terminal, interface Web),
- il enregistre l'historique de vos calculs pour assurer leur capitalisation et leur traçabilité,
- il garantit la reproductibilité de vos calculs, en conservant une copie de la machine virtuelle sur laquelle ils ont été exécutés,
- son interface Web permet une collaboration entre les différents experts pour mener à bien vos études numériques les plus pointues.
Simulagora est un produit de Logilab qui, depuis 15 ans, met les technologies du Web au service de l'informatique scientifique.
Simulagora, sets numerical simulation available to all!
Simulagora is a cloud-based numerical simulation service which allows computations without any up-front investment:
- No dedicated hardware: only an Internet connection is needed
- No advanced IT knowledge: major free software packages are pre-installed and ready-to-use
- No financial investment: pay only for used resources.
Simulagora provides access to the virtually unlimited power of public clouds:
- Start your complex computations on machines with up to 32 CPUs and 120GB of RAM
- Get the results of your parametric studies in record time using hundreds of machines simultaneously.
Simulagora is a one-of-a-kind computation execution service:
- Easy to fit into your existing processes (scripts, terminal applications, Web interface)
- Saves your computations history to ensure their traceability and maximize their capitalization
- Guarantees the reproducibility of your computations by keeping a copy of the virtual machines on which they ran
- Enables collaboration among experts using the Web interface to carry out your most precise digital studies.
Simulagora is a service built by Logilab, a 15-year veteran of Web technologies for scientific computing.
Innover par et pour la donnée - Logilab ADBU Bibcamp 2015Logilab
Innover par et pour la donnée, illustrée par l'exemple de data.bnf.fr et les principes du web sémantique.
Présentation de Logilab au #bibcamp15 organisé en juin 2015 par l'ADBU, association des directeurs de bibliothèques universitaires.
Study of the dynamic behavior of a pump with Code_ASTER on SimulagoraLogilab
Simulagora is a platform using Web technology to ease grid
computation by leveraging cloud resources.
Public cloud based
– Computation and storage resources
– Enormous power
– Super-fast computation resources ramp-up
Support de présentation lors de notre participation au "Battle" organisé par LibertTIC et DataLab à la ruche numérique du Mans.
Version PDF : http://www.logilab.fr/file/2221/raw/Battle%20Opendata.pdf
Debconf14 : Putting some salt in your Debian systems -- Julien CristauLogilab
Salt allows scalable infrastructure management, including provisioning new systems and managing them over their lifetime. In this talk I'll show how it makes managing Debian systems easier.
L'autre poster présenté par Logilab concerne Simulagora, un service en ligne de simulation numérique collaborative, qui permet de lancer des calculs dans les nuages (donc sans investissement dans du matériel ou d'administration système), qui met l'accent sur la traçabilité et la reproductibilité des calculs, ainsi que sur le travail collaboratif (partage de logiciel, de données et d'études numériques complètes).
Logilab was part of the research project PAFI (Plateforme d'Aide à la Facture Instrumentale) and developed an innovative web app, using CubicWeb, to facilitate the virtual prototyping of musical instruments and collaborative work between makers, users and museum curators.
Presentation by Nicolas Chauvat at "Fabrique de la Loi" 2014 (Open Legislative Data Conference)
Mirror of http://www.logilab.org/file/253393/raw/OLDC2014-chauvat.pdf
Blog entry : http://www.logilab.org/blogentry/253397
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Monitoring Java Application Security with JDK Tools and JFR Events
BRAINOMICS A management system for exploring and merging heterogeneous brain mapping data based on CubicWeb
1. BRAINOMICS
A management system for exploring and merging
heterogeneous brain mapping data based on CubicWeb
Vincent Michel
Logilab
CrEDIBLE 2013 - 3/10/2013
1/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 1 / 3
2. Plan
1 Introduction
2 Cubicweb and Brainomics
3 Data Model
4 Querying data
5 Future and Conclusions
2/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 2 / 3
3. Logilab
computer science and knowledge management ;
Founded in 2000 ;
20 experts in IT technologies ;
Public and private clients (CEA, EDF R&D, EADS, Arcelor Mittal, etc.).
Knowledge management :
http://collections.musees-haute-normandie.fr/collections/
http://data.bnf.fr/
3/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 3 / 3
4. Brainomics
Goals
Software solution for integration of neuroimaging and genomic data ;
Conception/optimization (GPU) of algorithms for analysing these data ;
Collaborative R&D project
NeuroSpin laboratory of CEA
Supelec ;
UMR 894 of INSERM ;
UMR CNRS 8203 of IGR ;
Logilab ;
Alliance Services Plus (AS+) ;
Keosys.
This work was supported by grants from the French National Reseach Agency (ANR GENIM ;
ANR-10-BLAN-0128) and (ANR IA BRAINOMICS ; ANR-10-BINF-04).
4/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 4 / 3
5. Context
Brain mapping data
Large datasets for brain mapping :
http://openfmri.org/data-sets
http://fcon_1000.projects.nitrc.org/indi/abide/
Neuroimaging + clinical data + genetics data ;
Brain mapping databases
Neuroimaging and genomics databases are dedicated to their own field of
research ;
XNAT - Neuroimaging ;
BASE - Genetics ;
SHANOIR - Neuroimaging ;
5/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 5 / 3
6. Plan
1 Introduction
2 Cubicweb and Brainomics
3 Data Model
4 Querying data
5 Future and Conclusions
6/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 6 / 3
7. What is Cubicweb
A semantic open-source web framework written in Python
An efficient knowledge management system
Entity-relationship data-model ;
RQL (Relational Query Language) ;
Separate query and display (HTML UI, JSON, RDF, CSV,...) ;
Conform to the Semantic Web standards ;
Fine-grained security system coupled to the data model definition ;
Migration mechanisms control model version and ensure data integrity ;
Industrial use : large databases, many users, security, logging ;
Used in production environments since 2005 ; LGPL since 2008 ;
http://www.cubicweb.org/
7/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 7 / 3
8. What Cubicweb is not
Cubicweb is not
a pure Web application framework - Web/HTML interface is only one
possible output ;
a triple store - data is structured ;
a CMS - allows complex business data modeling ;
Cubicweb is a framework
Used to build applications, with reusable components called cubes :
data model : persons, addressbook, billing...
displays (a.k.a views) : d3js, workflow, maps, threejs...
full applications : forge, intranet, erp...
open databases : dbpedia, diseasome, pubmed...
8/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 8 / 3
9. Overview of the framework
Well established core technologies : SQL, Python, HTML5, Javascript ;
Application
Based on a relational database (e.g. PostgreSQL) for storing information ;
Web server for HTTP access to the data ;
Integration with existing LDAP for high-level access management ;
Code - written in Python
Schema - define the data model.
Business Logic - allow to increase the logic of the data beyond the scope
of the data model, using specific functions and adapters.
Views - specific display rules, from HTML to binary content.
9/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 9 / 3
10. CubicWeb and Semantics
Not using a triple store does not mean “not semantic web compliant”
CubicWeb conforms to the Semantic Web standards
One entity = an unique URI ;
One request = an unique URI :
http://localhost:8080/?rql=MYRQLREQUEST&vid=MYVIEWID
HTTP content negociation (HTML, RDF, JSON, etc.) ;
Import/export to/from RDF data, based on a specific mapping :
xy.add_equivalence(’Person given_name’,
’foaf:givenName’)
Use RDF as a standard I/O format for data, but stick to relational database
for storage.
10/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 10 /
11. Visualization and interactions
Cubicweb views
A view is applied on the result of a query ;
The same result may be visualized using different views ;
Views are selected based on the types of the resulting data ;
Exploring the data
Many different possible views ;
Auto-completion RQL query form ;
Filtering facets ;
Using data in scripts with the URL vid/rql parameters ;
And also : forms, widgets, ...
11/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 11 /
12. Brainomics
Brings together brain imaging and genetics data
A solution based on CubicWeb
Modeling of Scans, Questionnaires, Genomics results, Behavioural
results, Subjects and Studies information, ...
Can deal with large volumes (> 10000 subjects) ;
Tested with several datasets (openfmri, abide, imagen, localizer) ;
Specific views : ZIP, XCEDE XML, CSV ;
Open source solution
http://www.brainomics.net/demo
12/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 12 /
13. Plan
1 Introduction
2 Cubicweb and Brainomics
3 Data Model
4 Querying data
5 Future and Conclusions
13/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 13 /
14. Data model and Schema
CubicWeb schema
Based on a Python library : http://www.logilab.org/project/yams
Defined in a Python file (schema.py) ;
Allow to create entity types, relations, constraints ;
Security can be tightly included in the data model ;
class Subject(EntityType):
identifier = String(required=True,indexed=True,maxsize=64)
gender = String(vocabulary=(’male’, ’female’, ’unknown’))
date_of_birth = Date()
...
class related_studies(RelationType):
subject = ’Subject’
object = ’Study’
cardinality = ’1*’
14/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 14 /
15. Data model - Subject
15/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 15 /
16. Where are the reference models ?
Reference models
Reference models may be implemented as Cubicweb schema
additional modeling refinements for application features
(sortability, readability, ...)
already existing cubes for some ontologies/taxonomies (e.g. FRBR) ;
Be pragmatic : if you need a new attribute, add it in the model → deal with the
reference model in the I/O formats ;
... or you could use your own specific schema
Ontologies do not exist for all fields ;
Define your schema and only code the mappings (I/O) to reference
models when they go out.
16/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 16 /
17. Schema evolution
http://xkcd.com/
Standards are moving
Existing Migration mechanisms to stick to the reference models
evolution ;
Easy modification for modeling of other applicative fields ;
17/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 17 /
18. Data - Input
Keep the application data model as pivot model → convert the data to
this model during the insertion
Datafeed - Periodic input
an URL ;
some Python logic for integrating/updating information ;
an interval of synchronization ;
→ used for almost any possible type of data, e.g. RSS feeds, files, RDF data,
other CW instances...
Stores - Bulk loading
Tools similar to ETL
allow to import huge amount of structured data ;
principally used for bulk loading ;
different level of security and data check ;
18/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 18 /
19. Data - Output
Design a specific view for outputing in any reference model
If one reference model change, change the I/O, not the internal model ;
Avoid data redundancy, if different reference models are based on the
same information (e.g. dc :title, foaf :name) ;
Example of View - Export a Scan in XCEDE format
class ScanXcedeItemView(XMLItemView):
__select__ = XMLItemView.__select__ & is_instance(’Scan’)
__regid__ = ’xcede-item’
def entity_call(self, entity):
self.w(u’<acquisition ID="%(id)s" projectID="%(p)s"
’subjectID="%(s)s" visitID="%(a)s" ’
’studyID="%(a)s" episodeID="%(la)s"/>n’
% {’id’: entity.identifier,
’p’: entity.related_study[0].name,
...
19/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 19 /
20. Data Output - Example of XCEDE
20/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 20 /
21. Data Output - RDF specific case
Existing tools in CubicWeb for RDF mapping
RDF mapping
xy.register_prefix(’foaf’, ’http://xmlns.com/foaf/0.1/’)
xy.add_equivalence(’Subject’, ’foaf:Subject’)
xy.add_equivalence(’MedicalCenter’, ’foaf:Organization’)
xy.add_equivalence(’Subject given_name’, ’foaf:givenName’)
xy.add_equivalence(’Subject family_name’, ’foaf:familyName’)
xy.add_equivalence(’* same_as *’, ’owl:sameAs’)
xy.add_equivalence(’* see_also *’, ’foaf:page’)
Also possible to plug specific Python functions for non-trivial mapping.
Used for RDF import/export and SPARQL endpoint
21/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 21 /
22. Plan
1 Introduction
2 Cubicweb and Brainomics
3 Data Model
4 Querying data
5 Future and Conclusions
22/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 22 /
23. RQL - Relational Query Language
Features
Similar to W3C’s SPARQL, but less verbose ;
Supports the basic operations (select, insert, etc.), subquerying, ordering,
counting, ...
Tightly integrated with SQL, but abstracts the details of the tables and
the joins ;
Use the schema for data types inference, based on a syntactic analysis of
the request.
→ A query returns a result set (a list of results), that can be displayed using
specific views.
23/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 23 /
24. RQL - Example
Query all the Cmap scans of left-handed male subjects that have a score
greater than 4.0 for the "algebre" question of the Localizer questionnary
↓
Any SA WHERE S handedness "left", S gender "male",
X concerns S, A questionnaire_run X,
A question Q, Q text "algebre", A value > 4,
SA concerns S, SA is Scan, SA type "c map"
and the SQL translation ...
SELECT _SA.cw_eid FROM cw_Answer AS _A, cw_Question AS _Q,
cw_QuestionnaireRun AS _X, cw_Scan AS _SA, cw_Subject AS _S
WHERE _S.cw_handedness="left" AND _S.cw_gender"male"
AND _X.cw_concerns=_S.cw_eid
AND _A.cw_questionnaire_run=_X.cw_eid
AND _A.cw_question=_Q.cw_eid AND _Q.cw_text="algebre"
AND _A.cw_value>4 AND _SA.cw_concerns=_S.cw_eid
AND _SA.cw_type="c map"
24/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 24 /
25. RQL for Endusers
No perfect dashboard / no perfect search form
↓
Put an expressive query language in the hands of the endusers
Exploring the data model
Explore the schema
http://localhost:8080/schema
RQL completion form ;
Learning RQL by showing the RQL for each page/each filter ;
Deep exploration of the data by the endusers
25/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 25 /
26. RQL VS SPARQL
Developped in parallel with SPARQL years ago, with a focus on SQL database
and support of SET / UPDATE / DELETE.
Why we should support SPARQL ...
SPARQL is a W3C standard, and a reference in the Semantic Web
community.
To improve interoperability of the CubicWeb application with other
Semantic Web technologies.
... and why we don’t want to use only SPARQL
Huge company’s internal knowledge on RDBMS (VS Triplestores) ;
SPARQL is quite verbose, RQL is more intuitive and elegant (“Syntax
matters”) ;
There exists a basic translation from SPARQL to RQL (only selection queries).
26/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 26 /
27. Federated databases in CW
PostgreSQL federated databases
Let PostgreSQL do what it does best
Since PostgreSQL 9.3 ;
Based on Foreign Data Wrapper (FDW) ;
Could be used to federated queries in CW (WIP) ;
FROM clause (WIP)
Similar to SPARQL SERVICE
Any P, S WHERE GEN is GenomicMeasure, GEN concerns S,
GEN platform P, P related_snps SN, SN in_gene G, G name GN
WITH P BEING (Any X WHERE X is Paper, X keywords GN)
FROM http://pubmed:8080
Eventually, also allow SPARQL subqueries ;
Or use CubicWeb SPARQL endpoint with SERVICE.
27/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 27 /
28. Plan
1 Introduction
2 Cubicweb and Brainomics
3 Data Model
4 Querying data
5 Future and Conclusions
28/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 28 /
29. Conclusion
Brainomics
Open source solution to manage brain imaging datasets and associated
metadata ;
Powerful querying and reporting tool, customized for emerging multimodal
studies.
Feedback from Brainomics
Do not store raw data in database ;
Try to interact with existing reference databases ;
Using CubicWeb :
Easy modeling of other applicative fields in the schema (e.g. Histology) ;
Security, migrations are already included ;
Many different views, and existing API to define your own ;
29/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 29 /
30. Future work
Future of Brainomics
How to transfer large files (>10Go for genotype files) ?
Need Content Delivery Network (CDN) in CubicWeb ;
Integration to reference databases (pubmed, refseq, ...) ;
Future of Cubicweb
Extended support of SPARQL ;
Finish the work on federated queries ;
REST support, Python’s Web Server Gateway Interface ;
Better integration with Bootstrap ;
30/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 30 /
32. RQL/SPARQL - Example
Cities of Île de France with more than 100 000 unhabitants ?
RQL
Any X WHERE X region Y, X population > 100000,
Y uri "http://fr.dbpedia.org/resource/Île-de-France"
SPARQL
select ?ville where {
?ville db-owl:region <http://fr.dbpedia.org/resource/Île-de-F
?ville rdf:type db-owl:Settlement .
?ville db-owl:populationTotal ?population .
FILTER (?population > 100000)
}
32/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 32 /
33. This is NOT big data !
Small databases ...
No need to store raw data in the database ;
Structured data : metadata (patient informations, ...), choosen scores
(some statistical values on genes of interest, ...) ;
10 Mo / subject * 10.000 subjects = 100 Go ;
Classical SQL databases (e.g. PostgreSQL) work greats up to few TB !
http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
http://www.vitavonni.de/blog/201309/
2013092701-big-data-madness-and-reality.html
33/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 33 /
34. Global architecture of CubicWeb
34/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 34 /
35. What’s needed
Efficient data model to integrate all the measures ;
Easy access to the relevant information (query language + UI) ;
Import / Export in several formats, for merging heterogenous studies ;
Adaptable to the evolutions of various dynamic applicative fields.
CubicWeb, a semantic datamanagement framework
35/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 35 /
36. Data model - Assessment
36/36 Vincent Michel (Logilab) Brainomics CrEDIBLE 2013 - 3/10/2013 36 /