As electricity is difficult to store, it is crucial to strictly maintain the balance between production and consumption. The integration of intermittent renewable energies into the production mix has made the management of the balance more complex. However, access to near real-time data and communication with consumers via smart meters suggest demand response. Specifically, sending signals would encourage users to adjust their consumption according to the production of electricity. The algorithms used to select these signals must learn consumer reactions and optimize them while balancing exploration and exploitation. Various sequential or reinforcement learning approaches are being considered.
Online violence amplifies IRL discriminations, and the lack of diversity grows in a vicious circle. Understanding cyber-violence, its forms and mechanisms, can help us fight back. To process massive volumes of data, AI finally comes into play for good.
In the energy sector, the use of temporal data stands as a pivotal topic. At GRDF, we have developed several methods to effectively handle such data. This presentation will specifically delve into our approaches for anomaly detection and data imputation within time series, leveraging transformers and adversarial training techniques.
Natasha shares her experience to delve into the complexities, challenges, and strategies associated with effectively leading tech teams dispersed across borders.
Nour and Maria present the work they did at Tweag, Modus Create innovation arm, where the GenAI team developed an evaluation framework for Retrieval-Augmented Generation (RAG) systems. RAG systems provide an easy and low-cost way to extend the knowledge of Large Language Models (LLMs) but measuring their performance is not an easy task.
The presentation will review existing evaluation frameworks, ranging from those based on the traditional ML approach of using groundtruth datasets, including Tweag's, to those that use LLMs to compute evaluation metrics.
It will also delve into the practical implementation of Tweag's chatbot over two distinct documents datasets and provide insights on chunking, embedding and how open source and commercial LLMs compare.
Sharone Dayan, Machine Learning Engineer and Daria Stefic, Data Scientist, both from Contentsquare, delve into evaluation strategies for dealing with partially labelled or unlabelled data.
As electricity is difficult to store, it is crucial to strictly maintain the balance between production and consumption. The integration of intermittent renewable energies into the production mix has made the management of the balance more complex. However, access to near real-time data and communication with consumers via smart meters suggest demand response. Specifically, sending signals would encourage users to adjust their consumption according to the production of electricity. The algorithms used to select these signals must learn consumer reactions and optimize them while balancing exploration and exploitation. Various sequential or reinforcement learning approaches are being considered.
Online violence amplifies IRL discriminations, and the lack of diversity grows in a vicious circle. Understanding cyber-violence, its forms and mechanisms, can help us fight back. To process massive volumes of data, AI finally comes into play for good.
In the energy sector, the use of temporal data stands as a pivotal topic. At GRDF, we have developed several methods to effectively handle such data. This presentation will specifically delve into our approaches for anomaly detection and data imputation within time series, leveraging transformers and adversarial training techniques.
Natasha shares her experience to delve into the complexities, challenges, and strategies associated with effectively leading tech teams dispersed across borders.
Nour and Maria present the work they did at Tweag, Modus Create innovation arm, where the GenAI team developed an evaluation framework for Retrieval-Augmented Generation (RAG) systems. RAG systems provide an easy and low-cost way to extend the knowledge of Large Language Models (LLMs) but measuring their performance is not an easy task.
The presentation will review existing evaluation frameworks, ranging from those based on the traditional ML approach of using groundtruth datasets, including Tweag's, to those that use LLMs to compute evaluation metrics.
It will also delve into the practical implementation of Tweag's chatbot over two distinct documents datasets and provide insights on chunking, embedding and how open source and commercial LLMs compare.
Sharone Dayan, Machine Learning Engineer and Daria Stefic, Data Scientist, both from Contentsquare, delve into evaluation strategies for dealing with partially labelled or unlabelled data.
Laure talked about a very hot topic in the community at the moment with the ChatGPT phenomenon: how to supervise a PhD thesis in NLP in the age of Large Language Models (LLMs)?
Abstract: Who hasn't heard of the "Pilot Syndrome"? 85% of Data Science Pilots remain pilots and do not make it to the production stage. Let's build a production-ready and end-user-friendly Data Science application. 100% python and 100% open source.
Phase 1 | Building the GUI: create an interactive and powerful interface in a few lines of code
Phase 2 | Integrated back end: Manage your models and pipelines and create scenarios the smart way
"Nature Language Processing for proteins" by Amélie Héliou, Software Engineer @ Google Research
Abstract: Over the past few months, Large Language Models have become very popular.
We'll see how a simple LLM works, from input sentence to prediction.
I'll then present an application of LLM to protein name prediction.
Twitter: @Amelie_hel
"We are not passing by, and we are not a trend". What if an automated and large scale version of the Bechdel-Wallace test could confirm the speech of Alice Diop at the Cesar 2023?
That's the objective of BechdelAI : to build a tool based on Artificial Intelligence and open-source, allowing to measure the inequalities and the under-representation of women in movies and audiovisual.
"Emergency plan to secure winter: what are the measures set up by RTE?" by Sophie Diakhate, Ingénieure Génie électrique, Consultante en énergie et utilities at Yélé
Abstract: The french electric system is currently going through an exceptional crisis, threatening the electric supplies for this winter, and potentially the next ones. As the guarantor of the balance between supply and demand, RTE must assume the security of supply security in France. They set up an emergency plan to secure winters. Yélé helps RTE to carry on that plan.
We are going to see what are the primary measures proposed by RTE for the winter 2022-2023, and the options for individuals and the industry to reduce the risks of network load shedding.
"New edge prediction and anomaly-detection in large computer networks" by Dr Silvia Metelli, Marie Skołodowska-Curie Individual Fellow
Abstract : Monitoring computer network traffic in search for anomalous behaviour is both a challenging and important task for cyber-security. New edges, i.e. connections from a host or user to a computer that has not been connected to before, provide potentially strong statistical evidence for detecting anomalies and in rare cases might suggest the presence of intruders or malicious activity. In this talk, I will introduce a robust Bayesian model and anomaly detection method for simultaneously characterising network structure and modelling likely new edge formation in a large computer network graph. What constitutes normal behaviour for some hosts might be very unusual for some others and thus examining existing network structure (e.g. clusters of similar clients and servers) is key for accurately predicting likely future interactions. Finally, the model is used to construct an anomaly detection method, which successfully identifies some of the machines known to be compromised when demonstrated on real computer network authentication data.
Interest has increased in the use of prognosis factors as a cursor for breast cancer personalized treatment. For clinicians, early detection of those factors can be helpful for a good management of the disease and for the choice of an efficient treatment. Moreover, it exists a huge amount of meaningful information in pathological reports, biological measurements and clinical information in a patient journey that remain unexploited. In that context, I propose to develop and apply novel machine learning techniques to predict cancer outcome such as recurrence or survival from multi-modal breast cancer patient data (including medical notes in natural languages and the outcome of various lab analyses). For that, I use a deep neural sequence transduction for electronic health records called BEHRT1. This model is inspired from one of the most powerful transformer-based architecture in Natural Language Processing: BERT2.
During the joint meetup with Duchess France and PyLadies Paris, Deborah Boyenval, PhD Student at Université Côté d'Azur presented a part of her PhD work: “Formal modeling of biological cyclic behavior with control points: the case of the cell cycle”.
The main limitation of biologists rooted in an experimental practice is the ability to perform rigorous proofs in the absence of a language for formalizing the biological knowledge extracted from their experiments.
Biologists have identified numerous biochemical and genetic mechanisms involved in physiological functions or diseases, but once this knowledge is linked together it remains extremely difficult and expensive to predict the impact of genetic mechanisms on physiological functions.
Déborah will present to us her thesis, which focuses firstly on a reasonable mathematical specification of complex biological functions such as cell cycle checkpoints, which represent the main barrier against cancer. Secondly, she focused on the development of an automated proof method, using the mentioned tools, proving whether a set of genetic regulations is sufficient to generate cell cycle checkpoints.
A look behind L’Oréal’s tool for consumer feedback analysis. We will discuss how NLP and Computer Vision have been applied to analyse large volumes of product reviews. Specifically, we’ll talk about topic extraction, sentiment analysis and topic enrichment in NLP, and siamese neural networks with triplet loss in Computer Vision.
"Blood flow simulation for clinical applications" by Dr Irene Vignon-Clementel, Directrice de recherche @Inria
Abstract : The dynamics of how blood flows into our body can be numerically simulated. Such simulations provide an 'augmented intelligence' to better understand cardiovascular and organ disease and plan their treatment.
"Cost of war - employment impact" by Anastasiia Tryputen, AI enthusiast, researcher, entrepreneur, founder of NeuroHarb and Data unBlocked companies
Abstract: The talk would give a background of the Ukrainian talent market, historical trends, the success of Ukrainians inside and outside of the country. The next part of the talk will focus on the truth of russian full-scale war and what it means to families, companies, and talents. The last part of the talk will focus on the statistics around people fleeing war, specifically children and women, and suggest data science professionals and business owners to consider helping those who arrived to pick up new skills to enter the technical field. Finally, the closing part will focus on the gender gap in tech field and how we can contribute to changing this situation for all the women in EU and globally.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
More Related Content
More from Paris Women in Machine Learning and Data Science
Laure talked about a very hot topic in the community at the moment with the ChatGPT phenomenon: how to supervise a PhD thesis in NLP in the age of Large Language Models (LLMs)?
Abstract: Who hasn't heard of the "Pilot Syndrome"? 85% of Data Science Pilots remain pilots and do not make it to the production stage. Let's build a production-ready and end-user-friendly Data Science application. 100% python and 100% open source.
Phase 1 | Building the GUI: create an interactive and powerful interface in a few lines of code
Phase 2 | Integrated back end: Manage your models and pipelines and create scenarios the smart way
"Nature Language Processing for proteins" by Amélie Héliou, Software Engineer @ Google Research
Abstract: Over the past few months, Large Language Models have become very popular.
We'll see how a simple LLM works, from input sentence to prediction.
I'll then present an application of LLM to protein name prediction.
Twitter: @Amelie_hel
"We are not passing by, and we are not a trend". What if an automated and large scale version of the Bechdel-Wallace test could confirm the speech of Alice Diop at the Cesar 2023?
That's the objective of BechdelAI : to build a tool based on Artificial Intelligence and open-source, allowing to measure the inequalities and the under-representation of women in movies and audiovisual.
"Emergency plan to secure winter: what are the measures set up by RTE?" by Sophie Diakhate, Ingénieure Génie électrique, Consultante en énergie et utilities at Yélé
Abstract: The french electric system is currently going through an exceptional crisis, threatening the electric supplies for this winter, and potentially the next ones. As the guarantor of the balance between supply and demand, RTE must assume the security of supply security in France. They set up an emergency plan to secure winters. Yélé helps RTE to carry on that plan.
We are going to see what are the primary measures proposed by RTE for the winter 2022-2023, and the options for individuals and the industry to reduce the risks of network load shedding.
"New edge prediction and anomaly-detection in large computer networks" by Dr Silvia Metelli, Marie Skołodowska-Curie Individual Fellow
Abstract : Monitoring computer network traffic in search for anomalous behaviour is both a challenging and important task for cyber-security. New edges, i.e. connections from a host or user to a computer that has not been connected to before, provide potentially strong statistical evidence for detecting anomalies and in rare cases might suggest the presence of intruders or malicious activity. In this talk, I will introduce a robust Bayesian model and anomaly detection method for simultaneously characterising network structure and modelling likely new edge formation in a large computer network graph. What constitutes normal behaviour for some hosts might be very unusual for some others and thus examining existing network structure (e.g. clusters of similar clients and servers) is key for accurately predicting likely future interactions. Finally, the model is used to construct an anomaly detection method, which successfully identifies some of the machines known to be compromised when demonstrated on real computer network authentication data.
Interest has increased in the use of prognosis factors as a cursor for breast cancer personalized treatment. For clinicians, early detection of those factors can be helpful for a good management of the disease and for the choice of an efficient treatment. Moreover, it exists a huge amount of meaningful information in pathological reports, biological measurements and clinical information in a patient journey that remain unexploited. In that context, I propose to develop and apply novel machine learning techniques to predict cancer outcome such as recurrence or survival from multi-modal breast cancer patient data (including medical notes in natural languages and the outcome of various lab analyses). For that, I use a deep neural sequence transduction for electronic health records called BEHRT1. This model is inspired from one of the most powerful transformer-based architecture in Natural Language Processing: BERT2.
During the joint meetup with Duchess France and PyLadies Paris, Deborah Boyenval, PhD Student at Université Côté d'Azur presented a part of her PhD work: “Formal modeling of biological cyclic behavior with control points: the case of the cell cycle”.
The main limitation of biologists rooted in an experimental practice is the ability to perform rigorous proofs in the absence of a language for formalizing the biological knowledge extracted from their experiments.
Biologists have identified numerous biochemical and genetic mechanisms involved in physiological functions or diseases, but once this knowledge is linked together it remains extremely difficult and expensive to predict the impact of genetic mechanisms on physiological functions.
Déborah will present to us her thesis, which focuses firstly on a reasonable mathematical specification of complex biological functions such as cell cycle checkpoints, which represent the main barrier against cancer. Secondly, she focused on the development of an automated proof method, using the mentioned tools, proving whether a set of genetic regulations is sufficient to generate cell cycle checkpoints.
A look behind L’Oréal’s tool for consumer feedback analysis. We will discuss how NLP and Computer Vision have been applied to analyse large volumes of product reviews. Specifically, we’ll talk about topic extraction, sentiment analysis and topic enrichment in NLP, and siamese neural networks with triplet loss in Computer Vision.
"Blood flow simulation for clinical applications" by Dr Irene Vignon-Clementel, Directrice de recherche @Inria
Abstract : The dynamics of how blood flows into our body can be numerically simulated. Such simulations provide an 'augmented intelligence' to better understand cardiovascular and organ disease and plan their treatment.
"Cost of war - employment impact" by Anastasiia Tryputen, AI enthusiast, researcher, entrepreneur, founder of NeuroHarb and Data unBlocked companies
Abstract: The talk would give a background of the Ukrainian talent market, historical trends, the success of Ukrainians inside and outside of the country. The next part of the talk will focus on the truth of russian full-scale war and what it means to families, companies, and talents. The last part of the talk will focus on the statistics around people fleeing war, specifically children and women, and suggest data science professionals and business owners to consider helping those who arrived to pick up new skills to enter the technical field. Finally, the closing part will focus on the gender gap in tech field and how we can contribute to changing this situation for all the women in EU and globally.
More from Paris Women in Machine Learning and Data Science (20)
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Water billing management system project report.pdfKamal Acharya
Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
An Approach to Detecting Writing Styles Based on Clustering Techniquesambekarshweta25
An Approach to Detecting Writing Styles Based on Clustering Techniques
Authors:
-Devkinandan Jagtap
-Shweta Ambekar
-Harshit Singh
-Nakul Sharma (Assistant Professor)
Institution:
VIIT Pune, India
Abstract:
This paper proposes a system to differentiate between human-generated and AI-generated texts using stylometric analysis. The system analyzes text files and classifies writing styles by employing various clustering algorithms, such as k-means, k-means++, hierarchical, and DBSCAN. The effectiveness of these algorithms is measured using silhouette scores. The system successfully identifies distinct writing styles within documents, demonstrating its potential for plagiarism detection.
Introduction:
Stylometry, the study of linguistic and structural features in texts, is used for tasks like plagiarism detection, genre separation, and author verification. This paper leverages stylometric analysis to identify different writing styles and improve plagiarism detection methods.
Methodology:
The system includes data collection, preprocessing, feature extraction, dimensional reduction, machine learning models for clustering, and performance comparison using silhouette scores. Feature extraction focuses on lexical features, vocabulary richness, and readability scores. The study uses a small dataset of texts from various authors and employs algorithms like k-means, k-means++, hierarchical clustering, and DBSCAN for clustering.
Results:
Experiments show that the system effectively identifies writing styles, with silhouette scores indicating reasonable to strong clustering when k=2. As the number of clusters increases, the silhouette scores decrease, indicating a drop in accuracy. K-means and k-means++ perform similarly, while hierarchical clustering is less optimized.
Conclusion and Future Work:
The system works well for distinguishing writing styles with two clusters but becomes less accurate as the number of clusters increases. Future research could focus on adding more parameters and optimizing the methodology to improve accuracy with higher cluster values. This system can enhance existing plagiarism detection tools, especially in academic settings.