A presentation by UVA Data Science Institute 2019-20 Presidential Fellow in Data Science Tianlu Wang, at the 2019 Tom Tom Applied Machine Learning Conference in Charlottesville, VA. Learn more at datascience.virginia.edu.
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
Presentation given by Evan Estola (Meetup, New-York) at the Big Data & Society conference held at RTBF on 13 Dec 2016. Main topic was the design of recommendation algorithms and how to make them ethical
My presentation on PyCon Ireland "Python for Computer Vision" gives an introduction into deep learning and computer vision and a list of references for learning more.
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
Presentation given by Evan Estola (Meetup, New-York) at the Big Data & Society conference held at RTBF on 13 Dec 2016. Main topic was the design of recommendation algorithms and how to make them ethical
My presentation on PyCon Ireland "Python for Computer Vision" gives an introduction into deep learning and computer vision and a list of references for learning more.
Connecting citizens with public data to drive policy changeMelissa Moody
UVA Data Science Institute Master of Science in Data Science researchers Lucas Beane and Elena Gillis undertook a capstone project to investigate possible reasons for the stagnation of the Charlottesville Open Data Portal.
Data Collection Methods for Building a Free Response Training SimulationMelissa Moody
Master of Science in Data Science capstone project researchers Vaibhav Sharma, Beni Shpringer, and Michael Yang, along with UVA School of Engineering M.S. student Martin Bolger and Ph.D. students Sodiq Adewole and Erfaneh Gharavi, sought to develop new methods for collecting, generating, and labeling data to aid in the creation of educational, free-input dialogue simulations.
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
Researchers Navin Kasa, Andrew Dahbura, and Charishma Ravoori undertook a capstone project—part of the UVA Data Science Institute Master of Science in Data Science program—that addresses credit card fraud detection through a semi-supervised approach, in which clusters of account profiles are created and used for modeling classifiers.
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Melissa Moody
UVA Data Science Institute Master of Science in Data Science students Sean Mullane, Ruoyan Chen and Sri Vaishnavi Vemulapalli were motivated to apply data science tools and techniques to the problem, and see if protein structures can be quantitatively described, compared and otherwise analyzed in a more robust, efficient and automated manner. Potential applications include more effectively designed drugs to inhibit disease-related proteins, or even newly engineered ones.
The researchers received the award for Best Paper in the Data Science for Health category at the 2019 Systems & Information Design Symposium (SIEDS) meeting. Their project, "Machine Learning for Classification of Protein Helix Capping Motifs," focused on small segments of a protein called secondary structural elements. These structural elements are the basic molecular-scale building blocks that all proteins—and therefore life—build upon.
Automatic detection of online abuse and analysis of problematic users in wiki...Melissa Moody
For their 2019 capstone project, DSI Master of Science in Data Science students Charu Rawat, Arnab Sarkar, and Sameer Singh proposed a framework to understand and detect such abuse in the English Wikipedia community.
Rawat, Sarkar, and Singh received the award for Best Paper in the Data Science for Society category at the 2019 Systems & Information Design Symposium (SIEDS). In "Automatic Detection of Online Abuse and Analysis of Problematic Users in Wikipedia," the team presented an analysis of user misconduct in Wikipedia and a system for the automated early detection of inappropriate behavior.
Plans for the University of Virginia School of Data ScienceMelissa Moody
The University of Virginia, through the largest gift in the University’s history, has the opportunity to play a national and international leadership role in data science training, research, and service by expanding the already successful Data Science Institute (DSI) to become a School of Data Science (SDS). When first presented to then President-elect James Ryan, he pointed out that a gift alone does not make a school. Particular concerns were sustainability and the impact on other schools of the University. Throughout 2018 and early 2019, we have crafted a proposal for the SDS that is financially and academically sustainable and that works in concert with all schools to enrich every student’s experience at a time when our society is increasingly data driven.
A presentation by UVA Data Science Institute MSDS 2019 students Charu Rawat, Arnab Sarkar, and Sameer Singh, advised by DSI professor Raf Alvarado and researcher Lane Rasberry, at the 2019 Tom Tom Applied Machine Learning Conference in Charlottesville, VA.
Collective Biographies of Women: A Deep Learning Approach to Paragraph Annota...Melissa Moody
A presentation by UVA Data Science Institute MSDS 2019 students Sakshi Jawarani, Murugesan Ramakrishnan, and Varshini Sriram, advised by MSDS Program Director and professor Rafael Alvarado, at the 2019 Tom Tom Applied Machine Learning Conference in Charlottesville, VA.
Ethical Priniciples for the All Data RevolutionMelissa Moody
A presentation by Stephanie Shipp, from the Research Highlights session at the 2019 Women in Data Science Charlottesville Conference. Hosted by the UVA Data Science Institute.
Assessing the reproducibility of DNA microarray studiesMelissa Moody
A presentation by Eva Lancaster, from the Research Highlights session at the 2019 Women in Data Science Charlottesville Conference. Hosted by the UVA Data Science Institute.
Modeling the Impact of R & Python Packages: Dependency and Contributor NetworksMelissa Moody
A presentation by Gizem Korkmaz, from the Research Highlights session at the 2019 Women in Data Science Charlottesville Conference. Hosted by the UVA Data Science Institute.
How to Beat the House: Predicting Football Results with Hyperparameter Optimi...Melissa Moody
UVA Data Science Institute MSDS student Abhimanyu Roy ('18) presented a talk at the 2018 Tom Tom Applied Machine Learning Conference in Charlottesville, Va. His presentation highlights how data science can be used to predict results in sporting events.
Learn more about Abhimanyu at https://dsi.virginia.edu/people/abhimanyu-roy.
A Modified K-Means Clustering Approach to Redrawing US Congressional DistrictsMelissa Moody
UVA Data Science Institute MSDS student Jack Prominski ('18) presented a talk at the 2018 Tom Tom Applied Machine Learning Conference in Charlottesville, Va. His talk highlights how data science can create a more equitable redistricting process.
Learn more about Jack at https://dsi.virginia.edu/people/jack-prominski.
Joining Separate Paradigms: Text Mining & Deep Neural Networks to Character...Melissa Moody
UVA Data Science Institute MSDS students Caitlin Dreisbach ('18), Morgan Wall ('18), and Ali Zaidi ('18) presented a talk based on their capstone research project, part of the MSDS program, at the 2018 Tom Tom Applied Machine Learning Conference in Charlottesville, Va.
Learn more about the project at https://dsi.virginia.edu/projects/connecting-mind-and-body.
Connecting citizens with public data to drive policy changeMelissa Moody
UVA Data Science Institute Master of Science in Data Science researchers Lucas Beane and Elena Gillis undertook a capstone project to investigate possible reasons for the stagnation of the Charlottesville Open Data Portal.
Data Collection Methods for Building a Free Response Training SimulationMelissa Moody
Master of Science in Data Science capstone project researchers Vaibhav Sharma, Beni Shpringer, and Michael Yang, along with UVA School of Engineering M.S. student Martin Bolger and Ph.D. students Sodiq Adewole and Erfaneh Gharavi, sought to develop new methods for collecting, generating, and labeling data to aid in the creation of educational, free-input dialogue simulations.
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
Researchers Navin Kasa, Andrew Dahbura, and Charishma Ravoori undertook a capstone project—part of the UVA Data Science Institute Master of Science in Data Science program—that addresses credit card fraud detection through a semi-supervised approach, in which clusters of account profiles are created and used for modeling classifiers.
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Melissa Moody
UVA Data Science Institute Master of Science in Data Science students Sean Mullane, Ruoyan Chen and Sri Vaishnavi Vemulapalli were motivated to apply data science tools and techniques to the problem, and see if protein structures can be quantitatively described, compared and otherwise analyzed in a more robust, efficient and automated manner. Potential applications include more effectively designed drugs to inhibit disease-related proteins, or even newly engineered ones.
The researchers received the award for Best Paper in the Data Science for Health category at the 2019 Systems & Information Design Symposium (SIEDS) meeting. Their project, "Machine Learning for Classification of Protein Helix Capping Motifs," focused on small segments of a protein called secondary structural elements. These structural elements are the basic molecular-scale building blocks that all proteins—and therefore life—build upon.
Automatic detection of online abuse and analysis of problematic users in wiki...Melissa Moody
For their 2019 capstone project, DSI Master of Science in Data Science students Charu Rawat, Arnab Sarkar, and Sameer Singh proposed a framework to understand and detect such abuse in the English Wikipedia community.
Rawat, Sarkar, and Singh received the award for Best Paper in the Data Science for Society category at the 2019 Systems & Information Design Symposium (SIEDS). In "Automatic Detection of Online Abuse and Analysis of Problematic Users in Wikipedia," the team presented an analysis of user misconduct in Wikipedia and a system for the automated early detection of inappropriate behavior.
Plans for the University of Virginia School of Data ScienceMelissa Moody
The University of Virginia, through the largest gift in the University’s history, has the opportunity to play a national and international leadership role in data science training, research, and service by expanding the already successful Data Science Institute (DSI) to become a School of Data Science (SDS). When first presented to then President-elect James Ryan, he pointed out that a gift alone does not make a school. Particular concerns were sustainability and the impact on other schools of the University. Throughout 2018 and early 2019, we have crafted a proposal for the SDS that is financially and academically sustainable and that works in concert with all schools to enrich every student’s experience at a time when our society is increasingly data driven.
A presentation by UVA Data Science Institute MSDS 2019 students Charu Rawat, Arnab Sarkar, and Sameer Singh, advised by DSI professor Raf Alvarado and researcher Lane Rasberry, at the 2019 Tom Tom Applied Machine Learning Conference in Charlottesville, VA.
Collective Biographies of Women: A Deep Learning Approach to Paragraph Annota...Melissa Moody
A presentation by UVA Data Science Institute MSDS 2019 students Sakshi Jawarani, Murugesan Ramakrishnan, and Varshini Sriram, advised by MSDS Program Director and professor Rafael Alvarado, at the 2019 Tom Tom Applied Machine Learning Conference in Charlottesville, VA.
Ethical Priniciples for the All Data RevolutionMelissa Moody
A presentation by Stephanie Shipp, from the Research Highlights session at the 2019 Women in Data Science Charlottesville Conference. Hosted by the UVA Data Science Institute.
Assessing the reproducibility of DNA microarray studiesMelissa Moody
A presentation by Eva Lancaster, from the Research Highlights session at the 2019 Women in Data Science Charlottesville Conference. Hosted by the UVA Data Science Institute.
Modeling the Impact of R & Python Packages: Dependency and Contributor NetworksMelissa Moody
A presentation by Gizem Korkmaz, from the Research Highlights session at the 2019 Women in Data Science Charlottesville Conference. Hosted by the UVA Data Science Institute.
How to Beat the House: Predicting Football Results with Hyperparameter Optimi...Melissa Moody
UVA Data Science Institute MSDS student Abhimanyu Roy ('18) presented a talk at the 2018 Tom Tom Applied Machine Learning Conference in Charlottesville, Va. His presentation highlights how data science can be used to predict results in sporting events.
Learn more about Abhimanyu at https://dsi.virginia.edu/people/abhimanyu-roy.
A Modified K-Means Clustering Approach to Redrawing US Congressional DistrictsMelissa Moody
UVA Data Science Institute MSDS student Jack Prominski ('18) presented a talk at the 2018 Tom Tom Applied Machine Learning Conference in Charlottesville, Va. His talk highlights how data science can create a more equitable redistricting process.
Learn more about Jack at https://dsi.virginia.edu/people/jack-prominski.
Joining Separate Paradigms: Text Mining & Deep Neural Networks to Character...Melissa Moody
UVA Data Science Institute MSDS students Caitlin Dreisbach ('18), Morgan Wall ('18), and Ali Zaidi ('18) presented a talk based on their capstone research project, part of the MSDS program, at the 2018 Tom Tom Applied Machine Learning Conference in Charlottesville, Va.
Learn more about the project at https://dsi.virginia.edu/projects/connecting-mind-and-body.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations
1. Balanced Datasets Are Not Enough:
Estimating and Mitigating Gender Bias in
Deep Image Representations
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Tianlu Wang
University of Virginia
2. Gender Bias in Visual Recognition Systems
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Deep Neural
Network
3. Gender Bias in Visual Recognition Systems
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Trained Deep
Neural
Network
tie:
4. Quantifying Bias: Leakage
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Trained Deep
Neural Network
Is this prediction
biased?
5. Quantifying Bias: Model Leakage
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Gender
Classifier
Model Leakage: gender prediction accuracy of a classifier trained on predictions.
man
woman
6. Object & Action Recognition Models
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
COCO Object Recognition
• 22k images (16k man & 6k woman)
• 80 objects (kite, ski, handbag, tie…)
• Recognition performance (F1): 53.75%
• model leakage: 70.46%
imSitu Action Recognition
• 24k images (14k man & 10k woman)
• 211 activities (cooking, shooting, lifting…)
• Recognition performance (F1): 40.11%
• model leakage: 76.93%
7. Quantifying Bias: Dataset Leakage
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Gender
Classifier
Dataset Leakage: gender prediction accuracy of a classifier trained on annotations.
Predictions
Ground Truth
Labels
man
woman
Does the model inherit 100% dataset leakage?
8. Performance Matters!
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Predictions
Ground Truth
Labels Predictions
F1 score = 100%
Dataset Leakage = 67.72%
F1 score = 53.75%
Model Leakage = 70.46%
Random Guess
F1 score ≈ 0
NO LEAKGE!
9. Random
Perturbation
Perturbed
Labels
match the
performance
Quantifying Bias: Adjusted Dataset Leakage
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Gender
Classifier
man
woman
Adjusted Dataset Leakage:
gender prediction accuracy of a classifier trained on perturbed annotations.
Ground Truth
Labels
10. Quantifying Bias: Bias Amplification
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Δ = Model Leakage – Adjusted Dataset Leakage > 0 Bias Amplification!
52
56
60
64
68
72
Model Leakage Adjusted Dataset
Leakage
COCO Object Recognition
50
55
60
65
70
75
80
Model Leakage Adjusted Dataset
Leakage
imSitu Action Recognition
20.47
9.93
11. Eliminating Bias: Adding Noise
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
35
40
45
50
55
2 4 6 8 10
F1Score(%)
Bias Amplification in COCO
original
randomization
12. Eliminating Bias: Balanced Datasets
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
man
71%
woman
29%
man
68%
woman
38% man
50%
woman
50%
Original
F1 score (%): 53.75
model leakage (%): 70.46
Balanced 3
F1 score (%): 52.60
model leakage (%): 67.78
Balanced 1
F1 score (%): 42.89
model leakage (%): 63.22
man
57%
woman
43%
Balanced 2
F1 score (%): 51.95
model leakage (%): 64.45
less images, lower performance, lower model leakage
13. Eliminating Bias: Balanced Datasets
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
35
40
45
50
55
2 4 6 8 10
F1Score(%)
Bias Amplification in COCO
original
randomization
balanced 3
balanced 2
balanced 1Balancing the co-occurance of gender and target labels
does not reduce bias amplification.
14. Eliminating Bias: Using Extra Annotations
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
blur-segmoriginal blackout-face blackout-segm blackout-box
15. Eliminating Bias: Using Extra Annotations
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
35
40
45
50
55
2 4 6 8 10
F1Score(%)
Bias Amplification in COCO
original
randomization
balanced 3
balanced 2
balanced 1
blackout-face
blur-segm
blackout-segm
blackout-box
16. Eliminating Bias: Adversarial Training
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Convolutional
Neural
Network
(Resnet-50)
Fully-
connected
Layer
+
Logistic
Regressors
Handbag
Fork
Vase
Spoon
…
Knife
Car
Oven
Gender
Classifier
man
woman
Gradient Reversal
17. Eliminating Bias: Adversarial Training
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
35
40
45
50
55
2 4 6 8 10
F1Score(%)
Bias Amplification in COCO
original
randomization
balanced 3
balanced 2
balanced 1
blackout-face
blur-segm
blackout-segm
blackout-box
adv @ image
adv@conv4
adv @ conv5
18. Visualization of Adversarial Training
Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Convolutional
Neural
Network
(Resnet-50)
Fully-
connected
Layer
+
Logistic
Regressors
Handbag
Fork
Vase
Spoon
…
Knife
Car
Oven
Gender
Classifier
man
woman
Gradient Reversal
X
Mask Prediction
19. Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Adversarial Training: Removing Face Area
20. Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Adversarial Training: Removing Face and Skin
21. Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Adversarial Training: Removing Entire Person
22. Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Adversarial Training:
Removing Contextual Cues
23. Tianlu Wang-Gender Bias in Deep Image Representation-Applied Machine Learning Conference, Tom Tom Fest
Editor's Notes
During training, we feed images containing tie and man into the model.
At test time, the model is not able to recognize “tie” when there is a woman in the image.
Gender information leaked through the predictions which are generated by the model.
With an accuracy 70.46%, you can tell the gender of the person in the image correctly, just from the prediction.
The model leaks gender information may because the dataset is not balanced.
Instead of using predictions, we use ground truth labels to train the gender classifier.
Gender information revealed by ground truth annotations (dataset).
Perturbed labels have the same F1 score as the model, introduce some randomness which may reduce the bias
Gender revealed by perturbed ground truth labels at different levels of accuracy.
Or: Gender leakage of a model with certain accuracy, whose errors are due entirely to chance.
compare model leakage and adjusted dataset leakage, they have same performance.
Imaging we have an ideal model which has the same performance as our model but make mistakes entirely due to chance, not systematic bias.