1. The document discusses creating a machine learning model to predict beauty by analyzing photos and their aesthetic ratings. It outlines collecting a dataset of labeled images, developing a convolutional neural network model, and potential applications of predicting beauty such as improving online reviews and product personalization.
2. The presenter acknowledges limitations of the initial model due to the small dataset and discusses challenges such as overfitting. Potential uses of CNN models beyond beauty prediction are also mentioned, such as image recognition, text translation and game playing.
3. Key takeaways are that high quality training data is needed to build accurate models and that machine learning requires creativity in its applications. The presentation encourages exploring model development and potential uses.
27. OUR DATASET
27
AADB Dataset:
- 10,000 photographic Flickr images with Creative Commons
license
- Each image has been labelled by 5 participants by aesthetic merit
- These participants were Amazon employees paid for their time.
34. ASSUMPTIONS
34
1. The model will attempt to predict the perceived level of beauty of
an image as the participants themselves would rate it.
2. Due to the low number of images, the model will likely overfit the
images that are used for training and isn’t expected to abstract
well.
3. We will be grouping together images similarly rated with each
other under the assumption that they represent about the same
level of aesthetics.
4. Images rated a 3 likely do not have a strong influence on the
participant and thus are not considered to have important
features.
35. HOW TO IMPLEMENT
35
CNN Knowledge
Required
Method
Parameter
Control
Ability to Brag at
Parties
Chance of
Overwriting Hard
Drive
HIGHMEDIUMLOW
LOW MEDIUM HIGH
MEDIUM HIGH VERY HIGH
VERY LOW LOW HIGH
36. WHY SHOW A BAD MODEL?
36
Why 👏 are 👏 you 👏 wasting 👏 our 👏 time 👏 ?
55. THANKS!
You can find me at
github.com/fergusonrae
linkedin.com/in/fergusonrae
fergusonrae@outlook.com
55
56. REFERENCED
● AADB Dataset
○ Information
○ Data
● Extroversion Study
● Awesome Rating Model Video
● Yelp and Machine Learning
● Importance of Aesthetics in:
○ Car Company Loyalty
○ Phone Sales
○ Socially Sharing Products Bought
○ Increase in Sales After Personalization
56
57. VISUAL CITATIONS
● Presentation template by SlidesCarnival
● Machine Learning Images
○ CNN Filtering - all
○ MNIST dataset
○ Checkers
● Predicting Beauty
○ Hummingbird
○ Boats
○ Crowd of Stick Figures
○ Dabbing Figure
○ Happy Man
○ Happy Woman
○ Bubble Freezing
○ Black and White House
○ Wedding
○ Man and Son
57
58. VISUAL CITATIONS
● Predicting Beauty
○ Surfing Business Man
● Potential
○ Road with Hills
○ City Center
○ Selfie 1
○ Selfie 2
○ Selfie 3
○ Bag 1
○ Bag 2
○ Xray
○ Recipes
○ Portrait of Edmond Belamy
○ Parameter Tuning
58
59. RESOURCES
● TensorFlow Transfer Learning
● Andrew Ng’s Coursera Video Series
● Chris Olah’s Blog
● Freely Available Datasets for Exploration
59
Editor's Notes
goal of this talk was to apply ml to a creative topic, dive in to how to do it, and put the tools in your hands to explore more and push the boundaries of what machine learning is and what it applies to, and open about some really creative useful things that you can do.
Brief overview of the different pieces we’ll explore. Starts with what steps are required when setting up a machine learning project. This part has probably been repeated a few times so far, so I’ll be fast. Then, going to dig in to what makes a convolutional neural network different from other models.
Next, we’ll begin setting up our beauty classification problem. This is the part I have made some tweaks to since writing the talk abstract due to time constraints and an increased focus on the important aspects of modeling in the future. So, this will be a bit different.
Finally, we’ll walk through some interesting applications that can be done using modeling and beauty algorithms in general.
Just the basics. We’ll add more info and background and we move along. Also, I noticed there were quite a few ml geared talks today so I’m sure ya’ll are filled up
This flow is probably recognizable to most of you.
Walk through the steps.
Most important step is understanding your data going in.
We are going to discuss these again and again and again. Because these are currently the most critical parts.
For the most part, every other step has a defined list of things you do. Follow the steps, and your model will turn out the best it can. Which means it can be automated.
However, data selection is considered an art form and requires special human care. And it is the easiest thing to mess up.
Most important part of machine learning and the central theme to this talk
MAKE SURE YOU HAVE CLEAN DATA
If your model is going wrong, first thought should always be in regards to your data
Garbage in, garbage out
Your model means nothing if you’re feeding it biased, inaccurate, and/or too little good data
Sounds like a simple thing, but clean data is actually really hard to come by. You can pretty much guarantee that any dataset that is touched or filled in by a human will need to be examined closely because there will be errors. We’re not machines.
I know this talk is about modeling, but always remember that the most important part of this is cleaning your data, removing anomalies if needed, normalizing, the whole shebang. That is where most of your time will be spent.
For image recognition, we’re going to focus on convolutional neural networks. These are currently the best option when implementing a model that takes images and inputs.
So CNNs are a type of neural network, like their name suggests.
What makes CNNs distinct as a particular kind of NN is that they
Explicitly take images for inputs
Are better optimized for handling images than plain neural networks are with the introduction of filters
So what makes a CNN optimized for handling images? It has preprocessing in the form of filters!
In this case, preprocessing the input image data means taking the images and turning them into data that the computer can recognize. So taking an image and running a filter.
So let’s start with a black and white image so we don’t have to deal with three dimensions. Guess who that is?
Example of a filter. This is how we are going to pull out and create data from the image.
The initial filters for an image are usually very simple. Lines, slight curve, etc.
Now we’re using a mouse because it had more readily available images.
What it gets to eventually. Can classify based on whether certain features are contained within an image.
Common one that you would have likely come across if you did research into trying to train your own CNN
Countless tutorials because it’s such an interesting dataset
Wonderful image right?
CNNs have been trained to learn what a checkerboard looks like, and what it needs to look like for a winning strategy
First thought might be cool, doens’t this seem more complicated than just creating a more virtual checkboard with code and programming in logic?
Valid point. Yes and no. Yes as in that might be more efficent and require less computer power. But no as in, we don’t need to represent it in a way the computer can understand this way. It learns how to understand it itself. There’s no extra processing steps. Also, a human can play a physical checkers game with a machine. You don’t have to play it online, where the computer is comfortable. No input is needed except for the input also needed for a human.
Which brings me to my next point. Who cares if it’s easier because you don’t have to change inputs, if you still have to know all this stuff about the model and code it into existence, and train it and tune it and test it?
In this section, we’re going to start introducing what are considered abstract-concepts for humans. We’ll start the process of creating a model that takes images as data and outputs a classification of beautiful or ugly and work through how it is we can do that.
We’re going to start off in the most exciting way possible, with a definition!
It’s important to know exactly what we want to predict so we know if we hit it or not.
Here, we’re going to extend that definition to also include aesthetically pleasing.
So this is how a human can define beauty. But, if you send this definition into the computer, it’s not going to know what in the world you are talking about. What does that even mean. Heck, even us humans aren’t quite sure exactly what beauty represents. There isn’t a quantifiable thing that you can reference.
How could an individual define beauty outside of abstract words and lots of hand waving. Well, the same way you describe concepts to a child, you point out examples. This is a dog, this is not. This might be a dog? And computers like examples.
So, the goal is to find a dataset that has images classified into beautiful and not beautiful. So we want labeled data. Cool, straightforward right?
This is where we start to slip down that abstract slide. Because while there tends to be a general consensus as to beauty, people are not the exact same when it comes to it.
Those photos aren’t the total definition of beauty. They are my definition of beauty. For instance, does anyone actually think that this frog is beautiful?
And this specification of “my” is how we can attempt to get around the complexity of everyone having a different notion of beauty.
Now, because a large amount of data is needed to try to encompass all aspects of an abstract concept like beauty, and I don’t want to personally label a million images, so for this model, we’re going to instead find a good dataset that can represent consensus data.
NIMA model does this
Now even though we’re just beginning, there’s some red flags here. And it’s important to be critical at every step, most importantly selecting data.
First, 10,000 images were used. At a glance, that seems like a lot, but remember that we are trying to quantify beauty. Could 10,000 labeled stills represent the entirety of human beauty? Questionable.
So, worst offender here is that it is based on 5 people’s opinions. That is too small to be statistically significant for really any group of people outside of these 5. So off the bat, we need to change the objective of this model.
Also, we don’t really know anything about these 5 people. We don’t know their background. We don’t know their state of mind when answering their questions. We don’t know their personalities.
Extroverted people rate things higher and have been shown to mentally react to stimuli more intensely. So, a 4 to an extrovert is likely not the same thing as a 4 to an introvert.
With this in mind, let's take a closer look at how these individuals rated their images.
Example of what screen the participants saw. Hard to see, but at the top they are rating the level of aesthetic and along the sides they are indicating if a photo has contains any of the listed “negative” and “positive” effects.
So, more red flags.
2nd, bias built in. Attributes are labeled negative and positive. This assumes that all people view the rule of thirds as aesthetically pleasing and not having an object emphasis as not aesthetically pleasing.
Also, how are the 5 participants checking for these attributes. What does it mean for an image to have “color harmony” versus not. We’re adding subjective features to our subjective label.
So, we have all these issues right now. Okay, first instinct, we probably shouldn’t use this dataset due to the amount of bias and subjectivity baked in.
But I did use it. Which brings me to:
I went with this dataset because it’s free to use/legal to use, has documentation about where the data came from, and I had difficulty finding any other good datasets.
So, with that in mind, I continued with the data and tried to make it better where I could.
For example, I did not pull in any of these negative/positive attributes features into my model to eliminate those biases. Rather I used the straight aesthetic score as a measure of level of “beauty” which from now out should actually be called “aesthetic” because that is what is called on the dataset and we can’t say that there is a one-to-one correlation there. Also, this model is assumed to only predict what level of aesthetic these 5 individuals would rate the image.
And really, if you had a couple months to focus in there is other things you could do. Instead of averaging responses to images, take into account if the image is polarizing or not, etc. But at the end of the day, it’s still just 5 people. Not really a consensus model
So now that we have the data clean and we’ve covered our assumptions.
So a CNN is a classification algorithm. So it puts things in classes. But our data is set up to have discrete ordinal values. That is they have a defined ranking. So a 3 is more like a 4 than a 1. So it’s not really it’s own class.
So another adjustment that will be made is to combine values 4 and 5 under the class beautiful. And values 1 and 2 under the class ugly. We will strip photos with a value of 3 from the dataset under the idea that they don’t provide useful data to the model. Another assumption, but it also addresses the possibility of people rating things slightly different. Turning this into a binary classification. An issue with this is that it whittles down our dataset to about 8,000 images.
So here are some of the ways we can now create a model based on our dataset. In particular, I want you to take notice of this CNN knowledge required section. And take notice of the fact that there is a low section.
Over the past year alone, modeling has become incredibly more accessible. Services like AutoML and TensorFlow 2.0 can allow you to train, test, and put into production a model without really needing to know anything about tuning or testing. And this isn’t counting the tons of start ups.
Because of this increasing accessibility for non-devs and abstraction away of things like testing and tuning, the most important part of your model is the data cleaning because this is the part that is hardest to be automated in the future.
After going through the previous steps, it should be apparent that this is not a solid model.
But, I do have a working classification example of the Medium TensorFlow on my GitHub linked in the references. Feel free to try it out if you feel like seeing what 5 random strangers would likely think about your selfie. I gotta say from experience though, they are rather harsh.
I would now go on to testing and tuning, but we’ve already established this is not a good model and needs a lot of work. And, if testing came back as great, that should only increase suspicion. So really, this model is a dud, shouldn’t be used to make any real life decisions, and needs a lot of work. Entertainment value only. But no worries, we’ve already worked through the most important aspects.
So, why walk through this setup if the model is bad and falls apart?
One, because I want to normalize this experience as a part of machine learning. It’s easy to see all these great, fancy models in production and being talked about, but you often don’t get to see the background. A lot of the time, your model is not going to work. There won’t be enough data. The data that exists is incomplete or biased. Or, you have fantastic perfect data, and the model just doesn’t find anything interesting. It doesn’t latch on to anything.
Two, I want to show how critical you need to be about the data going into your model. How sensitive you need to be to recognizing bias.. You gotta read the docs on where the data came from and ask yourself if what is happening behind the scenes is okay. Because once you put biased data in, it doesn’t matter what fancy algorithm you choose or how much time you spend tuning and adjusting, it’s going to be a biased model. They’re only as smart as what you put in.
And three, show how much time is spent in this aspect. Actually training, testing, tuning the model? Small potatoes.
One reason I switched this part of my talk out, is that I have worked on several data science projects with incredibly smart, math phds, super intelligent people with extremely sofistigated models. That, sadly, fell apart in production because they didn’t focus on this step. They took data accumulated by data engineers, and immediately created a model around it without questioning anything about it. This is the most important part.
Back at this again.
If you come away from this talk with anything, please make sure it is this. I think it is the most important/fundamental thing.
That being said, there are currently existing models that are based on good data. Things like Google’s NIMA, currently being used to predict aesthetic quality of photographs.
Now that we’ve seen two poor examples of models. How about an example of a great one?
And, what’s on the horizon for future applications
Well, here we have 2015 Yelp. Like a lot of search sites, when you search for a restaurant images taken by members pop up and are one of the first things a users eyes are drawn to.
The images shown are ranked. Usually by quantifiable statistics like amount of views of a picture. So the more time people click on it, the faster it goes to the most readily viewable image.
So, with these top pictures in mind, how do you feel about dining at the Country Way restaraunt? Looks appetizing right?
An issue they had is
1. Users would click more often on fuzzy or blurry images to try to see them in a higher resolution. These clicks were falsely assumed to be because they liked the image.
2. Reinforcement bias. The pictures most clicked on were, suprise, the pictures that were already at the top because they were viewable directly on the page.
Real life application of a CNN trained on not only the images themselves, but also additional data gleaned from them like aspect ratios.
Much tastier right?
Now the photos are ranked on aesthetic merit versus clicks.
So with this example of a good model, what can we hope to look to in the future when it comes to aesthetic predictions?
Most scenic route to work. Has to know what you find scenic and be able to classify points of your route. Maybe you like the look of tight bussling cities. Or maybe you like rolling hills. Maye you just hate the color orange and you don’t want to see it anywhere on your trip
Best selfie
Found that people are more likely to be loyal to a company if they value their aesthetics than if they like the actual attributes of the cars. And then they found this again when it came to phone sales.
With this model, as consumers interact with our site, we can get valuable information about what type of images they prefer to look at.
When it comes to personalized CNNs, it’s pretty exciting stuff. If you can collect enough data on your opinions or thoughts on images, you can pretty much automate a lot of your life. Or, increase the visual appeal of it.
Airport security measures. Went through the airport recently and see that there’s an individual having to look at each one.
Suffer from fatigue.
Classify types of skin rashes, xrays. This is starting to be implemented, but hasn’t totally caught on.
Let’s get even more general. How about machine learning in general. Including not just visual CNNs, but NNs, regressions, clustering?
Absolutely bonkers.
Machine learning really is revolutionary
Really think of any decision you make. It can probably be turned into a model that will make the decision for you and, given enough information, likely will make it better.
Recipes, making the perfect recipe.
Portrait of Edmond Belamy
December of this last year.
First AI art auctioned at an auction house. Fetched $425,000. Uses generative adversarial network, which is a model that learns what something is like a cnn would, but then tries to create it’s own images and trick itself into thinking they’re real.
Now we’re getting into concepts like can machines been artists? Creativity is considered a human pursuit, what makes us us. But if a machine can learn what that subjective concept is, what is stopping a machine from having creativity?
Tune and test itself
Clean and find its own data. That whole thing about you making the decision as to what is good and bad data? People are starting to look into how to automate that as well.
Classifying aesthetics
One of those terms that mean something different to everyone. Define that it’s the artistic movement, not the meme. Also, is currently used as a synonym for fashion style but it’s not strictly that either. Also not really an art style? Can combine a bunch of senses at once and isn’t necessarily human made.
Most important part of machine learning
MAKE SURE YOU HAVE CLEAN DATA
If your model is going wrong, first thought should always be in regards to your data
Garbage in, garbage out
Your model means nothing if you’re feeding it biased, inaccurate, and/or too little good data
Sounds like a simple thing, but clean data is actually really hard to come by. You can pretty much guarantee that any dataset that is touched or filled in by a human will need to be examined closely because there will be errors. We’re not machines.
I know this talk is about modeling, but always remember that the most important part of this is cleaning your data, removing anomalies if needed, normalizing, the whole shebang. That is where most of your time will be spent.
This includes non-devs. We’re right at the point where you don’t really need to be able to code to be able to model. And it’s only going to get easier from here on out. As long as you can be skeptical with data going in and results coming out, you’re golden.
There’s so much to discover. The goal of this talk was to apply ml to a creative topic, dive in to how to do it, and put the tools in your hands to explore more.
Maybe you don’t want to predict beauty, but you want to create a model that
______…
For fun, think about decisions you make throughout the day
Do it. Make silly models. Try your hand at automating possible decisions in your life.