Andreas Weigend (


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Andreas Weigend (

  1. 1. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Andreas Weigend ( Data Mining and Electronic Business: The Social Data Revolution STATS 252 April 13, 2009 Class 2 Ecosystems: (Part 1 of 2) This transcript: Corresponding audio file: Next Transcript: (Part 2 of 2): To see the whole series: Containing folder: Transcript by Tamara Bentzur, Page 1
  2. 2. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Andreas: STATS 252, Spring 2009 lecture 2. Welcome to the class. Here is the agenda for today. I will start walking you through our wiki, specifically, through the logistics. Then, we will talk and recap what we did last time, which was the PHAME framework. After that, we have Linus Young, who was a student in class three years ago. He then became a hotty by becoming the most popular Facebook application for a while. He then took what he knew from Facebook and became a mafia guy by running the most popular iPhone app for a while. He is now going to turn to babies, but we will talk about babies at the end of the first part. In the second half, we will have the founder of RockYou, [0:00:51.7 unclear], come to class and he will talk about ecosystems and platform. At the end of class I will tell you what we’re going to do during the rest of the quarter. Somewhere in between, I will sneak in a discussion of the first homework and tell you about what makes good metrics good metrics. That’s the agenda for today. Are there any questions? If not, then let’s quickly run through the wiki. As all of you know, you should all have editing privileges on the course wiki. What I will talk about in it is this is actually a wiki, which means it relies on all of you contributing to it. There is another page that is the page I edit on, which is the Stanford teaching description. I will start uploading the mp3 files after each class, starting probably tomorrow. There will be mp3s of each class, for people who either miss the class or have nothing better to do while they are taking airplanes or driving can actually listen to them. What is this class about? I will talk about this in the last part of today’s class. That will be filled in, in real time. Who is in the class? Most of you have actually found the Ning social network that we created for class. I will ask you to please upload your pictures so I know how to associate the names with the pictures. I do give a grade for class participation of 35%. I don’t want to know just the CS guy in the back, but what corresponds to that picture so I can do a fair job of grading. 0:02:39.0 The communication goes via email to me, and I’ve set up communication paths to the TAs. Don’t use Ning for mission critical communication. Use Ning if you want to get to know each other, if you want to talk about your stuff, and also there is a page on Stanford 2009, which by mistake the TAs created the Student Summary. I’ve locked this page because I don’t want you to have to enter the information multiple times. What I ask you to do is to take that information, and whatever you’re comfortable with, everything is public, to put it out on the Ning network. I may unlock it again, but don’t add stuff on this page; add it on Ning. Back to the boring logistics here. Grading policy – homework is 60%. Contribution to the wiki is 30%. Class participation is 5%. Contributions elsewhere, like the Facebook group, is also 5%. Transcript by Tamara Bentzur, Page 2
  3. 3. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Most of you have figured out that the homework submission email was actually broken. The TA didn’t get around to creating it. I created it Sunday night. If people had problems, I apologize for this. This is, where you submit homework so we have a clear record of when it was submitted. When you need help, hit up the TAs, which is one email address for all of them,, which all of them share and they will grab questions you have and try to answer them. In the last class, you met Enrique Allen, our social media TA, who has been more than pulling his weight on the first homework. He really rocks. I met with all my 6 teaching people on Saturday at my house in San Francisco, so most of the stuff you see, he built. We have [0:04:42.9 unclear], who is actually grading your first homework. We have another grader, Ryan Mason, who took the class last year and is not working at 23andMe. I told you about the Facebook page. We started as a group and moved it over to a Facebook page, which is either, or That is also where Matt is going to help me build the dashboard for the various metrics you have been creating, which we will see on a daily update, for the various groups here and at Berkeley, how you are doing on the homework. I just wanted to make sure I take the five minutes to once walk you through this. If you have questions about this, if you see typos or if you see typos in the course description or if you think something is missing, email me and tell me, “Andreas, you got this wrong.” 0:05:33.4 All right, time to get to content. I want to start by recapping from last class, the PHAME framework. PHAME stands for problems, hypotheses, actions, metrics, and experiments. One question is can you actually see this? Is this large enough a font for people to read? Okay, good. I will give my file, as a habit, at the end of class to one of the people who is responsible for making this wiki. The wiki is viewed three days after class. That would be Thursday evening. That way, you have all the notes I wrote and you just wikify them and add stuff that I may say but didn’t put in the notes. One person who is responsible for this week’s wiki should come up to me afterwards and say, “Email me your notes.” Problems, Hypotheses, Actions, Metrics, and Experiments – I contrasted this last class against the old-style hope, the internal hope for “Let’s just do data mining and hope for insights to emerge.” We are still waiting for those insights. As you see, data is not actually the primary thing, mining is not the primary thing, but the problem is the primary thing. Problems, for most people, actually start with what are we doing it for? What is the business model or what is the monetization? Transcript by Tamara Bentzur, Page 3
  4. 4. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics I want to start the conversation today by giving you a pretty complete view of everything there is to monetization. Is that a good start? Okay, so first of all, you know in the web 1.0 area, it was ads that fueled the Internet economy. I want to do this in the context of PHAME, and work with you in figuring out if we have a certain monetization model, what are the corresponding metrics that matter. For instance, if you wonder about the ads, what would be the metric you would like to have as large a number on as possible? Exactly – views, ad views, ad clicks. Conversion rate comes in here. You want to have people stay on your page as long as possible. Then, the next thing is you can actually sell stuff. For example, and there, it’s already no longer clear whether you want to have people stay on your site as long as possible, or whether you want to give them what they’re looking for, they’re done, and they know how it was cool, it was smooth, as opposed to putting stumbling blocks in their way. The best example of versioning is dating sites. On the one hand, you want to have a lot of people who are actually on the dating site. There is a good inventory. On the other hand, you want to make some money. You have two different versions, the free version to get the inventory, and the paid version. You need to differentiate it in a smart way. For instance, you could differentiate it that you can find out contact details about the person if you pay, or you can send an unlimited number of messages as opposed to only a couple of messages a day. What other ways of monetization do we have? Virtual goods is actually one of the upcoming ways to monetize stuff. People buy bits. We talked last time about whether a virtual gift is actually different from a real gift, besides the obvious. What other ways of monetization are there, since we want to have a complete picture here? 0:09:21.7 Lead gen? do you want to explain how lead generation works? Student: … Andreas: For instance, what’s the one where you get a certain number of points at Facebook apps and then they sell your name to some mortgage broker or some Russian mail order brides, or whatever they’re selling there? [0:09:57.7 Offer Power] is one example. There are a whole bunch of them that do lead generation. Lead generation actually has its roots way before that. For instance, if you look for mortgage, at least the way it use to be, it was pretty expensive. People on average make a lot of money on the key word mortgage. Transcript by Tamara Bentzur, Page 4
  5. 5. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics On the other hand, if you look for primary school or relocation, you tended to get those key words much cheaper. People always tried to understand who we could come up with key words that are cheaper, but correspond to the same idea of lead generation as the very expensive ones. Google – we talked about data sources last time that Google knows. By knowing the sequence of what people do, by knowing key words, they understand what they subsequently do. Google can do an awesome job in suggesting to people, “You might also consider those key words.” Why does it make sense for Google? Because by suggesting this, more people bid on those key words so the price goes up. Google runs a second price auction, which means if you bid $1 for your key word if somebody clicks, and the second guy only bids $.10, you don’t have to pay the $1 if someone clicks on yours; you only have to pay the $.10. This is a more stable algorithm for auctions than if you actually pay what you say you would pay. It also makes people hope, “We only have to pay what the second guy is paying,” so maybe the average price is going to be higher for Google. Are there any other monetization ways? We have subscriptions and one time payments. For instance, some games are one time payments and you own it. Other games are subscriptions where you keep on paying to play it. Dating sites keep on paying until you don’t need it anymore. Student: Mining information… your site versus… Andreas: Information products is a term I use for that. We’ll have one class toward the end of the quarter, where I will talk about how we can leverage information, how we can use information we collect on the web for finance, for trading. That’s a classic example. A good specific example here is a company that had a great product. They figured out how to understand something about a pharma by going to doctors and doing service with them, and measuring the implicit and explicit behavior. 0:12:31.2 How did they make money? It was not by going to the pharma industry. It was by going to Wall Street and telling Wall Street, “We know something about that product. We know the patterns of how doctors prescribe, we know how the patterns change,” and that was their way of monetizing their insights as opposed to trying to help the pharma industry to create better products or sell them better. That’s a specific case of a more general case called freemium, which means sometimes it is free and then when you want to have a premium version, you pay for it. Or, Chris Anderson has a wonderful paper in Wired, about a year ago, with the title, “Free,” which is the classic Gillette model. You might know that you buy something for free, such as a shaver, but then you buy the blades and you continue buying blades. With Google that is yet another level. You pay nothing; you pay with your attention and your clicks, but Google gets the money from somebody else. Understanding who the Transcript by Tamara Bentzur, Page 5
  6. 6. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics agents are who are willing to pay what the incentives are, aligning the incentives is a key thing. Student: … for access to API… Andreas: There is a company in San Francisco which does the API stuff. Rapleaf is another company we will talk about later, in terms of data products. Mashery – that’s right. Mashery is a company that does just APIs for you. They know how to make sure that the calls happen the way they should happen and stuff like this. Most of these things exist in very cheap building blocks where the company sells it thousands of times, makes their money that way, and you can just take the building block and now you have an API. For instance, BestBuy actually uses the Mashery API. Student: … franchising… sell the brand … domain overseas… Andreas: That may be part of the selling and versioning, already. Student: … Andreas: Those are actually already included here. The versioning might be the LinkedIn thing, that certain things are free, but then you have a premium membership that allows you to do other things. You can message people. Information products, access to data; I think we pretty much have this space here, up there. Ray, one other thing? Ray: … Andreas: What’s missing is the ecosystem and the App Store is a very good example here. Widgets, which [0:15:29.1 Jao Shung] is going to talk about in the end of class today, which is another example where you have [0:15:33.5 unclear] building a widget and each of them has certain stakes in the game. 0:15:40.7 The point I’m making here is that these are different problems. The problems are driven by the way we monetize them. If we now focus on one thing, namely on getting users, then acquisition x retention is the right thing. Nobody cares for users who only come once, unless they buy some that one time, which actually is not a subscription service. That’s what you want. Most people want users to actually like the product and to come again. One problem is acquisition and the other problem is the retention. Retention is the term where I sometimes say the product is the message. It’s not that whatever marketing is money spent because you have a bad product. It’s not just what the marketing spent, it’s to get people to decide once, but then people experience the product and that’s what makes them come back. That’s the problem space. Transcript by Tamara Bentzur, Page 6
  7. 7. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics For your homework we will focus on acquisition and retention. Sometimes, acquisition is called viral loop. Linus will be talking about this. Sometimes retention is saying, “We want to have really good engagement.” Hypotheses is the second of those five letters of PHAME. Hypotheses, in my perspective, are often coming from cognitive signs, from behavioral economics. Ultimately, what we’re interested in is helping people make better decisions, and so this means we need to figure out how people make decisions. For instance, if I give you either $10 right now, or $100 in a week from now, who would pick the $10 right now? I guess you know where to find me in a week from now. Let’s say if you see something on the street or you meet somebody at [0:17:31.7 Tresida] and you say, “Hey, do you want $10 right now, or do you want $100 in a week?” Who would take the $10 right now; about half of you? Who would take the $100 in a week; roughly the other half, okay. I guess it depends on the person. Now, I’m shifting this into the future and it’s the same guy at [0:17:53.4 Tresida]. Do you want to have $10 in fifty-one weeks from now, or do you want to have $100 in fifty-two weeks from now? It’s the same delta of one week. What do you think? Everybody would basically go for the $100 because between fifty-one and fifty-two weeks is not a big difference. This is called hyperbolic discounting. 0:18:15.9 One thing, when I was preparing with Linus here, was that we really came up with a very good example. It’s an example from Amazon. Here is how that example worked. Mike [0:18:26.9 Shaw], who runs Wikinvest, tested the following; he tested what was the economics behind co-branded credit cards. The economics behind co-branded credit cards, like the Amazon Visa card, is that if Amazon gets a new customer to Chase, Chase pays Amazon $100 and Chase pays Amazon $30, which goes to the customer. Basically, for each time Amazon brings a new customer to Chase, Amazon makes $100 and the customer gets $30. The question is how do you message that to the customer? The two alternatives were, basically one of them was saying, “You have $43 worth in your shopping cart. You can get that for only $13, today.” The other one was saying, “Come next time, and we’ll give you $30 off of your next purchase.” It is not as trivial as you think, at first sight. You can argue both ways. You can argue the former, $30 off right now, is actually better because people would rather take – that’s why I talked about the hyperbolic discount – the $10 or $30 right now, than wait for some point in the future. It depends on what you want to measure. If you are measuring how much is Amazon making over all, then the fact that somebody has a $30 credit may make them come and buy something later since they have that Transcript by Tamara Bentzur, Page 7
  8. 8. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics credit there. That is a good example of two hypotheses and we don’t know apriori, we don’t know before the experiment, which one will actually work better. Now, it clearly depends on the metrics we are looking at. If we are looking at the metrics of people signing up for the card, we’ll have more the first time. If we take a different view such as we want to know how much is Amazon going to make down the road, including the $100 but also including selling more stuff and having better retention of customers, maybe the latter one is better. It turns out; the former one was the one that actually worked better. Give people a discount right now. In the data community, that’s typically called, “Whether Mr. Right or Mr. Right Now; we’ll take Mr. Right Now.” That was the hypotheses part. I’m actually curious what hypotheses you have for building your pages for the social data revolution. You have to think big, think broad. The hypotheses don’t come by staring at code. They come by having ideas, by talking with friends, and they come by looking at data, this [0:21:10.5 unclear] process, and coming up with new ideas. We will do a class on visualization in the quarter. In the visualization class, I will make the big difference between real time and interactive. Most things, with the exception of performance metrics, we don’t care all that much about whether they’re real time or not. We do care that they are interactive because it is our time. So, for the interaction time, how long does it take between having the idea and getting an answer back that counts much more so than whether this hit was a minute or an hour old. 0:21:44.4 The actions clearly depend on what problem you’re trying to solve. I already mentioned, as an action, the varying of the text. Another example of an action is if you sent emails out to people, reminder emails, marketing emails, so you have another channel and not just the website. It could be Facebook notifications. When do you send those emails? That’s another experimental question and it turns out that sending them a little bit earlier than when the last time the person read the previous email. If you read emails on Tuesday, at 10:00 in the morning, we’ll send out an email on Tuesday at 9:30 in the morning. For some reason it seems to be that people read last in, first out. That seems to be a time that works well for people reading email. You don’t want to bury it by sending it on Friday evening. People will come to the office on Monday morning and have a stack of stuff and never get to the bottom of it. These would be two variations here, two of the parameters that can vary; text and time of sending out an email. The lesson for the endpoint is twofold. One is you want to have many metrics. It is not one metric that matters. I got, from the TA, the whole union of all this stuff you created here. I will talk to you a little bit about a framework on how to put them together. There is a lot of metrics you want to consider, not just one. Transcript by Tamara Bentzur, Page 8
  9. 9. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Student: I was wondering why… Andreas: What I call experiment is running the experiment. For me, experiment means the one liner in the code when doing a split test, and A/B test. The point I try to make here is that I want to have differential actions. I want to have it either this way, $30 off now, or that way, $30 in the future. I believe that splitting it conceptually is better than lumping it all together and calling it an experiment. Also, metrics are part of the experiment as well. That was a good question. Any other questions? Student: … Andreas: I hate to tell you, but no, the actions are the actions you are taking. The consumer action or reaction comes in the last point, as you do the experiment. When I’m talking about actions here, these are differential actions where you decide it could be this way or it could be that way. Last time, I gave you the example of having a shopping cart on the left versus having the shopping cart on the right. Today, I gave you other examples. We have a lot more examples here. It’s the actions you are taking, so it’s a very empowering approach, as opposed to hoping for the insights. You start with the actions. I sometimes call it the primacy of the action, that’s the starting point. That was a good question, thank you. 0:24:52.5 Now, we get to the heart of it. We already talked a little bit about metrics. It is not just metrics that measure user behavior. It is metrics that measure site behavior, as well. For instance, if somebody sees a fatal, if something crashes, there will be negative effects for the user. If he is trying to buy something, the guy is not going to buy now. There will probably be long-term effects. As I said the last time, short term is easy. Long term is hard effects. Then, we do the experiment. For the experiment, what matters is to have this one line of pseudo-code here, if a certain condition is met, it shows them the new stuff. If the condition is not met it shows them the old stuff. Here is a fun exercise for the computer scientist behind you. If you want to have 2% of all customers see the new stuff, how would you pick 2% of the customers? Student: … Andreas: So one is you take the user ID. You take the remainder of the model 100 and see where [0:26:14.1 unclear]. That’s reasonable. What other ways do you have? The problem there is that you always have the same 2%. Student: … Transcript by Tamara Bentzur, Page 9
  10. 10. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Andreas: You can add a random number to this. There are many ways of doing it. They are all good. What’s important is that the answer was not, “In the morning you take the first 100 people, and then you take the next 500 or 5000.” You pretty much decide, before the person comes, whether they’re going to be in the “test set,” which is the old one, or in the new set where the new stuff is. Once you have decided this, then we actually show them the thing. The advantage of this is that you don’t have biases in any way. The way it’s done at Amazon is that each person has an Amazon ID, which you probably don’t even know. You don’t want to start user IDs counting up from 1. That way your competitors would know how many customers you have. You want to start in the big space, take random numbers without replacement, maybe ten billion, hundred billion, some big number. Storage is cheap. Assign a random number. Then, if you want to have 2% of the people, you first pick one of those ten digits. You say, “Okay, it’s the third digit.” You pick another random number, if the third digit is a 7 then you have 10% of the people. You have to add this with another digit, and then you say, “Or, between this random number and that random number,” and that will get you 2%. There are many ways of doing it. The big difference is don’t try to do it through a soft launch such as “We did this and now we are moving over and we’ll compare it,” because you have other external effects which are typically much more important – day of the week effect, or some other apps, than what it is you want to measure. Are there any questions about that? It’s an important methodological point. 0:28:09.9 In terms of framework, that completes my part of the review of the PHAME model. For about five minutes, I want to talk with you now about a few dimensions leading over to Linus, about what I am looking for in good metrics, by contrasting and not on an example basis, by contrasting two things. They always have in common that there is something called deep structure, which we are looking for versus surface structure. People sometimes think this is like seeing the forest or seeing the trees? The answer is not. We are not making the difference between seeing the forest, which is some aggregate phenomenon versus seeing the trees, which is some high granular phenomenon. We are actually looking at the dynamics. We’re coming up with models that tell us if for instance this sunlight is changing because of photosynthesis or whatever is happening there, there will be effects to the dynamics. Things will be growing differently. Don’t think forest versus trees. Think how we can actually do experiments that allow us to understand the underlying dynamics. Transcript by Tamara Bentzur, Page 10
  11. 11. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Here are a few ways I try to explain this. I hope we will get to it. I have three examples for this. The first one is we talk about a model versus a description of the trends. A model tends to be something where you can plug something in. For instance, a linear operator is a model. Some input is going in and some stuff is coming out. A model for me, is a predictive model. I can make predictions. I can compare these predictions to what actually happens in reality. The is in contrast to the descriptive model where I just try to summarize things, talk about trends, but we don’t know whether there is any predictive power in that. Deep structure, for me, means somewhere I can make predictions. In the equation of the business, if I change this parameter, if I, in the viral loop, change this, I make a prediction that we get 7% more. If we get 7% more, we know we have a damn good model, which has some understanding about the underlying dynamics. The second way I talk about this is that I’m interested in the axes of the space. I’m interested in the underlying dimensions, much more so than only being interested in the instances and the points in this space. When you work with clients, they often come and ask, “What software package should I buy?” That’s of no interest to us here in an academic setting. We want to understand what the dimensions of this space are. What are the characteristics of these packages. Why? Because statistics is a science that deals with noise. Statistics is a science that deals with generalization. The assumption we have everywhere in statistics is if I move a little bit in input space, output is only moving finitely. 0:31:29.8 If you want to be fancy you can speak Latin and say [0:31:32.6 unclear], which means nature doesn’t jump around. Germans always have Latin sayings for stuff. It’s very different from people who just know that here are packages, and here are instances, but they don’t know whether they’re close by or whether they’re miles apart and there is no generalization from one to the other. The third way I like to talk about this deep structure versus surface structure is that I like to talk about tools versus art. We are more or less engineers here, so we like to build stuff that people can use. The output of course, depends on what we put in. That’s very different from people who create art. To create art, you make that movie and when you are done with the movie you are done; you release it, and that’s it. It is up to people how they want to interpret it. Some visualization is that way, but that’s not how we really learn stuff. That’s the art aspect and it can be very beautiful. But, it is not how to make progress in data mining and understanding what the data is. Transcript by Tamara Bentzur, Page 11
  12. 12. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics We are doing deep structure here, which means we try to build models, making predictions. We try to look at the axes of the space and we try to build tools. That doesn’t mean that there is something wrong with art. Art says things in other ways. For instance, there are great movies we love that tell us things. I want to make the difference between the surface structure, which something fixed has, and the underlying structure, which we are interested in, here in class. Are there any questions? Student: … Andreas: 6 to 10 is a number pulled out of a hat. I figured there are roughly ten groups. There are five groups at Berkeley, so it’s still a manageable thing. It makes it clear to you that I’m not asking for the one most important metric. In other cases, I will ask what the one most important reason for things is. It is not that I always believe in 6 to 10. It’s a reasonable, manageable size. There are 3 to 5 people in each group so if each of you have two metrics you are really passionate about, you can each have them without having to fight over them. 0:33:50.3 Metrics fall in certain groups. There are the traditional metrics that come from the olden days when people actually looked at newspapers, unique users, how many people read my paper. Then, we have metrics that concern the individual. Those are engagement metrics, how often does a person come back. Customer lifetime value is one of the things that falls in here. The new thing for the last couple of years is social metrics which are metrics between people. These could either be implicit metrics like I forward something to Rohit. I see what he is doing with it. I post on somebody’s wall; these are all things between people. Of course, explicit feedback where the standard example is from Amazon; I found this really useful, or this answered my questions, or this was helpful. You give feedback between things. It’s either about the site, or it is about people/individuals, or it’s about the relationship between individuals. It’s the same as we had when we talked about the social data revolution; it was sniffing the digital exhaust, which scientists can do by themselves. It’s people revealing things about themselves, as a first stage, and then as a second stage, people revealing stuff about their relationships with others. Qualitatively, computationally, they are very different. How do I analyze them? We look at the properties of individual metrics. What is expected; what is good; what is bad? If we are also looking at the set of metrics, we see what are the tradeoffs between the metrics? If this one goes up, do we expect that one to go up, as well? Or if this one goes up, do we expect that one to go down? That’s how I want you to think about metrics, in that space. Finally, to visualize them, an important thing is to look at the distribution. I gave you some examples from last time. It’s not one number, like the average page views, Transcript by Tamara Bentzur, Page 12
  13. 13. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics which we are interested in. It is the distribution across all of the sample. The second one is you want to look at how things change over time. You can also look at how robust they are by removing a certain number of data points that go into the metric. Does it have a big effect? If so, you are very sensitive towards outliers. If it does not have a big effect, then it’s probably more robust. You want to go for stuff that is robust. For example, I gave you the slope of the [0:36:15.7 law log] plot, which is the exponent in a power law. After this introduction, it is a great pleasure for me to introduce my friend Linus Young. Linus took the class three years ago and I won’t embarrass you by showing you the pictures we took at the party. You can find them if you just look for “ling,” which is basically his first initial and last name. He’s so modest, he didn’t even put his name on the presentation, on Flickr. I’m super happy you’re here. Linus: Thank you. I’m very happy to be here, too. It’s been a while. This class is a lot of fun. I just miss class, in general. Like Andreas said, I took his class three years ago. I just recently graduated. Since then, what I’ve been doing is I started a Facebook company, right after class, which I eventually sold. Then, I did an iPhone app company, which I eventually left. Now, I do something completely different, which I’ll talk about a little bit later. 0:37:17.9 Today, Andreas asked me to talk a little bit about developing on the Facebook platform, and the iPhone platform, and to give some of my thoughts, my insights, and my experiences about how I did this. Feel free to interrupt me at any time. This is more of a discussion than me giving a formal talk. This is the first time I have given it, so it may not be as polished. Andreas, please feel free to interrupt me with any questions you might have. Just to give you some relevant background about how I got started, the Facebook platform opened up in the summer of 2007. I was just doing nothing during that summer, between summer school and things like that. I had a lot of time. My buddy and I went to a bar one night and were talking about this. It was very interesting. He said, “What can we do with this?” For the first time, we have access to two hundred million users, their data, their social connections and things like that. We came up with this brilliant idea, after a few beers; what if you could hook up virtually with someone on Facebook? That’s what we found, on Facebook when we go there we look at a lot of pictures, and things like that. We had this brilliant idea; what if you could pick a friend, virtually mate with them, and then create a baby and take care of that baby? It was a cute little game. We coded it up that summer, and it did very well. It got a lot of users. We got a lot of Transcript by Tamara Bentzur, Page 13
  14. 14. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics data, and we got really interested in this. We said, “Here’s a market that we can go after.” Back then, not a lot of people knew whether this was something just for fun or whether you could build a business out of it. We wanted to build a business. We thought there was market potential. Framing it in Andreas’ PHAME model, what was the problem? Our problem was how do you get a lot of users. My partner and I knew that this was not something we wanted to be in, for a long time. Basically, we wanted to make a lot of money and capitalize on this hot market. To do that, we needed to get a lot of people, very quickly. That was one problem. The second problem was how do you scale this? We saw in the summer, that even with a hundred thousand users or so, our server started getting hiccups, started crashing, and we had little problems. If we wanted to get to the next order of magnitude, to a million users, how were we going to scale that? That’s a big problem. Finally, related to the first problem, how do you get an application compelling enough to get this huge amount of growth? Those were our problems. 0:39:34.5 When the summer ended, I actually took the Facebook class here, and I met one of the most brilliant guys I’ve ever met. He is a physicist. He brought the virality model from physics; it’s very simple model you have in biology… he brought it from physics and basically put it in the context of Facebook. He created this “viral loop”. These are some of the metrics of it. It’s kind of hard to see, but basically, it’s a formula between your invitation rate, acceptance rate, and conversion rate. You just multiply it together and if you get a number over 1, you’re viral. I can talk more about this later. That kind of solved our first problem. He proved the model. He basically said if you follow this, you can grow very quickly. It came down to basically forcing people to invite their friends. That kind of solved the first problem because now we knew we could grow. The second problem was how do you scale this? Luckily, I had a roommate who was doing a PhD thesis on scaling. He was at Berkeley at the time. We brought him on board and with our skills together, we felt we could properly scale this. The last problem was harder to solve. What do you make that’s compelling enough to grow quickly? That brings me to my next life. We had several hypotheses. We knew from what was hot at the time, and what was really growing, was that people wanted to know something about themselves. More importantly, they also wanted to know what their friends thought about them. We wanted to figure out an Transcript by Tamara Bentzur, Page 14
  15. 15. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics application that was compelling enough that they would “annoy their friends” by forcing the invites that would grow our apps. Basically, we threw a lot of ideas at the wall. We threw about twenty or thirty ideas and we saw what stuck. These are some of the applications that stuck. One was, “What does my birthday mean,” so you enter in your birthday and you figure out what the meaning of it was by inviting friends, popular friends, second friends. The one that really did well was “You’re a hotty,” which was basically how hot your friends are. Andreas: One important element here is that it’s not that people try very hard to figure this out by arguing from first principles. With technology being so easy, you just throw it out and see what works. That’s a very different way of thinking about this compared to developing a nuclear power plant or something. Linus: That’s a great thing about software. You can try a lot of stuff. You change your logo, change some text, and just start throwing the thing out there. Certain things stuck really well. I’m going to dive deeper into this one, “You’re a hotty,” and explain what it was. 0:42:08.1 It was very simple. It’s two pages. You get the invite and it says, “Your friends think you’re hot. See how hot you are.” You click through it and you get presented the first page on the left, which is basically here is your top ten friends. See where you rank by inviting all these people. You force them to invite. Once they invite twenty or so friends, they get that second page which allows them to readjust those rankings. It’s very simple. We coded the thing in about six hours. You can see how it is. Some of the actions we took, and some of the things we varied was what happens when you rate someone hotter, do you send emails to all their friends, do you send it to one friend, when do you send emails to them, when do you send applications. All these different things we kind of varied. We thought of ways to optimize. We did manage to optimize. We had exponential growth. That was about a course of a month when we were exponential. We were getting about three hundred thousand daily actives. All this growth was from new users. We eventually hit a peak where we burned through the first field of Facebook. We’re just getting residual users in that viral loop, and eventually we trailed off. I’ll go into that later. It was pretty amazing. We were getting three hundred thousand new users a day, and eventually burned through all of Facebook. Andreas: Let me ask you a question. What were the hypotheses that led you to come up with hotties? Was it that you saw James [0:43:42.8 Hong’s] “Hot or Not,” was it you figured out people love to know about themselves, was it that people don’t know what Transcript by Tamara Bentzur, Page 15
  16. 16. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics they want, but they are good at comparing stuff? What let you to this one and what led you to the other ones, in terms of the hypotheses creation? Linus: Our hypotheses were that most of the people on Facebook are younger users, or new college students. That’s the first time they actually opened it up to all the colleges. We wanted something very colloquial, like very fun. We wanted something that would attract their ego. All those things were kind of related around that. It’s also because that was what the top apps were doing at the time. They were all revolved around that. Andreas: One of the questions that we always discuss was what of this is actually people 2.0, where are people different from the way they used to be and one is just the same old same old; people have certain things. If you could comment on that a little bit. 0:44:46.6 Linus: I think people are always the same. People are always concerned about themselves and what other people think about them. I think we see that in Facebook and eventually, in the iPhone… Student: … Linus: We actually knew you could make money off this model because it was proven by the other people in the Facebook cloud. We knew that advertisers were paying very high CPMs and it was a very easy thing to do. Does that answer your question? Student: … Linus: I’ll get back a little later on in the slides about how we have to re-engage them. Yes, to answer your first question, it was a couple of thousand dollars a day, at its peak. It was a pretty decent size of money. Student: Sort of going off that, it seems like you’re optimizing a lot for getting a lot of new users… future engagement… brand recognition… Linus: We’ll get to that in the metrics section. Basically, we did. Actually, that’s a good tie into the next set. These are some of the metrics we came up with. I’m not sure why it’s not showing very well. We spent six hours putting the application together, but we spent about six weeks trying to collect these metrics. It was a pretty hard thing to do. We coded it in Ruby, which luckily has a lot of packages to allow you to collect these statistics, but it was very hard to integrate it, especially on a platform. It’s much easier on a website, but when you have this middleman, Facebook, it’s very hard. Transcript by Tamara Bentzur, Page 16
  17. 17. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics We had to go and figure out new ways to figure out these metrics. We collected a ton of metrics. We would do this all asynchronistically. Every night, we compiled gigabytes of data, and every morning we would have reports on these types of things. Even though we collected all this, we didn’t look at all these every day but only when we had a certain problem we would come back and look at these. Feel free to ask me questions on any of these. It was a lot of metrics and a very grueling process to actually code all this stuff to record it. Then we ran experiments. We did a lot of A/B testing. Going back to the other question. We burned through all our users. We may have annoyed a lot of people, so our numbers starting trailing down. When we got bought out, one of the criteria was to bring that user base back. We had twenty million users at the time; how do you re- engage them. We did a lot of A/B testing. We did a lot of changing layouts, laying buttons and things like that. We basically built an engine, called a sticky engine. We wanted to see how engaging, how sticky the app was by changing all these various metrics. 0:48:15.9 What we did was we had different branches. We did A/B testing, tagged each user, like you said, 2% of the users or 5% of the users with the new features to see how it performed. We compiled these stats to give us graphs. Unfortunately, I don’t have any more data so the only screenshot was the original design of the engine. We optimized it. We optimized the hell out of it and then our conclusion was actually that the best way to get new users was actually to do one-to-many notifications, or one-to-many actions. If I do an action, I inform many friends of mine. We built our whole application around doing this type of stuff. We eventually changed “You’re a Hotty” to something where it’s like buy and sell your friends. This was also very engaging at the time. One of the actions we had was you would buy all these friends and a one-to-many action, for example, would be you could pet them. Their value increase. You email all their friends, notify all their friends that they’ve been petted and then you get re-engagement coming back. We did a lot of stuff like this. It is kind of interesting. You kind of have to do this dance with Facebook where you do something that is kind of evil but not too evil and then they push back and you push back and you eventually get to this equilibrium state where everyone is sort of happy. That’s how we would re-engage our numbers. Actually, we did better at re-engaging users than we did in our viral growth. That was fun. Eventually, this got boring. We Transcript by Tamara Bentzur, Page 17
  18. 18. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics came with the expectation of making a lot of money. We kind of met that; we got bought out. Now, we’re in this company and it’s kind of boring because we have more developers but it’s very slow. You don’t have the freedom to do what you really want to do. I was able to renegotiate my contract, get out of it and I moved onto something else. Andreas: Before we move the iPhone app, do you have any questions about what Linus did? Student: Did you look at the number of …or uninstalls? Do you think it makes sense to set either … if I’m over this… Linus: That’s very interesting that you say that. In terms of uninstalls, it doesn’t happen very often, actually. In the old profile, it did. It would always be under 5% or something. It was very low. I don’t know if people just don’t know how to uninstall or they just didn’t bother to. We didn’t really concern ourselves that much. 0:50:57.0 In terms of annoying users, that was a very interesting question. To tell you the truth; I didn’t really care about the users. I would never spam my friends. These networks would be bigger in the European countries and different countries. I didn’t know any of them and they didn’t personally affect me so I really didn’t care. To play in this game, if you want to be in this game, you have to annoy people. Even the top comedies, like Play Fish, is one of the biggest app companies right now; they annoy the hell out of the users. They do it subtly but they do it all the time. The second company was the one that bought us out, actually, they do it all the time too. This is the price of playing the game, you alienate your friends. Student: Do you think that devalues the platform for the brand…. Linus: I think they do better at it now. At the time, it did significantly hurt their brand. They were considered spam-type companies, basically. Student: I was wondering… Linus: It’s funny, I don’t know if this completely answers your question but a lot of the A/ B testing we would get would be marginal differences, like .1% or .2%. It was very insignificant. We would never get orders of magnitude difference on any text changes. They always say that all these .1% add up and eventually you get to something that really helps, but it didn’t really do much. At the end, we just kind of did away with text changes, button placement, and things like that. We kind of went with our intuition and figured out what it was. Then, we cracked … if you send one too many notifications, you’ll grow, very easily actually. Student: Did you try any method of recording those people who were spreading the word? Transcript by Tamara Bentzur, Page 18
  19. 19. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Linus: Sort of like an incentive program, right? Towards the end we did that. At the time, when we were doing the applications, we didn’t really need to. You have to do a lot of statistics. You have to figure out who are the most engaging users. We had those statistics, but then you have to go out and engage them or create some kind of incentive program to do that. We didn’t really feel we needed to do that because we had something that worked at the time. We were meeting our goals so we didn’t need to work harder, basically. Student: Do you feel that people are going to get sufficiently annoyed with this stuff sometime in the future… or do you think it’s just going to go on… and take more and more of it? Linus: I thought that we reached the tipping point, at some point. Facebook did crack down. They said, “This is getting out of control,” first with the request system. They completely closed that down. Then, with the notification system, they closed that down. They put more metrics, so the app developers did basically push that point. They annoyed the hell out of the users. Now, I think it’s in this equilibrium state. I think app growth is not in the U.S., primarily, but in different countries that are growing. They’re very popular there. 0:54:24.1 I’m not really sure. I think I’m still out on that one. In the U.S., I think it’s already reached that point. Andreas: One distinction we can make here is between deep structure and surface structure. The surface structure is some developer changing a parameter on Facebook and allowing you to only send out N as opposed to M invitations. This can make or break a company. You know that it’s sort of some developer who makes these distinctions. By contrast, it’s the deep structure of getting the incentives aligned. For instance, Google’s incentives in search are very well aligned with our incentives to search. Google tries to show us stuff that is most relevant to us. That way, we’re more likely to click. I think if we just try to reverse engineer, what a company like Facebook is doing, it is not at all as interesting as if you tried to build a two-sided market, where there is price discovery, where we set it up so that whatever happens there is some fair solution where the equilibrium is being found as opposed to the equilibrium being set by somebody tuning something up or tuning something down. Linus: That’s a much better model, I agree. Are there any other questions? Student: … Linus: It was basically daily actives. It was when we were viral. All our apps were viral so we sold at the peak. You guys are very interested in the acquisition. I should probably move on and take more questions later. Transcript by Tamara Bentzur, Page 19
  20. 20. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Then, I left the company and basically started another company to do iPhone applications. We would try to take what we learned from Facebook and try to mimic that on the iPhone. It’s very interesting when you have a mobile phone, the games change a little bit. You don’t really have that social aspect anymore. You have more location-based aspects. Based on that, we tried to tailor an app to do that. We tried to make better apps. Our first app, to the far left, was “What’s Hot”. We like to use the word hot a lot; it seemed to work well on Facebook. You basically take a picture of where you are. If you are on the street and see something interesting, you take a picture, upload it to our server, it shows whoever has that app. It did fairly well. This was the first app we created. It wasn’t the best. It did okay but not as well as we thought. Mainly, the problem was it was our first app, but people started using it for porn. They started taking really explicit pictures, they were very hard to manage, so we shut it down. 0:57:11.5 Based on that, we moved on to the next one. A lot of people would take pictures of themselves and want to know how hot they were. We created a “Hot or Not” type application on the iPhone. We built more viral loops into it. We tried to bring that social aspect, the stuff we learned from Facebook, into the iPhone. We developed our own invite system. It would go through your contact list and you could pick who you wanted to SMS. It would send SMSs to come to the app. That didn’t actually work as well, either. It was very interesting. People don’t want to send SMSs to other people and they have to send it to people who have iPhones for one thing, and basically try to push them to use the app, when the reward is just figuring out how hot they are, there are multiple websites that do this, on the web, on Facebook or wherever. That app kind of failed. We were at a problem now. We didn’t know really what was working on the iPhone. We decided what else is very engaging that will actually make users go out there and actively go to their friends and pull people onto the app. We came up with this mafia game. This is not rocket science. This was actually a game developed about twenty years ago, on the TI83. It took off on the web. It took off on Facebook. You’ve probably seen invites for this. Why don’t we just port this to the iPhone? I don’t know how many people have played this game, but it’s literally a text-based game. It’s very simple, but the whole game revolved around you bringing in more people to the game, to become the biggest mobster. This turned out to be wildly successful because people would just go out there and pull people into the game. The app was number two on the App Store, for about a month. Our users grew. That was that. Around this point, I started feeling I wanted to do something different. I did Transcript by Tamara Bentzur, Page 20
  21. 21. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics annoy a lot of users. I did do a lot of things; coming to Stanford, getting a good education in engineering, I could probably spend my engineering skills doing something better. This is fun, it was fun to code this, it was fun to scale it, it was fun to do all these different things, but it wasn’t very rewarding. It wasn’t something that was changing the world very much, and frankly, it was taking other people’s stuff and being first to market, capitalizing, and taking advantage of the situation. It was easy to do but it wasn’t very challenging. Right now, I’ve started a non-profit that makes low-cost infant incubators for developing countries. It’s called “Embrace,” so check it out on the web, if you are interested more in this. I’ll take some questions and I can talk about anything. I can compare Facebook with the iPhone, talk about where I think the future is going, talk about Embrace. Student: … 1:00:29.8 Linus: This is completely different. Everything I’ve done was electrons and bits and things like that. This is a physical product. I think some of the key skills I can bring in is just in terms of how you run a startup, the processes you go through and what you need to do and figuring out what is most important at a certain time. As far as general engineering skills, I also bring those to the table. There is actually a lot of physics and engineering that goes into this product, like how you heat – it’s basically a pouch in a sleeping bag, and how you heat that pouch. We actually had to write code, very special code, to make it so it doesn’t overheat, so it’s very safe. I was actually coding two weeks ago, non-stop, to make this heater actually work. Finally, I did my focus on design, so there is a lot of user-centric design and a lot of that type of stuff. Student: ... back to the… on the iPhone. Did you stop because you weren’t … Linus: We were getting problems with Apple, but it wasn’t what we intended the thing to be. It wasn’t doing anything. It wasn’t making money. It was giving us a lot of liability. The risk to reward was not there to continue doing it. Student: … Linus: It’s crazy, they would go on forums, they would post everywhere. There really wasn’t a viral loop. They would actually be very proactive and go out there and get all their friends with iPhones to join their mob. They would have Twitter groups and things like that. It is a very addicting game. Student: … pricing Linus: Pricing on the App Store is very interesting too. The Mafia game itself is like Scrabble. You can’t really copyright a game. You can copyright the trademark, the Transcript by Tamara Bentzur, Page 21
  22. 22. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics images, the text you use, and things like that. The game itself, you can’t really copyright. We just did our own version, new text, new graphics. The game itself is the same but everything else is different. What we found works is to actually make it free for a little bit of time, getting a buzz, getting it to generate on the App Store, in the free section. Then you have a time it is free and then you start charging for it. You generate an initial buzz, initial user base where people are going out and grabbing people, and then you slowly charge more and more money. Student: Do you think that’s better than the mode where you have … Linus: We tried that model and it works okay but it didn’t generate as much revenue as doing something like that, for that particular application. Student: The concept for the iPhone application, where you try to get users to draw them in and get them to invite their friends… 1:03:54.8 Linus: You’re completely right, the business models are different. More specifically, you’re just saying how did we go about creating a business model? Student: How did it affect your business process… Linus: I always believed that when you go in you have to figure out what your expectations are. Our expectations were always to just make money and exit quickly. For the Facebook one, it was basically do ads. That worked really well and when the CPMs dropped, we tried to find the exit. For the iPhone business model, we tried many different things. We made a free app and see how ads did. It constantly evolved and eventually turned into this. We started were basically all the revenues were generated by ads. That actually did pretty poorly. It’s very hard to get money through ads on the iPhone. I guess there are not enough users and the CPMs are too low. Then, we tried a free version and then a paid version. That worked fairly well, but I’m not really sure why it didn’t do as well. You initially get this user base and then – it’s kind of like a bait and switch. You get them and get them to pull in other people, and by then, you charge money. I still think the business model on the iPhone App Store is not settled, yet. It’s also very tailored the specific type of application. Student: How did the user response change when you changed your application from the free mode to paid mode? Transcript by Tamara Bentzur, Page 22
  23. 23. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Linus: When you go from free to paid, I don’t think the users are actually that upset. There were a lot of users, so I tended not to read any of the emails that they sent to us. That’s another thing, when you get to twenty million users, by the way, you get hundreds of emails every morning. I just stopped reading emails. I look at the empirical evidence. If the revenues don’t go down, that’s fine. Student: Can you talk about the viral… Linus: Sure, on which one? On Facebook, right? There are three components. One if the invite rate. How many people, when they come to your app, do you require them to invite? If it is fifteen or whatever, you get that number multiplied by the number of invites they actually send out. In this case, we force them to send out the full fifteen, multiplied by the conversion rate. Once the other person gets the invite, how many people actually accept that? We found the sweet spot was about fifteen. If you force someone to invite fifteen people, about half of them would drop out and won’t do it. The people that do, and if the people that do send out the invites, and the conversion rate is fairly high, you will actually get a number greater than one. It’s very simple. 1:07:03.8 Andreas: It’s actually very intuitive. The viral loop might be scary, but it really shouldn’t be. It is the interaction between people being asked to do stuff. I gave you the example last time with the $10 and if you think it’s fair you keep it and if you think it’s not fair, you have to return it. It’s the same thing here. If people think it’s fair to invite 500 people, they will do it. If they don’t think it’s fair, then you’re [1:07:31.6 unclean] You are really trading off things here which are people constants. Ultimately, you are doing experiments about what people think is fair. What is the right price to pay? How much do I annoy my friends? Do you think that a model where people would have, in the back of their minds, some cost, some price they have to pay in the friendship; if I hit up Linus five times, he’s not going to talk to me anymore, versus if I hit up Enrique two times, he’s not going to talk to me anymore. Do you think people have models like this? Should we try to model the cost of annoyance, the cost of interrupt? You mentioned the cost of unsubscribe, which is super high. If somebody says, “I don’t want this anymore,” one action to take is never to invite that person again. The ones that unsubscribed are probably not in the market anymore. What are your thoughts about having it in a more decision-theoretical framework? Transcript by Tamara Bentzur, Page 23
  24. 24. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Linus: We actually thought about that. I would love to know how many times I can annoy particular friends. I don’t think people think that way. They can kind of get a sense. They can get feedback from their friends when they get annoyed. I used to pet a lot of my friends. They would say, “Stop petting me,” and things like that. I would stop petting them. That’s one way to get feedback. We actually tried stuff where with the metrics, we found the users who would accept requests more often and we tried to tailor it so that when other people came to the site and invited those people, they would pop up to the top of the list. People who are more susceptible to trying new applications. That’s one thing. I rarely invited my friends. I always had fake accounts. It’s hard for me to say because I really didn’t care about the users. I don’t know if that’s a good or bad thing. 1:07:03.8 Student: Dr. Weigend, you said about the fairness model, right?... Andreas: I just gave you the example last class, where I give you $10. It’s culturally dependent that if Matt gives Blake $.01, what is Blake going to say versus if Matt is going to give Blake $3, and Blake says, “Okay, that sounds fair.” People have some intrinsic notions about fairness. What we are probing now is whether the costs they have of bugging their friends seems worth it for them. What Linus is saying is people don’t care about their friends. Linus: Oh no, some people do. Andreas: Should we have a model, for instance, where we figure out, based on past behavior, how much that person can bring us? What about the second order model? If I only invite five friends, but they are really good friends and will do whatever I ask them to do, versus Matt invites five hundred friends and they don’t even click; did you take that into account? Linus: Yes, I probably misunderstood the question. I do care about annoying my friends. That’s why I created other accounts, to spread the word about my other app. We wanted to find out users, and we got it from our statistics about who invites the most people, who are most susceptible, who are good users, who liked to invite friends and those friends actually invite other people too? It takes a long time, but you can actually compile these statistics. That was helpful. Student: … before people start using your application… Andreas: People don’t know what they’re getting, basically. It’s like you click on a link at Google; you don’t know what’s behind it. Transcript by Tamara Bentzur, Page 24
  25. 25. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Linus: That’s why the text has to be very compelling. Student: Any words on future of the next cool app…. Linus: If I had to do something right now, if I had to do it all over again today, I was wondering what I would actually do. It depends on what you want to do. Do you want to build a company, or do you want to make a quick exit. If you want to make a quick exit, I would say Facebook is still a very good platform to develop on. You can get rapid users. It’s amazing; you have more than two hundred million users. You have access to all of them. If you want to make a big splash, just create something very innovative and creative, or copy something and just do it better than everyone else, out execute them. That would be a good way to do it. You can make a lot of money, still. It’s a very good platform. 1:12:09.9 If you want to build a company, I would say mobile is probably a better idea. With Android coming out, with iPhone being dominant, I feel the mobile space is finally going to take off. Before this, I actually did a mobile startup. We built it on the Microsoft mobile platform, which was not as good. I think it’s kind of dying. The problem with doing mobile startups is there are so many different operating systems out there. You have to work with all these different companies and they’re also very private. They’re not as open. This was three years ago. It’s very hard to develop on all these different phones, port your app to all these different phones, to get them to work together. Now, I think and hope that there is going to be one dominant player, hopefully the iPhone or the Android. They are more open. Google is very open to opening their APIs and working with developers. So is Apple, to some extent. Hopefully, there will be two big players for now, and maybe later there will be one player. Then you can really explore the space. The phones are getting more powerful. I really believe that’s the next big market. If I had to do something, I would probably do a startup in the mobile space. Andreas: Linus told me I can ask him five questions. These are the five questions we agreed on when we prepared here, in the framework that you took the class three years ago and you can look at it here. 1. The first question is what has not changed and what has remained the same? If you just focus on what changes, you miss half of it. 2. The second one is how has the world changed and what is different now from what was three years ago? 3. What will be there in another three years? Transcript by Tamara Bentzur, Page 25
  26. 26. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics 4. What would you have done differently? 5. What advice would you give to the class this year? Are these fair questions? Linus: Yes, these are fair questions. I sort of answered some of these. What has not changed? People have not changed. Just from the Mafia game; that was popular twenty years ago. It’s popular now, and it’s popular on the iPhone. People tend to stay the same, their behaviors stay the same. I think that’s a very interesting thing. 1:14:11.4 Andreas: But let me play Devil’s Advocate here. People talk about continuous partial attention, that people are constantly moderating what’s happening on their mobile phones, what’s happening on their short messages, their IMs and so forth. That certainly was not the case before. The way we deal with communication, and the way we deal with communication and information overload is quite different from what it was ten years ago. Linus: There is much more noise. Andreas: For instance, the way people communicate on Facebook is pretty [1:14:44.0 unclear]. You might hit me up on Facebook, on Ning, on Twitter, and I might see it or I might not see it. By contrast, if you send me a short message, you can be pretty sure that I will be looking at it. Linus: That is what has changed, though. Andreas: It’s both. I think it’s important to understand where we changed and where we didn’t change. I think people changed and people didn’t change. What do you guys think? Student: … Andreas: As we always agree, people getting paid and people getting laid. [Laughter] Linus: It works pretty well on Facebook, at least. It’s human behavior. The Mafia game is inherently addicting. I have no idea why. I’m not a game player, but whatever we did, we mimicked it. It’s addicting. Student: … Linus: That’s definitely a great point. What has changed is this whole app thing. Applications are hot right now Three years ago, there were apps out there, it’s just that no one had heard about them. I think Amazon even had their API opened up. Transcript by Tamara Bentzur, Page 26
  27. 27. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics People could create little things but it just wasn’t as hot. I think Facebook was the first player. It wasn’t hot because people didn’t know you could make money off of it. A lot of people said, “This is just a tool I could leverage for my existing website for this existing business. You can actually build a company just by making apps. Once people found that out, it became hot. I think people are more open with their information, definitely. You see all these celebrities now, on Twitter. It’s amazing what they share and how open they are. I would never have dreamed, three years ago, that it would have taken off so far, like Barak Obama and things like that, it’s leveraging those technologies and showing everything you have. 1:17:10.5 Andreas: Another big difference is that technology has been totally commoditized. Amazon spent a lot of money building its first round of service. Now, you can use elastic computing, scaling, you don’t have to worry about putting anything [1:17:29.2 unclear] anymore. The great thing about this is it has freed up our creativity, of coming up with ideas, and testing them rapidly, fading rapidly, and then 19 out of 20, one works. The technology we have really moved up the stack. The primitives are way higher than the way it used to be. Linus: Programming language, like Ruby, it’s an easy programming language. If you spend a week doing it, if you have some computer science background, you can pick it up. Back then, PHP was a little easier than Python or whatever, but I really believe Ruby is a great technology to build on. People can learn it very quickly. What will it be like in three years? Again, I said mobile will be very big. I think data will just become more freely shared. Services based around that will do very well, as well. What would I have done differently? I don’t know, actually. I have been kind of happy with what I’ve accomplished. Maybe I wouldn’t have been so spammy, I don’t know. It’s a very good experience about what really matters. I found out, through this process, that I do like technology but I don’t like it at the expense of doing things that aren’t very creative. I think it was a good learning experience. Maybe I would have done something more creative, but I wouldn’t have gone down the path I am right now. I’m happy with what I’m doing. I probably not have done much differently. What advice would I give? The best advice, and I really think this is important, is if you want to start a company, really figure out with your partners what the expectations are. Make sure all your partners are on the same page. Then just hammer at it. Transcript by Tamara Bentzur, Page 27
  28. 28. Transcript of Andreas Weigend Data Mining and E-Business: The Social Data Revolution Stanford University, Dept. of Statistics Student: … Linus: In terms of apps sort of dying in the U.S., I’m not saying they’re dying but people are kind of less sensitive to it. When the first request came, people were like “What the hell is this,” and they would go and check it out. Now, they’ve become desensitized to it. Also Facebook changed their profile so it’s less prominent. I don’t know if their behavior necessarily changed, it’s just that they got used to it. Student: … 1:20:56.9 Linus: That’s a great question to ask [1:20:59.4 unclear], when he comes. He probably knows more about it. I’ve been out of this game for a while. Just look at your requests; they’ve probably gone down. I know that I’ve seen reports that international markets are growing. Whether they’ll eventually get desensitized to it, I don’t know, I assume they will. Student: … Linus: I’m all about giving out data, but I have to check with my partners. If you are really interested, maybe we should talk afterwards. Ask Andreas for my email. We share data with the research group. One of my partners was doing his PhD thesis on scaling, so he has shared data with his old group. There are interesting insights. I think there are a couple of papers on it. Andreas: Okay, given that the [1:22:08.8 unclear] café closes at 4:00 and we need our shot of caffeine to get through the second half of class, let’s give Linus a very, very big thank you. Let’s be back here in 15 minutes, which is shortly before 4:00. Thank you. Transcript by Tamara Bentzur, Page 28