The Science Of Social Networks


Published on

Presentation discusses scientific method, common pitfalls of social media experiments. Defines some terms, shows neat tools, tries to move discussion forward.

Published in: Technology, Education
1 Comment
  • Links collected here: uStream here even though I think I sound like a frog:
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hey Everybody – welcome to Social Media Club. I’d like to thank you for letting me speak today – hopefully it’s not something you’ll immediately regret. Also a big thank you to Bazaar Voice for hosting and also Kick Apps – I bet they know almost everything in this presentation as a significant chunk of it I think is in Bazaar Voice’s Intelligence Platform and Kick App’s stuff too. How’s that for a plug? Well, I just hope everyone will learn something. I know I will.So, I’m Ehren Foss, I run a firm here in town called Prelude Interactive. We do mainly nonprofit technology consulting and website building. Mostly PHP and SalesForce stuff – you know, donor database, volunteer databases, and advocacy tools are the big three in that space, the bread and butter, but there’s all kinds of cool stuff happening in the nonprofit technology sector, a lot of it crossing over into social media, and I’d be happy to talk your ear off about that too afterward. And since it was actually my first Social Media Club meeting back at IBM in aught-8 that convinced me to sign up for Twitter, I’ll mention that too. I’m @ehrenfoss – say hi!
  • So, if there’s a main idea of this presentation it’s that very little is known for sure about how social networks are composed, how they behave, and how to analyze or manipulate them. It is only recently that people have been able to accurately study, simulate, and do research on social networks. Until maybe a decade ago you just couldn’t get that kind of data, and you couldn’t get the kind of computing resources necessary to analyze it. Also, people hadn’t really out how to simulate things accurately. So, just don’t confuse how much is actually known with how much seems to be known.So, science still knows very little – it hasn’t had much time to react, nobody has. Some very interesting stuff has been discovered, though, and we’ll cover that.I know it’s dangerous to say this in a room full of social media professionals, but social media consulting – the profession and industry that has grown rapidly around the rapidly growing online social networks – also isn’t very far along. I’d say – and this is an outside perspective - it’s between a skilled trade and an art form – not an accurate science. It’s not flattering, but think of yourselves as auto mechanics in 1910, mathematicians a generation after the invention of, say, numbers, or connoisseurs of fine wine, scotch and cognac five hours into your 21st birthday. It may feel like expertise, and by the standards of the day it is. But there’s a long way to go, and future generations looking back will not be kind to you. Today’s bleeding edge is tomorrow’s steampunk and society for creative anachronism. But, don’t despair! I’m absolutely not condescending to your industry. I’ve met a lot of you, and I’m familiar with the energy and smarts that go into the work that you do. What I’m hoping you’ll do after this presentation, though, is slightly change your focus and your way of working with your own networks and those of your clients. I hope you’ll walk away more skeptical the fluffy pseudoscience pouring out into the web and at conferences, and I hope you’ll feel more empowered with your own ability to experiment and discover what makes this stuff tick. In high level academics, so I’ve heard, ONE better-than-mediocre idea which you can prove correct can make a career, get you into a tenured position and an endowed chair. In your line of work, ONE new idea, new innovation, is enough to set you apart from the rest. I hope to help you find it!
  • So, I’ve been told that people retain information better when you tell them what you’re about to tell them, tell em’, and then tell them what you just told em’. So, step one, here’s what you’re going to hear tonight.I’m going to give a background on a couple of big deal, canonical papers that everybody cites and that have been watershed moments in the study of social networks. I’m also going to go over some of the jargon, the terms, and tie them to the terms you’re familiar with. Maybe all you get out of this is the ability to namedrop - “Well, I don’t agree with Barabasi’s latest work but Duncan Watts seems to be on the right track”Next we’re going to talk about examples of things that network analysts DO use, and DO seem to work for many situations. Not all, but some. Third, and ultimately the most pragmatic and useful stuff is towards the end, I’ll briefly discuss and share some links with useful tools, toys, and software you can use in your explorations.Double finally, you might see this slide again.
  • So, first off, I am not the first person to address this topic, to attempt to bridge gaps between the scientific community and the community putting those ideas to practice. I had already found a few similar presentations online, but soon after the first tweets about tonight, Dan Zarrella wrote with a… polite note indicating that he saw some possible similarities and avenues of collaboration [slide].The titles are nearly identical – I will definitely admit that much. His presentation debuted about three weeks ago on O’Reilley – big name. He works for Hubspot in Boston. I do encourage you to check out his presentation if you haven’t come across it already – it’s pretty interesting. He concentrates on explaining and discussing scientific metaphors of memes, evolution, and selfish genes as they apply to messaging, and also gets into the psychology and sociology of social networks. His main goal is to help people spread their message – get retweeted – more easily. I’ll be concentrating on something I think is potentially more significant – network structures and tools for looking at those. But, since it appears he’s curious about ways our presentations differ, I encourage you to Tweet to him at any time during the presentation or afterward any time you notice a possible difference between my presentation and his. Remember two Rs and two Ls in Zarrella.Seriously though, we’re all here because of the powerful moments like this that social networks, especially open ones like Twitter, can create. It took him 30 seconds to set up a search in TweetDeck for “science social media,” he immediately found our tweets about it. If I WERE actually copying his presentation I would have been embarassed and caught red handed. There’s some lesson here about… anonymity? Tact? Who knows, anyway….
  • I also ran across a very interesting looking slide deck about analyzing social networks. Unfortunatley I couldn’t find the audio, so seek this guy out at conferences if you can, or we’ll see if we can dig that up – I know this is a place where I can’t get offended if people are staring at their phones the whole time, so if you happen to figure out a better link to these don’t be shy and share it. There are some very interesting related slideshare decks, and a modicum of searching will get you to a procrastinatory workday’s worth of youtube videos of other presentations. Anyway, you can all use Google and find this stuff. It’s out there. Great resources. Tonight is meant to pique your interest and encourage you to do more exploration and tinkering. All the answers are not here.
  • So, let’s say you want to change how you look at your use of social media and your work as a scientific pursuit – because you know that what you’re doing now is based mainly in experience, instinct, and other intangibles. Great! But, how long has it been since you’ve actually known the real steps of the scientific method. It’s a bit of a “Wizard of Oz” type thing most of the time, but the steps are amazingly simple and delightfully easy to screw up. Defining the question. Duh. But this is actually the hardest part. Very difficult. What are you specifically trying to figure out? Generally it starts out as “How do I get my message to more people?” or “How do I figure out who the important people are in my network” that kind of thing. But to get results you can trust out of the analysis you’re doing, it needs to be much more specific. “Are men or women more likely to retweet a message with these certain properties” or “Can the community respond differently to an implied gender in a brand’s messaging? How is this affected by different kinds of communities?” – again, with all other conditions the same.Gathering information and resources sounds a little like drudgery, but you live in a time when the “be open” paradigm has actually started to spread, and it has never been easier to gather LARGE amounts of social network data. Assuming you have access to large networks – those and your clients – or assuming you’ve purchased some tools like those of Small World Labs or Bazzar Voice – this is accomplishable. It’s not a snap, and sometimes it’s against the terms of service, so be careful of that. But, you can get this data by API hook or craigslist crook.Again, the hypothesis is a simple yet tricky step. Be as specific and narrow as possible here. Also, you must beware of bias both in the definition of your question and the framing of your hypothesis. So don’t ask “How much more successful will this technique be”, ask “Will this technique have a measurable, significant effect.” Scientists screw this up all the time – their funding and careers depend on them finding something conclusive, but the dirty secret is that the vast majorities of experiments are failures. The vast majority of your experiments will be failures too, if you do them right. If it works every time you are not doing it right.
  • Performing the experiment – pretty straightforward, but fraught with some danger. The hardest thing about performing experiments – scientific experiments – is making sure that the thing you’re trying to measure is ONLY affected by the knobs you’re twiddling, by the things you’re experimenting on. It is very common, and a great idea, to have a control group and an experimental group. Split up your contacts into two groups, perform the same thing on two different communities, etc. But if you run one test on Monday and your control group on Friday afternoon, having to do with retweets, that could really affect things. Likewise, if one community is made up of teenage World of Warcraft enthusiasts and the other a network of septugenarian chefs, can you really trust your findings? Determining and sequestering your test groups can be very tricky indeed.Analyzing data, interpreting it and assessing your hypothesis. So, you might notice a pattern here. Every new step seems easy but is fraught with pitfalls. Still true with this step, but at least if you’ve gotten this far you can go back and review everything you did and double check. I know a lot of this will fall under NDAs, or you won’t want to share openly with competitors, but the more people you share your results with, and the more people who retry what you did, the more certain you can be of your conclusion. Remember, they also need to control all the variables but the one being tested, need to be careful about their analysis. Very hard to guarantee!It is very easy to do bad science. It is more effort to do good science, but good science should be the only thing you’re interested in. Bad science is worse than worthless – it is misleading and potentially very damaging, especially the world adopts it as truth or dogma. You can do good science, and use your results to inform your practices, and then you can destroy your competition… or engage with them, whatever you’re about, you can do it better with careful study. You don’t need a PHD or Masters to do this, you don’t need advanced math degrees – though knowing statistics or having someone who knows statistics review things can really help. Be honest with yourself and make sure you’re avoiding the pitfalls, and you’ll probably do OK. And your experiment will still fail the first 9 times.
  • This flub, confusing causation and correllation when analyzing the results of an experiment, happens quite a bit, even to tweedy professors. Causation is when you know that one particular event or thing has directly caused another, or has significantly shared in another thing happening. Correllation is when two events happen together frequently. When you have an emotional or financial stake in the outcome of an experiment, it is very easy to get these two mixed up. Even in everyday life, humans do this all…the…time.The human mind loves being able to find stories and anecdotes to explain what is happening around it. My bicycle tire went flat because my street has a lot of potholes. Well, maybe I bought cheap tires or maybe I didn’t pump them up enough.Dan provides a good example of this problem, and it’s really hard for people to spot. Shows a graph that seems to make common sense – people want to hear happy messages right? That’s also present in the conventional wisdom of networking and, you know “How to make friends and influence people”. But, he has not proven his case. He’s showing a correllation – a line fit to his data that shows that fewer negative remarks occur with people who have lots of followers. But he has not proven that they have more followers BECAUSE they aren’t as negative, or even that being positive or negative influences it. Maybe people with more followers are happier people? Other things we don’t know here – that would make it hard to repeat the experiment, are how he determined what positive and negative means, or what the statistical error might be on these measurements. You know how those political polls say “+/- 3%?”. Well… that matters.So, whenever listening to someone’s explanation of what’s happening in a social network, REMEMBER about causation and correllation. The word “because” or “this proves” or “this shows” should furrow your brow. A better experiment here would have been to measure this stuff over time. Start with a group of people who have a similar follower count, say, 1000, and similar interests, topics, and tweeting habits.. Then see what happens and measure positive and negative effect.
  • Ok, so now we’ve had a refresher on the scientific method. It’s hard to remember seven or eight things at once, so just remember three: Define the question and hypothesis cleanly and simply without bias – beware causation and correlation – and share your results or repeat your experiment.So, are you sick of this slide yet? What is a social network? What is a network? Have you seen this before in presentations? It’s harder to precisely define than you’d think. You can list examples of social networks, but there are confounding ideas in there too. Can cities be nodes and interstates be the connections? What about ant colonies? Where are the nodes are they the ants? The whole mound? Can things without memory or individuality really socialize? Do computers socialize? If you move beyond Twitter, Facebook, LinkedIn and other online social networks – Ning, FourSquare, etc – you’ll find that networks are everywhere. We’re in this room because we work with networks that are already based in data, and accessible, and discrete. You can do your jobs because node, connection, and network have already been defined in this realm.IF, and I know this sounds crazy, but if some OTHER new technology comes along in 5 years and nobody cares about online social networks anymore – new disruptive technologies never happen but humor me – then the lessons you learn here will still apply somewhere. Job security, right?My definition is more broad, but you can ignore it if you want. I think a social network is any collection of the activities of entities which can socialize. Activities is more than textual communication, entities are more than people, and many things can ‘socialize’. The purpose of this slide is to introduce the accepted, dull, broad, and mostly unhelpful definition of a social network, and also to caution you against making big, longterm bets on a definition of social networks that is more narrow than this. Not every network is the same, no two communities is the same, and – do you think the FBI uses the same rules of thumb as you do when trying to unravel clandestine networks like terrorists or, say, a child pornography ring? Probably not.
  • So, you may have seen these before too, but I just wanted to do a quick reminder. I might accidentally use the wrong kinds of jargon here, so just have these in mind.A node is a thing – a person, an airport, an academic paper, a street intersection.An edge connects two things – a one night stand, an airplane flight, 6th Street between intersections at Red River and Brazos. A graph is a mathier term for a network. Same thing. Graph theory and network theory are almost the same thing. If you want to quibble read the references and call them up and argue.Finally, remember that the edges in some networks are bidirectional, some are directional. This matters immensely, just keep it in mind. Twitter has directional links – I can follow you but you don’t have to follow me. Facebook has bidirectional links – our friendship is mutual by definition.
  • So, getting into the research now, the very first modern ‘social networking’ experiment occurred in the early 50s. StanleyMilgram is the controversial psychologist who had people administering fake shocks to other volunteers to test their resistance or acceptance of authority in different situations. He also, in a different experiment, sent a bunch of packages from Omaha to Boston specifying that people had to hand them off to someone they knew directly, and that the goal was simply to get the package to the correct recipient. He found that the average number of handoffs was 6. His results have since been criticized, but this “Six Degrees of Separation” concept is engrained in lore, and also justified by some later research. This number is not magical. It is a property of human relationships and human dynamics as they exist in the modern world. If there were trillions of people spread across the solar system this number would be higher. In a small town or village, it is smaller. This “average path length” is the first of many examples of the kind of metrics and concepts we use to understand social networks. Why does this matter? Well, if someone brings up six degrees of separation you now know that it is only true for a network of certain properties (human relationships) and of a certain size (a few hundred million / a billion people). This might be seven degrees of separation once everybody on earth is connected and measurable. With a significantly larger or smaller network, the number will be different.
  • So, this idea that everybody is connected by a fairly low number of hops is an interesting one, and with a little more refinement it leads to the “Small Worlds” hypothesis of Duncan Watts and Steven Strogatz in 1998. The “Small World Labs” folks if they’re here probably have more stuff to say on the subject. Watts and Strogatz studied networks such as actors – where being in the same film constitutes an edge. Power lines and power transmission, and finally the neural network in worm brainswhen you have a random network – like, take a bunch of nodes, and randomly connect each node with some of the other nodes in the network, a few properties emerge. One is that the “clustering” is fairly low – most nodes have a number of connections that is close to the average. Another is that the “average path length” or “separation” between nodes is small. They discovered that with a certain class of networks they called “small world” networks, these properties held mostly true. They also found that the “average path length” scales with the size of the network, but in an interesting way. If you have a network that’s, say, ten times bigger, the path length might be one more.So, if you had 60 billion people on earth we might be talking about seven degrees of separation. Also think about the fact that you can fly to what, 10,000 cities in the world? And have you ever heard of more than three or four connecting flights? David told me not to say logarhythm, but that’s what it is.Another interesting finding is that because the network is randomly constructed, some nodes do have a lot more connections than others. Because the network is random, even though most nodes are mostly average, some are decidedly NOT average. some small number of them will have a very large number of connections – “super connectors” or politicians or CEOs – and some have very few. But, because those large connectors exist, they help the average path length remain low. As an aside, this is by far my favorite feature of LinkedIn. I’m connected to maybe 200 people, but from those 200 I can reach almost anybody I’m interested in meeting in about 3 or 4 hops. If you haven’t used the introduction feature there, you really should!So why is this useful? It explains some of the conventional wisdom of networking. It helps explain why some networks look they way they do. It ALSO gives you a clue that if you’re starting to work with a NEW network, you should make sure that it has the same properties as a “small world” network. If it doesn’t, we need to throw most of social media wisdom right out the window.
  • So, this is another importantpaper – to be taken with a grain of salt, because it has been challenged by the community a bit. “Scale Free” networks paper of Barabasi and Albert. Which is actually a rehash of de Solla Prince’s paper of 1965. Before social network and other data became highly available, and even now when scientists want something more mathematically pure to study with fewer variables – that they could control in a lab - they were having trouble creating networks for study that had the right properties. The principle of “preferential attachment” means that if you have a group of things with some property – like people with wealth, or people with connections, or academic papers with citations, etc – that when you add more of that resource, it is distributed according to how it is already distributed. People with a lot of money accrue more money than everyone else when the economy does well – the rich get richer. People who are already very well connected tend to add more connections at a higher rate than everyone else. You might think it’s common sense, but it’s the principle of preferential attachment that attempts to explain it. People were able to build random networks using preferential attachment (with a few tweaks as produced by the Barabasi Albert model) that mimicked the properties of real networks – notably that they seem to look similar if you zoom all the way in and out. I actually had a chance to work on these two years ago – a company wanted to demonstrate that if only, say, 10% of users adopted a certain feature, it would have enough critical mass to be useful to them and the feature would spread. We built some random networks of different sizes, with different percentages of adoption, using preferential attachment and then generated metrics for how well their feature would do in each of those cases. The company is still around, doing very well, but an NDA covers anything else. So why is this useful? “Simulated” networks can be built with preferential attachment, and you can also simulate how a network will grow over time using preferential attachment. So if you’re being asked by a client to “predict what will happen if…” – remember noone can predict the future – but you can give it a good ol’ college try with preferential attachment.
  • Community detection is the difficult practice of identifying significant subgraphs of a network. Cliques in a high school. Specific topic areas in a network of academic papers. This is the step that takes an enormous, tangled social network and begins to decompose it into components that you can understand and individually target. You have concepts of demographic groups, interest groups, and whatnot – but this technique should tell you what the groups actually are rather than what you think they are. I won’t spend too much time on this as the details are a little hairy to me, but the Girvan Newman algorithm for doing this addresses a few longstanding problems with what people were trying before. It starts by determining the “betweenness” of each node – sounds like BS, but there’s a definition: Betweenness is a measure of the number of shortest paths in the network that traverse the edge between two nodes. So, if you’re one of the only people who attends Social Media Club and the Austin Backyard Poultry association, and if the whole network were only composed of these two groups of people, the your betweenness would be high. The shortest path between most the members of each group would probably flow through you.So, with this method you take ‘edges’ or relationships with the highest betweenness, remove them, and then recalculate the betweenness of the edges in the graph affected by this removal, and then remove the next highest edge, and recalculate, and so on, eventually your communities are exposed as separate sub-communities.So why is this useful? Well, to find your actual communities of course! The key difference here is that you have assumed communities – film nerds, web nerds, and music nerds attending SxSW, but until you analyze what’s actually happening in that network you don’t know for sure what the camps are, or how to break them down from there.
  • Because social networks are so popular, and because they are now BIG MONEY thanks to YouFace and MyTube and LinkTweet, and because certain academics have become famous or rich by studying it – or copying earlier unrelated research - a lot more academics and people are piling on the bandwagon. None of us in this room are innocent, regarding our relationship with or position on said bandwagon…. But, academically speaking, right now you have a lot of people from a variety of math-heavy disciplines throwing their complex frameworks and theories at network analysis trying to be relevant.Sometimes it works, most of the time it doesn’t. My friend Jake His main point – that I agree with after reading around and thinking about this stuff, is that people are really good at inventing new terms, new techniques, new little tricks and cute math. But, those things are almost never universally applicable. That’s right, everything you’re hearing tonight probably won’t work perfectly in YOUR network. Each network is different. Each node is different, each relationship is different.So, this is why I’m encouraging you to do more of your own experiments and research. There are lots of things to measure, but knowing which ones are the right ones will depend on the exact application you have in mind. Are you concerned about retweets? About followers? About how many people click on your links? Or, are you concerned about the tone and appropriateness of the content of a network you control? Each one of these problems is very different and is inadequately addressed by science at this point in different ways.
  • So, we just mentioned metrics you can calculate – none of which are superior to the others without knowing what you’re trying to do or find out - this set of terms goes beyond the first simple set of node, edge, graph… a little bit. These are the things that people try to use to measure the structure of a network in more specific ways.Degree is simple – number of connections. Number of followers, fans, contacts, etc. Centrality asks “how central or importan to the network is this node?” There are different ways of determining this – by the degree or number of connections again. Closeness centrality - By how easy it is to reach the rest of the graph from that node (who are the right people to know)The last one – I don’t fully understand, but it’s a way of determining centrality or importance based on the importance of the nodes you are connected to. So, if you aren’t yourself important, but you are connected to the most important people, your eigenvector centrality goes up – there are many different definitions of importance here, remember.
  • So, there are also some commonly used calculations for the whole network. Clustering coefficient – weird name, weird definition. So, in a network whenever you have three nodes you have a possible triangle. I could be connected to Mike, Sue, and they could be connected to me and each other. It’s a triangle if everybody is connected. So, the clustering coefficient of a network – and this is a metric of the network as a whole – is the # of possible closed triangles divided by the number of possible triangles. This one will make your computer sweat, but that’s what computers are for.Centralization – this measures how much a network is dependent on a few central nodes. I’m not 100% sure how to calculate this – any ideas? I believe it’s an average over all the nodes for the difference between the possible number of connections a node has versus the number of connections it actually has, but that doesn’t seem quite right to me.
  • So, a little mid-presentation refresher here. Just summing up what we’ve covered so far and the main points that should be caroming around your head in between tweets to Dan Zarrella. Applying the scientific method in the right way is a challenge, but it can be done by anyone.
  • So, time to talk about guns with which we can shoot ourselves in the feet. First off, emotional content is really more of a text analysis and linguistic tool than one specific to social networks. Emotion extraction, tools that attempt to guess gender, age, and tools that categorize text along different attributes.These tools are really cool – they can tell you things about your tone, aggregate word choice, and other stuff that can be surprising. For instance, I show signs of being both aloof and “in the know” which is something you do want in a presenter, right?But, a caveat – you may be noticing a pattern here – you really need to look at how these numbers are calculated and make sure you think the methods are sound. It’s very possible to be frufru when dealing with emotion and you don’t want to be duped.So, put the analysis of emotion, gender, tone and other text analysis or linguistic analysis metrics in your quiver. Depending on what you’re trying to do, they might be useful. Especially if you’ve just been told by your client that their upper level management has decided that they want their communications to be “edgier” or “more youthful”. If you aren’t yourself edgy or hip, this can come in handy!
  • This slide isn’t really about science, but it is another example of how rigorous analysis of a social network can tell you things you want to know. I’ve been a bit fascinated by @superfakeconvio – a twitter account critical of the Convio company and their products – as I work in that space too and this person drives Jordan Viator crazy. Maybe you have a troublemaker in a client’s network. Maybe you suspect that competitors have invaded your private Ning community of one of your clients. Either way, you can use the scientific method here too, in a way. Hypothesis - @superfakeconvio can be uncovered by analyzing their network and @convio’s and trimming down possibilities. Data – lists of followers, retweets, and maybe a list of people who attended events where @superfakeconvio was active. Don’t forget the time of day that tweets went out, as that might be a clue too. Gather follower lists for @superfakeconvio, anyone who has RT’d them, anyone they have RT’d, and anyone who tweeted about the @Convio conference. Compare lists and remove singletons. Compare time of tweets. Linguistic analysis of @superfakeconvio written tweets.
  • If you have more money in your pockets, it might make sense to buy some software to help you do a lot of this stuff. I’m more of a DIY type guy, but I think depending on the scale of the networks you’re dealing with, some of these could really be a time saver.I haven’t used any of these, but I’m sure they have a lot of really cool stuff built in. You all know how to use a search engine though, so moving right along….
  • Hopefully if you’re in the business of networking you already know this – people are potentially far more useful than any software. In fact, unless you’re really comfortable with math or programming or have the money to pay for both, you’ll want to make some friends. Make friends with statisticians. If you can convince them you really do want to do things right, the statistics you need to weed out causation and correllation properly, and to figure out if your results are significant or not, are quite simple for statisticians or anybody willing to humble themselves and learn something new. I didn’t want to spend too much time picking on good old Dan Zarrella (remember, two Rs, two Ls) but in reading the comments (and surrounding chatter) of this post of his, he was just too good a mark not to go after again. In this post he does some interesting looking analysis of different attributes of retweets or viral messages vs non-viral messages. And, the retweet column is bigger or smaller as you might expect. But what a few commenters pointed out is that they have no way of telling if the results are statistically significant or not. What does that mean? If you only measure one retweet and one regular, it seems pretty silly to conclude that a tweet containing the word “banana” is more likely to be retweeted. If you study a million retweets, and banana is still significant, well, now that’s something. But where do things cross over from being not trustable to being statistically significant? Ask a statistician. Or the internet. You can do it! Second, make friends with programmers. Despite what we’re charging you, using the Twitter, Facebook, LinkedIn, and other APIs is really easy. We love APIs! Helping you collect and analyze data is the kind of thing that any programmer should be able to do, and would be a great training project for, say, someone at UT or ACC needing some real world experience. Larger, expensive platforms will of course save you a lot of time here, potentially, but if you really want to get your hands dirty, make friends with a programmer! Or, if you aren’t the making friends type, there is decent, and decently priced labor at the ready. Be sure to double check everything. Speaking from personal experience, I find helping out with social media problems and data mining to be very interesting, satisfying work. It’s real world, it’s not as technically challenging, and it leads to some really cool discoveries. So, with the right approach you could collaborate with maybe some math masters students or undergrads or something, or your nephew or niece who is a programmer on some projects. Go for it!
  • The Science Of Social Networks

    1. 1. The Science of Social Networks<br />Knowing How Little We Know<br />(And how to fix it)<br />@ehrenfoss | 512-673-7254 |<br />
    2. 2. Takeaways…<br />Science is starting from scratch<br />Beware the guru<br />Trade :: Social media consulting? Art :: social media consulting?<br />Accepted, simple ways of looking at users and networks.<br />You can do experiments to inform your practice.<br />
    3. 3. Overview<br />Background – Terms & Big Ideas<br />Metrics<br />Available Tools & Resources<br />
    4. 4. Let’s start a conversation!<br />
    5. 5. & Others<br />Danah Boyd – Headliner at SxSW<br />Node XL – Mark A. Smith<br /><br />And others:<br />Search for:<br />Network analysis<br />Social network analysis<br />Network visualization<br />Community detection<br />
    6. 6. Scientific Method<br />Define the question<br />Gather information and resources (observe)<br />Form hypothesis<br />Don’t forget about me!<br />
    7. 7. Scientific Method (Cont’d)<br />Perform experiment and collect data – Experimental Group / Control Group<br />Analyze data<br />Interpret data and draw conclusions that serve as a starting point for new hypothesis<br />Publish results<br />Retest (frequently done by other scientists)<br />
    8. 8. Causation vs. Correllation<br />Extremely common<br />Very foolish! Beware the “Because…”<br /><br />Maybe people with more followers are just happier people?<br />Need to know more…<br />
    9. 9. Social Networks<br />“A social network is a social structure made of individuals (or organizations) called &quot;nodes,&quot; which are tied (connected) by one or more specific types of interdependency, such as friendship, kinship, financial exchange, dislike, sexual relationships, or relationships of beliefs, knowledge or prestige.” - Wikipedia<br />Networks are more than Twitter, Facebook, etc. <br />Those are simply the networks with the most measurable and accessible data.<br />“Any collection of the activities of entities which can socialize. “ – This Guy<br />
    10. 10. More Definitions…<br />Node – A thing<br />Edge – A connection between two things.<br />Graph &lt;-&gt; Network<br />Directional / Bi-Directional – Do connections go both ways? <br />Degree – Number of Edges a Node has.<br />Path Length – Number of Edges between two particular Nodes. (typically “Shortest”)<br />
    11. 11. In The Beginning, there was Darkness<br />Stanley Milgram(of dubious fame…)<br />Sent 160+ packages from Omaha to Boston, average path was 6 hops. <br />Credited/blamed for Six Degrees of separation<br />…but it’s true for the human network.<br />E.g. Microsoft’s .NET Messenger measured 6.6<br />
    12. 12. Small Worlds<br />Watts & Strogatz<br /><br />Actors, Power Lines, and Worm Brains<br />Ordered Networks &lt;&lt; Real networks &lt;&lt; Random Networks<br />
    13. 13. Scale Free<br />Helps generate random networks similar to real ones. <br />Same properties at any size.<br />Preferential Attachment<br />Barabasi & Albert:<br /><br />Derek J. de Solla Prince<br />1965 (perhaps deserves credit?)<br />
    14. 14. Community Detection<br />Finding sub-networks<br />“Betweenness”? Seriously?<br />Girvan – Newman<br />Removes “betweenest” nodes & recalculates<br />Repeat until communities exposed.<br /><br />
    15. 15. What now?<br />Network analysis - So hot right now. But…<br />“…the higher level point is that there are a whole variety of features you might calculate for a given network or individuals within that network, but none are superior to the others a priori. So if people are looking to use this in a functional sense, it&apos;ll largely depend on the application they have in mind; even then, it&apos;s unclear if network information is superior to other, much dumber and easier to calculate features. (e.g. popularity, a.k.a. degree, is likely a more useful feature than centrality in practical settings where you&apos;d like to do something like market products via Twitter.)…” – Jake Hofman<br />Dogma might not apply to you or your network.<br />
    16. 16. Toys for Nodes<br />Degree - # of connections a node has.<br />Centrality – Importance of a node<br />Degree centrality (# of connections)<br />Closeness centrality (# of hops to other nodes)<br />Eigenvector centrality (what?)<br />Betweenness – Higher if shortest paths go through this node.<br />
    17. 17. Toys for Networks<br />Clustering Coefficient - # Triangles / # of Possible Triangles<br />Centralization – Do most edges come from highly connected nodes? Or an even distribution?<br />
    18. 18. Where are we?<br />Science is challenging, but you can do it.<br />Small Worlds, Scale-Free Networks, Community Detection<br />Know your metrics but know each application and each community is different.<br />Next…tools!<br />Don’t forget about me!<br />
    19. 19. Emotion, Style, Demographics<br />Positive / Negative Emotions<br />Implied Gender<br />Implied Age<br /> &<br /><br />
    20. 20. Implied Node Properties<br />MIT student’s “FacebookGaydar”<br /><br />Also:<br />Political Bias<br /><br />Friendship Prediction:<br />Be careful! <br />
    21. 21. Detective Work?<br />Who is @superfakeconvio?<br />Gather evidence<br />Sift Data<br />
    22. 22. Platforms <br />Radian6<br />Bazaar Voice <br />NodeXL<br />“Social CRM”<br /><br />Don’t forget about me!<br />
    23. 23. REALLY useful tools…<br />Statisticians<br />Procrastinatory Grad Students <br />Programmers<br />oDesk<br />eLance<br />Don’t be afraid to Network into these Networks<br />
    24. 24. Neat<br />Quick Twitter Graph<br />Info Chimps -<br />Wikipedia Article Network -<br />Analysis Toolkit -<br />
    25. 25. More! For the price of 1 $lide!<br />Jure Leskovec<br />Jon Kleinberg & David Easley<br />Jake Hofman’s<br /><br />Pete Skomoroch’s<br /><br />
    26. 26. Fin -<br />Science knows little<br />Each network is different<br />Get familiar with metrics and tools<br />Try stuff out<br />Beware Pitfalls<br />@ehrenfoss | 512-673-7254 |<br />