Scott YaraTHE SWEET SOUND OFBIG DATAEMC Greenplum’s Scott Yara on the Planet-AlteringPower of Data Analytics and Collaboration by Terry Brown EMC+
That rumbling sound? Information. It’s a dull roar of taps and keystrokes and spin- GREENPLUM CHORUS:ning disk drives, crowd noise, chaos, until people like EMC’s Scott Yara arrive. Then SOCIAL MEETS BIG DATA—the annoying buzz starts to sound more like music—a Big Data symphony of tellinganswers to infinite questions that helps us manage a frantic, data-mad planet. IN A BIG WAY Asked to sum up Greenplum Chorus,The maestro Yara is a co-founder of Greenplum, the database software company in the Big Data collaboration softwareSan Mateo, California acquired by EMC in 2010. Greenplum specializes in large-scale just announced by EMC Greenplum,data warehousing, analytics, and now, with the introduction of Greenplum Chorus, Scott Yara does not mince words.enterprise-wide collaboration. Chorus, according to Yara, is extremely big news. “I think what Java was to the Inter- “We have found,” says Yara, now Greenplum’s Senior Vice President for Products, net,” says Yara, “Chorus will be to“that as companies try to gain insight from their data, people and process challenges Big Data.”are just as vexing as the infrastructure ones. Chorus is really the first system thatfocuses on building a collaboration and social environment for doing the work of Yara, EMC Greenplum’s Senior Vicedata science. It’s really the first of its kind. President for Products, says that Cho- rus completes the puzzle of how to“Chorus is a breakthrough in what typically has been a pretty siloed and scattered analyze gigantic volumes of disparateprocess. Now we can integrate all the tasks that a company does to produce an insight data. To master Big Data you needfrom data and bring it all together in a single environment.” fast processing, sophisticated analyt-In other words, for the first time the table is set to take full advantage of Big Data— ics, and what wasn’t possible untilthe analytics platform to work the data and the collaboration environment to speed Chorus—the ability to collaborate.decisions about it. So does that mean that Big Data is ready to do what the pundits “Chorus provides an opportunity forsay—change the world? Yara weighs his words carefully when asked about that—he’s a data scientist to see the data as-reluctant (and says so) to boil his viewpoint into sound bites. But he does concede sets across a company with a simplethe presence of an inexorable trend. search interface. It gives them the“We’ve been at this for awhile,” says Yara. “These revolutions take a long time. But freedom to manipulate and analyzetoday there really isn’t an organization on the planet that’s not thinking very deeply that data as they see fit. You’ll haveabout using data, and that just wasn’t the case three or four years ago. It’s moving. your own private workspace andAnd that’s exciting to see.” sandbox where you can manipulate that data as you see fit. And then••• you have workflow and collaboration“Big Data” is just what it sounds like—huge volumes of data generated by anything tools to share that data or insightand anyone who works or plays or functions on line and in computer networks. Smart or process back to the company, inphones, laptops, PCs, mainframes. Social networks, internet shopping, online bank- a very agile way.ing, surveillance systems, pavement sensors, call records, health care information, It’s data at your fingertips, light-speedand so on and so on. Of course data has always been big, in the context of existing analytics for an extended, enterprise-technology—to Ebenezer Scrooge, Big Data was a tall shelf of dusty ledger books. wide team. “We wanted to make us-The recent need to name it in capital letters stems from more than the deluge of elec- ing data inside the enterprise a lottronic data and the inability of conventional systems to make sense of it. It also names more familiar and friendly,” says Yara.the fiendishly clever new processing and analytics technologies built by people like “And so providing social collaborationYara and his Greenplum colleagues. That’s what led EMC, the leader in information interfaces using common streams,infrastructure, to Greenplum—the need to offer ways to analyze the data stored and user profiles, the opportunity to sharemanaged on that infrastructure. things, hopefully that lends itself to an organizational dynamic that is a lot more natural.”
“For the last ten years, as the web has exploded and expanded around us,” says Yara,“the idea of answering questions about customers or buying patterns or whateverby looking through all this tremendous variety of information was technically impos-sible. So Greenplum developed a way to support multiple data types, using a parallelscale-out computing model that mirrored the internet, with analytics software supportthat has made some of these really hard things much easier.”Yara grew up in Minnesota, outside Minneapolis, matriculated to UCLA, studied com-puter science, and left early to join an internet startup called Sandpiper Networks,an early player in content delivery systems. Internet performance was spotty in theearly ‘90s, and Sandpiper built caching systems that sped the performance of bigwebsites. Sandpiper merged with the internet services company Digital Island, wentpublic, and was sold to the British telecom Cable & Wireless. “I think what Java was toIn 2000 Yara started a company called Metapa to capture and analyze information onthe Web. In 2002 Metapa merged with a similar startup, called Didera, whose founder the Internet, Chorus will beLuke Lonergan became Yara’s partner in the new venture they called Greenplum. to Big Data.”(Where did the name come from? As Yara and Lonergan cast about for a name, one SCOTT YARAof their employees asked his young daughter for her advice. She suggested “Apple.” EMC GREENPLUMTold that name was taken, she offered up Greenplum, which stuck. Kids bring a lotof naming help to the Big Data world—the developer of Hadoop, the open sourcesoftware that Greenplum uses to analyze unstructured data, named his product afterhis son’s toy elephant.)“It was a natural evolution for EMC as a business,” says Yara. “It’s a huge opportunityto provide analytic capabilities to customers once they’ve stored all that Big Data onEMC systems. Here’s the thing: Companies are starting to realize, and consumers aretoo, that their most valuable asset isn’t necessarily the intellectual property they’vebuilt, but rather the data that they generate as a consequence of their products, sothere is a very aggressive movement to monetize or gain value from that data.“So what we’re seeing is an economy being built around the data that’s being gener-ated across all industries and how to unlock the value of that data.”•••The first movers in the Big Data world were companies with Internet-enabled busi-nesses—search engines, online retailers, social networking sites. Now other orga-nizations—government, universities, offline companies—are learning the potentialof Big Data analytics, and the technology is ready to spread to every corner of themarketplace because compute and storage costs have dropped dramatically. Nowcompanies can not only afford to gather and store information—they can also affordto analyze it.Now Google can detect regional flu outbreaks a week to ten days faster than the Cen-ters for Disease Control and Prevention by monitoring increased search term activityfor phrases associated with flu systems. Cities are analyzing traffic data in real timeand making decisions to manage congestion before it becomes a story in tomorrow’snewspaper. Smart electric grids are helping homeowners monitor and manage their
power use. The Federal government’s USAspending.gov website tracks governmentspending and charts the data based on queries by anybody who visits the site.Big Data is woven into the physical fabric of our lives. The “Internet of Things”—thephysical assets that become part of the information infrastructure—is changing howcompanies create business models and people live their lives, giving systems andpeople the ability to capture, compute, communicate, and collaborate around in-formation. Embedded with sensors, actuators, and communications capabilities,such assets or “things” will soon be able to absorb and transmit information on amassive scale and, in some cases, to adapt and react to changes in the environmentautomatically.So how will lives be changed? Ask Scott Yara.“Let’s say you’re a big retail bank,” he says. “You might have 60 million customersthat use a huge number of different products—checking account customers, homeloan customers, credit card customers. Some communicate through the website.Some complain on Twitter. As the business owner, you want know who your top,most loyal customers are, and what kinds of products they’re using and not using.And what makes a great customer?”Big Data, says Yara, lets a business owner sift through all the data available, answerthorny questions, and know how to create a business that has more loyal customersand keeps the bad ones away.“Let’s say you’re a young woman who lives in a condo in the suburbs and works in an “The adoption of Big Dataoffice downtown,” Yara says. “When it’s time to go to work, you instruct your condo technology in thewhen to wash the dishes, when to start a load of laundry, when to open or shut thewindows depending on the weather. “ enterprise will be twice asThen you’re at the office, Yara says. Your washing machine sends you a text that says fast and twice as big asit’s out of detergent and can’t do the load you requested. The text includes a coupon the virtualization cloudfor detergent at the store you where you most often shop (based on a credit cardspending pattern algorithm). It beeps you when you’re near the store (based on geo computing market.”location data and smart car sensors) so you don’t forget. Your refrigerator texts to SCOTT YARAsay you’re low on lettuce so if you plan to have a salad with dinner tonight (based EMC GREENPLUMon the menu you programmed into the frig) you should pick up the veggie when youget the laundry detergent. Maybe there’s there is a two-for-one coupon included foryour favorite salad dressing.Or maybe you’re planning a trip or wondering about your bank balance or thinkingabout phone services. When you call up the airline or your bank and or your telephonecompany you won’t be irritated to learn they don’t have your latest purchase history oraccount details available. Those simple things will start to become much more com-monplace and over time the services themselves will seem more personalized to you.“That’s a big part of it,” says Yara, “but Big Data also represents the services andbusiness that we provide getting safer and more trustworthy because they can usethis information to better trap the bad guys. So the end result of Big Data is that
hopefully things start to naturally work more efficiently, more securely, and morepersonally in a way that feels very natural.”•••Of course for some that metaphorical drone of information background noise feels abit creepy—the sense of systems behind every wall, in every purse or pants pocket,on every car dashboard, overhead, in the ground, continuously gathering and for- “Big Data is about beingwarding and analyzing information on everything we do. Yara is aware, but not wary. able to take all this“I think with any new technology,” he says, “there will always be concerns over privacyor security or fraud. People had the same fears about the internet when it first ap- information, from anpeared —the idea of putting your credit card number online was pretty scary to a lot of incredible variety ofpeople back in ‘90s. And those fears are understandable. A number of companies arebuilding technologies to help make sure that data analysis has some level of encryp- sources, and answertion and access control and security. We will see a need for more awareness around questions we couldn’tthe ethics or protection of individual rights, and there are already firms tackling thesetough issues—the Electronic Frontier Foundation, Creative Commons, and others.” answer before.”The fact is that the Big Data revolution is here to stay. “It’s only going to get bigger,” SCOTT YARA EMC GREENPLUMYara wrote in the Huffington Post last year. “There’s no turning back the tide, no goingback to an era when we knew less.”How big will Big Data be? “We expect Data Science and data analytics to be perva-sive,” says Yara, “with far broader reach and impact even than previous-generationcomputational science. Big-data computing is perhaps the biggest innovation incomputing in the last decade. We have only begun to see its potential to collect,organize, and process data in all walks of life.“My simple guess is that the adoption of Big Data technology in the enterprise willbe twice as fast and twice as big as the virtualization cloud computing market hasbeen and that’s because while cloud computing is about the bottom line, with moreefficient and optimized infrastructure, Big Data is really about the top line, because theinformation itself helps you generate more revenues. It helps you get more profitableand so I think that the enthusiasm we see growing around Big Data is acceleratingat a pace that is faster than cloud computing itself.”For John and Jane Doe, Yara says, Big Data works because it can capture who we areas individuals.“I think that in the best ways,” says Yara, “Big Data is not something new. It’s anamplifier for existing human behavior and so when you are looking for things thatyou like, whether it’s an individual or music or places to eat or someone to manageyour retirement savings, you have a set of personal preferences that the technologyknows and protects. The serendipity comes when the options available are much moreclosely correlated with the things that you already like—it’s an extension of yourself.”That sounds pretty good.