Transcript of "Taking advantage of Big Data analytics"
TAKING ADVANTAGEOF BIG DATA ANALYTICSVaults of structured and unstructured data can point the way to higherrevenue and competitive advantages. But efforts to capture and analyzebig data need careful planning and firm shepherding. BY RICK SHERMANUNLOCKING THE BUSINESS BENEFITS IN BIG DATA2SMALL STEPS BRINGBIG REWARDS3ARCHITECTINGA SUCCESSFULDEPLOYMENT4WHO’S ONTHE TEAM?1BIG DATAQUESTION TIME
HOMEBIG DATAQUESTION TIMESMALL STEPSBRING BIGREWARDSARCHITECTINGA SUCCESSFULDEPLOYMENTWHO’S ONTHE TEAM?Numerous stories have examinedits use in applications from trackingcustomer sentiment and identifyingsocial media trends to successfullypredicting the outcome of the 2012U.S. presidential election. Based onthe amount of attention—and yes,hype—that big data technologiesare receiving, one would be forgivenfor thinking that their adoption anddeployment is already pervasive.But the fact is that most companiesare still trying to get a handle onwhat big data is, how to effectivelymanage it and how to get tangiblebusiness benefits from their invest-ments in big data tools.The first of those three questionsis easy to answer: Big data envi-ronments consist of high-volumepools of information, often includ-ing a variety of structured andunstructured data types that areupdated frequently. For example,data captured from social mediasites, Internet clickstreams, serverlogs, sensors and mobile networksis commonly found in big data sys-tems. The goal is finding businessvalue in that information—analyticalinsights that point to new revenueopportunities and ways to improveinternal processes and operations.But managing and using big dataisn’t so easy. In order to plan andimplement a successful big dataanalytics project, an organizationneeds to consider a range of dif-ferent technologies and determinewhat kind of architecture it is goingto deploy. Resource requirementsare another key factor to take intoaccount, as are the scope of theproject and how it should be struc-tured and managed. Let’s take acloser look at those four elementsand how best to approach them toput deployments of big data analyt-ics tools and applications on theright track.Initially, many big data projectsflew under IT’s radar; they werelaunched independently by dataanalysts, programmers and technol-ogy-savvy users taking advantage ofTAKING ADVANTAGE OF BIG DATA ANALYTICS 2“BIG DATA” IS A HOT TOPIC NOT ONLY IN IT CIRCLES ANDTECHNOLOGY PUBLICATIONS BUT ALSO IN BUSINESSMAGAZINES AND OTHER MAINSTREAM MEDIA OUTLETS.The fact is that mostcompanies are stilltrying to get a handleon what big data is.
TAKING ADVANTAGE OF BIG DATA ANALYTICS 3HOMEBIG DATAQUESTION TIMESMALL STEPSBRING BIGREWARDSARCHITECTINGA SUCCESSFULDEPLOYMENTWHO’S ONTHE TEAM?the open source nature of Hadoopand other components of the bigdata technology stack. But now thatbig data is squarely in the spotlight,projects often start off like the firstgeneration of data warehouse,enterprise reporting and businessintelligence (BI) dashboard projectsdid—with IT saying, “If we build it,they will come.” Whenever a newwave of technology is promoted soextensively, there’s a tendency forenterprises to buy into the hype andassume that the new technology fitstheir needs. Frequently, the result isexpensive projects that fail to meetexpectations and set back futureefforts to invest in, and benefit from,the technology in question.1BIG DATAQUESTION TIMEBefore blithely beginning a big dataproject, get answers to the followingquestions:D Why is the business interested inbig data? What are the long-termbusiness objectives for implement-ing big data analytics applications?Is it, for example, to track whatis trending on social networks?Increase the effectiveness of mar-keting campaigns? Improve supplychain performance? Knowing the“why” is essential to establishingthe business scope and determiningthe expected return on investment(ROI) for these projects.D Where in the organization is bigdata going to be used? Once youknow why you’re building a big dataanalytics system, you need to cata-log the business processes, applica-tions and data sources that will beinvolved. That information is essen-tial to assessing the impact not justfrom a technology perspective butalso from the standpoint of people,processes and the corporate cultureso you can develop a change man-agement plan up front. Not doingso can imperil efforts to unlock thebusiness value of big data.D What kinds of information needto be included in your big data imple-mentation? Discussions about bigdata often concentrate on data fromsocial media sites such as Facebook,LinkedIn and Twitter, but as men-tioned above, there’s a lot more toit than that. To begin the processof planning a big data analyticsdeployment, project managers needto determine which of the varioustypes of data that could be capturedare wanted for analysis by businessusers. Answering that question willalso help identify applicable big dataBIG DATA QUESTION TIME
TAKING ADVANTAGE OF BIG DATA ANALYTICS 4HOMEBIG DATAQUESTION TIMESMALL STEPSBRING BIGREWARDSARCHITECTINGA SUCCESSFULDEPLOYMENTWHO’S ONTHE TEAM?applications designed to handlespecific data types.A critical factor that many orga-nizations ignore at this stage is inte-grating structured transaction datawith unstructured forms of informa-tion as part of an overall data ware-housing and big data architecture.It’s terrific, for example, to use tex-tual data from social networks andother sources to analyze how wellyour marketing campaigns are beingreceived by customers and pro-spective buyers. But even greaterbusiness value can be derived bycorrelating that information withanalytical findings on how valu-able individual customers are—howmuch they’ve bought, what the prof-it margins were, whether they’rerepeat buyers and how much itcosts to retain them. Big data sys-tems can become big data silos ifthey’re designed solely for analyzingcertain information for its own sake,without a broader focus.D How big does your big data sys-tem need to be? Once the requireddata types have been identified,the anticipated data volumes andupdate frequency—that is, veloc-ity—need to be factored into yourplanning. Those two characteristicsare often coupled with data varietyand referred to as the three V’s ofbig data. Although rapid updatesand significant data volumes arecommonly assumed, the real-ity is that the needs of companiesvary widely based on size and theintensity of information usage.Accurately assessing your organi-zation’s requirements will help youdetermine the architecture and thetechnology investments needed toeffectively capture, manage andanalyze big data.2SMALL STEPSBRINGBIG REWARDSIt’s tempting to believe that big dataanalytics success is within yourgrasp provided you buy the righttechnology and commit enoughresources to the project. In real-ity, a big data deployment typicallyrequires significant systems anddata integration work; introducesnew tools and analytics techniques;and calls for new skills on both thesystems management and analyticssides. Trying to boil the ocean willresult only in doing too much, toofast—a recipe for frustration andfailure.For better results, an organizationshould plan to build its big data envi-ronment incrementally and iterative-ly. An incremental program is themost cost- and resource-effectiveSMALL STEPS BRING BIG REWARDS
TAKING ADVANTAGE OF BIG DATA ANALYTICS 5HOMEBIG DATAQUESTION TIMESMALL STEPSBRING BIGREWARDSARCHITECTINGA SUCCESSFULDEPLOYMENTWHO’S ONTHE TEAM?approach; it also reduces risks com-pared with an all-at-once project,and it enables the organization togrow its skills and experience levelsand then apply the new capabilitiesto the next part of the overall project.An architectural framework stillneeds to be established early on tohelp guide the plans for individualelements of a big data program. Butbecause the initial big data effortslikely will be a learning experience,and because technology is rapidlyadvancing and business require-ments are all but sure to change, thearchitectural framework will need tobe adaptive.3ARCHITECTINGA SUCCESSFULDEPLOYMENTHadoop, MapReduce, NoSQL data-bases and other big data technolo-gies initially were developed bycompanies looking to store andanalyze large amounts of unstruc-tured and semi-structured data thatweren’t a good fit for mainstreamrelational databases—Google andYahoo, for example. The opensource technologies have beenused successfully by those organi-zations and other early adopters,and they’re now widely available incommercial versions supported bybig data software vendors. But a keyissue to consider in designing a bigdata architecture is how much ofyour data analysis needs can be metby Hadoop and its cohorts on theirown.As I wrote earlier, combining theunstructured data prevalent in bigdata systems with structured trans-action data provides the most com-plete view of a company’s businessoperations, enabling it to deployanalytics applications that can yieldvaluable insights to aid in improvingbusiness processes and increas-ing revenue. This data integrationrequirement drives the need to cre-ate an enterprisewide architecturethat includes both types of data.In such cases, the architecturaloptions include moving all of therelevant data to either a big dataplatform or a traditional enterprisedata warehouse for analysis, orbuilding a hybrid architecture thatincorporates and ties together thetwo kinds of systems.Ultimately, because of the fun-damental differences betweenARCHITECTING A SUCCESSFUL DEPLOYMENTAn architecturalframework needs tobe established early onto help guide the plansfor individual elementsof a big data program.
TAKING ADVANTAGE OF BIG DATA ANALYTICS 6HOMEBIG DATAQUESTION TIMESMALL STEPSBRING BIGREWARDSARCHITECTINGA SUCCESSFULDEPLOYMENTWHO’S ONTHE TEAM?structured and unstructured data,it doesn’t make sense to try tohost both types of data on eitherof the different platforms. The bestapproach is a mixed architecturethat could also include data martsand specialized analytical data-bases, such as columnar systems.Choosing the hybrid option createsa logical infrastructure that lever-ages existing IT investments in datawarehouses and relational databas-es while enabling organizations tochannel data processing and analyt-ics workloads to the most appropri-ate platforms.Preconfigured appliance systemsare also emerging from a variety ofvendors for use in big data analyt-ics applications. The appliances mixhardware and software componentsand offer the promise of lower costsand shorter implementation timescompared with manually piecingtogether big data systems; they canalso reduce deployment risks andminimize the level of new develop-ment and management skills thatare needed in organizations.In addition, database and dataintegration vendors have addedcapabilities for exchanging databetween big data systems, datawarehouses and analytical databas-es, eliminating the need for exten-sive amounts of custom integrationcoding. For example, connectorsoftware for linking HadoopARCHITECTING A SUCCESSFUL DEPLOYMENTMIX IT UPa hybrid architecture for big data analytics can include the followingcomponents:n Hadoop and other big data tools for storing, managing and analyzingunstructured data;n A data warehouse and data marts for storing transaction data and theaggregated results of unstructured data analysis processes;n Standalone analytical databases for doing heavy-duty data analysis;n Data integration technologies—such as extract, transform and load tools,data virtualization software and Hadoop connectors—for tying togetherinformation on different platforms and delivering it to data analysts andbusiness users; andn Business intelligence and analytics tools.
TAKING ADVANTAGE OF BIG DATA ANALYTICS 7HOMEBIG DATAQUESTION TIMESMALL STEPSBRING BIGREWARDSARCHITECTINGA SUCCESSFULDEPLOYMENTWHO’S ONTHE TEAM?clusters and relational databaseshas become widely available.Because of the relative immatu-rity of big data technology, and theunder-the-radar nature of manybig data projects, implementationsoften have been treated as the WildWest of analytics application devel-opment and management, with norules or corporate standards. Butas the focus of big data projectsshifts to producing tangible and sus-tainable business value, more dis-cipline is needed. Building a hybridarchitecture to support big dataanalytics processes also makes iteasier to apply internal policies andprocedures on data management,governance, quality, security andprivacy.4WHO’S ONTHE TEAM?An often-overlooked aspect of suc-cessful big data analytics projectsis the importance of getting theright people with the right skills inplace, both to develop and man-age the systems and to use them.Assembling a project team is com-plicated by a shortage of technicaland analytics professionals with bigdata experience. As a result, orga-nizations likely will need to trainexisting employees to handle rolesthey can’t fill through hiring. That’sanother good reason to adopt astrategy of incrementally building abig data environment.The required IT resources includea mix of architects, developers andbusiness analysts, the latter to helpidentify relevant data and developproject requirements. On the userside, data scientists and other ana-lytics professionals with skills inrealms such as predictive and sta-tistical modeling as well as text ana-lytics are needed to do the heavylifting on analyzing data. In additionto their analytics skills, those work-ers must have extensive businessand industry knowledge, or workside by side with business userswho can provide that know-how,in order to generate useful insightsfrom big data analytics tools.In the past, predictive analytics,data mining and statistical analysisapplications often were constrainedby limited data volumes and aninability to include nontransactionaldata types. With the advance ofbig data technologies, analyticsWHO’S ON THE TEAM?With the advance ofbig data technologies,analytics pros have beenable to expand the breadthand depth of their work.
TAKING ADVANTAGE OF BIG DATA ANALYTICS 8HOMEBIG DATAQUESTION TIMESMALL STEPSBRING BIGREWARDSARCHITECTINGA SUCCESSFULDEPLOYMENTWHO’S ONTHE TEAM?pros have been able to expand thebreadth and depth of their work,increasing its potential businessvalue. Data scientists don’t comecheap; if your organization doesn’talready have people who can ana-lyze big data in-house, hiring themcan be a big budget item—assumingyou’re able to find candidates in thefirst place. But the ROI they makepossible can easily justify theirsalaries.There’s no doubt that big datatechnologies are currently at thepeak of hyped expectations. Andalthough there certainly is signifi-cant business value to be gainedfrom them, there are also significantrisks because of technology imma-turity, still-developing deploymentand management methodologies,and the shortage of availableexpertise.In addition, big data systems runthe risk of being the next data siloif they’re developed in isolationfrom existing BI, analytics and datawarehouse systems. Don’t turn ablind eye to the challenges and letyour big data analytics initiatives godown the wrong path. With big datanow on the radar screens not onlyof IT managers but also of corporateand business executives, the suc-cess—or failure—of projects surelywon’t go unnoticed. nWHO’S ON THE TEAM?BIG DATA ANALYTICS ROSTERThe project team for a deployment of big data analytics tools should includethese members:n Development managern Data and systems architectsn Big data developers(experienced with Hadoop,NoSQL and other big datatools)n Data integration developersn BI and analytics developersn Business analystsn Data scientists or analyticsprofessionals