Big Data Analytics, Simplified - by Joanna Schloss


Published on

Intuitive content mining and analytics can transform a wealth of big data resources into timely strategic insights. The Kitenga Analytics Suite from Dell integrates natural-language processing, search and visualization tools to analyze big data.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Big Data Analytics, Simplified - by Joanna Schloss

  1. 1. 16 2013 Issue 02 | intelligenceSpecialsectionReprinted from Dell Power Solutions, 2013 Issue 2. Copyright © 2013 Dell Inc. All rights reserved.Competitive differentiators changequickly in a fast-moving world.Organizations of all sizes arelooking for a cost-effective wayto tap into the plethora of data available inmassive repositories and then extract valuablenuggets to help guide strategic planning. Forexample, comprehensive analyses of structuredand unstructured data may help inform newproduct or services development, heightenunderstanding of consumer buying habits andpreferences, or increase the quality of healthcareand other services.Even so, a surprising number of thoughtleaders still wonder how big data fits into theirorganization’s big picture — and whether theyhave the in-house skill set to manage it. In thisinformation-rich era, opportunities aboundfor mining an apparently unending stream ofunstructured data that is expanding rapidly incountless directions. At the same time, thecapabilities required for extracting, manipulating andanalyzing unstructured data to derive meaningfulinsights traditionally require costly system platformsand data scientists with highly specialized skills.Meanwhile, technology advances such associal media are swelling big data repositories atan astonishing rate. Facebook, Twitter and othersocial media outlets have emerged only within thelast several years, but they are generating reams ofunstructured textual data. Although much of thedata generated by social media may be considerednoise because of its often trivial nature, some ofthat data has the potential to be extremely valuableto specific audiences and organizations.Mining social media data sources hastremendous potential for organizations todiscover information that may help grow theirbusiness or sharpen their competitive edge.For example, a tweet about where people atelunch may not be of much interest in general.However, that post may mention a specific foodproduct that appeals to a certain segment ofthe audience. The posting may even generatea conversation in which participants commentabout the product and express an opinion aboutwhether they like it or dislike it.These seemingly trivial postings may containinformation that is valuable to the manufacturerof the food product. By mining nuggets of datafrom this posted conversation, the manufacturermay learn whether or not people like the product,discover suggestions for improving it or obtainideas that foment creating a new product. Thechallenge, however, is finding those valuableinformational nuggets within massive repositoriesof unstructured data — possibly petabytes andeven exabytes of data.Big data is often defined by three fundamentalcharacteristics: volume, variety and velocity. Itencompasses data sets that are so extremelylarge and complex that traditional databaseIntuitive content mining and analytics can transform a wealth of big data resourcesinto timely strategic insights. The Kitenga Analytics Suite from Dell integratesnatural-language processing, search and visualization tools to analyze big data.By Joanna SchlossBig data analytics, simplified
  2. 2. | 2013 Issue 02 17Reprinted from Dell Power Solutions, 2013 Issue 2. Copyright © 2013 Dell Inc. All rights tools are often not equippedto store and process the data effectively. Andof course, big data is not limited to data setsgenerated by social media. In healthcaresettings, for example, the proliferation ofscanned and often handwritten informationsuch as medical records dramaticallyswells data volumes. Financial servicesorganizations also collect extremely largevolumes of varied data for system logs, riskmanagement and market trends. In addition,big data generated at extremely high speedsincludes information collected from millionsof daily back-end operations that supportweb retailers and service providers, sensordevices or streaming video from camerasoperating 24x7.Through its recent acquisition ofQuest Software, Dell offers the KitengaAnalytics Suite, a comprehensive big dataplatform for searching and analyzingmassive amounts of data through natural-language processing, machine learning,search and visualization capabilities. Theplatform’s content mining and analyticshelp organizations transform complex andtime-consuming manipulation of large dataresources into a rapid and intuitive process.Bridging the data analytics skill gapA number of technologies includingthe Apache™ Hadoop™ framework haveemerged to handle the massive storagerequirements of big data sets. Thesetechnologies are designed for workingwith very large volumes of unstructureddata. The Hadoop framework comprisesthe MapReduce paradigm and the HadoopDistributed File System (HDFS).Amassing huge data sets is just the firststep. To derive valuable insights from bigdata, organizations must be able to extractactionable information. Preparing massivevolumes of collected data for meaningfulanalysis involves the following steps:• Integrating unstructured data with carefullyconstructed and managed existing datasets, such as enterprise resource planning(ERP), customer relationship management(CRM) databases and data warehouses• Sorting, characterizing and classifyingdata within big data sets — for example,finding product references in socialmedia feeds and tying them tosentiments, opinions or suggestions, oridentifying specific health characteristicswithin a sea of medical records• Presenting information visually in auseful, compelling manner thatmakes it easy to reveal insights andidentify opportunitiesThe capabilities required to expand anenterprise’s knowledge base often call for arare individual who can bridge the skills gapwithin traditional IT organizations. The ITadministrator who may have implementedthe organization’s Hadoop clustering isoften tasked with collecting the data.However, that administrator may not havethe expertise, befitting a data scientist, toeffectively manipulate and analyze big dataonce it has been captured.Theoretically, individuals responsiblefor gathering, analyzing and formulatingthe data into meaningful informationshould possess the characteristics of thefollowing roles: Like IT professionals, theymust be data savvy. Like statisticians,they need the knowledge and insight fornumbers to be able to understand dataat a very granular level. And like businessanalysts, they need the business know-how to synthesize massive amounts ofdata into information. These skills, whichare frequently attributed to a mythicaldata scientist, may not be readily available.As a result, many organizations mayunderutilize the data they collect fromrepositories such as Hadoop clusters.The Kitenga Analytics Suite is designedto help organizations distill key data fromenormous volumes of data and thenanalyze it to extract actionable insight. Inthis way, Kitenga helps bridge the gap in
  3. 3. 18 2013 Issue 02 | intelligenceSpecialsectionReprinted from Dell Power Solutions, 2013 Issue 2. Copyright © 2013 Dell Inc. All rights reserved.the data scientist skill set and heighten returnson the big data investment. It provides advancedcapabilities such as natural language processingand sentiment analysis to sort, characterize andclassify data within large data sets. And it offersadvanced visualization capabilities — includingtreemaps, heatmaps and more — to helpline-of-business users spot patterns or trendsin the data they are analyzing. Moreover, theKitenga Analytics Suite provides visualizationson a platform that can be easily deployed oncost-effective, industry-standard infrastructureconsisting of Dell™ servers, networking, storage,clients and software.The importance of big data to anorganization lies in the analysis and insightderived from the data. Organizations that findthe valuable information within extracted datahave the opportunity to act on fresh insights,identify emerging opportunities and achievecompetitive advantage.Bolstering existing data setsWhile big data analytics holds the promiseof unprecedented insights, the amount ofuseful data from social media, blogs and otherunstructured object repositories is often quitesmall relative to the amount of data stored inthem. For example, a particular person may postthousands of tweets. But in 1,000 tweets, thenumber that may be pertinent to an organizationis generally quite small. And of those relevanttweets, the number that may provide meaningfulinsights may be smaller still.This potentially low signal-to-noise ratio forunstructured data is often in sharp contrast to thecarefully structured data created from traditional,structured systems such as applications anddata warehouses. Conventional data stored inrelational databases is often highly governed, highlyqualified, carefully managed and costly to maintain.Organizations seeking to combine large volumesof unstructured data with existing structured datacan run the risk of inadvertently minimizing thevalue of that well-maintained data.Organizations should, therefore, take careto sort, characterize and classify theirunstructured data before combining it withexisting data to help preserve the quality oftheir data repository and its capabilities forexpanding insight and knowledge. If doneskillfully, the marriage of large volumes ofunstructured data and existing data storescan significantly enhance the value of anorganization’s knowledge base.Searching big data to identify opportunitiesLarge data sets — whether they are derivedfrom social media, Hadoop clusters or locallycreated repositories — offer tremendousopportunities for organizations across a widerange of industries. Big data resources can besearched for applications in consumer products,healthcare, financial services, manufacturing andmany other industries. Organizations may faceparticular, industry-specific challenges, but manyshare the common challenge of extracting theprecise, actionable insights they need from themining and analysis of big data.Enhancing healthcare managementThe healthcare field is one industry that is activelyinvolved in big data initiatives. A growing numberof healthcare organizations — from hospitals,clinics, physician groups and medical officesto specialized practices and other healthcarenetworks — have digitized their records and arebeginning to engage in big data analysis. Manysuch organizations have scanned vast amounts ofunstructured, often handwritten patient recordsinto large data repositories. In addition, manyhealthcare organizations are utilizing traditional,structured electronic medical record (EMR)database systems.A rising challenge that many healthcareorganizations face is identifying useful patientinformation from the enormous data setsof medical records, and then merging thatinformation with existing EMR data. Extractingactionable insights from volumes of patientrecords helps improve patient care and enhancehealth practices in communities. For example,correlating patient identity and health informationfrom scanned, handwritten forms and recordswith EMR data helps foster well-informed patientcare decisions.
  4. 4. | 2013 Issue 02 19Reprinted from Dell Power Solutions, 2013 Issue 2. Copyright © 2013 Dell Inc. All rights reserved.Healthcare data also can be mined todetermine patients with a predispositionto a particular condition such as diabetes.Equipped with that information, physiciansand healthcare professionals can proactivelyencourage patients to make lifestylechanges that help them mitigate potentialhealth problems.Understanding consumer sentimentConsumer research is another area in whichthe application of big data analytics can bequite helpful to manufacturers, retailers,vendors and financial service providers,among many others. Organizations thatmanufacture a wide range of products,including clothing, cleaning supplies andother sundries, are mining social mediaoutlets such as Facebook and Twitter tounearth consumer sentiment in regard toproducts and their use.Conducting sentiment analyses ofunstructured data culled from socialmedia sites can offer valuable informationfor organizations focused on consumersatisfaction and buying habits. Sentimentdata involves capturing text that conveyshow consumers feel about a product, suchas whether they like it or dislike it; howsatisfied or dissatisfied they are with the wayit works, the way it feels, how it looks, andhow it tastes; or other reactions.Analyzing sentiment data can provideinsight into a specific way in which aproduct is used that may in some casesspark further innovation. A consumermight blog or tweet about a successfuloutcome by using a product for a purposeother than the one the manufacturerintended — for example, using dishwashingdetergent to remove a stain from a wall.Further analysis of conversation stemmingfrom that experience may reveal aconsumer need and initiate a research anddevelopment project for reformulatingan existing product into a new one with adifferent purpose. Organizations that cantap into huge stores of unstructured dataand derive consumer sentiment data foranalysis gain a valuable tool for helpingimprove both the functionality of existingofferings and identifying opportunitiespreviously unimagined.Transforming datainto valuable insightsThe Kitenga Analytics Suite is designedto help organizations rapidly, easilyand cost-effectively transform largedata volumes into actionable, insightfulinformation. The suite offers powerfulanalytics designed to sort, characterize,classify and visualize massive data sets.Kitenga Analytics Suite works with multipleHadoop distributions. By supportinganalysis of high-volume, high-velocitydata from a variety of sources, KitengaAnalytics Suite extends big data analyticswell beyond the realm of traditionalbusiness intelligence (BI) tools andanalytic engines (see figure).Powerful data analysis solutionsavailable in Kitenga help analystscategorize and classify data. Automatedindexing of Hadoop clusters and otherdata sources transforms unstructured,semi-structured and structured data intosearchable categories and relationships.Powerful natural language processingcapabilities extract meaning from writtentext beyond traditional tagging methodstypically utilized in big data deployments.Advanced sentiment analysis harvestedfrom social media sources helps assessand determine the feelings and state ofmind an individual may be experiencingbased on a written statement, and thengroup those sentiments into meaningfulcategories. Additionally, named entitiessuch as people, places and organizationscan be projected onto timelines fortemporal tracking.The Kitenga Analytics Suite also helpsorganizations ensure that insights derivedfrom big data sources enhance preciousexisting data sets rather than dilutingKitenga Analytics Suite: Extending analysis of massive data sets beyond the realm of traditional BI toolszyVelocityVolumexVarietyKitengaAnalyticsSuiteTransform multidimensional big datainto actionable intelligenceRecent analyticsdatabasesTraditionaldatabase-derived BI
  5. 5. 20 2013 Issue 02 | intelligenceSpecialsectionReprinted from Dell Power Solutions, 2013 Issue 2. Copyright © 2013 Dell Inc. All rights reserved.their value. An easy-to-use graphical userinterface enables categorized and classifieddata to be combined with existing datain a relational database managementsystem (RDBMS). Data elements can alsobe blended with contextual informationsuch as demographic, geographic andeconometric data to further enhance thepotential of analysis.The interface provided in KitengaAnalytics Suite is easy to use and includeskey visualization capabilities designed tounlock the meaning behind extracteddata. Treemaps display hierarchical data innested rectangles to represent data setssuch as sales cycles, manufactured unitsand supply chain activity, and heatmapsare well suited for sentiment analysis(see figures).As a result, this intuitive interfaceenables line-of-business users to effectivelyvisualize and efficiently perform deepanalyses of big data that has beenrigorously extracted from diverse sourcesand carefully integrated into their enterpriseknowledge base.Bringing big data analyticsinto the mainstreamTimely BI that illuminates emerging trendsand helps decision makers envisioncreative new opportunities is key tostrategic research and planning efforts —and crisp competitive differentiation ina fast-changing marketplace. A cost-effective big data analytics platform allowsorganizations to extract valuable data fromrich, far-reaching information repositoriesto expand their knowledge base in ameaningful way.Analyzing collections of unstructured,semi-structured and structured data mayalso involve mining massive volumes ofdata from social media along with hugein-house data stores, extracted fromHadoop clusters and other sources.Though much of the data gleaned fromblogs, tweets and Facebook pages may“Timely business intelligence that illuminatesemerging trends and helps decision makersenvision creative new opportunities is keyto strategic research and planning efforts —and crisp competitive differentiation ina fast-changing marketplace.”Heatmaps: Analyzing sentiments with color-coded visualizationsTreemaps: Representing data from frequent monitoring of complex activity in a hierarchical visualizationof nested rectangles
  6. 6. | 2013 Issue 02 21Reprinted from Dell Power Solutions, 2013 Issue 2. Copyright © 2013 Dell Inc. All rights reserved.Accelerating the return on investmentin end-to-end information managementThe intensifying volume, variety andvelocity of data flowing into and acrossenterprise networks is evolving businessanalytics and business intelligence (BI)strategies. But as the torrent swells,the prospect of deriving value quicklyand cost-effectively from all this databecomes ever more challenging. Toexpedite the process, IT leaders seekbest practices, systems and solutionsthat streamline lengthy and labor-intensive information managementprojects without costly replacements ofprior enterprise investments.Part of the Dell Software InformationManagement portfolio, the Dell ToadBusiness Intelligence Suite helpsbridge the gap between IT analysts andbusiness analysts. The Toad BusinessIntelligence Suite is designed to respectand accommodate the requirements ofenterprise compliance and governancemandates. In a single solution, theToad Business Intelligence Suite allowsorganizations to develop, build andmanage a business analytics frameworkthat meets the needs of both IT and line-of-business users while coexisting withestablished enterprise BI deployments.In addition to the Toad BusinessIntelligence Suite, Dell Software provides acomprehensive tool chain to help simplifyend-to-end information management.The tool chain addresses the needs oftoday’s surging data landscape througha diverse range of capabilities:• Toad productivity software formanaging environments that supporttraditional data sources• Turnkey appliances that providetremendous efficiency throughthe convergence of hardware,software and services that areoptimized for fast return oninvestment (ROI) and low totalcost of ownership (TCO)• On-site and in-the-cloud dataintegration or replication thatenables data movement to meetthe needs of business analysis andperformance management• Big data analytics to separatethe signal from the noise in themountains of data being generatedby myriad systemsThe Dell Software InformationManagement portfolio provides boththe breadth and the depth of solutionsthat enable organizations to advancebeyond static enterprise reporting todynamic business analytics and insight —while complying with governancerequirements and integrating withexisting BI investments.seem trivial, selectively extracted datanuggets can be successfully woveninto an organization’s well-maintained,structured information database. Thisfresh information may contributeto innovative product development,enhanced efficiency of services deliveryor imaginative ways for advancingbusiness and organizational outcomes.Today, the diverse skill sets requiredto derive insightful information fromenormous data repositories are oftencostly and difficult to find in a singleindividual. Through its acquisition ofQuest Software, Dell offers the KitengaAnalytics Suite, a comprehensivebig data analytics platform that runson cost-effective, industry-standardDell hardware. The suite enablesorganizations to bridge the IT skills gapand empower line-of-business users withinformation modeling, visualization andmany other tools that heighten businessacumen — along with the return oninvestment in big data analytics.AuthorJoanna Schloss is a business intelligence and bigdata subject-matter expert and evangelist in theDell Software Group at Dell.Learn moreKitenga Analytics