Smarter Big Data Strategies


Published on

Companies from across sectors are experiencing exponential growth in data as social interactions, rich media and a variety of devices generate new content. A tidal wave... of digital data is getting created through emails, instant messaging, survey videos, images, RFID tags, web text, blogs, geo-location devices, collaboration platforms like Twitter and Facebook, and so many other sources.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Smarter Big Data Strategies

  1. 1. Insights Smarter Big Data Strategies - Girish Khanzode Companies from across sectors are experiencing exponential growth in data, thanks to new content generated by social interactions, rich media and a variety of devices. This vast amount of digital data is getting created through emails, instant messaging, surveys, videos, images, RFID tags, web text, blogs, geo-location devices, collaboration platforms like Twitter and Facebook and so many other sources. This data, when combined with in-house legacy data, is a potential goldmine of opportunity for organizations of all types.
  2. 2. The scientific and creative analysis of this large complex data in real time can generate deeper insights offering 360-degree perspectives around customer sentiment and behavior. Companies can respond to market trends dynamically, improve operational efficiencies and gain significant competitive advantage. Smarter analytics, machine learning and intelligent algorithms can help discover new patterns that can result in identification of more patterns, and replace intuitive management decision-making with the one driven by facts. With proper data analysis in place, many companies are now able to answer questions that were never asked before. Clearly, successful Big Data management can radically transform organizations. A retailer using Big Data can improve operating margin by more than half; US healthcare can save more than $300 billion every year; consumer goods and service companies can create fine-grained customer segments in real time, in order to improve the precision of targeting promotions and advertising; healthcare companies can discover new treatments faster; and investors can predict stock market events with higher accuracy.2 | Infosys
  3. 3. Challenges of Big Data InitiativesAlthough Big Data has its promise, it has its perils too. Companies trying toleverage it can face significant challenges. Studies indicate that more than80% of Fortune 500 organizations will fail to take advantage of Big Databy 2015. Failure on this front represents serious risks to a company andcan disrupt its business. On the contrary, smarter execution can propel theorganization into a new trajectory of growth by attracting more customers,improving sales margins, introducing newer products and services much faster,and achieving higher satisfaction levels and loyalty of existing customers.The size, speed, complexity and diversity of Big Data can push the capabilitiesof traditional data management technologies towards an extreme and in mostcases also cause them to fail. This challenge is further compounded by theneed to manage data in the context and in real time. The collection, storage,processing, analysis and visualization of this data can overwhelm existing ITinfrastructure. Timeliness, privacy and shortage of relevant skillsets are otherimpediments in implementation.Companies could look at the following 10 practical strategies to successfullyleverage Big Data: Infosys | 3
  4. 4. 1 Because data storage costs are falling continuously, companies have a tendency to store excessive data for future use. However, they need to avoid this practice since the costs of data collection, storage, and analysis - considering the rapid velocity of data growth - can quickly rise significantly. Top Business There is a risk of readily available or easily acquirable data becoming the driver of Big Data strategy. Instead the strategy should be driven by the data’s potential to add value by solving major pain points or Needs as yielding healthy return on investments, for instance. Primary Drivers The first step should be to identify a set of key questions targeted at areas that the company wants to grow. For example, an online business might be focused on fine grained customer segmentation, cross-selling, strengthening multi-channel reach or improving its recommendation intelligence, whereas a manufacturing unit might be looking at improving product designs. These are the type of questions that should drive Big Data implementation strategies. These questions should further get mapped to a clear set of business requirements, which are critical to identify timelines and resource needs (like skillset). The implementation process should be iterative in nature to ensure that it meets the needs of continuously evolving key questions, enables the collection of the right data and helps garner the intended insights. 2 Big Data projects require different skillsets compared to traditional IT needs. Companies require data scientists, managers and engineers with expertise in multiple domains like computers, business operations, machine learning, statistics, analytics, advanced mathematics and visualization tools. Data scientists should be able to - formulate models and perform data mining, spot patterns and Criticality of the associations, and create appropriate logic that can process data into business decisions. Right Skillset Data managers should be conversant with business operations and capable of - asking the right questions for generating business insights, mapping results to formulate business strategy and creating recommendations. Data engineers should be able to - design, develop and maintain applications in Big Data environments, program visualization tools and dashboards, and maintain the infrastructure to perform analytics. Equally important is creativity and the ability to leverage data to improve business growth. Since Big Data is a recent phenomenon, there is a shortage of trained and qualified data professionals. This shortage will continue to exist in the next few years considering the huge demand. Lack of suitable talent will prove be a major hindrance to Big Data strategy implementation. Supplementing hiring with appropriate training and redeployment of existing staff can mitigate some part of this risk. 3 In theory, more data means better analysis; however, we live in a real world with limitations where many challenges such as the cost of data storage, manipulation and computational power rise with volume. Big Data, helped by the tendency of data to proliferate quickly, can force traditional data platforms to scale beyond the levels they are not designed for. Beyond petabytes of datasets, current warehouse Optimizing infrastructures become uneconomical. Storage Needs It is important to note that more data does not automatically mean higher accuracy and in some cases, may even introduce noise that can obscure weaker patterns. It also increases the risk of false discoveries resulting in insights that will not yield positive results. Growth in computing processing power and drop in memory-capture prices is making it possible to build data on the fly and process it in-memory. This strategy reduces the need for very large storage capacity. When storing Big Data, it is also important to remember that replication systems can introduce security vulnerabilities and RAID at petabyte scale can lead to data loss. For efficient processing, data must be split and stored in different segments based on its value, sensitivity and costs involved. The most valuable data should be housed inside the corporate data warehouses, less valuable data on cheaper commodity storage like Cloud, and the rest should be put within analytical tools. When results are desired, all of this data can be pulled together dynamically and analysis can be performed on-the-fly. Metadata needs special attention since it is growing at twice the rate of other data. Companies, especially those in the starting phases of the Big Data drive, should set up a clear set of rules and guidelines detailing which data should be retained, archived and how long.4 | Infosys
  5. 5. 4 Due to greater variety and volume, the acquisition of Big Data needs infrastructure capable of supporting flexible data structures, very high transaction volumes and the ability to process queries in a distributed environment, along with delivering predictable latency after a query is fired. While network performance is critical, communication paths increase significantly with the number Scaling the of nodes in a cluster. The transfer of a larger dataset requires higher networking bandwidths and WAN Infrastructure optimization technologies. A multimode cluster using HDFS can create high levels of traffic across the network since Hadoop spreads the data across the member servers of the cluster. Direct attached storages (DAS) can help create islands of information that can be processed by analytics applications but impair data and resource sharing with other servers. While SANs offer better throughput and scalability, local storage is cheaper and performs better overall. Storage appliances designed for Hadoop and Big Data analytics are another option. A decision on a Big Data storage solution must take into account space requirements, data growth, frequency of analytics execution and type of data processed. All these factors coupled with security, allocated budget and processing time should drive Big Data investments.5 While collecting Big Data, a significant amount of garbage can creep in. Poor quality data can result in faulty analysis especially when finding outliers. With massive amounts of data getting generated from machines and sensors, the potential for pollution in data goes up exponentially driven by factors Ensuring Data like transmission errors, incorrect device calibrations, inaccurate device measurement methods or poor device performance under peak loads. Stringent quality control and inspection mechanisms along with Quality good data governance are critical to reduce data ‘obesity’ and derive insights that are correct Data typically becomes less valuable to the business as it ages. Conservation policies of data, based on timelines, can play a significant role in preserving data quality in analysis. In addition, hygiene techniques like quality maintenance, profiling, standardization, ensuring consistency and integration, along with rules-driven testing should be part of the Big Data strategy.6 At the end of the day, the success of Big Data initiatives will be measured by the widespread usage of analytic applications by business users. It will depend on their ability to easily create data sets that fit their needs, and their ability to feed these to analytical tools developed as part of the Big Data Maximizing User initiative, without the help of corporate IT, to build insights in real time. Adoption Growth and maturity in Cloud and appliances, coupled with the arrival of newer analytics tools, is resulting in users focusing more on business value than underlying technologies. Important qualities like system performance, scalability, availability, user experience and manageability will be critical to the adoption of Big Data applications within the organization. In addition, making these applications accessible from multiple device types will improve user adoption significantly considering the trend of bring- your-own- device gaining traction. Infosys | 5
  6. 6. 7 Big Data’s value creation potential depends on users’ ability to seamlessly access data for analysis. Typically, data like customer records, resides across multiple departments, geographies or silos thereby Importance of creating obstacles to its sharing and aggregation. This is a problem when companies want to integrate Data Access external data acquired from third parties with their own corporate data pool to create insights. This lack of a centralized customer focused view can hinder the organization’s ability to exploit Big Data. An effective enterprise data access strategy must include interoperable data models, transactional data architectures, interoperability standards, analytical architecture, security and compliance. 8 Successful Big Data processing is dependent on rapidity of data acquisition and its analysis. Big Data systems should be quickly adaptable to changing market realities on the ground and not constrained by traditional long application development cycles that can run for many months and beyond. Faster Response to Market Big Data comes in several forms, such as device / sensor and scientific information, bar codes, vehicle telematics, surgery videos, stock market trades, x-rays, telephonic conversations, contracts, advertisements, Conditions spreadsheets, audit trails and so on. As the type or a source of data changes, it should be easier to adapt implementation to this new data and such changes should be delivered in shorter duration of two-three month cycles. Overall, the whole philosophy of analytic solution implementations should be driven by the fact that Big Data will continue to evolve in all aspects and Big Data applications must be able to respond in the shortest possible time to reap the rewards and keep the analysis relevant. 9 The management of Big Data typically involves predictive analysis, natural language processing, image analysis or advanced statistical techniques such as discrete choice modeling and mathematical optimizations. This requires technologies that are quite different from the traditional ones. Building Big Data solutions should focus on processing data in a manner that avoids costly movement of large Appropriate volumes of data, apart from the need to handle very high data flow rate and a large variety of formats. Technology Apache Hadoop is utilized to deliver analytics solutions in distributed and massively parallel environments Ecosystem running on a cluster of commodity hardware to filter and capture high-velocity incoming streams while keeping the data on the original data storage clusters; and providing fault tolerance and scalability. The Hadoop Distributed File System (HDFS) is commonly deployed for distributed storage of Big Data. NoSQL databases trade off integrity guarantees with high scalability and are well suited for dynamic data structures involving heterogeneous data. These database systems can capture all data without categorizing and parsing, which is useful in the collection and storage of data like social media. Generally NoSQL solutions are required to combine with SQL solutions in order to meet the manageability and security requirements of enterprises. Custom MapReduce programs are required for parallel execution on the distributed data nodes. A tool like Apache Giraph is better suited to fulfill specialized needs like social graph analysis, because it can extract insight from complicated social relationships for customer marketing and retention campaigns. However, deriving insights using these new technologies requires significant programming efforts and skills to interpret the storage logic used and perform analysis. Specialized needs can create new challenges such as the lack of support for complex query patterns in case of NoSQL databases. Further complications could arise from the distributed nature of processing along with the demand for results in real-time with context considerations. The Big Data strategy must pay careful attention to all these aspects while zeroing in on Big Data products and solutions along with other important factors like their interoperability and standards.6 | Infosys
  7. 7. 10 Big Data is breaking traditional barriers of flow with large amounts of data getting digitized and traveling across boundaries. This can create issues for data portability, security, privacy, compliance, intellectual property and liability. With more data getting stored on external Cloud as it is an inexpensive Avoiding Security alternative, concerns around security and privacy issues are gaining larger proportions. and Privacy Pitfalls Since Big Data involves processing customer information, organizations should ensure confidentiality of personally identifiable and sensitive data. Data protection policies and tools like data masking must be used to protect personal and corporate sensitive data to avoid costly consequences like loss of customer and stakeholder faith, brand erosion, liabilities and fines. Data privacy laws differ across countries and Big Data processing efforts should ensure that these privacy regulations are adhered to. Big Data analytics is getting so advanced that sometimes it can create insights that the customer is not aware of. Companies must be careful while issuing personalized recommendations based on analytics of vast amount of individual data they possess, because in some cases it can make customers uncomfortable. Organizations should make sensitive data accessible on “need to know” basis and ensure adequate data security. Companies should deploy tools and technologies like multifactor authentication, VPNs, intranet firewalls, biometric systems and threat monitoring suites in order to protect valuable data assets. Recent studies indicate that security breaches cost companies $204 per compromised customer record. Since data can quickly proliferate or combine easily with other data, and can be used by multiple persons, it is necessary to institute policies addressing intellectual property issues and liabilities to safeguard the organization. About the Author Girish Khanzode Products & Platforms Innovator for Futuristic Technologies, Infosys Girish is a veteran in Enterprise Software Product design and development with more than 20 years of professional experience. He has built and led large product engineering teams to deliver highly complex products in multiple domains, covering entire product life cycle. Currently, he is engaged in innovating and building the next generation products and platforms in emerging new technology areas like Enterprise Data Security and Privacy, Collaboration technologies, Digital Workplace, Social Analytics, Smart Cities, Big Data and Internet of Things. Girish holds M. Tech. degree in Computer Engineering and a bachelor’s degree in Electrical Engineering. Infosys | 7
  8. 8. About InfosysInfosys partners with global enterprises to drive their innovation-led growth.Thats why Forbes ranked Infosys 19 among the top 100 most innovativecompanies. As a leading provider of next-generation consulting, technologyand outsourcing solutions, Infosys helps clients in more than 30 countriesrealize their goals. Visit and see how Infosys (NASDAQ: INFY),with its 150,000+ people, is Building Tomorrows Enterprise® today.For more information, contact© 2012 Infosys Limited, Bangalore, India. Infosys believes the information in this publication is accurate as of its publication date; such information is subject to change without notice. Infosys acknowledgesthe proprietary rights of the trademarks and product names of other companies mentioned in this document.