Big data
Upcoming SlideShare
Loading in...5
×
 

Big data

on

  • 802 views

 

Statistics

Views

Total Views
802
Views on SlideShare
802
Embed Views
0

Actions

Likes
2
Downloads
62
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big data Big data Presentation Transcript

  • Big Data
  •  “Big Data” is data whose scale, diversity, and complexity requires new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it  Most analysts and practitioners currently refer to data sets from 30-50 terabytes(1000 gigabytes per terabyte) to multiple petabytes (1000 terabytes per petabyte) as big data.
  •  The progress and innovation is no longer hindered by the ability to collect data  But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 3 Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the tim Sensor technology and networks (measuring all kinds of data)
  •  Volume: The massive scale and growth of unstructured data outstrips traditional storage and analytical solutions  Velocity: Data is generated in real time, with demands for usable information to be served up immediately  Variety: Data is getting generated in the form of relational data, text data, semi structured data ,Graph data etc.
  •  There were 5 billion mobile phones in use in 2010.  There is a 40% projected growth in global data generated per year vs. 5% growth in global IT spending.  There were 235 terabytes of data collected by the US Library of Congress in April 2011.  15 out of 17 major business sectors in the United States have more data stored per company than the US Library of Congress.
  • The Problem The complex nature of big data is primarily driven by the unstructured nature of much of the data that is generated by modern technologies, such as that from web logs, radio frequency Id (RFID), sensors embedded in devices, machinery, vehicles, Internet searches, social networks such as Facebook, portable computers, smart phones and other cell phones, GPS devices, and call center records. In most cases, in order to effectively utilize big data, it must be combined with structured data (typically from a relational database) from a more conventional business application, such as Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM).
  • Global market for big data Industry Size : Today every organisation across the globe is faced with an unprecedented growth in data. The digital universe of data was expected to expand to 2.7 Zetta bytes (ZB) by the end of 2012. Then it is predicted to be double every two years, reaching 8 ZB data by 2015. Its hard to conceptualize this quantity of information. US library of Congress holds 462 terabytes (TB) of digital data. At this rate 8 ZB is equivalent to almost 18 million libraries of Congress. That translates to a ten-fold increase over the last five years and an astounding 29-fold increase over the next ten years. This year, the world’s digital information is expected to grow by 57%. Within that, internet traffic is growing by 35%, and mobile data traffic at 110%, according to Cisco. The big data industry is worth somewhere between $30bn and $200bn.
  • Smartphones, tablets, sensors, social networks, online games, video streams and mobile payments will all drive big data for many years to come
  • Internet companies: Amazon , Apple, Facebook ,Google, Microsoft The big Internet companies control where the data comes from and where it goes to . Amazon, Baidu, Facebook and Google may one day make a lucrative side business from selling their proprietary distributed database technologies, competing with IBM and Oracle
  •  Data storage, networking and hardware companies: ARM, BROCADE, CISCO, DELL, EMC, HP, INTEL ,LENOVO, NETAPP, SEAGATE Many hardware makers like Cisco, Dell, Lenovo and HP are investing heavily in big data appliances Data storage companies are likely to continue to beat earnings expectations as the data deluge goes into overdrive
  • Enterprise software companies: Adobe, Citrix System, IBM, Fujitsu, Informatica, Oracle, Red Hat, SAP, Salesforce.com Hadoop is fast becoming the industry standard enterprise database platform Cloud database services are likely to be the fastest growth sector this year within the enterprise software space
  • A wide variety of techniques and technologies has been developed and adapted to aggregate, manipulate, analyze, and visualize big data. BIG DATA TECHNIQUES:  A/B testing: A technique in which a control group is compared with a variety of test groups in order to determine what treatments (i.e., changes) will improve a given objective variable, e.g., marketing response rate. This technique is also known as split testing or bucket testing. An example application is determining what copy text, layouts, images, or colors will improve conversion rates on an e-commerce Web site.  Association rule learning: A set of techniques for discovering interesting relationships, i.e., “association rules,” among variables in large databases. These techniques consist of a variety of algorithms to generate and test possible rules. One application is market basket analysis, in which a retailer can determine which products are frequently bought together and use this information for marketing (a commonly cited example is the discovery that many supermarket shoppers who buy diapers also tend to buy beer). Used for data mining.
  •  Cluster analysis: A statistical method for classifying objects that splits a diverse group into smaller groups of similar objects, whose characteristics of similarity are not known in advance.  Crowdsourcing: A technique for collecting data submitted by a large group of people or community through an open call, usually through networked media such as the Web.  Statistics: The science of the collection, organization, and interpretation of data, including the design of surveys and experiments BIG DATA TECHNOLOGIES  There is a growing number of technologies used to aggregate, manipulate, manage, and analyze big data.
  •  Big Table. Proprietary distributed database system built on the Google File System. Tables are further split into multiple tablets. When size of data grows beyond limits, tablets are compressed by the use of algorithms such as Snappy.  Business intelligence (BI): A type of application software designed to report, analyze, and present data. BI tools are often used to read data that have been previously stored in a data warehouse or data mart. BI tools can also be used to create standard reports that are generated on a periodic basis, or to display information on real-time management dashboards, i.e., integrated displays of metrics that measure the performance of a system.
  •  Data warehouse: Specialized database optimized for reporting, often used for storing large amounts of structured data. Data is uploaded using ETL (extract, transform, and load) tools from operational data stores, and reports are often generated using business intelligence tools.  Extract, transform, and load (ETL): Software tools used to extract data from outside sources, transform them to fit operational needs, and load them into a database or data warehouse.  Hadoop: An open source (free) software framework for processing huge datasets on certain kinds of problems on a distributed system. Its development was inspired by Google’s MapReduce and Google File System.  Hbase: An open source (free), distributed, non-relational database modeled on Google’s Big Table. It enables fault tolerant way of storing large quantities of data.
  •  Opportunities: Data intent and capacity • The data revolution • Intent in an age of growing volatility Social Science and Policy Applications  Challenges: Data • Privacy • Access and Sharing Analysis • Defining and Detecting Anomalies in Human Ecosystems
  • • HP’s Big Data strategy and Vertica • CSC Buys Infochimps for Big Data, Analytics Expertise • Market Intelligence Provider FirstRain Unveils New Big Data Tool, Market Insights
  • Investment risks: Whilst big data industry revenues are certain to grow, investors face significant risks. Bandwidth risk Today, internet bandwidth prices are capped, effectively making internet bandwidth a free resource for big data companies. But, without substantial investment by the world’s mobile operators, big data is likely to grow far faster than the ability of the network to carry it. As networks get overloaded, network latency rises, reducing the speed and efficiency of analytical engines, especially those powered through the cloud. The coming mobile bandwidth shortage will shift competitive advantage from technology companies to telecom operators.
  • Open source risk With the source code free, barriers to entry remain low. In the longer term, this may depress the database industry’s margins. Patent risk Ever since Apple took on the mobile phone industry – and won – with barely a handful of mobile patents to its name, a patent war has erupted across the technology sector. Were a patent war to break out in the big data space, technological progress could be slowed down. Whilst regulators are unlikely to allow any hoarding of patents on anti-competitive grounds, the risk remains. Oracle, a leader in big data, is well known for filing multi-billion dollar patent infringement lawsuits against its competitors. Cyber risk Last month Global Payments, a credit card transaction processor, admitted that hackers had stolen the details of 1.5m North American card holders. This is the latest in a string of security breaches that have hit companies dealing in big data. Apple, EMC, Google, Oracle and Sony are all recent hacking victims. As the level of cyber-crime rises, so does the risk of dealing with big data. Just as the Fukushima incident dampened prospects for the nuclear sector, so a large cyber-attack could adversely impact big data industry profits.
  •  Often misunderstood and ill-applied  The question is not “how big is your data?”, it is “what are you are doing with your data?”  It fails to supply its customers with products that solve business problems.  Companies searching for data solutions are often confused by all the big data marketing hype and sometimes end up wasting resources.