Big data ankita1


Published on

  • Be the first to comment

  • Be the first to like this

Big data ankita1

  1. 1. BIG DATA (E –Commerce) Faculty: Dr. Rakhi Tripathi Presented by: Ankita Sharma(222005) Piyush Pandey(222015)
  2. 2. What is BIG DATA ? • Big Data is any data that is too large, complex & dynamic for any conventional data tools to capture, store, manage & analyze. • This explosion in data volume, variety, and velocity is called Big Data – and if you can harness it, it will revolutionize the way you do business. Big Data platform, applications, analytics, and services can help you dive into that ocean of information and extract real business value – in real time • It can be Structured as well as Non Structured.
  3. 3. WHY BIG DATA IS BECOMING IMPORTANT NOW? • Rise of Smartphones with GPS and Internet connectivity: There are 4.6 billion mobile-phone subscriptions worldwide and there are between 1 and 2 billion people accessing the internet. • Aerial Sensors and Sensor Network: The NASA Center for Climate Simulation stores 32 petabytes of climate observations and simulations on the Discover supercomputing cluster. • Social Network Adoption:Facebook has 1.06 billion monthly active users with 30 billion pieces of content shared on Facebook every month. There are roughly 175 million tweets every day, from more than 465 million accounts.
  4. 4. BIG DATA GENERATORS Social media and networks (all of us are generating data) Mobile devices (tracking all objects all the time) Scientific instruments (collecting all sorts of data) Sensor technology and networks (measuring all kinds of data)
  5. 5. Big Data Characteristics “Big Data” refers to high volume, velocity, variety and complex information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making
  6. 6. Characteristics of Big Data: 1. VOLUME (Scale) • There has been a considerate increase in the volume of the data. Around 40X increase from 2009 to 2013. • In other words, data volume is increasing exponentially. Data collected from the World’s Topmost Companies:  Google processes 20000 PB a day (2010)  Wayback Machine has 3 PB + 100 TB/month (3/2009)  Facebook has 2.5 PB of user data + 15 TB/day (4/2009)  eBay has 6.5 PB of user data + 50 TB/day (5/2009) The amount of data to be analyzed has gone up from some Terabytes to millions of Petabytes or more precisely in thousands of Zetabytes.
  7. 7. Characteristics of Big Data: 2. VELOCITY (Speed) • Data is being generated fast and need to be processed fast • Velocity is as critical as the volume of the data. • Late decisions  missing opportunities • Examples – E-Promotions: Based on your demographics, your purchase history, what you like  send promotions right now for items relevant to you. – Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction
  8. 8. Characteristics of Big Data: 3. VARIETY (Complexity) • Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. • Monitor 100’s of live video feeds from surveillance cameras to target points of interest • Exploit the 80% data growth in images, video and documents to improve customer satisfaction
  9. 9. BIG DATA MANAGEMENT • We have moved from an era where an organization could implement database to meet a specific project need and be done. Nowadays, data has become the fuel of Growth & Innovation. For effective data management we have to keep in mind the below figure: To Capture, Organize, Consolidate (Integrate), Analyze & Act
  10. 10. BIG DATA TECHNOLOGIES • With the evolution of computing technology, it is now possible to manage immense volumes of data that previously could have only been handled by supercomputers at great expense. • In particular, the innovations MapReduce, Hadoop, and Big Table proved to be the sparks that led to a new generation of data management. These technologies address one of the most fundamental problems - the capability to process massive amounts of data efficiently, cost effectively, and in a timely fashion.
  11. 11. Hadoop • Hadoop is an open source software project that enables the distributed processing  of large data sets across clusters of commodity servers. It is designed to scale up  from a single server to thousands of machines, with a very high degree of fault  tolerance.  Hadoop allows applications based on MapReduce (hadoop’s function) to run on  large clusters of commodity hardware. Hadoop is designed to parallelize data  processing across computing nodes to speed computations and hide latency. 
  12. 12. MAP REDUCE MapReduce was designed by Google as a way of efficiently executing a set of  functions against a large amount of data in batch mode. The “map” component  distributes the programming problem or tasks across a large number of systems  and handles the placement of the tasks in a way that balances the load and  manages recovery from failures. After the distributed computation is completed,  another function called “reduce” aggregates all the elements back together to  provide a result.
  13. 13. BIG TABLE Big Table was developed by Google to be a distributed storage system  intended to manage highly scalable structured data.  Data is organized into tables with rows and columns. Unlike a traditional  relational database model, Big Table is a sparse, distributed, persistent  multidimensional sorted map. It is intended to store huge volumes of data  across commodity servers.
  14. 14. Advantages of Big Data • As organisations create and store more transactional data in digital form, they  can collect more accurate and detailed performance information on everything  from product inventories to sick days and therefore expose variability and boost  performance. In fact, some leading companies are using their ability to collect and  analyse big data to conduct controlled experiments to make better management  decisions. • Big Data allows ever-narrower segmentation of customers and therefore much  more precisely tailored products or services.
  15. 15. Continued…. • Sophisticated analytics can substantially improve decision-making, minimise risks,  and unearth valuable insights that would otherwise remain hidden. • Big Data can be used to develop the next generation of products and services. For  instance, manufacturers are using data obtained from sensors embedded in  products to create innovative after-sales service offerings such as proactive  maintenance to avoid failures in new products.
  16. 16. CONCLUSION Analyzing new and fresh data can reveal new sources of economic value, provide fresh insights into customer behavior & identify market trends early. But this influx of new data creates great challenges for IT Department. To derive real business value from Big Data, you need the right set of tools to capture & organize a wide variety of Data types from different sources. By using the mentioned applications an enterprise can acquire, organize & analyze all their enterprise data including structured & unstructured- to make the most informed decisions.
  17. 17. ALWAYS REMEMBER “Today’s Big Data will not stay the same Tomorrow.” THANKS 