Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big data road map

1,072 views

Published on

Big data road map,Data Growth,why big data,Big data Myths,

Published in: Data & Analytics
  • Be the first to comment

Big data road map

  1. 1. WDABT 2016 – BHARATHIAR UNIVERSITY
  2. 2. Dr.V.Bhuvaneswari Assistant Professor Department of Computer Applications Bharathiar University Coimbatore bhuvanes_v@yahoo.com, bhuvana_v@buc.edu.in visit at www.budca.in/faculty.php BIG DATA ROADMAP
  3. 3. Big Data Roadmap  Timeline – Big Data Predictions  Data Growth in Units  Data Landscape  Data Explosion  Big Data Myths  Big Data  5Vs of Big Data  Why Big Data  Data as Data Science 3 Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  4. 4. Timeline – Big Data Predictions 1944- Yale Library in 2040 will have “approximately 200,000,000 Volumes 1961- Scientific Journals will grow exponentially rather than linearly, doubling every fifteen years and increasing by a factor of ten during every half-century. 1975- Ministry of Posts and Telecommunications in Japan introduced words as unifying unit of measurement 1997- First article published by Michael Cox and David Ellsworth in in the ACM digital library to the term “Big data.” Big Data evolved in 1997 and exploded to greater heights in 2010 and become popular in 2012 4Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  5. 5. Data Growth – in Units 5Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  6. 6. Data Landscape 6 Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  7. 7. BIG DATA FACTS  Every 2 days we create as much information as we did from the beginning of time until 2003  Over 90% of all the data in the world was created in the past 2 years.  It is expected that by 2020 the amount of digital information in existence will have grown from 3.2 zettabytes today to 40 zettabytes.  Every minute we send 204 million emails, generate 1.8 million Facebook likes, send 278 thousand Tweets, and up-load 200,000Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 7
  8. 8. Big Data Explosion 12+ TBs of tweet data every day 25+ TBs of log data every day ?TBsof dataevery day 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014
  9. 9. Data Deluge
  10. 10. Big Data Market Size
  11. 11. Potential Talent Pool -Big Data Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University India will require a minimum of 1 lakh data scientists in the next couple of years in addition to data analysts and data managers to support the Big Data space. 11
  12. 12. BIG DATA MYTHS Big Data • New • Only About Massive Data Volume • Means Hadoop • Need A Data Warehouse • Means Unstructured Data • for Social Media & Sentiment Analysis 12 Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  13. 13. Lets Us Clarify 13 Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  14. 14. Big Data Big Data is  A complete subject with tools, techniques and frameworks.  Technology which deals with large and complex dataset which are varied in data format and structures, does not fit into the memory.  Not about huge volume of data; provide an opportunity to find new insight into the existing data and guidelines to capture and analyze future data 14 Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  15. 15. Big Data : A Definition  Big data is the realization of greater business intelligence by storing, processing, and analyzing data that was previously ignored due to the limitations of traditional data management technologies :Source: Harness the Power of Big Data: The IBM Big Data Platform 15 Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  16. 16. BIG DATA as Platform Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University Source: IBM 16
  17. 17. 4 V‘s of Big Data Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 17
  18. 18. 5Vs of Big Data Volume Velocity Variety Veracity Value 18Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  19. 19. Why Big Data ? 19
  20. 20. Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Operations Analysis Analyze a variety of machine data for improved business results The 5 Key Big Data Use Cases Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 2 0
  21. 21. 21Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  22. 22. Data Science  "Data Science" was used by statisticians and economist in early 1970 and defined by Peter Naur in 1974.  Data Science” has gained popularity in the last couple of years because of the massive data deposits  Usage of Big Data technology to explore data used in large corporates, government and industries made the term data science catchy. 22Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  23. 23. Data Science as Discipline  Data Science has emerged as a new discipline to provide deep insight on the large volume of data.  Data Science is fusion of major disciplines like Computational Algorithms, Statistics and Visualization  90% of the world’s data has been created in the last two years which includes 10% of structured data and 80% of unstructured data  The digital universe is in data deluge and estimated to be larger than the physical universe and data unit measurement is predicted as Geopbytes 23Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  24. 24. 24 Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  25. 25. Data Growth in Bytes 25Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  26. 26. Data Classification ◦ Open Data ◦ Closed Data ◦ Hot Data ◦ Warm Data ◦ Cold Data ◦ Thin Data ◦ Thick Data 26Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  27. 27. Data Analytics – Need for today  Data considered as digital asset similar to other property.  The organizations believe data generated by them will provide deep insights to understand their business process for arriving strategic decisions.  The earlier limitation of computational storage and processing is overcome by the technologies of cloud computing and big data techniques. 27Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  28. 28. Data Science Components Pre-Processing - ETL Dash Boards ChartsPie, Bar Histogram Data Models Linear Regression, Decision Tree, Dimensionality Reduction Clustering Outlier Analysis Association Analysis 28Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  29. 29. Data Science - Big Data Technology  Collect, Load, Transform ◦ ETL SCRIBE, FLUME  Store ◦ HADOOP, SPARK, STORM  Process, Analyze and Reasoning ◦ Computational Algorithms, ◦ Statistical Methods and Models  R, PIG, HIVE,  PHYTON, JAVA, SCALA,  CLOJURE, MAHOUT  Visualization ◦ DASHBOARD, APP 29Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  30. 30. Data Science Vs Data Analytics  Data Science is a discipline which groups techniques and methods from various domains to study about data and data analytics is a component in Data Science.  Data Analytics is a process of analyzing the dataset to find deep insights of data using computational algorithms and statistical methods. There exists no common procedure to 30Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  31. 31. Data Analytics Vs Big Data Analytics  Data Analytics is used to explore and analyze datasets using statistical methods and models.  Big Data Analytics is used to analyze data with the characteristics of Volume, Velocity and Variety by integrating statistics, mathematics, computational algorithms in Big data Platform. 31Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  32. 32. Data Science – Emerging Roles  Data Scientist is responsible for scrubbing data to bring out deep insights of data Skills : Expert in CS, Mathematics, Statistics Work on open ended research problems  Data Engineer is responsible for managing and administering the infrastructure and storage of data. Skills : Strong skills in Programming and Software Engineering  Deep Knowledge in Data warehousing  Expertise in Hadoop, NOSQL and SQL technologies  Data Analyst is one who views the data from one source and has deep insight on the data based on the organization guidance. Skills : Competency Skills in understanding of Statistics 32Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  33. 33. Data Analytics Use Case Scenario 33
  34. 34. Data Science Applications  Data Personalization - Logs, Tweets, Likes  Smart Pricing – Air Transportation  Financial Services – Fraud Detection Insurance  Smart Grids – Energy Management 34Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  35. 35. Air Fare Management – Use case 1 Objectives: Hike airfare based on High Value Customers - CRM. Strategic decision requires Understanding of data insights How customers are divided? Which customer is high value customer? Who is Frequent flyer? How to retain customers? Data sources : Conventional Enterprise information Data from weblogs, social media, competitors pricing 35Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  36. 36. Data Engineering Airfare Classification (Economy, Business,First) Analyse factors (Enterprise Datasources) – Data Exploration techniques Passenger Booking information Forecasted data - Statistics Inventory Customers Behavioral data - Predictive Analytics – Statistical models – Decision tree, classification Information has to be gained from websites that provide route information, dining, preferable locations Holistic Analytics Analyzing customer data from Social profiles, sales, CRM etc. 36Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  37. 37. Complexities and Challenges Data is larger than terabytes Data integration Variety data formats Solution Big data Accelerators Hadoop ecosystem Analytic components Integrated data warehouses Source: Big data spectrum Infosys 37Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  38. 38. Insurance Fraud Detection – Use case Scenario Data Engineering Verifying customer data Customer Profile analysis Verification of claims raised Fraud detection from disparate systems Exact claim reimbursement Data Sources Data about customer, product sold from ERP, CRM Credit history from other sources Data from social networking – Customer profiles, product rating, credit rating from 3rd parties 38Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  39. 39. Health Epidemics Data Engineering Kind of epidemics and target users Causes and effects with respect to locations Environmental and other related issues of epidemics Data on Awareness Data Sources EHR records, Medical Insurance claims, Socialmedia – awareness, ERP Systems Data Analytics Descriptive Analytics Predictive Analytics ( Model based analysis) 39Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  40. 40. Big Data Challenges Privacy Protection All Big data stages collect, store, process, knowledge Integration with enterprise landscape All systems store data in rdbms,DW Does not support bulk loading to Big data store Limited number of analytics from Mahout Big data technologies lack visualization support and deliverable methods Leveraging cloud computing for big data applications Addressing Real time needs with varied format and volume 40Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  41. 41. PART B : Big Data Use Cases – Scenario 41
  42. 42. Big Data Applications 42Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  43. 43. Big Data Applications - India  Big Data – Elections  SBI uses big data mining to check defaults  Karnataka Govt – Identify water leakage 43Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  44. 44. Big Data - Election  Mined data from every Internet user in the country, to accurately understand voter sentiments and local issues.  Data-based analysis was used to raise funds and create different models for different regions targeting on local issues.  India involve more than 800 million voters with different ideologies and expectations.  Innovative usage of Big Data marked a huge change in the way elections were fought traditionally. 44Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  45. 45. Data Analytics  Modac Analytics built electroal data.  Processing huge volumes of unstructured data (around 10TB of PDF documents), and also structured data.  Modak chose Hadoop, and self-built a 64-node cluster that had 128TB of storage. Apart from Hadoop, the team used PostgreSQL as the front-end database.  They have developed Rapid ETL to 45Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  46. 46. SBI  State Bank of India (SBI) ran its newly acquired data-mining software recently to check for purity of data.  Made an interesting find - close to one crore accountholders have not provided any nomination for their savings accounts. What is worse, over half of them are senior citizens.  To analyse trends in Banks, SBI has hired a whole team of statisticians and economists.  Identify default patterns, high value customers. 46Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  47. 47. 47Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
  48. 48. 48Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University

×