Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kurukshetra - Big Data


Published on

Keynote on Big Data during Kurukshetra 2013 Workshop

Published in: Education
  • Be the first to comment

  • Be the first to like this

Kurukshetra - Big Data

  1. 1. Big DataShankar Radhakrishnan
  2. 2. Topics• Data Management Today• New Interests, Expectations, Problems• Big Data• New Approach• Big Data Ecosystem• Q&A
  3. 3. Data Management Today• Relational Databases • Oracle, MySQL, MS-SQL Server• Data warehouse Appliances • Teradata, IBM-Netezza• Legacy Systems • Mainframes
  4. 4. New Interests, Expectations• Collect More, Data-Mine More • Actionable Insights• Complex Data Integration • Extension of Investments• Advanced Analytics • Talent Management• Social Data Analysis • ROI• Machine Data Analysis • TCO• Realtime Data Analysis • Business Continuity
  5. 5. How Big is Data?? BIG 90 is the average $214 amount companies have to spend per of the world’s data compromised Facts was created in the last two years customer when a data breach occurs(as of Oct 2012) 2.7bn Average number of “likes” and “comments” posted on 247bn e-mail messages are sent each Facebook daily day… about 80% of them are spam It would take 2,000 hours to watch all the YouTube 500,000+ videos uploaded while data centers across the world are large we’re talking on this enough to fill 5,955 football fields panel* *this is 3x more than just 2 short years ago5
  6. 6. New Problems• Unpredictable Volume • Computing Limitations• Data Processing Issues • Information vs. Insights• Data Integration Issues • Business Requirements• Identifying Source-of-Truth • Regulatory Requirements• Store vs. Analyze • True Value-of-Data• Data Retrieval Requirements • Price to Performance Dilemma
  7. 7. What is Big Data? • Very large data sets • Real-time data streaming • Sizes from 100 TB to 50 PB data • Larger than “one machine” • High volume / Low latency • Whole data set analysis • Write heavy replaces “sampling” • Read heavy • Both is common Volume Velocity • Structured data • OLTP Variety Complexity • DW • ODS • Data marts • Unstructured data • Complexity • Text • Data acquisition • Audio • Analysis • Video • Deriving insights • Click streams • Log filesSource: Ventana Research
  8. 8. New Approach• Commodity Hardware • Open Computing Project• Open Source Solutions, Frameworks • Value Added Products – Cloudera, Datastax, 10gen• Research Oriented Product Development• Augmented Ecosystem
  9. 9. Big Data : Ecosystem Advanced Analytics Predictive & Optimization Modeling, BusinessData Analytics Processes Analysis, R Splunk Functional Analysis SAS Big Data Madlib Mahout Visual Analytics Tableau Advanced VisualizationsData Delivery Data Delivery - Dashboards , Scorecard SpotFire (Strategy Maps), Spatial & Temporal DatameerData Visualization Analysis Pig Hive Other BI Tools withData Engineering BI / Reporting Hadoop connectors Data Engineering - Performance Reporting, Enterprise Lucene KarmasphereData Agility Metrics, Data Agility - Data Mining, OLAP Modeling etc Cassandra Crunch PangoolData Consolidation Data Storage and Processing HDFS HBase Mapreduce Data Storage, Data processingData Economics Flume Scribe Avro Sqoop Chukwa Data Integration & Management Zookeeper Oozie Data Filtering, Data Consolidation & Warehousing, Data Quality, MetadataIntegration Management, Job Scheduling, Data Economics Native Hadoop ETL Traditional ETL with Hadoop connectors Distributed Infrastructure Hadoop components Open source Hadoop platforms 3rd party Hadoop supporting platforms
  10. 10. What Big Data can do that traditional data warehousing and analytics cannot? Traditional DW Big DataComplete records from known transactional Data from many different internal & external sourcessystems. with unknown quality and/or utility. uData is structured, and data fields have known Loosely structured data. Flat schemas with few(and often complex) interrelationships. complex interrelationships, connections between data u elements have to be probabilistically inferred.Multi Terabytes of Data Multi Peta Bytes of Data uMostly Scale Up Architecture Scale Out Architecture u The analytic models are larger and require very largeAnalytics run on a stable data model. u amounts of hardware resources to process them in a timely mannerLow Performance/Cost ratio as most of the High Performance/Cost ratio as most of the software/software/hardware platforms are proprietary u hardware platforms are commodity, free, open sourceand license based10
  11. 11. What Big Data can do that traditional data warehousing and analytics cannot? Traditional DW Big Data Aggregate data (structured) u Raw Data (structured and unstructured) Individual level analytics, Micro segmentation, Aggregate / Segment analytics u individualized offers to customers Mainstream analytics Outlier analytics, Pattern discovery, Simulation and – Structured analysis u modeling, Machine learning - OLAP cubes Entire population of granular data can be Sample data is used for identifying patterns u leveragedReports & Dashboards are done on a production Real-time operational analytics and reporting. Intra-basis u day decision making. Traditional models good for small amount of Big Models: Computationally intensive analyses, data due to time constraints u simulations, models with many parameters11
  12. 12. Q&A Thank You !