Your SlideShare is downloading. ×
  • Like
  • Save
Kurukshetra - Big Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Kurukshetra - Big Data

  • 1,059 views
Published

Keynote on Big Data during Kurukshetra 2013 Workshop

Keynote on Big Data during Kurukshetra 2013 Workshop

Published in Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,059
On SlideShare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Big DataShankar Radhakrishnan
  • 2. Topics• Data Management Today• New Interests, Expectations, Problems• Big Data• New Approach• Big Data Ecosystem• Q&A
  • 3. Data Management Today• Relational Databases • Oracle, MySQL, MS-SQL Server• Data warehouse Appliances • Teradata, IBM-Netezza• Legacy Systems • Mainframes
  • 4. New Interests, Expectations• Collect More, Data-Mine More • Actionable Insights• Complex Data Integration • Extension of Investments• Advanced Analytics • Talent Management• Social Data Analysis • ROI• Machine Data Analysis • TCO• Realtime Data Analysis • Business Continuity
  • 5. How Big is Data?? BIG 90 is the average $214 amount companies have to spend per of the world’s data compromised Facts was created in the last two years customer when a data breach occurs(as of Oct 2012) 2.7bn Average number of “likes” and “comments” posted on 247bn e-mail messages are sent each Facebook daily day… about 80% of them are spam It would take 2,000 hours to watch all the YouTube 500,000+ videos uploaded while data centers across the world are large we’re talking on this enough to fill 5,955 football fields panel* *this is 3x more than just 2 short years ago5
  • 6. New Problems• Unpredictable Volume • Computing Limitations• Data Processing Issues • Information vs. Insights• Data Integration Issues • Business Requirements• Identifying Source-of-Truth • Regulatory Requirements• Store vs. Analyze • True Value-of-Data• Data Retrieval Requirements • Price to Performance Dilemma
  • 7. What is Big Data? • Very large data sets • Real-time data streaming • Sizes from 100 TB to 50 PB data • Larger than “one machine” • High volume / Low latency • Whole data set analysis • Write heavy replaces “sampling” • Read heavy • Both is common Volume Velocity • Structured data • OLTP Variety Complexity • DW • ODS • Data marts • Unstructured data • Complexity • Text • Data acquisition • Audio • Analysis • Video • Deriving insights • Click streams • Log filesSource: Ventana Research
  • 8. New Approach• Commodity Hardware • Open Computing Project• Open Source Solutions, Frameworks • Value Added Products – Cloudera, Datastax, 10gen• Research Oriented Product Development• Augmented Ecosystem
  • 9. Big Data : Ecosystem Advanced Analytics Predictive & Optimization Modeling, BusinessData Analytics Processes Analysis, R Splunk Functional Analysis SAS Big Data Madlib Mahout Visual Analytics Tableau Advanced VisualizationsData Delivery Data Delivery - Dashboards , Scorecard SpotFire (Strategy Maps), Spatial & Temporal DatameerData Visualization Analysis Pig Hive Other BI Tools withData Engineering BI / Reporting Hadoop connectors Data Engineering - Performance Reporting, Enterprise Lucene KarmasphereData Agility Metrics, Data Agility - Data Mining, OLAP Modeling etc Cassandra Crunch PangoolData Consolidation Data Storage and Processing HDFS HBase Mapreduce Data Storage, Data processingData Economics Flume Scribe Avro Sqoop Chukwa Data Integration & Management Zookeeper Oozie Data Filtering, Data Consolidation & Warehousing, Data Quality, MetadataIntegration Management, Job Scheduling, Data Economics Native Hadoop ETL Traditional ETL with Hadoop connectors Distributed Infrastructure Hadoop components Open source Hadoop platforms 3rd party Hadoop supporting platforms
  • 10. What Big Data can do that traditional data warehousing and analytics cannot? Traditional DW Big DataComplete records from known transactional Data from many different internal & external sourcessystems. with unknown quality and/or utility. uData is structured, and data fields have known Loosely structured data. Flat schemas with few(and often complex) interrelationships. complex interrelationships, connections between data u elements have to be probabilistically inferred.Multi Terabytes of Data Multi Peta Bytes of Data uMostly Scale Up Architecture Scale Out Architecture u The analytic models are larger and require very largeAnalytics run on a stable data model. u amounts of hardware resources to process them in a timely mannerLow Performance/Cost ratio as most of the High Performance/Cost ratio as most of the software/software/hardware platforms are proprietary u hardware platforms are commodity, free, open sourceand license based10
  • 11. What Big Data can do that traditional data warehousing and analytics cannot? Traditional DW Big Data Aggregate data (structured) u Raw Data (structured and unstructured) Individual level analytics, Micro segmentation, Aggregate / Segment analytics u individualized offers to customers Mainstream analytics Outlier analytics, Pattern discovery, Simulation and – Structured analysis u modeling, Machine learning - OLAP cubes Entire population of granular data can be Sample data is used for identifying patterns u leveragedReports & Dashboards are done on a production Real-time operational analytics and reporting. Intra-basis u day decision making. Traditional models good for small amount of Big Models: Computationally intensive analyses, data due to time constraints u simulations, models with many parameters11
  • 12. Q&A Thank You !