SlideShare a Scribd company logo
1 of 10
Big Data
          in 10
What’s real and what’s fluff
   Abhishek Pamecha
        Mar-2013
What is Big Data
• It is all about data
   – But not about “how much”




   – But about correlations and increased reach
BigData Architecture
It influences or changes your
   • Data source choices
   • Data storing choices
   • Data analyzing/mining approaches



It helps
   • Address highly focused use cases
   • Correlate more data sources
   • address scale and fault tolerance issues
Caution!



BigData is not a “substitute” for existing warehousing practices.
It complements existing practices.
Architectures – Data sources
• Traditional DW           • BigData adds

   – Production DB            – Log files

   – Dictionaries             – Social graphs

   – ETL/ELT pipelines        – Streaming data

   – External Data marts
Architectures – Data Storage
• Traditional DW               • BigData adds

  – Production DB                – Distributed file storage
      • Flatten hierarchies
      • Resolved references      – Distributed hash maps

                                 – Columnar representations
  – ROLAP or MOLAP databases
      •   Star schema
                                 – Graph data bases
      •   Materialized views
      •   Virtual data marts
                                 – Document collections
      •   Partitioned tables

  – Still relational             – Other NoSQL variants
Architectures – Analytic approaches
•   Traditional DW                                  •   BigData adds

     – Production DB                                     – Distributed file storage
         •   Flatten hierarchies                              •   Map reduce frameworks and chaining
         •   Resolved references
                –   Pre-generate results
                                                         – Distributed hash maps
                                                              •   Single key predominant
     – ROLAP databases
         •   Star schema
                –   Multidimensional queries
                                                         – Columnar representations
         •   Materialized views                               •   Extracts select columns per row
                –   adhoc explorations on subsets
         •   Still relational                            – Graph data bases
         •   Virtual data marts                               •   Navigate links
                –   adhoc explorations on subsets
         •   Partitioned tables
                                                         – Document collections
                                                              •   Simplified schemas

                                                         – Other NoSQL approaches
                                                              •   Stream pattern matching and pipelining
Big Data Architectures
                                 Pros and Cons
•   Pros

     –     Incorporate low value and social data in analysis
     –     Increase analysis reach to non-structured data
     –     Correlate across data sources on the same platform
     –     Very strong in their sweet spots.
     –     Efficiency in terms of
                •   data movement volume,
                •   scale
                •   fault tolerance and
                •   responsiveness.

•     Cons

     –          Not relational. Gives up on some of the relational advantages.
            •         Joins
            •         Aggregations etc.
     –          Little standards – Non portable solutions
     –          Less support with end-user tools and applications [ though growing ]
     –          Not a replacement to DW but just an extension to it.
     –          Incompatible with different classes of use-cases. Have sweet spots.
     –          Heterogeneous setup in Development and Operations.
Challenges
•   Architectural
     –   “Big” data management
     –   Data consistency
     –   Read heavy or write heavy
     –   Scaling
     –   Distributed deployment


•   Functional
     –   data quality
     –   Problem set choice


•   Organizational
     –   Data backed decisions
     –   Going overboard
     –   SLAs and operations management
     –   Data Privacy
Thank you!

More Related Content

What's hot

Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Barrett Peterson
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
Andrew Brust
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
Introduction to Microsoft SQL Server 2008 R2 Analysis Service
Introduction to Microsoft SQL Server 2008 R2 Analysis ServiceIntroduction to Microsoft SQL Server 2008 R2 Analysis Service
Introduction to Microsoft SQL Server 2008 R2 Analysis Service
Quang Nguyễn Bá
 

What's hot (20)

Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
 
Oracle hyperion essbase
Oracle hyperion essbaseOracle hyperion essbase
Oracle hyperion essbase
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Webinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeWebinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data Lake
 
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
2016 SDMX Experts meeting, Implementation of SDMX RI at INS, Kamel Abdellaoui
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Sql Saturday Costa Rica-SSAS Tabular Model
Sql Saturday Costa Rica-SSAS Tabular ModelSql Saturday Costa Rica-SSAS Tabular Model
Sql Saturday Costa Rica-SSAS Tabular Model
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Web mining
Web miningWeb mining
Web mining
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overview
 
Introduction to Microsoft SQL Server 2008 R2 Analysis Service
Introduction to Microsoft SQL Server 2008 R2 Analysis ServiceIntroduction to Microsoft SQL Server 2008 R2 Analysis Service
Introduction to Microsoft SQL Server 2008 R2 Analysis Service
 

Similar to Bigdata

Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
DATAVERSITY
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 

Similar to Bigdata (20)

Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital era
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
Software architecture & design patterns for MS CRM Developers
Software architecture & design patterns for MS CRM  Developers Software architecture & design patterns for MS CRM  Developers
Software architecture & design patterns for MS CRM Developers
 
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
No Sql Movement
No Sql MovementNo Sql Movement
No Sql Movement
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Bigdata

  • 1. Big Data in 10 What’s real and what’s fluff Abhishek Pamecha Mar-2013
  • 2. What is Big Data • It is all about data – But not about “how much” – But about correlations and increased reach
  • 3. BigData Architecture It influences or changes your • Data source choices • Data storing choices • Data analyzing/mining approaches It helps • Address highly focused use cases • Correlate more data sources • address scale and fault tolerance issues
  • 4. Caution! BigData is not a “substitute” for existing warehousing practices. It complements existing practices.
  • 5. Architectures – Data sources • Traditional DW • BigData adds – Production DB – Log files – Dictionaries – Social graphs – ETL/ELT pipelines – Streaming data – External Data marts
  • 6. Architectures – Data Storage • Traditional DW • BigData adds – Production DB – Distributed file storage • Flatten hierarchies • Resolved references – Distributed hash maps – Columnar representations – ROLAP or MOLAP databases • Star schema – Graph data bases • Materialized views • Virtual data marts – Document collections • Partitioned tables – Still relational – Other NoSQL variants
  • 7. Architectures – Analytic approaches • Traditional DW • BigData adds – Production DB – Distributed file storage • Flatten hierarchies • Map reduce frameworks and chaining • Resolved references – Pre-generate results – Distributed hash maps • Single key predominant – ROLAP databases • Star schema – Multidimensional queries – Columnar representations • Materialized views • Extracts select columns per row – adhoc explorations on subsets • Still relational – Graph data bases • Virtual data marts • Navigate links – adhoc explorations on subsets • Partitioned tables – Document collections • Simplified schemas – Other NoSQL approaches • Stream pattern matching and pipelining
  • 8. Big Data Architectures Pros and Cons • Pros – Incorporate low value and social data in analysis – Increase analysis reach to non-structured data – Correlate across data sources on the same platform – Very strong in their sweet spots. – Efficiency in terms of • data movement volume, • scale • fault tolerance and • responsiveness. • Cons – Not relational. Gives up on some of the relational advantages. • Joins • Aggregations etc. – Little standards – Non portable solutions – Less support with end-user tools and applications [ though growing ] – Not a replacement to DW but just an extension to it. – Incompatible with different classes of use-cases. Have sweet spots. – Heterogeneous setup in Development and Operations.
  • 9. Challenges • Architectural – “Big” data management – Data consistency – Read heavy or write heavy – Scaling – Distributed deployment • Functional – data quality – Problem set choice • Organizational – Data backed decisions – Going overboard – SLAs and operations management – Data Privacy