SlideShare a Scribd company logo
1 of 16
By Thanuja Seneviratne
 What is Big Data
› Top Down Approach to the Topic of Big Data
› Data Science and Data Scientists
› Days of Data Past – Part I
› Days of Data Past – Part II
› Days of Data Past – Part III
› Three V’s
› “Data Lake” Architecture
 Who can use Big Data
› Individual Experience vs Collective Experience
› Business Cases
› Use Cases
 How to Use Big Data
› Coming of Hadoop
› Evolution of Hadoop
› “Other than” HDFS
 Data Science and Data Scientists
› Science of mining, extracting, analyzing,
modeling, visualizing large data sets from
multiple sources
› “Data analyst, data artist”
› Knowledge of math, statistics, predictive
modeling, pattern recognition and learning, data
visualization, data warehousing, etc
› From C.F. Jeff Wu to William S. Cleveland to
“Data Science Journal” and beyond
 Days of Data Past - Part 1
› Relational databases and their impact
› Write-first schema
› ACID compliant
› Row-store technology
› Relationally structured data for smaller data sets
› Relatively cheaper products
 SQL Server, Oracle, etc
› Highly available skill-set
› SQL languages
 Data manipulation– Insert, Select, Update and Delete
 Data definition – Create, Alter, Truncate and Drop
› Influenced LINQ (in .NET) and JPQL (in Java) etc in application
programming
› Enterprise ready
 Days of Data Past - Part II
› Enterprise Data Warehouses (EDW) and their impact
› Massively Parallel Processing (MPP) appliances –not all EDW’s are
packaged as MPP appliances
› Column-store technology, faster and easier for BI – not all EDW’s use
column-store
› Dimensionally structured data for large data sets
› Enterprise storage not commodity storage
› Expensive premium products
 TeraData, Vertica, SQL Server PDW, etc
 Some major companies offers commodity hardware for low price
customer
 Some major companies offers services in addition to products
› Demanding skill-set
› Enterprise ready
 Days of Data Past - Part III
› NoSql data stores and their impact
› Not relational and Not ACID compliant
› Four types
 Key-value stores (KV)
 Document stores
 Graph database stores
 Wide column stores
› Relatively cheaper products
› Commodity storage not enterprise storages
› Demanding and scarce skill-set
› Not Enterprise ready
NewSql data stores as an alternative to NoSql
 Relational and ACID compliant
 SQL driven so that existing SQL investments are intact
 Three V’s
› Volume
 Large volumes 100 TB or more currently
 Expecting above benchmark in future
› Velocity
 How quickly data accumulates
 How quickly your data makes sense
 Batch, near-time, real-time
 Batch vs Interactive
› Variety
 Various data sources
 Structured data – relational, ERP, CRM
 Semi-structured data – click streams, weblogs, geographical,
social
 Unstructured data – sensor, textual, machine generated
 “Data Lake” Architecture
› Modern Data Architecture
 Provides a shared service for broad insight across a
large, diverse data set at efficient scale according to
HortonWorks
 A unified data architecture which integrated to
enterprise end-to-end solutions according to TeraData
› Cater to support 3V driven big data opportunities
› Raw data of unrecognized value
› Read-first schema
 Individual Experience vs Collective Experience
› Need to treat as individuals instead a mass collective
› Predictive modeling to recommend individual’s best
“intent”
› Implementing Process communication models (PCM) to
give better individualized customer service
 Listening to particular song by particular artist via mobile
 Calling to a call center
› Privacy concerns – main obstacle in current big data trend
 Business Cases
› Medical or Healthcare
› Entertainment
› Forensics
› Financial
› Retail
 Use Cases
› Medical or Healthcare
 Find a cure to a disease based on individual’s medical history,
behavior patterns, food and drug consumption, and similar
patients’ data
› Entertainment
 Provide a recommendation engine for IMDB or Netflix for
individual’s viewing patterns
› Forensics
 Capture a serial killer from historical murder data in CSI.
Similarly avoid more incidents in the similar killer pattern
› Financial
 Provide a predictive financial model for Wall Street stock market
fluctuations based on historical shareholder patterns
› Retail
 Coming of Hadoop
› GFS and Google’s MapReduce engine and
publishing of white papers by Google
› Yahoo team who first to decode the white papers and
create HDFS and an MR engine to scale out yahoo
search
› Creation of Hadoop 1.0 (Generation 1) in 2006 and
commit for Production level Hadoop by Yahoo
› Spawning the HortonWorks company in 2011 from a
set of Yahoo employees and move towards
Enterprise hardening
› Spawning multiple Hadoop distros as products
 Evolution of Hadoop
› Hadoop 1.x (Generation 1)
 Data Management – HDFS for redundant data storage from various sources and MapReduce
to process the data
 Data Access Layer (batch, near-time, real-time) - to access data simultaneously in multiple
ways
› Hadoop 2.x (Generation 2)
 Introducing YARN for Data Management layer
 Governance and Integration for Enterprise level – data loading, execute data policies, data
management – introducing Apache Falcon
 Security – authentication and authorization at a layered and secured way – Apache Knox
 Operations – deploy, monitor and manage the platform as whole – introducing Apache Ambari
› Enterprise Hadoop
 Deployment choice – Physical, virtual, cloud; distro Windows or Linux; distro product
HortonWorks or Cloudera or other
 Presentation and Applications – Enable existing and new applications to generate value from
Hadoop
 Enterprise management and security – empower existing proven enterprise tools to integrate
with Hadoop
 Services or Product choice - YARN-enabling always –on forever running services with Apache
Slider
 Hadoop 2.7 Stack (HortonWorks view)
 “Other than” Hadoop, HDFS
› HDFS-like storage systems with similar
MapReduce engines
› MapR (uses an NFS)
 Has cloud support too
› EMC, NetApp, CleverState, Symentic
› IBM’s BigInsight (kind of distro of Cloudera which
is intern distro of Hadoop)
› SAP’s HANA suite
› Of course proprietary GFS which HDFS is based
on originally
Top Down Approach to Understanding Big Data Concepts

More Related Content

What's hot

Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoopahmed alshikh
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotJen Stirrup
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
An exploration in analysis and visualization
An exploration in analysis and visualizationAn exploration in analysis and visualization
An exploration in analysis and visualizationDorai Thodla
 

What's hot (20)

Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
Hadoop
HadoopHadoop
Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
big data and hadoop
 big data and hadoop big data and hadoop
big data and hadoop
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Big Data
Big DataBig Data
Big Data
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivot
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
NoSQL Introduction
NoSQL IntroductionNoSQL Introduction
NoSQL Introduction
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
An exploration in analysis and visualization
An exploration in analysis and visualizationAn exploration in analysis and visualization
An exploration in analysis and visualization
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 

Similar to Top Down Approach to Understanding Big Data Concepts

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseBig data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseAMAR NATH
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 

Similar to Top Down Approach to Understanding Big Data Concepts (20)

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Bigdata (1) converted
Bigdata (1) convertedBigdata (1) converted
Bigdata (1) converted
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseBig data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbase
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Thilga
ThilgaThilga
Thilga
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Unit 1
Unit 1Unit 1
Unit 1
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 

Recently uploaded

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 

Recently uploaded (20)

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 

Top Down Approach to Understanding Big Data Concepts

  • 2.  What is Big Data › Top Down Approach to the Topic of Big Data › Data Science and Data Scientists › Days of Data Past – Part I › Days of Data Past – Part II › Days of Data Past – Part III › Three V’s › “Data Lake” Architecture  Who can use Big Data › Individual Experience vs Collective Experience › Business Cases › Use Cases  How to Use Big Data › Coming of Hadoop › Evolution of Hadoop › “Other than” HDFS
  • 3.  Data Science and Data Scientists › Science of mining, extracting, analyzing, modeling, visualizing large data sets from multiple sources › “Data analyst, data artist” › Knowledge of math, statistics, predictive modeling, pattern recognition and learning, data visualization, data warehousing, etc › From C.F. Jeff Wu to William S. Cleveland to “Data Science Journal” and beyond
  • 4.  Days of Data Past - Part 1 › Relational databases and their impact › Write-first schema › ACID compliant › Row-store technology › Relationally structured data for smaller data sets › Relatively cheaper products  SQL Server, Oracle, etc › Highly available skill-set › SQL languages  Data manipulation– Insert, Select, Update and Delete  Data definition – Create, Alter, Truncate and Drop › Influenced LINQ (in .NET) and JPQL (in Java) etc in application programming › Enterprise ready
  • 5.  Days of Data Past - Part II › Enterprise Data Warehouses (EDW) and their impact › Massively Parallel Processing (MPP) appliances –not all EDW’s are packaged as MPP appliances › Column-store technology, faster and easier for BI – not all EDW’s use column-store › Dimensionally structured data for large data sets › Enterprise storage not commodity storage › Expensive premium products  TeraData, Vertica, SQL Server PDW, etc  Some major companies offers commodity hardware for low price customer  Some major companies offers services in addition to products › Demanding skill-set › Enterprise ready
  • 6.  Days of Data Past - Part III › NoSql data stores and their impact › Not relational and Not ACID compliant › Four types  Key-value stores (KV)  Document stores  Graph database stores  Wide column stores › Relatively cheaper products › Commodity storage not enterprise storages › Demanding and scarce skill-set › Not Enterprise ready NewSql data stores as an alternative to NoSql  Relational and ACID compliant  SQL driven so that existing SQL investments are intact
  • 7.  Three V’s › Volume  Large volumes 100 TB or more currently  Expecting above benchmark in future › Velocity  How quickly data accumulates  How quickly your data makes sense  Batch, near-time, real-time  Batch vs Interactive › Variety  Various data sources  Structured data – relational, ERP, CRM  Semi-structured data – click streams, weblogs, geographical, social  Unstructured data – sensor, textual, machine generated
  • 8.  “Data Lake” Architecture › Modern Data Architecture  Provides a shared service for broad insight across a large, diverse data set at efficient scale according to HortonWorks  A unified data architecture which integrated to enterprise end-to-end solutions according to TeraData › Cater to support 3V driven big data opportunities › Raw data of unrecognized value › Read-first schema
  • 9.  Individual Experience vs Collective Experience › Need to treat as individuals instead a mass collective › Predictive modeling to recommend individual’s best “intent” › Implementing Process communication models (PCM) to give better individualized customer service  Listening to particular song by particular artist via mobile  Calling to a call center › Privacy concerns – main obstacle in current big data trend
  • 10.  Business Cases › Medical or Healthcare › Entertainment › Forensics › Financial › Retail
  • 11.  Use Cases › Medical or Healthcare  Find a cure to a disease based on individual’s medical history, behavior patterns, food and drug consumption, and similar patients’ data › Entertainment  Provide a recommendation engine for IMDB or Netflix for individual’s viewing patterns › Forensics  Capture a serial killer from historical murder data in CSI. Similarly avoid more incidents in the similar killer pattern › Financial  Provide a predictive financial model for Wall Street stock market fluctuations based on historical shareholder patterns › Retail
  • 12.  Coming of Hadoop › GFS and Google’s MapReduce engine and publishing of white papers by Google › Yahoo team who first to decode the white papers and create HDFS and an MR engine to scale out yahoo search › Creation of Hadoop 1.0 (Generation 1) in 2006 and commit for Production level Hadoop by Yahoo › Spawning the HortonWorks company in 2011 from a set of Yahoo employees and move towards Enterprise hardening › Spawning multiple Hadoop distros as products
  • 13.  Evolution of Hadoop › Hadoop 1.x (Generation 1)  Data Management – HDFS for redundant data storage from various sources and MapReduce to process the data  Data Access Layer (batch, near-time, real-time) - to access data simultaneously in multiple ways › Hadoop 2.x (Generation 2)  Introducing YARN for Data Management layer  Governance and Integration for Enterprise level – data loading, execute data policies, data management – introducing Apache Falcon  Security – authentication and authorization at a layered and secured way – Apache Knox  Operations – deploy, monitor and manage the platform as whole – introducing Apache Ambari › Enterprise Hadoop  Deployment choice – Physical, virtual, cloud; distro Windows or Linux; distro product HortonWorks or Cloudera or other  Presentation and Applications – Enable existing and new applications to generate value from Hadoop  Enterprise management and security – empower existing proven enterprise tools to integrate with Hadoop  Services or Product choice - YARN-enabling always –on forever running services with Apache Slider
  • 14.  Hadoop 2.7 Stack (HortonWorks view)
  • 15.  “Other than” Hadoop, HDFS › HDFS-like storage systems with similar MapReduce engines › MapR (uses an NFS)  Has cloud support too › EMC, NetApp, CleverState, Symentic › IBM’s BigInsight (kind of distro of Cloudera which is intern distro of Hadoop) › SAP’s HANA suite › Of course proprietary GFS which HDFS is based on originally

Editor's Notes

  1. In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics = Data Science?“ In 2001, William S. Cleveland introduced data science as an independent discipline, extending the field of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics"  In April 2002, the International Council for Science: Committee on Data for Science and Technology (CODATA)[9] started the Data Science Journal
  2. ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably