SlideShare a Scribd company logo
The Six Pillars for Building Big
Data Analytics Ecosystems
Big Data and Analytics
What ?
◦ Any voluminous amount of
◦ Structure
◦ semi-structured
◦ unstructured data
Where
◦ Large organizations
Why ?
◦ Cost reduction
◦ Faster, better decision making
◦ New products and services
Big data Analytics Ecosystems
◦ Data explorations
◦ Data preparation
◦ Modeling
Pillars of Big Data: Overview
Pillars of Big Data: Storage
RDBMS
◦ Ensures ACID
◦ Performance and scalability
DFS
◦ Client server architecture
◦ Hiding information from use e.g location
◦ Concurrency transparency
◦ Failure transparency
◦ Replication and Scalability transparency
◦ E.g GFS, HDFS,CFS
No SQL
◦ Sacrifice the consistency, to have high availability and scalability
◦ Data store as key/value pairs
◦ Supports three types
◦ E.g MVCC, COD, DOD,Graph
Pillars of Big Data: Processing
Batch Processing
◦ Execute a series of jobs without manual intervention
◦ E.g Hadoop
◦ Real life example
◦ Credit card
◦ Map Reduce
◦ Map
◦ Shuffle
◦ Reduce
Interactive Processing
◦ Requires human interaction
◦ Real life example
◦ Spreadsheets
Pillars of Big Data: Processing
Iterative Processing
◦ Machine learning operations
◦ Requires several passes for the algorithm to converge
◦ HaLoop, Main Memory MapReduce(M3R)
◦ Real life example
◦ Evaluation of mathematical expression
Incremental Processing
◦ Analyze data in motion
◦ Requires quick actions
◦ Full data is not required for algorithm
◦ E.g Apache Storm, Microsoft Trill
◦ Real life example
◦ Check on incoming data stream for security
Pillars of Big Data: Processing
Approximate Processing
◦ Quick retrieval of approximate results from a small sample
◦ E.g Early Accurate Result Library (EARL), Blink DB
In-Database Processing
◦ In database machine learning
◦ Microsoft SQL Server Analysis Services (SSAS)41
Pillars of Big Data: Analytics
Orchestration
Orchestrate complex analytic jobs and workflows to achieve the user’s goals
Scheduling
. Resource Utilization
Resources : Memory, CPU, Network and Disk
the idea of effective resource utilization to mitigate idle
resources
. Hadoop 1.0 Shortcomings
. Apache Hadoop YARN
. Data Locality
ensure data and processing on same node to avoid network
congestion
Pillars of Big Data: Analytics
Orchestration
Provisioning
. Resource Provisioning
Resource allocation to jobs with minimal cost & execution time
. Resource Set (RS) Maximizer
. Conductor
. Data Provisioning
Pillars of Big Data: Analytics
Assistance
narrowing analytics talent gap by magnifying internal skill set using in tool assistance
Static Assistance
. Tooltips
. Help Pages
. Wizards
Intelligent Assistance
. Data Preparation
determining and converting irrelevant data/attributes to meaningful info
Pillars of Big Data: Analytics
Assistance
. Selecting Operations
. Expert Systems (ES)
. Meta-Learning Systems (MLSs)
. Ontology Reasoners (OR)
. Automatic workflow generation
Provide a workflow based on the input data and existing problem
. Fault Detection and Handling
When the data is Big Data, failure in the middle is a catastrophe
Pillars of Big Data: User Interfaces
Full power of analytics solutions are limited to relevant users.
Five approaches for user interfaces:
 Scripts
 SQL-based Interfaces
 Graph based interfaces
 Sheets
 Visualizations
Pillars of Big Data: User Interfaces
Scripts:
Analytics at programming level
Interface’s can be CLI or API
Low level coding
Supports data mining
Mostly avoided by a normal user
Such as: R for statisticians, Matlab and
weka
SQL-based Interfaces
Unified SQL interface – extended SQL
Use of UDF’s (User defined functions)
Further classification:
SQL-on-Hadoop
Machine learning SQL
Pillars of Big Data: User Interfaces
Graphs:
No need to code
Drag and drop
Panel (Operations) and canvas
(Processing)
Such as: Rapidminer, IBM SPSS modeler,
WINGS etc
Sheets:
Most fissile for business organization as it
deals with spreadsheets
Focused on data exploration in easiest
way
Compatible with moving data on another
solutions
Such as: Power query, Microsoft Tabular,
Google open refine etc
Pillars of Big Data: User Interfaces
Visualization
To control the high probability of analyzing the wrong or incompatible set of attributes
Suitable for large business firms
Lack of machine learning techniques
Such as: IBM Watson analytics, SAS visual Analytics etc
Pillars of Big Data: Deployment
Many components that needs to be integrated together
Deployment challenges includes
◦ Complexity
◦ Challenging
◦ Scope beyond the in house IT technicians
Pillars of Big Data: Deployment
Product:
Use of product deployment models to ensure privacy and security
◦Cost
◦IT-Staff
◦Limited Scalability
Most components are open source platforms but again integration is the major issue
Pillars of Big Data: Deployment
Service:
Services provided on demand, solution cost pay per user/data.
Security and privacy is an issue and cost of moving data to provider’s cloud.
Hybrid cloud
Data storage and processing residing on the organization infrastructure
Future Directions:
Each solution brings some features not available in the others, but also
adds some limitations and overheads.
While there has been a continuous improvement in analytics solutions to
address different analytics scenarios, there are still some gaps.
Conclusions:
Difficult to select suitable analytics solution because a weak component in
the ecosystem can cause the whole ecosystem to function inefficiently.
For each of these pillars, different approaches are discussed and popular
systems are presented.
The pillars form a taxonomy that aims to give an overview on the field, to
guide organizations and researchers to build their Big Data Analytics
Ecosystem, and help to identify challenges and opportunities in the field.

More Related Content

What's hot

Windows operating systems
Windows operating systemsWindows operating systems
Windows operating systems
Veronica Alejandro
 
The Human: Memory
The Human: MemoryThe Human: Memory
The Human: Memory
hcicourse
 
Fitts' Law
Fitts' LawFitts' Law
Fitts' Law
John Rooksby
 
Models of Interaction
Models of InteractionModels of Interaction
Models of InteractionjbellWCT
 
Operating System Presentation
Operating System PresentationOperating System Presentation
Operating System Presentation
GaganiRajapaksha
 
Prototyping and storyboarding.pptx
 Prototyping and storyboarding.pptx Prototyping and storyboarding.pptx
Prototyping and storyboarding.pptx
Lassonde School of Engineering
 
Introduction to SSH
Introduction to SSHIntroduction to SSH
Introduction to SSHHemant Shah
 
Domain Driven Design (DDD)
Domain Driven Design (DDD)Domain Driven Design (DDD)
Domain Driven Design (DDD)
Guillaume Collic
 
The magic of ops genie
The magic of ops genieThe magic of ops genie
The magic of ops genie
AUGNYC
 
Operating system- AARAMBH PANDEY
Operating system- AARAMBH PANDEYOperating system- AARAMBH PANDEY
Operating system- AARAMBH PANDEY
AARAMBH PANDEY
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
Gluster.org
 
Proteus Project : Arduino programming for LED
Proteus Project : Arduino programming for LEDProteus Project : Arduino programming for LED
Proteus Project : Arduino programming for LED
Hassan Khan
 
Activity Theory in 20 slides
Activity Theory in 20 slidesActivity Theory in 20 slides
Activity Theory in 20 slides
ozten
 

What's hot (14)

Windows operating systems
Windows operating systemsWindows operating systems
Windows operating systems
 
The Human: Memory
The Human: MemoryThe Human: Memory
The Human: Memory
 
Fitts' Law
Fitts' LawFitts' Law
Fitts' Law
 
Models of Interaction
Models of InteractionModels of Interaction
Models of Interaction
 
Operating System Presentation
Operating System PresentationOperating System Presentation
Operating System Presentation
 
Prototyping and storyboarding.pptx
 Prototyping and storyboarding.pptx Prototyping and storyboarding.pptx
Prototyping and storyboarding.pptx
 
Introduction to SSH
Introduction to SSHIntroduction to SSH
Introduction to SSH
 
Domain Driven Design (DDD)
Domain Driven Design (DDD)Domain Driven Design (DDD)
Domain Driven Design (DDD)
 
The magic of ops genie
The magic of ops genieThe magic of ops genie
The magic of ops genie
 
Operating system- AARAMBH PANDEY
Operating system- AARAMBH PANDEYOperating system- AARAMBH PANDEY
Operating system- AARAMBH PANDEY
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
 
Green computing 1 1
Green computing 1 1Green computing 1 1
Green computing 1 1
 
Proteus Project : Arduino programming for LED
Proteus Project : Arduino programming for LEDProteus Project : Arduino programming for LED
Proteus Project : Arduino programming for LED
 
Activity Theory in 20 slides
Activity Theory in 20 slidesActivity Theory in 20 slides
Activity Theory in 20 slides
 

Viewers also liked

BDX 2016- Monal daxini @ Netflix
BDX 2016-  Monal daxini  @ NetflixBDX 2016-  Monal daxini  @ Netflix
BDX 2016- Monal daxini @ Netflix
Ido Shilon
 
Technical Mentoring, What works and not
Technical Mentoring, What works and notTechnical Mentoring, What works and not
Technical Mentoring, What works and not
Stanly Lau
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
Ajay Ohri
 
Big data analytics johan quist
Big data analytics johan quistBig data analytics johan quist
Big data analytics johan quist
Johan Quist
 
Big Data Analytics for Commercial aviation and Aerospace
Big Data Analytics for Commercial aviation and AerospaceBig Data Analytics for Commercial aviation and Aerospace
Big Data Analytics for Commercial aviation and Aerospace
Seda Eskiler
 
Big Data Airline Project at UAEU
Big Data Airline Project at UAEUBig Data Airline Project at UAEU
Big Data Airline Project at UAEU
Ziyad Saleh
 
Analytics Education in the era of Big Data
Analytics Education in the era of Big DataAnalytics Education in the era of Big Data
Analytics Education in the era of Big Data
Gregory Piatetsky-Shapiro
 
Big Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics StartupsBig Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics Startups
wallesplace
 
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'AnnaBig Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Data Driven Innovation
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
Gregory Piatetsky-Shapiro
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Ghulam Imaduddin
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
Amazon Web Services
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
Big Data Spain
 
Introduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data AnalyticsIntroduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data Analytics
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
User and IoT Data Analytics
User and IoT Data AnalyticsUser and IoT Data Analytics
User and IoT Data Analytics
Ericsson
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
Arvind Sathi
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
台灣資料科學年會
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 

Viewers also liked (20)

BDX 2016- Monal daxini @ Netflix
BDX 2016-  Monal daxini  @ NetflixBDX 2016-  Monal daxini  @ Netflix
BDX 2016- Monal daxini @ Netflix
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Technical Mentoring, What works and not
Technical Mentoring, What works and notTechnical Mentoring, What works and not
Technical Mentoring, What works and not
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Big data analytics johan quist
Big data analytics johan quistBig data analytics johan quist
Big data analytics johan quist
 
Big Data Analytics for Commercial aviation and Aerospace
Big Data Analytics for Commercial aviation and AerospaceBig Data Analytics for Commercial aviation and Aerospace
Big Data Analytics for Commercial aviation and Aerospace
 
Big Data Airline Project at UAEU
Big Data Airline Project at UAEUBig Data Airline Project at UAEU
Big Data Airline Project at UAEU
 
Analytics Education in the era of Big Data
Analytics Education in the era of Big DataAnalytics Education in the era of Big Data
Analytics Education in the era of Big Data
 
Big Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics StartupsBig Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics Startups
 
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'AnnaBig Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Introduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data AnalyticsIntroduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data Analytics
 
User and IoT Data Analytics
User and IoT Data AnalyticsUser and IoT Data Analytics
User and IoT Data Analytics
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 

Similar to The Six pillars for Building big data analytics ecosystems

Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
Mrinal Kumar
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
James Serra
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
ssuseracaaae2
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
Serhiy (Serge) Haziyev
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
shujee381
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
Information Security Awareness Group
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
Sitamarhi Institute of Technology
 

Similar to The Six pillars for Building big data analytics ecosystems (20)

Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 

Recently uploaded

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

The Six pillars for Building big data analytics ecosystems

  • 1. The Six Pillars for Building Big Data Analytics Ecosystems
  • 2. Big Data and Analytics What ? ◦ Any voluminous amount of ◦ Structure ◦ semi-structured ◦ unstructured data Where ◦ Large organizations Why ? ◦ Cost reduction ◦ Faster, better decision making ◦ New products and services Big data Analytics Ecosystems ◦ Data explorations ◦ Data preparation ◦ Modeling
  • 3. Pillars of Big Data: Overview
  • 4. Pillars of Big Data: Storage RDBMS ◦ Ensures ACID ◦ Performance and scalability DFS ◦ Client server architecture ◦ Hiding information from use e.g location ◦ Concurrency transparency ◦ Failure transparency ◦ Replication and Scalability transparency ◦ E.g GFS, HDFS,CFS No SQL ◦ Sacrifice the consistency, to have high availability and scalability ◦ Data store as key/value pairs ◦ Supports three types ◦ E.g MVCC, COD, DOD,Graph
  • 5. Pillars of Big Data: Processing Batch Processing ◦ Execute a series of jobs without manual intervention ◦ E.g Hadoop ◦ Real life example ◦ Credit card ◦ Map Reduce ◦ Map ◦ Shuffle ◦ Reduce Interactive Processing ◦ Requires human interaction ◦ Real life example ◦ Spreadsheets
  • 6. Pillars of Big Data: Processing Iterative Processing ◦ Machine learning operations ◦ Requires several passes for the algorithm to converge ◦ HaLoop, Main Memory MapReduce(M3R) ◦ Real life example ◦ Evaluation of mathematical expression Incremental Processing ◦ Analyze data in motion ◦ Requires quick actions ◦ Full data is not required for algorithm ◦ E.g Apache Storm, Microsoft Trill ◦ Real life example ◦ Check on incoming data stream for security
  • 7. Pillars of Big Data: Processing Approximate Processing ◦ Quick retrieval of approximate results from a small sample ◦ E.g Early Accurate Result Library (EARL), Blink DB In-Database Processing ◦ In database machine learning ◦ Microsoft SQL Server Analysis Services (SSAS)41
  • 8. Pillars of Big Data: Analytics Orchestration Orchestrate complex analytic jobs and workflows to achieve the user’s goals Scheduling . Resource Utilization Resources : Memory, CPU, Network and Disk the idea of effective resource utilization to mitigate idle resources . Hadoop 1.0 Shortcomings . Apache Hadoop YARN . Data Locality ensure data and processing on same node to avoid network congestion
  • 9. Pillars of Big Data: Analytics Orchestration Provisioning . Resource Provisioning Resource allocation to jobs with minimal cost & execution time . Resource Set (RS) Maximizer . Conductor . Data Provisioning
  • 10. Pillars of Big Data: Analytics Assistance narrowing analytics talent gap by magnifying internal skill set using in tool assistance Static Assistance . Tooltips . Help Pages . Wizards Intelligent Assistance . Data Preparation determining and converting irrelevant data/attributes to meaningful info
  • 11. Pillars of Big Data: Analytics Assistance . Selecting Operations . Expert Systems (ES) . Meta-Learning Systems (MLSs) . Ontology Reasoners (OR) . Automatic workflow generation Provide a workflow based on the input data and existing problem . Fault Detection and Handling When the data is Big Data, failure in the middle is a catastrophe
  • 12. Pillars of Big Data: User Interfaces Full power of analytics solutions are limited to relevant users. Five approaches for user interfaces:  Scripts  SQL-based Interfaces  Graph based interfaces  Sheets  Visualizations
  • 13. Pillars of Big Data: User Interfaces Scripts: Analytics at programming level Interface’s can be CLI or API Low level coding Supports data mining Mostly avoided by a normal user Such as: R for statisticians, Matlab and weka SQL-based Interfaces Unified SQL interface – extended SQL Use of UDF’s (User defined functions) Further classification: SQL-on-Hadoop Machine learning SQL
  • 14. Pillars of Big Data: User Interfaces Graphs: No need to code Drag and drop Panel (Operations) and canvas (Processing) Such as: Rapidminer, IBM SPSS modeler, WINGS etc Sheets: Most fissile for business organization as it deals with spreadsheets Focused on data exploration in easiest way Compatible with moving data on another solutions Such as: Power query, Microsoft Tabular, Google open refine etc
  • 15. Pillars of Big Data: User Interfaces Visualization To control the high probability of analyzing the wrong or incompatible set of attributes Suitable for large business firms Lack of machine learning techniques Such as: IBM Watson analytics, SAS visual Analytics etc
  • 16. Pillars of Big Data: Deployment Many components that needs to be integrated together Deployment challenges includes ◦ Complexity ◦ Challenging ◦ Scope beyond the in house IT technicians
  • 17. Pillars of Big Data: Deployment Product: Use of product deployment models to ensure privacy and security ◦Cost ◦IT-Staff ◦Limited Scalability Most components are open source platforms but again integration is the major issue
  • 18. Pillars of Big Data: Deployment Service: Services provided on demand, solution cost pay per user/data. Security and privacy is an issue and cost of moving data to provider’s cloud. Hybrid cloud Data storage and processing residing on the organization infrastructure
  • 19. Future Directions: Each solution brings some features not available in the others, but also adds some limitations and overheads. While there has been a continuous improvement in analytics solutions to address different analytics scenarios, there are still some gaps.
  • 20. Conclusions: Difficult to select suitable analytics solution because a weak component in the ecosystem can cause the whole ecosystem to function inefficiently. For each of these pillars, different approaches are discussed and popular systems are presented. The pillars form a taxonomy that aims to give an overview on the field, to guide organizations and researchers to build their Big Data Analytics Ecosystem, and help to identify challenges and opportunities in the field.

Editor's Notes

  1. What : Big data is an evolving term that describes any voluminous amount of structured, semi structured and unstructured data that has the potential to be mined for information. quickly. Increasingly, organizations’ success has become dependent on how quickly and efficiently they can turn the petabytes of data they collect into actionable information Data can be structured, which is generated by applications like Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems and typically stored in rows and columns with well-defined schemas. It can be semi-structured, which is generated by sensors, web feeds, event monitors, stock market feeds, and network and security systems. Where:With almost everything now online, organizations look at the Big Data collected to gain insights for improving their services. Why: Cost reduction Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data – plus they can identify more efficient ways of doing business Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new sources of data, businesses are able to analyze information immediately – and make decisions based on what they’ve learned. New products and services. With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. Davenport points out that with big data analytics, more companies are creating new products to meet customers’ needs. Data Exploration: Analysts go through the data, using ad-hoc queries and visualizations, to better understand the data; Data preparation: Analysts clean, prepare, and transform the data for modeling using batch processing to run computational and IO intensive operations; Data models are trained, using iterative processing, on the prepared data and trained models are used to score the unlabeled data.
  2. —Storage that handles the data’s huge volume, fast arrival, and multiple formats; —Processing that meets the Big Data Analytics processing needs; —Orchestration that manages available resources to reduce processing time and cost; —Assistance that goes beyond the interface and provides suggestions to help users with decisions when selecting operations and building their analytics process; —User Interface that provides users with a familiar environment to build and run their analytics; —Deployment Method that provides scalability, security, and reliability.
  3. ACID (Atomicity, Consistency, Isolation, and Durability) Recent RDBMSs developpments promise enhanced performance and scalability Hadoop File systems HDFS Cassandra File System (CFS) Voldemort and Riak use Multi Version Concurrency Control (MVCC) Column-Oriented Datab Document-Oriented Databasease
  4. MapReduce, as presented in Figure 3, consists of Map, Shuffle, and Reduce phases, which are executed sequentially, utilizing all nodes in the cluster. In the Map phase, the programmer-provided Map function (Mapper) processes the input data and outputs intermediate data in the form of <key, value> tuples which get stored on disk. The Shuffle phase then groups values to the same key together and sends them to the reduce nodes over the network. Finally, the programmer-provided Reduce function Reducer) reads the intermediate data from disk, processes it, and generates the final output.. Batch processing