SlideShare a Scribd company logo
1 of 23
Download to read offline
BIG DATA CHALLENGES
AND HOW TO OVERCOME THEM
7
PRESENTED BY:
It’s easy to get caught up in the
hype and opportunity of big data.
However, one of the reasons big data is so
underutilized is because big data and big
data technologies also present many challenges.
One survey found that
55% of big data projects are never completed.
So what’s the problem with big data?
55%
7 CHALLENGES:
5. Data Quality
6. Security
7. Cost Management
1. Hadoop is Hard
2. Scalability
3. Lack of Talent
4. Actionable Insights
While Hadoop and the surrounding ecosystem of
tools is lauded for its ability to handle massive
volumes of structured and unstructured data, the
software isn’t easy to manage or use.
1
HADOOP IS HARD
Hadoop frequently requires extensive internal resources to
maintain, and many businesses are left devoting most of
their resources to the technology rather than to the actual
big data problem they are trying to solve.
73% of Hadoop users claimed understanding
the big data platform was the most
significant challenge of a big data project.
73%
Many organizations fail to take into account how
quickly a big data project can grow and evolve.
2
SCALABILITY
Big data workloads also tend to be bursty, making it difficult
to allocate capacity for resources.
To successfully implement a big data project requires
a sophisticated team of developers, data scientists
and analysts who also have a sufficient amount of
domain knowledge to identify valuable insights.
3
LACK OF TALENT
Many big data vendors seek to overcome this challenge
by providing educational resources or by providing more
automation of the platform management.
A key challenge for data science teams is to identify
a clear business objective and the appropriate data
sources to collect and analyze to meet that objective.
4
ACTIONABLE INSIGHTS
Once key patterns have been identified, businesses must
be prepared to act and make necessary changes in order
to derive business value from them.
Dirty data costs companies in the United States
$600 billion every year.
5
DATA QUALITY
Common causes of dirty data include
1. User Input Errors
2. Duplicate Data
3. Incorrect Data Linking
1 2 3
Specific challenges include:
1. User authentication for every team and team member
accessing the data
2. Restricting access based on a user’s need
3. Recording data access histories and meeting other
compliance regulations
4. Proper use of encryption on data in-transit and at rest.
6
SECURITY
The challenge lies in taking into account all costs
of the project.
7
COST MANAGEMENT
Businesses pursuing on-premises projects must remember
the cost of training, maintenance and expansion.
Big data in the cloud projects must carefully
evaluate the service-level agreement with
the provider to determine how usage will be
billed and if there will be any additional fees.
$
While the number of big data challenges
can be overwhelming, it also presents an
opportunity. Those businesses who are able
to identify the right infrastructure for their
big data project and follow best practices
for implementation will see a significant
competitive advantage.
Ready to learn how you can be
successful with big data in the cloud?
Download the big data in the cloud success sheet to learn
implementation best practices and hangups to avoid.
Download Success Sheet

More Related Content

What's hot

5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...PROWEBSCRAPER
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and ImplementationSHIKHA GAUTAM
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big DataRavinder Kamboj
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 

What's hot (20)

5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
What is big data?
What is big data?What is big data?
What is big data?
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 

Viewers also liked

5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Internet of Things- Applications
Internet of Things- ApplicationsInternet of Things- Applications
Internet of Things- ApplicationsRavindra Dastikop
 
Arduino based intelligent greenhouse Project
Arduino based intelligent greenhouse ProjectArduino based intelligent greenhouse Project
Arduino based intelligent greenhouse ProjectAmit Saini
 
Big data - short intro on NGS challenges
Big data - short intro on NGS challengesBig data - short intro on NGS challenges
Big data - short intro on NGS challengesPawel Szczesny
 
Qubole State of the Big Data Industry
Qubole State of the Big Data IndustryQubole State of the Big Data Industry
Qubole State of the Big Data IndustryQubole
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConfQubole
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on CloudQubole
 
Informatica Big Data Edition - Profinit - Jan Ulrych
Informatica Big Data Edition - Profinit - Jan UlrychInformatica Big Data Edition - Profinit - Jan Ulrych
Informatica Big Data Edition - Profinit - Jan UlrychProfinit
 
State of Big Data Adoption
State of Big Data AdoptionState of Big Data Adoption
State of Big Data AdoptionQubole
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on YarnQubole
 
Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7mmathipra
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at PinterestQubole
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientPerficient, Inc.
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Social io t-sito s-iot
Social io t-sito s-iotSocial io t-sito s-iot
Social io t-sito s-iotLuigi Atzori
 
Big Data at Pinterest - Presented by Qubole
Big Data at Pinterest - Presented by QuboleBig Data at Pinterest - Presented by Qubole
Big Data at Pinterest - Presented by QuboleQubole
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 

Viewers also liked (20)

5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Internet of Things- Applications
Internet of Things- ApplicationsInternet of Things- Applications
Internet of Things- Applications
 
Arduino based intelligent greenhouse Project
Arduino based intelligent greenhouse ProjectArduino based intelligent greenhouse Project
Arduino based intelligent greenhouse Project
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Mity Open Access
Mity Open AccessMity Open Access
Mity Open Access
 
Big data - short intro on NGS challenges
Big data - short intro on NGS challengesBig data - short intro on NGS challenges
Big data - short intro on NGS challenges
 
Qubole State of the Big Data Industry
Qubole State of the Big Data IndustryQubole State of the Big Data Industry
Qubole State of the Big Data Industry
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on Cloud
 
Informatica Big Data Edition - Profinit - Jan Ulrych
Informatica Big Data Edition - Profinit - Jan UlrychInformatica Big Data Edition - Profinit - Jan Ulrych
Informatica Big Data Edition - Profinit - Jan Ulrych
 
State of Big Data Adoption
State of Big Data AdoptionState of Big Data Adoption
State of Big Data Adoption
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on Yarn
 
Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at Pinterest
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and Perficient
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Social io t-sito s-iot
Social io t-sito s-iotSocial io t-sito s-iot
Social io t-sito s-iot
 
Big Data at Pinterest - Presented by Qubole
Big Data at Pinterest - Presented by QuboleBig Data at Pinterest - Presented by Qubole
Big Data at Pinterest - Presented by Qubole
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 

Similar to 7 Big Data Challenges and How to Overcome Them

The value of big data analytics
The value of big data analyticsThe value of big data analytics
The value of big data analyticsMarc Vael
 
2018 Big Data Trends: Liberate, Integrate, and Trust Your Data
2018 Big Data Trends: Liberate, Integrate, and Trust Your Data2018 Big Data Trends: Liberate, Integrate, and Trust Your Data
2018 Big Data Trends: Liberate, Integrate, and Trust Your DataPrecisely
 
Group 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptxGroup 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptxNATASHABANO
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analyticsThe Marketing Distillery
 
Understanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidenceUnderstanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidenceIBM Software India
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
Intel Big Data Analysis Peer Research Slideshare 2013
Intel Big Data Analysis Peer Research Slideshare 2013Intel Big Data Analysis Peer Research Slideshare 2013
Intel Big Data Analysis Peer Research Slideshare 2013Intel IT Center
 
Delivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeDelivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeJean-Michel Franco
 
Big Data Trends and Challenges Report - Whitepaper
Big Data Trends and Challenges Report - WhitepaperBig Data Trends and Challenges Report - Whitepaper
Big Data Trends and Challenges Report - WhitepaperVasu S
 
Big Data Management: A Unified Approach to Drive Business Results
Big Data Management: A Unified Approach to Drive Business ResultsBig Data Management: A Unified Approach to Drive Business Results
Big Data Management: A Unified Approach to Drive Business ResultsCA Technologies
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Dell World
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analyticsThe Marketing Distillery
 
Achieving Flexible Scalability of Hadoop to Meet Enterprise Workload Requirem...
Achieving Flexible Scalability of Hadoop to Meet Enterprise Workload Requirem...Achieving Flexible Scalability of Hadoop to Meet Enterprise Workload Requirem...
Achieving Flexible Scalability of Hadoop to Meet Enterprise Workload Requirem...EMC
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big datasarfraznawaz
 
Module 1 the power of data
Module 1 the power of dataModule 1 the power of data
Module 1 the power of datacaniceconsulting
 
1Running head BIG DATA6BIG DATAMIT 681 MSIT.docx
1Running head BIG DATA6BIG DATAMIT 681  MSIT.docx1Running head BIG DATA6BIG DATAMIT 681  MSIT.docx
1Running head BIG DATA6BIG DATAMIT 681 MSIT.docxaulasnilda
 
Data Management Trends 2022_Shailendra Mruthyunjayappa.pdf
Data Management Trends 2022_Shailendra Mruthyunjayappa.pdfData Management Trends 2022_Shailendra Mruthyunjayappa.pdf
Data Management Trends 2022_Shailendra Mruthyunjayappa.pdfShailendra Mruthyunjayappa
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxsalutiontechnology
 

Similar to 7 Big Data Challenges and How to Overcome Them (20)

The value of big data analytics
The value of big data analyticsThe value of big data analytics
The value of big data analytics
 
2018 Big Data Trends: Liberate, Integrate, and Trust Your Data
2018 Big Data Trends: Liberate, Integrate, and Trust Your Data2018 Big Data Trends: Liberate, Integrate, and Trust Your Data
2018 Big Data Trends: Liberate, Integrate, and Trust Your Data
 
Group 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptxGroup 2 Handling and Processing of big data (1).pptx
Group 2 Handling and Processing of big data (1).pptx
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 
Understanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidenceUnderstanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidence
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Intel Big Data Analysis Peer Research Slideshare 2013
Intel Big Data Analysis Peer Research Slideshare 2013Intel Big Data Analysis Peer Research Slideshare 2013
Intel Big Data Analysis Peer Research Slideshare 2013
 
Delivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeDelivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data Lake
 
Big Data Trends and Challenges Report - Whitepaper
Big Data Trends and Challenges Report - WhitepaperBig Data Trends and Challenges Report - Whitepaper
Big Data Trends and Challenges Report - Whitepaper
 
Big Data Management: A Unified Approach to Drive Business Results
Big Data Management: A Unified Approach to Drive Business ResultsBig Data Management: A Unified Approach to Drive Business Results
Big Data Management: A Unified Approach to Drive Business Results
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 
Achieving Flexible Scalability of Hadoop to Meet Enterprise Workload Requirem...
Achieving Flexible Scalability of Hadoop to Meet Enterprise Workload Requirem...Achieving Flexible Scalability of Hadoop to Meet Enterprise Workload Requirem...
Achieving Flexible Scalability of Hadoop to Meet Enterprise Workload Requirem...
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big data
 
ii mca juno
ii mca junoii mca juno
ii mca juno
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Module 1 the power of data
Module 1 the power of dataModule 1 the power of data
Module 1 the power of data
 
1Running head BIG DATA6BIG DATAMIT 681 MSIT.docx
1Running head BIG DATA6BIG DATAMIT 681  MSIT.docx1Running head BIG DATA6BIG DATAMIT 681  MSIT.docx
1Running head BIG DATA6BIG DATAMIT 681 MSIT.docx
 
Data Management Trends 2022_Shailendra Mruthyunjayappa.pdf
Data Management Trends 2022_Shailendra Mruthyunjayappa.pdfData Management Trends 2022_Shailendra Mruthyunjayappa.pdf
Data Management Trends 2022_Shailendra Mruthyunjayappa.pdf
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
 

More from Qubole

Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup Qubole
 
BIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleBIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleQubole
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveQubole
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudQubole
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataQubole
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data TipsQubole
 
Big dataproposal
Big dataproposalBig dataproposal
Big dataproposalQubole
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloudQubole
 
Basic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveBasic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveQubole
 
Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive QueriesQubole
 

More from Qubole (11)

Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup   Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
 
BIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - QuboleBIPD Tech Tuesday Presentation - Qubole
BIPD Tech Tuesday Presentation - Qubole
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
Big dataproposal
Big dataproposalBig dataproposal
Big dataproposal
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
 
Basic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveBasic Sentiment Analysis using Hive
Basic Sentiment Analysis using Hive
 
Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive Queries
 

Recently uploaded

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideHironori Washizaki
 

Recently uploaded (20)

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
 

7 Big Data Challenges and How to Overcome Them

  • 1. BIG DATA CHALLENGES AND HOW TO OVERCOME THEM 7
  • 3. It’s easy to get caught up in the hype and opportunity of big data.
  • 4. However, one of the reasons big data is so underutilized is because big data and big data technologies also present many challenges.
  • 5. One survey found that 55% of big data projects are never completed. So what’s the problem with big data? 55%
  • 6. 7 CHALLENGES: 5. Data Quality 6. Security 7. Cost Management 1. Hadoop is Hard 2. Scalability 3. Lack of Talent 4. Actionable Insights
  • 7. While Hadoop and the surrounding ecosystem of tools is lauded for its ability to handle massive volumes of structured and unstructured data, the software isn’t easy to manage or use. 1 HADOOP IS HARD
  • 8. Hadoop frequently requires extensive internal resources to maintain, and many businesses are left devoting most of their resources to the technology rather than to the actual big data problem they are trying to solve.
  • 9. 73% of Hadoop users claimed understanding the big data platform was the most significant challenge of a big data project. 73%
  • 10. Many organizations fail to take into account how quickly a big data project can grow and evolve. 2 SCALABILITY
  • 11. Big data workloads also tend to be bursty, making it difficult to allocate capacity for resources.
  • 12. To successfully implement a big data project requires a sophisticated team of developers, data scientists and analysts who also have a sufficient amount of domain knowledge to identify valuable insights. 3 LACK OF TALENT
  • 13. Many big data vendors seek to overcome this challenge by providing educational resources or by providing more automation of the platform management.
  • 14. A key challenge for data science teams is to identify a clear business objective and the appropriate data sources to collect and analyze to meet that objective. 4 ACTIONABLE INSIGHTS
  • 15. Once key patterns have been identified, businesses must be prepared to act and make necessary changes in order to derive business value from them.
  • 16. Dirty data costs companies in the United States $600 billion every year. 5 DATA QUALITY
  • 17. Common causes of dirty data include 1. User Input Errors 2. Duplicate Data 3. Incorrect Data Linking 1 2 3
  • 18. Specific challenges include: 1. User authentication for every team and team member accessing the data 2. Restricting access based on a user’s need 3. Recording data access histories and meeting other compliance regulations 4. Proper use of encryption on data in-transit and at rest. 6 SECURITY
  • 19. The challenge lies in taking into account all costs of the project. 7 COST MANAGEMENT
  • 20. Businesses pursuing on-premises projects must remember the cost of training, maintenance and expansion.
  • 21. Big data in the cloud projects must carefully evaluate the service-level agreement with the provider to determine how usage will be billed and if there will be any additional fees. $
  • 22. While the number of big data challenges can be overwhelming, it also presents an opportunity. Those businesses who are able to identify the right infrastructure for their big data project and follow best practices for implementation will see a significant competitive advantage.
  • 23. Ready to learn how you can be successful with big data in the cloud? Download the big data in the cloud success sheet to learn implementation best practices and hangups to avoid. Download Success Sheet