Implementing a big data project is difficult. Hadoop is complex, and data governance is crucial. Learn common big data challenges and how to overcome them.
3. It’s easy to get caught up in the
hype and opportunity of big data.
4. However, one of the reasons big data is so
underutilized is because big data and big
data technologies also present many challenges.
5. One survey found that
55% of big data projects are never completed.
So what’s the problem with big data?
55%
6. 7 CHALLENGES:
5. Data Quality
6. Security
7. Cost Management
1. Hadoop is Hard
2. Scalability
3. Lack of Talent
4. Actionable Insights
7. While Hadoop and the surrounding ecosystem of
tools is lauded for its ability to handle massive
volumes of structured and unstructured data, the
software isn’t easy to manage or use.
1
HADOOP IS HARD
8. Hadoop frequently requires extensive internal resources to
maintain, and many businesses are left devoting most of
their resources to the technology rather than to the actual
big data problem they are trying to solve.
9. 73% of Hadoop users claimed understanding
the big data platform was the most
significant challenge of a big data project.
73%
10. Many organizations fail to take into account how
quickly a big data project can grow and evolve.
2
SCALABILITY
11. Big data workloads also tend to be bursty, making it difficult
to allocate capacity for resources.
12. To successfully implement a big data project requires
a sophisticated team of developers, data scientists
and analysts who also have a sufficient amount of
domain knowledge to identify valuable insights.
3
LACK OF TALENT
13. Many big data vendors seek to overcome this challenge
by providing educational resources or by providing more
automation of the platform management.
14. A key challenge for data science teams is to identify
a clear business objective and the appropriate data
sources to collect and analyze to meet that objective.
4
ACTIONABLE INSIGHTS
15. Once key patterns have been identified, businesses must
be prepared to act and make necessary changes in order
to derive business value from them.
16. Dirty data costs companies in the United States
$600 billion every year.
5
DATA QUALITY
17. Common causes of dirty data include
1. User Input Errors
2. Duplicate Data
3. Incorrect Data Linking
1 2 3
18. Specific challenges include:
1. User authentication for every team and team member
accessing the data
2. Restricting access based on a user’s need
3. Recording data access histories and meeting other
compliance regulations
4. Proper use of encryption on data in-transit and at rest.
6
SECURITY
19. The challenge lies in taking into account all costs
of the project.
7
COST MANAGEMENT
21. Big data in the cloud projects must carefully
evaluate the service-level agreement with
the provider to determine how usage will be
billed and if there will be any additional fees.
$
22. While the number of big data challenges
can be overwhelming, it also presents an
opportunity. Those businesses who are able
to identify the right infrastructure for their
big data project and follow best practices
for implementation will see a significant
competitive advantage.
23. Ready to learn how you can be
successful with big data in the cloud?
Download the big data in the cloud success sheet to learn
implementation best practices and hangups to avoid.
Download Success Sheet