In this session you will learn:
Importance of Data
ESG Report on Data Analytics
What is BigData?
Structured vs. Unstructured Data
Challenges of BigData
Why Distributed Processing?
BigData & it’s Hype
To know more, click here: https://www.mindsmapped.com/courses/big-data-hadoop/big-data-and-hadoop-training-for-beginners/
A Beginners Guide to Building a RAG App Using Open Source Milvus
Introduction to Big Data and Hadoop
1. Big Data and Hadoop Training
Introduction to Big Data and Hadoop
2. Page 2Classification: Restricted
Agenda
• Importance of Data
• ESG Report on Data Analytics
• What is BigData?
• Structured vs. Unstructured Data
• Challenges of BigData
• Why Distributed Processing?
• BigData & it’s Hype
3. Page 3Classification: Restricted
• “Data is the new oil,” said Andreas Weigend, social data guru and former chief
scientist at Amazon.com. “Oil needs to be refined before it can be useful.” g
• Data analysis is important to businesses will be an understatement. In fact, no
business can survive without analyzing available data
Importance of Data
4. Page 4Classification: Restricted
• Majority of organizations view data analytics as a top 5 business and IT priority
• Reduced costs and process improvement are top data analytics platform
benefits
• No leading data analytics platform has emerged yet. Nearly one-third of the
organizations surveyed are using a custom-developed solution
• Big data is driving changes in analytics tools, infrastructure, and processes
ESG Report on Data Analytics
11. Page 11Classification: Restricted
What is BigData?
•Lots of Data (in terms of Terabytes or Petabytes)
•It is a term applied to data-sets whose size is beyond the ability of
commonly used software tools to capture, manage & process within a
tolerable elapsed time.
•Systems/Enterprises generate huge amount of data from Terabytes to even
Petabytes.
14. Page 14Classification: Restricted
Quiz Time
•For the given file formats, identify which category of data that it belongs to:
•Word Docs, PDFs, Text files
•email body
• XML files
•Data generated by ERPs, CRMs etc.
17. Page 17Classification: Restricted
Why Distributed Processing?
To Read 1 TB of data:
Time to Process:
(1TB/100MB) =
10485 sec or
175min.
Time to Process:
(1TB/5*100MB) =
2097 sec or 35 min.
18. Page 18Classification: Restricted
•Gartner: Hadoop will be in two-thirds of advanced analytics products by
2015
•Livemint.com: SMAC is the new flavour of IT companies
SMAC will allow the IT industry to offer more value to the clients
•Offshore Insights: Growth of IT companies will be dictated by cloud, mobile,
analytics, big data and social media services, according to a survey of 410
global IT decision-makers by research firm Offshore Insights, released in
February
BigData & it’s Hype