In this session you will learn:
1. Importance of Data
2. ESG Report on Data Analytics
3. What is Big Data?
4. Structured vs. Unstructured Data
5. Definition
6. Challenges of Big Data
7. Why Distributed Processing?
8. Big Data & It’s Hype
9. Case Studies
2. Page 2Classification: Restricted
Agenda
•Importance of Data
•ESG Report on Data Analytics
•What is Big Data?
•Structured vs. Unstructured Data
•Definition
•Challenges of Big Data
•Why Distributed Processing?
•BigData & it’s Hype
•Case Studies
3. Page 3Classification: Restricted
Importance of Data
• “Data is the new oil,” said Andreas Weigend, social data
guru and former chief scientist at Amazon.com.
• “Oil needs to be refined before it can be useful.” g
• Data analysis is important to businesses will be an
understatement. In fact, no business can survive without
analyzing available data
4. Page 4Classification: Restricted
ESG Report on Data Analytics
•The Hadoop market is forecast to grow at a compound
annual growth rate (CAGR) 58% surpassing $1 billion by
2020.
•Majority of organizations view data analytics as a top 5
business and IT priority
•Reduced costs and process improvement are top data
analytics platform benefits
•No leading data analytics platform has emerged yet.
Nearly one-third of the organizations surveyed are
using a custom-developed solution
•Big data is driving changes in analytics tools,
infrastructure, and processes
11. Page 11Classification: Restricted
What is Big Data?
• Huge Data (in terms of Terabytes or Petabytes)
• It is a term applied to data-sets whose size is beyond
the ability of commonly used software tools to
capture, manage & process within a tolerable
elapsed time
14. Page 14Classification: Restricted
Quiz Time
For the given file formats, identify which category of
data that it belongs to:
A. Word Docs, PDFs, Text files
B. email body
C. XML files
D. Data generated by ERPs, CRMs etc.
16. Page 16Classification: Restricted
Why Distributed Processing?
To Read 1 TB of data:
Disk seek-time:
100 MB/sec
1TB/100MB
175 minutes
Disk seek-time:
100 MB/sec
17. Page 17Classification: Restricted
Why Distributed Processing?
To Read 1 TB of data:
Time to Process:
(1TB/100MB) =
10485 sec or
175min.
Time to Process:
(1TB/5*100MB) =
2097 sec or 35 min.
18. Page 18Classification: Restricted
BigData & it’s Hype
To Read 1 TB of data:
Time to Process:
(1TB/100MB) =
10485 sec or
175min.
Time to Process:
(1TB/5*100MB) =
2097 sec or 35 min.
19. Page 19Classification: Restricted
BigData & it’s Hype
Gartner: Hadoop will be in two-thirds of advanced
analytics products by 2015
Livemint.com: SMAC is the new flavour of IT
companies
SMAC will allow the IT industry to offer more value to
the clients
Offshore Insights: Growth of IT companies will be
dictated by cloud, mobile, analytics, big data and
social media services, according to a survey of 410
global IT decision-makers by research firm Offshore
Insights, released in February