BIG DATA
Delivered by
Seema Navaghare B.
Sheetal Mahagade D.
Guided by
Titare S.I.
 Introduction
 Characteristics
 Architecture
 Challenges
 Types of Big Data
 Applications
 Benefits
 Conclusion
 Definition
Data: Beyond the storage capacity and beyond
the processing power.
 Big Data includes
• Social media Data
• Stock exchange Data
• Power grid Data
• Transport Data
• Search engine Data
 Volume
 Variety
 Velocity
 Variability
 Value
Data
ingest
staging
processing
Dataworkflow
management
Access
Insight
Hadoop framework
Physical H/w
Value of Data
Data pipeline
 Data Resource Identification
 Data Ingestion
 Data Staging
 Hadoop framework (Data processing)
 Data pipeline
 Data workflow management
 Physical H/w
Data ingestion
Event ingestion
Batch
Ingestion
Operational
system 1
Operational
system 2
Flat files
Staging Area
Wharehouse
Staging
Database
Metadata Aggregate
data
Raw Data
HDFS
Map Reduce Algorithm
Big Data
Landing
Zone
Keyword
research
Content
classification
User
Segmentation
Receive
data
Verify
data
Transform Load
Report
error
Valid?
 Data analysis
 Data curation
 Search
 Storage
 Information privacy
 Sharing
 Structured
 Semi-structured
 Unstructured
 Healthcare
 Public sector
 Education
 Banking
 Industry
 Fully understanding the potential of data-driven
marketing
 Improving customer engagement and increasing
customer loyalty
 Reevaluating risk portfolios quickly
 Personalizing the customer experience
 Adding value to online and offline customer
interactions
The availability of Big Data ,low-cost commodity h/w,
And new information management and analytic s/w
Have produced a unique moment in the history of Data
analysis.
The Convergence of these trends means that we have
The capabilities required to analyze astonishing data
sets quickly and cost effectively for the first time in
history
[1].S. Madden From Databases to Big Data IEEE
Internet Computing, 16 (2012 June), pp.4-6
[2].Apache Software Foundation. Official Website
www.Apache.hadoop.org.
[3].Jeffrey Dean and Sanjay Chemawat,“MapReduce:
Simplified Data Processing On Large Clusters”,
CACM Jan. 2008 (PDF).
Thank you…..

Pptbig data4