This document provides an overview of big data concepts including definitions of big data, characteristics of big data using the 5Vs model, common big data technologies like Hadoop and MapReduce, and use cases. It discusses how big data has evolved over time through increased data volumes and varieties. Key frameworks like HDFS and MapReduce that enable distributed storage and processing of large datasets are explained. Examples of big data applications in areas such as banking are also provided.
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
This presentation about Big Data will help you understand how Big Data evolved over the years, what is Big Data, applications of Big Data, a case study on Big Data, 3 important challenges of Big Data and how Hadoop solved those challenges. The case study talks about Google File System (GFS), where you’ll learn how Google solved its problem of storing increasing user data in early 2000. We’ll also look at the history of Hadoop, its ecosystem and a brief introduction to HDFS which is a distributed file system designed to store large volumes of data and MapReduce which allows parallel processing of data. In the end, we’ll run through some basic HDFS commands and see how to perform wordcount using MapReduce. Now, let us get started and understand Big Data in detail.
Below topics are explained in this Big Data presentation for beginners:
1. Evolution of Big Data
2. Why Big Data?
3. What is Big Data?
4. Challenges of Big Data
5. Hadoop as a solution
6. MapReduce algorithm
7. Demo on HDFS and MapReduce
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
This presentation about Big Data will help you understand how Big Data evolved over the years, what is Big Data, applications of Big Data, a case study on Big Data, 3 important challenges of Big Data and how Hadoop solved those challenges. The case study talks about Google File System (GFS), where you’ll learn how Google solved its problem of storing increasing user data in early 2000. We’ll also look at the history of Hadoop, its ecosystem and a brief introduction to HDFS which is a distributed file system designed to store large volumes of data and MapReduce which allows parallel processing of data. In the end, we’ll run through some basic HDFS commands and see how to perform wordcount using MapReduce. Now, let us get started and understand Big Data in detail.
Below topics are explained in this Big Data presentation for beginners:
1. Evolution of Big Data
2. Why Big Data?
3. What is Big Data?
4. Challenges of Big Data
5. Hadoop as a solution
6. MapReduce algorithm
7. Demo on HDFS and MapReduce
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Big Data may well be the Next Big Thing in the IT world. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
this slide is for brief introduction to the big data with little bit of fun through memes.
it is prepared with the articles from different websites about big data and some of my own words so it would be great if you like it
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
this presentation describes the company from where I did my summer training and what is bigdata why we use big data, big data challenges, the issue in big data, the solution of big data issues, hadoop, docker , Ansible etc.
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
This Edureka Big Data Analytics Tutorial will help you to understand the basics of Big Data domain. Learn how to analyze Big Data in this tutorial. Below are the topics covered in this tutorial:
1) Big Data Introduction
2) What is Big Data Analytics?
3) Why Big Data Analytics?
4) Stages in Big Data Analytics
5) Big Data Analytics Domains
6) Big Data Analytics Use Cases
Subscribe to our channel to get updates.
Check our complete Hadoop playlist here: https://goo.gl/4OyoTW
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
As part of this session, I will be giving an introduction to Data Engineering and Big Data. It covers up to date trends.
* Introduction to Data Engineering
* Role of Big Data in Data Engineering
* Key Skills related to Data Engineering
* Role of Big Data in Data Engineering
* Overview of Data Engineering Certifications
* Free Content and ITVersity Paid Resources
Don't worry if you miss the video - you can click on the below link to go through the video after the schedule.
https://youtu.be/dj565kgP1Ss
* Upcoming Live Session - Overview of Big Data Certifications (Spark Based) - https://www.meetup.com/itversityin/events/271739702/
Relevant Playlists:
* Apache Spark using Python for Certifications - https://www.youtube.com/playlist?list=PLf0swTFhTI8rMmW7GZv1-z4iu_-TAv3bi
* Free Data Engineering Bootcamp - https://www.youtube.com/playlist?list=PLf0swTFhTI8pBe2Vr2neQV7shh9Rus8rl
* Join our Meetup group - https://www.meetup.com/itversityin/
* Enroll for our labs - https://labs.itversity.com/plans
* Subscribe to our YouTube Channel for Videos - http://youtube.com/itversityin/?sub_confirmation=1
* Access Content via our GitHub - https://github.com/dgadiraju/itversity-books
* Lab and Content Support using Slack
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Big Data may well be the Next Big Thing in the IT world. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
this slide is for brief introduction to the big data with little bit of fun through memes.
it is prepared with the articles from different websites about big data and some of my own words so it would be great if you like it
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
this presentation describes the company from where I did my summer training and what is bigdata why we use big data, big data challenges, the issue in big data, the solution of big data issues, hadoop, docker , Ansible etc.
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
This Edureka Big Data Analytics Tutorial will help you to understand the basics of Big Data domain. Learn how to analyze Big Data in this tutorial. Below are the topics covered in this tutorial:
1) Big Data Introduction
2) What is Big Data Analytics?
3) Why Big Data Analytics?
4) Stages in Big Data Analytics
5) Big Data Analytics Domains
6) Big Data Analytics Use Cases
Subscribe to our channel to get updates.
Check our complete Hadoop playlist here: https://goo.gl/4OyoTW
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
As part of this session, I will be giving an introduction to Data Engineering and Big Data. It covers up to date trends.
* Introduction to Data Engineering
* Role of Big Data in Data Engineering
* Key Skills related to Data Engineering
* Role of Big Data in Data Engineering
* Overview of Data Engineering Certifications
* Free Content and ITVersity Paid Resources
Don't worry if you miss the video - you can click on the below link to go through the video after the schedule.
https://youtu.be/dj565kgP1Ss
* Upcoming Live Session - Overview of Big Data Certifications (Spark Based) - https://www.meetup.com/itversityin/events/271739702/
Relevant Playlists:
* Apache Spark using Python for Certifications - https://www.youtube.com/playlist?list=PLf0swTFhTI8rMmW7GZv1-z4iu_-TAv3bi
* Free Data Engineering Bootcamp - https://www.youtube.com/playlist?list=PLf0swTFhTI8pBe2Vr2neQV7shh9Rus8rl
* Join our Meetup group - https://www.meetup.com/itversityin/
* Enroll for our labs - https://labs.itversity.com/plans
* Subscribe to our YouTube Channel for Videos - http://youtube.com/itversityin/?sub_confirmation=1
* Access Content via our GitHub - https://github.com/dgadiraju/itversity-books
* Lab and Content Support using Slack
Open science can contribute to AI trustworthiness. This talk is a categorization of scientific data platforms, and a framing of AI trustworthiness with pointers to open science contributions.
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
Palestra, em inglês, "Publishing Data on the Web" sobre o documento Data on the Web Best Practices, apresentada na Semana de Metodologia NIC.br, em São Paulo, dia 12 de abril de 2016.
Learn Big data and Hadoop online at Easylearning Guru. We are offer Instructor led online training and Life Time LMS (Learning Management System). Join Our Free Live Demo Classes of Big Data Hadoop .
In 2012, the University of Idaho Library began implementing VIVO, an open-source Semantic Web application, both as a discovery layer for its fledgling institutional repository and as a database to describe, visualize, and report university research activity. The presenters will detail some of the challenges they encountered developing this resource, while discussing the tools and techniques they used for obtaining, editing, and uploading institutional data into the RDF-based VIVO system.
an introductory course for Librarians on using Big Data and Data Science applications on the field of Library Science. The course is a 2 hour course module for basic fundamentals of applying DS work.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Delivering Micro-Credentials in Technical and Vocational Education and TrainingAG2 Design
Explore how micro-credentials are transforming Technical and Vocational Education and Training (TVET) with this comprehensive slide deck. Discover what micro-credentials are, their importance in TVET, the advantages they offer, and the insights from industry experts. Additionally, learn about the top software applications available for creating and managing micro-credentials. This presentation also includes valuable resources and a discussion on the future of these specialised certifications.
For more detailed information on delivering micro-credentials in TVET, visit this https://tvettrainer.com/delivering-micro-credentials-in-tvet/
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
3. Big Data Roadmap
Big Data – What?
◦ Timeline – Big Data Predictions
◦ Data Explosion
Big Data Myths
Big Data
5Vs of Big Data
Why Big Data
Data as Data Science
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
5. Timeline – Big Data
Predictions
1944- Yale Library in 2040 will have “approximately
200,000,000 Volumes
1961- Scientific Journals will grow exponentially rather than
linearly, doubling every fifteen years and increasing
by a factor of ten during every half-century.
1975- Ministry of Posts and Telecommunications in Japan
introduced words as unifying unit of measurement
1997- First article published by Michael Cox and David
Ellsworth in in the ACM digital library to the term
“Big data.”
Big Data evolved in 1997 and exploded to greater heights in
2010 and become popular in 2012
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
6. DATA EXPLOSION & ATTENTION
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
7. Big Data Explosion
12+ TBs
of tweet data
every day
25+ TBs
of
log data
every day
?TBsof
dataevery
day
2+
billion
people
on the
Web by
end 2011
30 billion RFID
tags today
(1.3B in 2005)
4.6
billion
camera
phones
world
wide
100s of
millions
of GPS
enabled
devices
sold
annually
76 million smart
meters in 2009…
200M by 2014
8. Data Growth – in Units
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
10. Desktop
Hobbyist
The Future?
Internet
Big Data
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks with rice
Terabyte : 2 Container Ships
Petabyte : State full of rice bag
Exabytes : States filled with rice bag
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICE BALL!
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
11. BIG DATA FACTS
Every 2 days we create as much information
as we did from the beginning of time until
2003
Over 90% of all the data in the world was
created in the past 2 years.
It is expected that by 2020 the amount of
digital information in existence will have
grown from 3.2 zettabytes today to 40
zettabytes.
Every minute we send 204 million emails,
generate 1.8 million Facebook likes, send
278 thousand Tweets, and up-load 200,000Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
12. Big Data – Popularity
2010 – Social Networking – Post Internet
era – Facebook – Big Data
2012 – Election of US President – Hot
After 2012 – Industries – Hadoop
Big Data in India
Election process – BJP Govt
Current electioneering scenario
Election Campaign through social media
requires permission - Election
commission
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
13. BIG DATA GOT SO BIG ?
Big data is a commodity as compared
to Gold.
Big data is Hot!Now What?
Businesses Freak Out Over Big Data
– Information Week
Big Data Grows up – Forbes
2012 : The Year of Big
Big Data Powers Revolution in
Decision Making – Wall Street Journal
Business Opportunities in Big Data –
Inc
THE HINDU 2015
www.thehindu.com THE WORLD’S FAVOURITE NEWSPAPER - Since 1879
5
14. BIG DATA MYTHS
Big Data
• New
• Only About Massive Data Volume
• Means Hadoop
• Need A Data Warehouse
• Means Unstructured Data
• for Social Media & Sentiment
Analysis
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
15. What is Big Data?
Lets Us Clarify
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
16. Big Data
Big Data is
A complete subject with tools, techniques
and frameworks.
Technology which deals with large and
complex dataset which are varied in data
format and structures, does not fit into
the memory.
Not about huge volume of data; provide
an opportunity to find new insight into the
existing data and guidelines to capture
and analyze future data
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
17. Big Data : A Definition
Big data is the realization of greater
business intelligence by storing,
processing, and analyzing data that
was previously ignored due to the
limitations of traditional data
management technologies
:Source: Harness the Power of Big Data: The IBM Big Data Platform
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
18. BIG DATA as Platform
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
Source: IBM
20. 4 V‘s of Big Data
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
21. 6Vs of Big Data
Volume
Velocity
Variety
Veracity
Value
Validity
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
22. What is in Big Data ?
Why Big Data Analytics?
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
23. Big Data
Exploration
Find, visualize,
understand all big
data to improve
decision making
Enhanced 360o View
of the Customer
Extend existing customer
views (MDM, CRM, etc) by
incorporating additional
internal and external
information sources
Security/Intelligence
Extension
Lower risk, detect fraud
and monitor cyber security
in real-time
Data Warehouse Augmentation
Integrate big data and data warehouse
capabilities to increase operational
efficiency
Operations Analysis
Analyze a variety of machine
data for improved business results
The 5 Key Big Data Use Cases
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
27. Conventional approaches
RDBMS
OS FILE SYSTEM
SQL QUERIES
CUSTOM FRAMEWORK
* C / C++
* PERL
* PYTHON
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
28. ISSUES IN LEGACY SYSTEMS
Limited Storage Capacity
Limited Processing Capacity
No Scalability
Single point of Failure
Sequential Processing
RDBMSs can handle Structured Data
Requires preprocessing of Data
Information is collected according to
current business needs
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
25
29. Mr. HADOOP says he has a solution to
our BIG problem !
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
27
30. Mr. HADOOP says he has a solution
to our BIG problem !
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
27
31. What is
Apache Hadoop Is A Framework That Allows For The
Distributed Processing Of Large Datasets Across Clusters Of
Commodity Computers Using A Simple Programming Model.
Concept
Moving computation is more efficient than moving
large data
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
31
37. WHAT IS HDFS
Hadoop Distributed File System
Highly Fault tolerant , distributed , reliable , scalable
file system for data storage.
Stores multiple copies of data on different nodes
A File is split up into blocks and stored on multiple
machines
Hadoop cluster typically has a single namenode and no.
of data nodes to form a hadoop cluster.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
38. HDFS BLOCKS
• Files are broken in to large blocks.
Typically 128 MB block size
Blocks are replicated for reliability
One replica on local node
Another replica on a remote rack
Third replica on local rack,
Additional replicas are randomly placed
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
41
39. HDFS BLOCKS contd.,
ADVANTAGES OF HDFS BLOCKS
Fixed Size
Chunk of file < block size : Only needed space
is used.
Eg : 420 MB file is split as
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
42
58. Potential Talent Pool -Big
Data
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
India will require a minimum of 1 lakh data scientists in the next couple
of years in addition to data analysts and data managers to support the
Big Data space.
63. SBI
State Bank of India (SBI) ran its newly
acquired data-mining software recently to
check for purity of data.
Made an interesting find - close to one crore
accountholders have not provided any
nomination for their savings accounts. What
is worse, over half of them are senior
citizens.
To analyse trends in Banks, SBI has hired a
whole team of statisticians and economists.
Identify default patterns, high value
customers.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
64. Big Data Applications - India
Big Data – Elections
SBI uses big data mining to check
defaults
Karnataka Govt – Identify water
leakage
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
65. Big Data Challenges
Privacy Protection
All Big data stages collect, store, process,
knowledge
Integration with enterprise landscape
All systems store data in rdbms,DW
Does not support bulk loading to Big data store
Limited number of analytics from Mahout
Big data technologies lack visualization support
and deliverable methods
Leveraging cloud computing for big data applications
Addressing Real time needs with varied format
and volume Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-