SlideShare a Scribd company logo
BIG DATA : ISSUES, CHALLENGES,
TOOLS AND GOOD PRACTICES
1
MOTIVATION
• Data stores are growing by 50% each year, and that rate of increase
is accelerating[8]
• The type of data is also changing. Over 80% of it will be
unstructured data which does not work well with relational
databases[8]
• The main difficulty is because the volume is increasing rapidly in
comparison to computing resources
2
DEFINING BIG DATA
• It is defined as large amount of data which requires new
technologies and architectures so that it becomes possible to
extract value form it by capturing and analysis process.
• It is a recent upcoming technology that can bring huge benefits to
the business organizations.
3
PROPERTIES OF BIG DATA
• Variety : Data being produced is not only traditional but also semi
structured from various sources.
• Volume : Data is supposed to increase in zetta bytes in near future
• Velocity : Speed of data coming from various sources
• Variability : It considers the inconsistencies of data flow.
• Complexity : It is difficult to link, match cleanse, and transform
data across systems coming from various sources.
• Value : Queries can be run against the data stored to deduct
important results.
4
PROPERTIES OF BIG DATA...
5
RELATED WORK
• Collaborative research on methodologies for big data analysis and
design.[1]
• Databases required for big data [2]
• Architectural considerations for big data [3]
• Concept of big data with market solutions [4]
• Scientific Data Infrastructure (SDI) generic architectural model [5]
• How big data analytics is different from traditional analytics [6]
• Analysis of social media sites like facebook,flickr,google+ [7]
6
IMPORTANCE OF BIG DATA
• Log Storage in IT Industries
– IT industries store large amounts of data as logs to deal with
problems which occur rarely.
– Big data analytics is used on the data to pinpoint the point of
failures
– Traditional Systems are not able to handle these logs.
• Sensor Data
– Massive amount of sensor data is also a big challenge for Big data
7
• Risk Analysis
– It’s important for financial institutions to model data in order to
calculate the risk.
– A lot of potential data is underutilized because of its volume and should
be integrated to determine the risk patterns more accurately
• Social Media
– The largest use of Big data is for social media and customer sentiments
– Keeping an eye on what the customers are saying is like getting a
feedback.
– The customer feedback can then be used to make decisions and add value
to the business
8
BIG DATA CHALLENGES AND ISSUES
• Privacy and Security
– The most important issue with Big data which includes conceptual,
technical as well as legal significance
– The personal information of a person when combined with external
large data sets leads to the inference of new private facts about
that person
– Big data used by law enforcement will increase the chances of
certain tagged people to suffer from adverse consequences .
9
• Data Access and Sharing of Information
– If data is to be used to make accurate decisions in time it becomes
necessary that it should be available in accurate, complete and timely
manner
• Storage and Processing Issues
– Many companies are struggling to store the large amount of data they
are producing
• Outsourcing storage to the cloud may seem like an option but long
upload times and constant updates to the data preclude this
option
– Processing a large amount of data also takes a lot of time
10
• Analytical Challenges
– What if data volume gets so large that we don’t know how to
deal with it
– Does all data need to be stored ?
– Does all data need to be analyzed?
– Which data points are really important ?
– How can data be used to best advantages
• Skill Requirement : Being a new and emerging technology, it needs
to attract organization and youth with diverse new skill sets.
11
• Technical Challenges
– Fault Tolerance
– Scalability
– Quality of Data
– Heterogeneous Data
Ravi 12
TOOLS AND TECHNIQUES AVAILABLE
• Hadoop - is an open source project hosted by Apache Software
Foundation for managing Big data
• Hadoop consists of two main components :
– Hadoop File System (HDFS) which is a distributed file-
system that stores the data on multiple separate servers
(each of which having its own processor(s))
– MapReduce the framework that understands and assigns
work to the nodes in a cluster[9]
13
ADVANTAGES OF HADOOP
• Hadoop provides the following advantages[9]
– Data read/write performance is increased by distributing the
data across the cluster allowing each processor to do work in a
parallel fashion
– It’s scalable, new nodes can be added as needed without making
changes to the existing system
– It’s cost effective because it brings parallel computing to
commodity servers
14
ADVANTAGES OF HADOOP…
– It’s flexible, it can absorb any type of data, structured or not
from any number of sources
– It’s fault tolerant, it handles failures intrinsically by always
storing multiple copies of the data and automatically loading a
copy when a fault is detected
15
HADOOP
• How do you use Hadoop?
– The developer writes a program that conforms to the MapReduce
programming model
– The developer specifies the format of the data to be processed in
their program
16
HADOOP
• How does MapReduce work?[10]
– Each Hadoop program performs two tasks:
• Map - Breaks all of the data down into key/value pairs
• Reduce - Takes the output from the map step as input and
combines those data key/value pairs into a smaller set of
key/value pairs
17
MAP REDUCE - EXAMPLE
• MapReduce example[10]: Assume you have five files, and each file
contains two columns that represent a city and the corresponding
temperature recorded in that city for the various measurement days
– Toronto, 20 , New York, 22, Rome, 32 , Toronto, 4, Rome, 33 ,New
York, 18
• We want to find the maximum temperature for each city across all of
the data files
• Then we create five map tasks, where each mapper works on one of the
five files and the mapper task goes through the data and returns the
maximum temperature for each city
– Which results in: (Toronto, 20) (New York, 22) (Rome, 33)
18
MAP REDUCE – EXAMPLE…
• Let’s assume the other four mapper tasks (working on the other four
files not shown here) produced the following intermediate results:
– (Toronto, 18) (New York, 32) (Rome, 37)(Toronto, 32) (New York,
33) (Rome, 38)(Toronto, 22) (New York, 20) (Rome, 31)(Toronto,
31) (New York, 19) (Rome, 30)
• All five of these output streams would be fed into the reduce tasks,
which combines the input results and outputs a single value for each
city, producing a final result set as follows:
– (Toronto, 32) (New York, 33) (Rome, 38)
19
BIG DATA – GOOD PRACTICES
• Creating dimensions of all the data being stored is good practice.
• All the dimensions should have durable surrogate keys that can’t be
changed and are unique.
• Expect to integrate structured and unstructured data
• Generality of technology is needed. Building it around key value pairs
work.
20
BIG DATA – GOOD PRACTICES…
• As value of big data becomes more apparent, privacy concerns grow.
• Data quality needs to be better.
• Limit on scalability of records.
• Business and IT leaders should work together to create more value
from data.
• Investment in data quality and metadata reduces processing time.
21
CONCLUSIONS
• New concept of big data, its importance and existing projects.
• Many challenges and issues exist which need to be brought up.
• Big data will help business grow.
• Hadoop Tool
22
REFERENCES
• [1] Stephen Kaisler, Frank Armour, J. Alberto Espinosa, William
Money,“Big Data: Issues and Challenges Moving Forward”, IEEE, 46th
Hawaii International Conference on System Sciences, 2013.
• [2] Sam Madden, “ From Databases to Big Data”, IEEE, Internet
Computing, May-June 2012.
• [3] Kapil Bakshi, “Considerations for Big Data: Architecture and
Approach”,IEEE , Aerospace Conference, 2012.
• [4] Sachchidanand Singh, Nirmala Singh, “Big Data Analytics”,
IEEE,International Conference on Communication, Information &
Computing Technology (ICCICT), Oct. 19-20, 2012.
• [5] Yuri Demchenko, Zhiming Zhao, Paola Grosso, Adianto Wibisono,
Cees de Laat, “Addressing Big Data Challenges for Scientific Data
Infrastructure”, IEEE , 4th International Conference on Cloud
Computing Technology and Science, 2012.
23
REFERENCES...
• [6] Martin Courtney, “The Larging-up of Big Data”, IEEE, Engineering
& Technology, September 2012.
• [7] Matthew Smith, Christian Szongott, Benjamin Henne, Gabriele von
Voigt, “Big Data Privacy Issues in Public Social Media”, IEEE, 6th
International Conference on Digital Ecosystems Technologies (DEST),
18-20 June 2012.
• [8] Why Every Database Must Be Broken Soon
https://blogs.vmware.com/vfabric/2013/03/why-every-database-
must-be-broken-soon.html
• [9] What is Hadoop? . http://www-
01.ibm.com/software/data/infosphere/hadoop/
• [10] What is MapReduce? http://www-
01.ibm.com/software/data/infosphere/hadoop/mapreduce
24
THANK YOU.
25

More Related Content

What's hot

Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data TechnologiesDATAVERSITY
 
Big data
Big dataBig data
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
Rupen Momaya
 
Big Data
Big DataBig Data
Big Data
Neha Mehta
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Yash Raj
 
Big data tools
Big data toolsBig data tools
Big data tools
Novita Sari
 
The importance of data
The importance of dataThe importance of data
The importance of data
APNIC
 
Big data
Big dataBig data
Big data
Pooja Shah
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
Richard Vidgen
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
Vedanand Singh
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
KARTIKEY TRIPATHI
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan
 
big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular network
shubham patil
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
James Hendler
 
big data
big data big data
big data
subhakirthi
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
Poonam Kshirsagar
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
Alex Meadows
 

What's hot (20)

Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data Technologies
 
Big data
Big dataBig data
Big data
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
 
Big Data
Big DataBig Data
Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data tools
Big data toolsBig data tools
Big data tools
 
The importance of data
The importance of dataThe importance of data
The importance of data
 
Big data
Big dataBig data
Big data
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big data
Big dataBig data
Big data
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular network
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
big data
big data big data
big data
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 

Viewers also liked

Reglamento participación ciudadana
Reglamento participación ciudadanaReglamento participación ciudadana
Reglamento participación ciudadana
Francisco Navarro Garcia
 
Mod convenio
Mod convenioMod convenio
Presupuesto
PresupuestoPresupuesto
Garagino doc
Garagino docGaragino doc
Garagino doc
Arduino Aficionado
 
Stenogr
StenogrStenogr
Stenogr
Nova Gromada
 
Reglamento de Participación Ciudadana
Reglamento de Participación CiudadanaReglamento de Participación Ciudadana
Reglamento de Participación Ciudadana
Francisco Navarro Garcia
 
Xm lquickref
Xm lquickrefXm lquickref
Xm lquickref
Arduino Aficionado
 
exprimer-la-certitude-et-le-doute
 exprimer-la-certitude-et-le-doute exprimer-la-certitude-et-le-doute
exprimer-la-certitude-et-le-doute
antorome
 
L’habitat intermédiaire
L’habitat   intermédiaire L’habitat   intermédiaire
L’habitat intermédiaire
Sami Sahli
 
Dia de la Mà Vermella - Nens Soldats
Dia de la Mà Vermella - Nens SoldatsDia de la Mà Vermella - Nens Soldats
Dia de la Mà Vermella - Nens SoldatsMans Unides ONG
 
Présentation de projet urbain
Présentation de projet urbainPrésentation de projet urbain
Présentation de projet urbain
Sami Sahli
 
Wharton study on_income_annuities (1)
Wharton study on_income_annuities (1)Wharton study on_income_annuities (1)
Wharton study on_income_annuities (1)Bryan Daly
 
Hamma les annasser. au 01
Hamma   les annasser. au 01Hamma   les annasser. au 01
Hamma les annasser. au 01
Sami Sahli
 
Tracking: Cookies vs. cookieless Tracking
Tracking: Cookies vs. cookieless TrackingTracking: Cookies vs. cookieless Tracking
Tracking: Cookies vs. cookieless Tracking
Jan Berens
 

Viewers also liked (18)

Reglamento participación ciudadana
Reglamento participación ciudadanaReglamento participación ciudadana
Reglamento participación ciudadana
 
Mod convenio
Mod convenioMod convenio
Mod convenio
 
Presupuesto
PresupuestoPresupuesto
Presupuesto
 
Garagino doc
Garagino docGaragino doc
Garagino doc
 
Stenogr
StenogrStenogr
Stenogr
 
Reglamento de Participación Ciudadana
Reglamento de Participación CiudadanaReglamento de Participación Ciudadana
Reglamento de Participación Ciudadana
 
Xm lquickref
Xm lquickrefXm lquickref
Xm lquickref
 
virtual resume
virtual resumevirtual resume
virtual resume
 
Convenio
ConvenioConvenio
Convenio
 
Puestos
PuestosPuestos
Puestos
 
exprimer-la-certitude-et-le-doute
 exprimer-la-certitude-et-le-doute exprimer-la-certitude-et-le-doute
exprimer-la-certitude-et-le-doute
 
L’habitat intermédiaire
L’habitat   intermédiaire L’habitat   intermédiaire
L’habitat intermédiaire
 
Dia de la Mà Vermella - Nens Soldats
Dia de la Mà Vermella - Nens SoldatsDia de la Mà Vermella - Nens Soldats
Dia de la Mà Vermella - Nens Soldats
 
Présentation de projet urbain
Présentation de projet urbainPrésentation de projet urbain
Présentation de projet urbain
 
Wharton study on_income_annuities (1)
Wharton study on_income_annuities (1)Wharton study on_income_annuities (1)
Wharton study on_income_annuities (1)
 
Hamma les annasser. au 01
Hamma   les annasser. au 01Hamma   les annasser. au 01
Hamma les annasser. au 01
 
Tracking: Cookies vs. cookieless Tracking
Tracking: Cookies vs. cookieless TrackingTracking: Cookies vs. cookieless Tracking
Tracking: Cookies vs. cookieless Tracking
 
Spuren im Internet
Spuren im Internet Spuren im Internet
Spuren im Internet
 

Similar to Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01

Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
ANAND PRAKASH
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Roi Blanco
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
TanguturiAvinash
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
ahmedibrahimghnnam01
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
Ridwan Fadjar
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
Anant Corporation
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
SpringPeople
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Vladi Vexler
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
Nagarjuna D.N
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Denodo
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
VIJAYAPRABAP
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
infinix8
 

Similar to Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01 (20)

Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 

More from Soujanya V

Decision tree
Decision treeDecision tree
Decision tree
Soujanya V
 
Asymptotic analysis
Asymptotic analysisAsymptotic analysis
Asymptotic analysis
Soujanya V
 
Implementing java server pages standard tag library v2
Implementing java server pages standard tag library v2Implementing java server pages standard tag library v2
Implementing java server pages standard tag library v2
Soujanya V
 
Filter
FilterFilter
Filter
Soujanya V
 
Load balancing
Load balancingLoad balancing
Load balancing
Soujanya V
 
Implementing jsp tag extensions
Implementing jsp tag extensionsImplementing jsp tag extensions
Implementing jsp tag extensions
Soujanya V
 
Filter
FilterFilter
Filter
Soujanya V
 

More from Soujanya V (7)

Decision tree
Decision treeDecision tree
Decision tree
 
Asymptotic analysis
Asymptotic analysisAsymptotic analysis
Asymptotic analysis
 
Implementing java server pages standard tag library v2
Implementing java server pages standard tag library v2Implementing java server pages standard tag library v2
Implementing java server pages standard tag library v2
 
Filter
FilterFilter
Filter
 
Load balancing
Load balancingLoad balancing
Load balancing
 
Implementing jsp tag extensions
Implementing jsp tag extensionsImplementing jsp tag extensions
Implementing jsp tag extensions
 
Filter
FilterFilter
Filter
 

Recently uploaded

CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 

Recently uploaded (20)

CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 

Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01

  • 1. BIG DATA : ISSUES, CHALLENGES, TOOLS AND GOOD PRACTICES 1
  • 2. MOTIVATION • Data stores are growing by 50% each year, and that rate of increase is accelerating[8] • The type of data is also changing. Over 80% of it will be unstructured data which does not work well with relational databases[8] • The main difficulty is because the volume is increasing rapidly in comparison to computing resources 2
  • 3. DEFINING BIG DATA • It is defined as large amount of data which requires new technologies and architectures so that it becomes possible to extract value form it by capturing and analysis process. • It is a recent upcoming technology that can bring huge benefits to the business organizations. 3
  • 4. PROPERTIES OF BIG DATA • Variety : Data being produced is not only traditional but also semi structured from various sources. • Volume : Data is supposed to increase in zetta bytes in near future • Velocity : Speed of data coming from various sources • Variability : It considers the inconsistencies of data flow. • Complexity : It is difficult to link, match cleanse, and transform data across systems coming from various sources. • Value : Queries can be run against the data stored to deduct important results. 4
  • 5. PROPERTIES OF BIG DATA... 5
  • 6. RELATED WORK • Collaborative research on methodologies for big data analysis and design.[1] • Databases required for big data [2] • Architectural considerations for big data [3] • Concept of big data with market solutions [4] • Scientific Data Infrastructure (SDI) generic architectural model [5] • How big data analytics is different from traditional analytics [6] • Analysis of social media sites like facebook,flickr,google+ [7] 6
  • 7. IMPORTANCE OF BIG DATA • Log Storage in IT Industries – IT industries store large amounts of data as logs to deal with problems which occur rarely. – Big data analytics is used on the data to pinpoint the point of failures – Traditional Systems are not able to handle these logs. • Sensor Data – Massive amount of sensor data is also a big challenge for Big data 7
  • 8. • Risk Analysis – It’s important for financial institutions to model data in order to calculate the risk. – A lot of potential data is underutilized because of its volume and should be integrated to determine the risk patterns more accurately • Social Media – The largest use of Big data is for social media and customer sentiments – Keeping an eye on what the customers are saying is like getting a feedback. – The customer feedback can then be used to make decisions and add value to the business 8
  • 9. BIG DATA CHALLENGES AND ISSUES • Privacy and Security – The most important issue with Big data which includes conceptual, technical as well as legal significance – The personal information of a person when combined with external large data sets leads to the inference of new private facts about that person – Big data used by law enforcement will increase the chances of certain tagged people to suffer from adverse consequences . 9
  • 10. • Data Access and Sharing of Information – If data is to be used to make accurate decisions in time it becomes necessary that it should be available in accurate, complete and timely manner • Storage and Processing Issues – Many companies are struggling to store the large amount of data they are producing • Outsourcing storage to the cloud may seem like an option but long upload times and constant updates to the data preclude this option – Processing a large amount of data also takes a lot of time 10
  • 11. • Analytical Challenges – What if data volume gets so large that we don’t know how to deal with it – Does all data need to be stored ? – Does all data need to be analyzed? – Which data points are really important ? – How can data be used to best advantages • Skill Requirement : Being a new and emerging technology, it needs to attract organization and youth with diverse new skill sets. 11
  • 12. • Technical Challenges – Fault Tolerance – Scalability – Quality of Data – Heterogeneous Data Ravi 12
  • 13. TOOLS AND TECHNIQUES AVAILABLE • Hadoop - is an open source project hosted by Apache Software Foundation for managing Big data • Hadoop consists of two main components : – Hadoop File System (HDFS) which is a distributed file- system that stores the data on multiple separate servers (each of which having its own processor(s)) – MapReduce the framework that understands and assigns work to the nodes in a cluster[9] 13
  • 14. ADVANTAGES OF HADOOP • Hadoop provides the following advantages[9] – Data read/write performance is increased by distributing the data across the cluster allowing each processor to do work in a parallel fashion – It’s scalable, new nodes can be added as needed without making changes to the existing system – It’s cost effective because it brings parallel computing to commodity servers 14
  • 15. ADVANTAGES OF HADOOP… – It’s flexible, it can absorb any type of data, structured or not from any number of sources – It’s fault tolerant, it handles failures intrinsically by always storing multiple copies of the data and automatically loading a copy when a fault is detected 15
  • 16. HADOOP • How do you use Hadoop? – The developer writes a program that conforms to the MapReduce programming model – The developer specifies the format of the data to be processed in their program 16
  • 17. HADOOP • How does MapReduce work?[10] – Each Hadoop program performs two tasks: • Map - Breaks all of the data down into key/value pairs • Reduce - Takes the output from the map step as input and combines those data key/value pairs into a smaller set of key/value pairs 17
  • 18. MAP REDUCE - EXAMPLE • MapReduce example[10]: Assume you have five files, and each file contains two columns that represent a city and the corresponding temperature recorded in that city for the various measurement days – Toronto, 20 , New York, 22, Rome, 32 , Toronto, 4, Rome, 33 ,New York, 18 • We want to find the maximum temperature for each city across all of the data files • Then we create five map tasks, where each mapper works on one of the five files and the mapper task goes through the data and returns the maximum temperature for each city – Which results in: (Toronto, 20) (New York, 22) (Rome, 33) 18
  • 19. MAP REDUCE – EXAMPLE… • Let’s assume the other four mapper tasks (working on the other four files not shown here) produced the following intermediate results: – (Toronto, 18) (New York, 32) (Rome, 37)(Toronto, 32) (New York, 33) (Rome, 38)(Toronto, 22) (New York, 20) (Rome, 31)(Toronto, 31) (New York, 19) (Rome, 30) • All five of these output streams would be fed into the reduce tasks, which combines the input results and outputs a single value for each city, producing a final result set as follows: – (Toronto, 32) (New York, 33) (Rome, 38) 19
  • 20. BIG DATA – GOOD PRACTICES • Creating dimensions of all the data being stored is good practice. • All the dimensions should have durable surrogate keys that can’t be changed and are unique. • Expect to integrate structured and unstructured data • Generality of technology is needed. Building it around key value pairs work. 20
  • 21. BIG DATA – GOOD PRACTICES… • As value of big data becomes more apparent, privacy concerns grow. • Data quality needs to be better. • Limit on scalability of records. • Business and IT leaders should work together to create more value from data. • Investment in data quality and metadata reduces processing time. 21
  • 22. CONCLUSIONS • New concept of big data, its importance and existing projects. • Many challenges and issues exist which need to be brought up. • Big data will help business grow. • Hadoop Tool 22
  • 23. REFERENCES • [1] Stephen Kaisler, Frank Armour, J. Alberto Espinosa, William Money,“Big Data: Issues and Challenges Moving Forward”, IEEE, 46th Hawaii International Conference on System Sciences, 2013. • [2] Sam Madden, “ From Databases to Big Data”, IEEE, Internet Computing, May-June 2012. • [3] Kapil Bakshi, “Considerations for Big Data: Architecture and Approach”,IEEE , Aerospace Conference, 2012. • [4] Sachchidanand Singh, Nirmala Singh, “Big Data Analytics”, IEEE,International Conference on Communication, Information & Computing Technology (ICCICT), Oct. 19-20, 2012. • [5] Yuri Demchenko, Zhiming Zhao, Paola Grosso, Adianto Wibisono, Cees de Laat, “Addressing Big Data Challenges for Scientific Data Infrastructure”, IEEE , 4th International Conference on Cloud Computing Technology and Science, 2012. 23
  • 24. REFERENCES... • [6] Martin Courtney, “The Larging-up of Big Data”, IEEE, Engineering & Technology, September 2012. • [7] Matthew Smith, Christian Szongott, Benjamin Henne, Gabriele von Voigt, “Big Data Privacy Issues in Public Social Media”, IEEE, 6th International Conference on Digital Ecosystems Technologies (DEST), 18-20 June 2012. • [8] Why Every Database Must Be Broken Soon https://blogs.vmware.com/vfabric/2013/03/why-every-database- must-be-broken-soon.html • [9] What is Hadoop? . http://www- 01.ibm.com/software/data/infosphere/hadoop/ • [10] What is MapReduce? http://www- 01.ibm.com/software/data/infosphere/hadoop/mapreduce 24