Big Data & Hadoop
This document discusses big data and the Hadoop framework. It defines big data as data that is beyond storage and processing capabilities due to its huge size. Most data today is unstructured from sources like social media, sensors, and online activities. Hadoop was developed to address challenges of storing and processing this large, unstructured data across clusters of commodity servers. It uses HDFS for distributed storage and MapReduce as a processing technique to parallelize jobs across nodes for faster completion. Hadoop was created based on Google's research and allows for petabytes of data to be efficiently stored and processed.
Roots tech 2013 Big Data at Ancestry (3-22-2013) - no animationsWilliam Yetman
This was one of my first presentations on Big Data at Ancestry.com. The audience was split between Family Historians interested in the Technology and Developers interested in our Big Data Story. So the presentation is a mix. I think there is plenty for a someone with an interest in technology and enough meat for a "technologist".
Keep this in mind as you look at this presentation.
Thanks,
-Bill-
10 Popular Hadoop Technical Interview QuestionsZaranTech LLC
The document discusses 10 common questions that may be asked in a Hadoop technical interview. It provides definitions for big data and the four V's of big data (volume, variety, veracity, velocity). It also discusses how businesses use big data analytics to increase revenue, examples of companies that use Hadoop, the difference between structured and unstructured data, the concepts that Hadoop works on (HDFS and MapReduce), core Hadoop components, hardware requirements for running Hadoop, common input formats, and some common Hadoop tools. Overall, the document outlines essential information about big data and Hadoop that may be helpful to review for a technical interview.
This document provides an introduction to big data concepts. It discusses how traditional architectures have limitations for solving problems involving large weekly increases in data in the petabyte range from diverse sources. Apache Hadoop is presented as a solution using a clustered architecture that is scalable, flexible and cost-efficient. Key aspects of Hadoop covered include its use of commodity hardware, storage of data across clusters of nodes, and benchmarks for sorting large datasets efficiently.
This document provides an overview of installing and configuring Apache Hadoop. It begins with background on big data and Hadoop, including definitions of big data, the Hadoop ecosystem, and differences between Hadoop 1.0 and 2.0. It then discusses installing Hadoop, describing the steps to set up a Cloudera cluster on Amazon Web Services and requirements for installing Cloudera Manager. The document concludes with mentioning a lab to set up a Cloudera cluster on AWS.
This document provides an overview and objectives of a Python course for big data analytics. It discusses why Python is well-suited for big data tasks due to its libraries like PyDoop and SciPy. The course includes demonstrations of web scraping using Beautiful Soup, collecting tweets using APIs, and running word count on Hadoop using Pydoop. It also discusses how Python supports key aspects of data science like accessing, analyzing, and visualizing large datasets.
This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data.
This course is available on hadoop-skills.com for free!
This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through:
• This course builds Understanding of Big Data problems with easy to understand examples and illustrations.
• History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop and was called Nutch
• What is Hadoop Magic which makes it so unique and powerful.
• Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role.
• And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them.
This course is available for free on hadoop-skills.com
Roots tech 2013 Big Data at Ancestry (3-22-2013) - no animationsWilliam Yetman
This was one of my first presentations on Big Data at Ancestry.com. The audience was split between Family Historians interested in the Technology and Developers interested in our Big Data Story. So the presentation is a mix. I think there is plenty for a someone with an interest in technology and enough meat for a "technologist".
Keep this in mind as you look at this presentation.
Thanks,
-Bill-
10 Popular Hadoop Technical Interview QuestionsZaranTech LLC
The document discusses 10 common questions that may be asked in a Hadoop technical interview. It provides definitions for big data and the four V's of big data (volume, variety, veracity, velocity). It also discusses how businesses use big data analytics to increase revenue, examples of companies that use Hadoop, the difference between structured and unstructured data, the concepts that Hadoop works on (HDFS and MapReduce), core Hadoop components, hardware requirements for running Hadoop, common input formats, and some common Hadoop tools. Overall, the document outlines essential information about big data and Hadoop that may be helpful to review for a technical interview.
This document provides an introduction to big data concepts. It discusses how traditional architectures have limitations for solving problems involving large weekly increases in data in the petabyte range from diverse sources. Apache Hadoop is presented as a solution using a clustered architecture that is scalable, flexible and cost-efficient. Key aspects of Hadoop covered include its use of commodity hardware, storage of data across clusters of nodes, and benchmarks for sorting large datasets efficiently.
This document provides an overview of installing and configuring Apache Hadoop. It begins with background on big data and Hadoop, including definitions of big data, the Hadoop ecosystem, and differences between Hadoop 1.0 and 2.0. It then discusses installing Hadoop, describing the steps to set up a Cloudera cluster on Amazon Web Services and requirements for installing Cloudera Manager. The document concludes with mentioning a lab to set up a Cloudera cluster on AWS.
This document provides an overview and objectives of a Python course for big data analytics. It discusses why Python is well-suited for big data tasks due to its libraries like PyDoop and SciPy. The course includes demonstrations of web scraping using Beautiful Soup, collecting tweets using APIs, and running word count on Hadoop using Pydoop. It also discusses how Python supports key aspects of data science like accessing, analyzing, and visualizing large datasets.
This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data.
This course is available on hadoop-skills.com for free!
This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through:
• This course builds Understanding of Big Data problems with easy to understand examples and illustrations.
• History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop and was called Nutch
• What is Hadoop Magic which makes it so unique and powerful.
• Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role.
• And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them.
This course is available for free on hadoop-skills.com
1. The document discusses big data problems faced by various domains like science, government, and private organizations.
2. It defines big data based on the 3Vs - volume, velocity, and variety. Volume alone is not sufficient, and these factors must be considered together.
3. Traditional databases are not suitable for big data problems due to issues with scalability, structure of data, and hardware limitations. Distributed file systems like Hadoop are better solutions as they can handle large and varied datasets across multiple nodes.
This document provides an overview of big data and Hadoop. It defines big data as large volumes of unstructured data that are too costly and time-consuming to load into traditional databases. It notes that big data comes from various sources like web data, social networks, and sensor data. The challenges of big data include slow disk speeds and the need for parallel processing. Hadoop is introduced as an open-source framework that uses HDFS for storage across clusters and MapReduce for parallel processing of large datasets. Key aspects of HDFS and MapReduce are summarized.
1. Hadoop is a software platform that allows for the distributed storage and processing of extremely large datasets across clusters of commodity hardware.
2. It addresses problems like parallel processing, fault tolerance, and scalability to reliably handle data at the petabyte scale.
3. Using Hadoop's core components - the Hadoop Distributed File System (HDFS) for storage and MapReduce for processing - it can efficiently distribute data and computation across large clusters to enable analysis of big data.
This document discusses big data and Hadoop frameworks for managing large volumes of data. It begins with an overview of how data generation has increased exponentially from employees to users to machines. Next, it discusses the history of big data technologies like Google File System and MapReduce, which were combined to create Hadoop. The document then covers sources of big data, challenges of big data, and how Hadoop provides a solution through distributed processing and its core components like HDFS and MapReduce. Finally, data processing techniques with traditional databases versus Hadoop are compared.
1. Big data refers to large and complex datasets that are difficult to process using traditional database and software techniques.
2. Hadoop is an open-source software platform that allows distributed processing of large datasets across clusters of computers. It solves the problems of big data by dividing it across nodes and processing it in parallel using MapReduce.
3. Hadoop provides reliable and scalable storage of big data using HDFS and efficient parallel processing of that data using MapReduce, allowing organizations to gain insights from large and diverse datasets.
This document defines key terms related to big data such as structured data, unstructured data, and semi-structured data. It discusses how data is generated from various sources and factors like sensors, social networks, and online shopping. It explains that big data refers to data that is too large to process using traditional methods due to its volume, velocity, and variety. Hadoop is introduced as an open source framework that uses HDFS for distributed storage and MapReduce for distributed processing of large data sets across computer clusters.
Big data refers to the enormous amounts of data being collected from people's online activities every day. This data is stored in large data centers that contain shelves of high-capacity storage drives to house people's data from around the world in a temperature-controlled environment. Hadoop and MapReduce are key applications used to analyze this big data. Hadoop uses a master-slave setup to distribute data across nodes for storage, while MapReduce uses mappers to break data into chunks and reducers to process the mapper output and generate new results. Together, these tools help analyze the vast amounts of collected data to provide useful insights.
This document provides an overview of Hadoop, a tool for processing large datasets across clusters of computers. It discusses why big data has become so large, including exponential growth in data from the internet and machines. It describes how Hadoop uses HDFS for reliable storage across nodes and MapReduce for parallel processing. The document traces the history of Hadoop from its origins in Google's file system GFS and MapReduce framework. It provides brief explanations of how HDFS and MapReduce work at a high level.
1) The document discusses big data, including how it is defined, challenges of working with large datasets, and solutions like Hadoop.
2) It explains that big data refers to datasets that are too large to be handled by traditional database tools due to their scale, diversity and complexity. Hadoop is presented as a solution for reliably storing and processing big data across clusters of commodity servers.
3) Benefits of analyzing big data are outlined, such as gaining insights, competitive advantages and better decision making. Applications of big data analytics are also mentioned in areas like healthcare, security, manufacturing and more.
Big Data may well be the Next Big Thing in the IT world. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
This document discusses big data, including what it is, common data sources, its volume, velocity and variety characteristics, solutions like Hadoop and its HDFS and MapReduce components, and the impact and future of big data. It explains that big data refers to large and complex datasets that are difficult to process using traditional tools. Hadoop provides a framework to store and process big data across clusters of commodity hardware.
Hadoop is an open source framework that allows for the distributed processing of large data sets across clusters of commodity hardware. It was designed to scale from terabytes to petabytes of data and to handle both structured and unstructured data. Hadoop uses a programming model called MapReduce that partitions work across nodes in a cluster. It is not a replacement for a relational database as it is designed for batch processing large volumes of data rather than transactional workloads or business intelligence queries. Big data refers to the large and growing volumes of structured, semi-structured and unstructured data that are beyond the ability of traditional databases to capture, manage, and process. Examples of big data sources include social media, sensors, and internet activity,
What is Bigdata
Sources of Bigdata
What can be done with Big data?
Handling Bigdata
MapReduce
What is Hadoop?
Why Hadoop is Useful?
Other big data use cases
At Spotify we collect huge volumes of data for many purposes. Reporting to labels, powering our product features, and analyzing user growth are some of our most common ones. Additionally, we collect many operational metrics related to the responsiveness, utilization and capacity of our servers. To store and process this data, we use scalable and fault-tolerant multi-system infrastructure, and Apache Hadoop is a key part of it. Surprisingly or not, Apache Hadoop generates large amounts of data in the form of logs and metrics that describe its behaviour and performance. To process this data in a scalable and performant manner we use … also Hadoop! During this presentation, I will talk about how we analyze various logs generated by Apache Hadoop using custom scripts (written in Pig or Java/Python MapReduce) and available open-source tools to get data-driven answers to many questions related to the behaviour of our 690-node Hadoop cluster. At Spotify we frequently leverage these tools to learn how fast we are growing, when to buy new nodes, how to calculate the empirical retention policy for each dataset, optimize the scheduler, benchmark the cluster, find its biggest offenders (both people and datasets) and more.
Big Data Analytics & Trends Presentation discusses what big data is, why it's important, definitions of big data, data types and landscape, characteristics of big data like volume, velocity and variety. It covers data generation points, big data analytics, example scenarios, challenges of big data like storage and processing speed, and Hadoop as a framework to solve these challenges. The presentation differentiates between big data and data science, discusses salary trends in Hadoop/big data, and future growth of the big data market.
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
This document discusses big data, where it comes from, and how it is processed and analyzed. It notes that everything we do online now leaves a digital trace as data. This "big data" includes huge volumes of structured, semi-structured, and unstructured data from various sources like social media, sensors, and the internet of things. Traditional computing cannot handle such large datasets, so technologies like MapReduce, Hadoop, HDFS, and NoSQL databases were developed to distribute the work across clusters of machines and process the data in parallel.
The document provides an introduction to the concepts of big data and how it can be analyzed. It discusses how traditional tools cannot handle large data files exceeding gigabytes in size. It then introduces the concepts of distributed computing using MapReduce and the Hadoop framework. Hadoop makes it possible to easily store and process very large datasets across a cluster of commodity servers. It also discusses programming interfaces like Hive and Pig that simplify writing MapReduce programs without needing to use Java.
This document discusses big data and the Hadoop framework. It begins by defining big data as data that is too large to fit in a typical database or storage system. It then discusses how big data is generated in large volumes from various sources. The document introduces Hadoop as an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware. It describes the key components of Hadoop, including the Hadoop Distributed File System (HDFS) for storage, and MapReduce as a programming model for processing and generating outputs from large datasets in a distributed manner.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
1. The document discusses big data problems faced by various domains like science, government, and private organizations.
2. It defines big data based on the 3Vs - volume, velocity, and variety. Volume alone is not sufficient, and these factors must be considered together.
3. Traditional databases are not suitable for big data problems due to issues with scalability, structure of data, and hardware limitations. Distributed file systems like Hadoop are better solutions as they can handle large and varied datasets across multiple nodes.
This document provides an overview of big data and Hadoop. It defines big data as large volumes of unstructured data that are too costly and time-consuming to load into traditional databases. It notes that big data comes from various sources like web data, social networks, and sensor data. The challenges of big data include slow disk speeds and the need for parallel processing. Hadoop is introduced as an open-source framework that uses HDFS for storage across clusters and MapReduce for parallel processing of large datasets. Key aspects of HDFS and MapReduce are summarized.
1. Hadoop is a software platform that allows for the distributed storage and processing of extremely large datasets across clusters of commodity hardware.
2. It addresses problems like parallel processing, fault tolerance, and scalability to reliably handle data at the petabyte scale.
3. Using Hadoop's core components - the Hadoop Distributed File System (HDFS) for storage and MapReduce for processing - it can efficiently distribute data and computation across large clusters to enable analysis of big data.
This document discusses big data and Hadoop frameworks for managing large volumes of data. It begins with an overview of how data generation has increased exponentially from employees to users to machines. Next, it discusses the history of big data technologies like Google File System and MapReduce, which were combined to create Hadoop. The document then covers sources of big data, challenges of big data, and how Hadoop provides a solution through distributed processing and its core components like HDFS and MapReduce. Finally, data processing techniques with traditional databases versus Hadoop are compared.
1. Big data refers to large and complex datasets that are difficult to process using traditional database and software techniques.
2. Hadoop is an open-source software platform that allows distributed processing of large datasets across clusters of computers. It solves the problems of big data by dividing it across nodes and processing it in parallel using MapReduce.
3. Hadoop provides reliable and scalable storage of big data using HDFS and efficient parallel processing of that data using MapReduce, allowing organizations to gain insights from large and diverse datasets.
This document defines key terms related to big data such as structured data, unstructured data, and semi-structured data. It discusses how data is generated from various sources and factors like sensors, social networks, and online shopping. It explains that big data refers to data that is too large to process using traditional methods due to its volume, velocity, and variety. Hadoop is introduced as an open source framework that uses HDFS for distributed storage and MapReduce for distributed processing of large data sets across computer clusters.
Big data refers to the enormous amounts of data being collected from people's online activities every day. This data is stored in large data centers that contain shelves of high-capacity storage drives to house people's data from around the world in a temperature-controlled environment. Hadoop and MapReduce are key applications used to analyze this big data. Hadoop uses a master-slave setup to distribute data across nodes for storage, while MapReduce uses mappers to break data into chunks and reducers to process the mapper output and generate new results. Together, these tools help analyze the vast amounts of collected data to provide useful insights.
This document provides an overview of Hadoop, a tool for processing large datasets across clusters of computers. It discusses why big data has become so large, including exponential growth in data from the internet and machines. It describes how Hadoop uses HDFS for reliable storage across nodes and MapReduce for parallel processing. The document traces the history of Hadoop from its origins in Google's file system GFS and MapReduce framework. It provides brief explanations of how HDFS and MapReduce work at a high level.
1) The document discusses big data, including how it is defined, challenges of working with large datasets, and solutions like Hadoop.
2) It explains that big data refers to datasets that are too large to be handled by traditional database tools due to their scale, diversity and complexity. Hadoop is presented as a solution for reliably storing and processing big data across clusters of commodity servers.
3) Benefits of analyzing big data are outlined, such as gaining insights, competitive advantages and better decision making. Applications of big data analytics are also mentioned in areas like healthcare, security, manufacturing and more.
Big Data may well be the Next Big Thing in the IT world. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
This document discusses big data, including what it is, common data sources, its volume, velocity and variety characteristics, solutions like Hadoop and its HDFS and MapReduce components, and the impact and future of big data. It explains that big data refers to large and complex datasets that are difficult to process using traditional tools. Hadoop provides a framework to store and process big data across clusters of commodity hardware.
Hadoop is an open source framework that allows for the distributed processing of large data sets across clusters of commodity hardware. It was designed to scale from terabytes to petabytes of data and to handle both structured and unstructured data. Hadoop uses a programming model called MapReduce that partitions work across nodes in a cluster. It is not a replacement for a relational database as it is designed for batch processing large volumes of data rather than transactional workloads or business intelligence queries. Big data refers to the large and growing volumes of structured, semi-structured and unstructured data that are beyond the ability of traditional databases to capture, manage, and process. Examples of big data sources include social media, sensors, and internet activity,
What is Bigdata
Sources of Bigdata
What can be done with Big data?
Handling Bigdata
MapReduce
What is Hadoop?
Why Hadoop is Useful?
Other big data use cases
At Spotify we collect huge volumes of data for many purposes. Reporting to labels, powering our product features, and analyzing user growth are some of our most common ones. Additionally, we collect many operational metrics related to the responsiveness, utilization and capacity of our servers. To store and process this data, we use scalable and fault-tolerant multi-system infrastructure, and Apache Hadoop is a key part of it. Surprisingly or not, Apache Hadoop generates large amounts of data in the form of logs and metrics that describe its behaviour and performance. To process this data in a scalable and performant manner we use … also Hadoop! During this presentation, I will talk about how we analyze various logs generated by Apache Hadoop using custom scripts (written in Pig or Java/Python MapReduce) and available open-source tools to get data-driven answers to many questions related to the behaviour of our 690-node Hadoop cluster. At Spotify we frequently leverage these tools to learn how fast we are growing, when to buy new nodes, how to calculate the empirical retention policy for each dataset, optimize the scheduler, benchmark the cluster, find its biggest offenders (both people and datasets) and more.
Big Data Analytics & Trends Presentation discusses what big data is, why it's important, definitions of big data, data types and landscape, characteristics of big data like volume, velocity and variety. It covers data generation points, big data analytics, example scenarios, challenges of big data like storage and processing speed, and Hadoop as a framework to solve these challenges. The presentation differentiates between big data and data science, discusses salary trends in Hadoop/big data, and future growth of the big data market.
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
This document discusses big data, where it comes from, and how it is processed and analyzed. It notes that everything we do online now leaves a digital trace as data. This "big data" includes huge volumes of structured, semi-structured, and unstructured data from various sources like social media, sensors, and the internet of things. Traditional computing cannot handle such large datasets, so technologies like MapReduce, Hadoop, HDFS, and NoSQL databases were developed to distribute the work across clusters of machines and process the data in parallel.
The document provides an introduction to the concepts of big data and how it can be analyzed. It discusses how traditional tools cannot handle large data files exceeding gigabytes in size. It then introduces the concepts of distributed computing using MapReduce and the Hadoop framework. Hadoop makes it possible to easily store and process very large datasets across a cluster of commodity servers. It also discusses programming interfaces like Hive and Pig that simplify writing MapReduce programs without needing to use Java.
This document discusses big data and the Hadoop framework. It begins by defining big data as data that is too large to fit in a typical database or storage system. It then discusses how big data is generated in large volumes from various sources. The document introduces Hadoop as an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware. It describes the key components of Hadoop, including the Hadoop Distributed File System (HDFS) for storage, and MapReduce as a programming model for processing and generating outputs from large datasets in a distributed manner.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Big data & hadoop Introduction
1. Big Data & Hadoop
By: Mohit Shukla
Email: mohit.shukla@walkwel.in
Software Engineer
2. Big Data
❏ In today’s modern world we are surrounded by the data.
❏ What we are doing, we are storing the data and processing the data.
❏ So what is big data ?
❏ The data which is beyond the storage capacity and beyond the processing
power is called big data.
❏ This data is generated from different resource like Sensors, cc cam , social
networks like facebook, twitter, online shopping, airline etc.
❏ By these factors we are getting the huge data.
3. Big Data
100 % data
90 % data
Generated from last 2 years
10 % data
Generated from the starting
4. Big Data
System variations from the few years
Yr : 1990 Yr : 2017
HDD Capacity : 1GB- 20 GB
Ram: 64-128 mb
Reading capacity is : 10 kbps
HDD Capacity : min 1Tb
Ram: 4 -16 Gb
Reading capacity is : 100 mbps
Note: from 1990 to 2017 what happen is our HDD capacity is increased by 100 times and similar
the Ram and reading capacity of System
5. Big Data
Challenges
To understand challenges let's take an example
Farmer
Farmer Field
Suppose in 1st year he generated 10 rice packet
2nd -> 20 rice packet
3rd -> 1000 rice pack
Farmer storage Room(800 rice pck)
20 Rice packet
10 Rice pck
6. Big Data
Challenges
● Problem is farmer have a limited storage, if he generated 10-800 rice packed then he can storage it to his storage
room.
● What if he generated 1000 rice packed ?
● well he can’t store it in that room.
● Similar with the data if we have proper storage capacity then we can store it but if we not, than what ?
● Then we have to storage that data in some other place.
● So in farmer case if he generated 1000 rice pck then he don’t have any room to store it so what he do .. he have to go
to some godown or wharehouse. For storage the rice packet in there.
8. Big Data
Challenges
Similar thing happen with the data storage, if we don’t have storage capacity for huge data then we have to go for data
centers to store that data..
Data Centers
What are these data centers: These are the people having servers to store your data they
may be IBM servers, EMC servers or any other. Here we can store our data. And whenever
we want to process it, we can get our data from these data centers to our local system and
then we can process it in our local system.
9. Big Data
Challenges
Why we are storing that data:
● Suppose i have one movie name “Titanic”, if i don’t want to watch that film why should i store it in my system.
● Coze may be some other day i watch that film.
● So if we don’t want to process the data then why should we store it.
● We are storing it because maybe some other day we process that data.
10. Big Data
Challenges
How we can process that data :
By written some code in java,mysql, C# or any other programming language.
Data Centers with 1000
Tb storage
Here i save 100 tb of my data
But want to process 2 tb data
of 100 tb data
To achieve this i write 100kb
of code.
So which is better way now
sending this 100 kb file to
datacenter for processing the
data or getting 2 TB data to
our local system and then
process it ?Size: 100 kb
11. Big Data
Before Hadoop
Obviously sending 100 kb file to data centers is better
● Even it is very easy to send 100 kb file to data center, but we should not send program to our datacenter
● Why it is?
● Before hadoop the basic part is Computation is processor bound.
What is this (Computation is processor bound ) ->
● computation is a program which you wanted to run on your data.
● So what exactly it is ?
● wherever you writing the program for that system only you have to fetch the data and process it on that system.
That is the only technique we have before hadoop. That means we can’t send our program to our datacenter to
process the data.
● what we can do we can fetch 2tb data to our local system and then by our computation program we can process it.
12. Big Data
Data Forms
Three V (Volume, velocity and Variety)
1: volume : (data size GB,TB,Peta Bytes) rapidly increasing in this form.
2: Velocity: This huge data creates some velocity problem too.
3: Variety: data is in different forms.
Note: In present time we have RDBMS for storing the relational data/structure data, so this is the technique by which we only store
our structure data or process structure data.
We can divide data into three forms
● Structured Data
● Unstructured Data
● Semistructured Data
If we are getting 100 % data so in this data 70% - 80% data is unstructured data or semi structured data. Rest 20 - 30% is
structured data.
13. Big Data
Data Forms
Question: How we are generating this unstructured and semi structured data is there any example ?
Answer : Yes
Unstructured data : Facebook videos, images, text message, audio.
Semi structured data: Log files. Suppose i have 2-3 mail account(gamil, yahoo) for all world most of the people have these
account. every account have their log files and store it on there gmail /yahoo servers etc.
4 gmail account * 5 times open a day = generate 20 log files (1 user)
2 yahoo mail * 5 times = 10 log files (1 user)
If i have other account like facebook, google+, instagram then suppose I generate 70 log files in a day.
so what about other people's in the world. These log files have lots of data.
14. Big Data
Definition
What is big data in other term :
Question : Suppose we have a system of 500 gb capacity if i wanted to store 300 gb it is possible.
Answer: Yes
Question: if i wanted to store 400 gb is it possible:
Answer: yes
Question: If i wanted to store 600 gb is it possible
Answer: no
If we can not store, we can not process so “The data which is beyond the storage capacity and beyond the processing power is
called big data.”
15. Big Data
Hadoop
● So simply if we have 500 gb storage capacity and we are storing 300 gb data and want to process it then.
● Question is how much time it takes
● Definitely it may take some time ..
● Here the hadoop comes into picture
Suppose i want to construct my home
One worker
Takes 1 yr to build
16. Big Data
Hadoop
3 worker
Takes 1-2 months to build
So the thing is that if we split the job to multiple guys we can complete that work in less time.
17. Big Data
Hadoop
● As the data is rapidly increasing suppose we get 1 tb data
● Our storage capacity is now 2 tb.
● Then we simply store it but what if we want to process 700 gb of data form that?
● It take much and more time. As we are using single system.
● So what we do?
● Instead of giving this 700 gb of data to one system.
● We simply divide this data and allocate on different system.
● So each and every system is working parallel and give the output in less time.
● Sounds Good
18. Big Data
Hadoop (Dividing work)
We can understand it using an example
Suppose I am in office and my attender has brought 100 files
i can process 50 files
In a day
50 files Pending
i can process 50 files
In a day
150 files Pending
100 files
100 more files
19. Big Data
Hadoop (Dividing work)
So instead of allocating work to single guy we should allocate work to 2-3 guys.
what we are trying to achieve here ?
We are trying to achieve here the processing power, means if we have huge data then we have some methods to process
that data in feasible amount of time.
Hadoop: hadoop know very well how to store huge data and how to process that huge data in less no. of time.
20. Hadoop
History of hadoop
● We all heard about (is a search engine). What it do is, it store data and when we search then it gives
us top results.
● In 1990, Google started working on how to store huge data and how to process it
● Google take 13 years for that and in 2003 he came with and idea
● He gives two things one is GFS (Google file system) and other one is (MapReduce)
● GFS is basically used for storing the huge amount of data
● And MapReduce is a technique by which we can process the data stored in the GFS
● But all of these two technologies are only in white paper it doesn’t implemented by the google
Not then Who ?
● it is also a largest web search engine after google.
● They guys also work on how to store that huge data and process it
● They guys take google white paper and started implemented it and give
● 2006-2007, HDFS (Hadoop distributed file system)
● 2007-2008, MapReduce (processing technique)
● These are two core concepts in hadoop (HDFS and MapReduce)
21. Hadoop
Definition
Who is the inventor of hadoop : Doug Cutting
“Hadoop is a open source framework given by apache software foundation basically overseen by apache software
foundation for storing huge data sets and for processing huge data sets with the cluster of commodity hardware
”
HDFS : Is a technique for storing our huge data with the help of commodity hardware.
MapReduce: Is a technique for processing the data which is stored in HDFS.