SlideShare a Scribd company logo
1 of 6
Download to read offline
Sysfore Technologies
#117-120, First Floor, 4th Block, 80 Feet Road, Koramangala, Bangalore 560034
HADOOP AND BIG DATA
ANALYTICS
Hadoop and Big Data Analytics
Gartner defines Big Data as “high volume, velocity and variety information
assets that demand cost-effective, innovative forms of information processing
for enhanced insight and decision making”.
Big data is data that, by virtue of its velocity, volume, or variety (the three Vs),
cannot be easily stored or analysed with traditional methods.
The term covers each and every piece of data your organization has stored till
now. It includes all the data stored both on-premises or in the cloud. It could
be papers, digital, structured and non-structured data within your company.
There is a deluge of structured and unstructured data that is generated every
second. This is known as Big Data, which can be analysed to help customers
turn that data into insights. AWS provides a broad platform of managed
services, infrastructure and tools to tackle your next Big Data project. It
enables you to collect, store, process, analyse and visualize Big Data on the
cloud. It provides all the hardware, software, infrastructure to maintain and
scale, so that you focus on building your application.
Some of the common Big Data Customer scenarios include Web & E-Tailing,
Telecommunications, Government, Healthcare & Life Science, Bank & Financial
Services and Retail, where Big Data is continuously generated.
How is Big Data consumed by Businesses: Businesses can gain a lot of insight
into how their product is being consumed, by analysing the Big Data
generated. Big Data analytics is an area of rapidly growing diversity. Analysing
large data sets requires significant compute capacity that can vary in size based
on the amount of input data and the analysis required. This characteristic of
big data workloads is ideally suited to the pay-as-you-go cloud computing
model, where applications can easily scale up and down based on demand.
Using Big Data analytics will give you a clear picture about how your data is
being generated and consumed by the customers. It can be used for predictive
marketing and plans to increase your business. It provides:
 Early key indicators, gives insights into business trends resulting in
business fortunes.
 Analytics results in business advantage.
 Get more precise analysis and results with more data.
Limitations of using the traditional analytics methods: The advancements in
technologies has resulted in huge volume of data being generated every
second. Storing, processing, analysing and getting quality results is time
consuming, costly and ineffective in the current scenario.
 Only a limited amount of high fidelity raw data is available for analysis.
 Storage is limited by the high volume of raw data that is continuously
generated.
 Moving data for computation doesn’t scale accordingly.
 Data is archived regularly to conserve space. This limits the amount of
data that is available for the analytical tools.
 The perception that traditional data warehousing processes are too slow
and limited in scalability.
 The ability to converge data from multiple data sources, both structured
and unstructured.
 The realization that time to information is critical to extract value from
data sources that include mobile devices, RFID, the web and a growing
list of automated sensory technologies.
As requirements change you can easily resize your environment (horizontally
or vertically) on AWS to meet your Amazon Web Services.
In addition, there are at least four major developmental segments that
underline the diversity to be found within Big Data analytics. These segments
are MapReduce, scalable database, real-time stream processing and Big Data
appliance.
Using Hadoop for Big Data Analytics: There is a big difference between Big
Data and Hadoop. The former is an asset, often a complex and ambiguous one,
while the latter is a program that accomplishes a set of goals and objectives for
dealing with that asset.
Hadoop is an open-source software framework for storing data and running
applications on clusters of commodity hardware. It provides massive storage
for any kind of data, enormous processing power and the ability to handle
virtually limitless concurrent tasks or jobs.
Hadoop is a framework, which allows processing of large data sets. It
completes the tasks in minutes, while the same done using the RDMS would
take hours.
Hadoop has 2 main components:
 HDFS – Hadoop Distributed File System (for Storage)
 MapReduce (for Processing)
Hadoop Distributed File System works: The Hadoop Distributed File System
(HDFS) is the primary storage system used by Hadoop applications. It consists
of HDFS clusters, which each contain one or more data nodes. Incoming data is
split into segments and distributed across data nodes to support parallel
processing. Each segment is then replicated on multiple data nodes to enable
processing to continue in the event of a node failure.
While HDFS protects against some types of failure, it is not entirely fault
tolerant. A single NameNode located on a single server is required. If this
server fails, the entire file system shuts down. A secondary NameNode
periodically backs up the primary. The backup data is used to restart the
primary but cannot be used to maintain operation.
HDFS is typically used in a Hadoop installation, yet other distributed file
systems are also supported. The Amazon S3 file system can be used but does
not maintain information on the location of data segments, reducing the ability
of Hadoop to survive server or rack failures. Other file systems such as open
source CloudStore and the MapR file system can be used to do maintain
location information.
Distributed processing is handled by MapReduce: The idea behind
MapReduce is that Hadoop can first map a large data set, and then perform a
reduction on that content for specific results. A reduce function can be thought
of as a kind of filter for raw data. The HDFS system then acts to distribute data
across a network or migrate it as necessary.
The MapReduce feature consists of one JobTracker and multiple TaskTrackers.
Client applications submit jobs to the JobTracker, which assigns each job to a
TaskTracker node. When HDFS or another location-aware file system is in use,
JobTracker takes advantage of knowing the location of each data segment. It
attempts to assign processing to the same node on which the required data
has been placed.
Apache Hadoop users typically build their own parallelized computing clusters
from commodity servers, each with dedicated storage in the form of a small
disk array or solid-state drive (SSD) for performance. These are commonly
referred to as “shared-nothing” architectures.
Big Data is getting Big and more important: As more and more data are
collected, the analysis of these data requires scalable, flexible, and high-
performing tools to provide analysis and insight in a timely fashion. Big Data
analytics is a growing field, with the need to parse large data sets from
multiple sources, and to produce information in real-time or near-real-time
gaining importance. IT organizations are exploring various analytics
technologies to parse web-based data sources and extract value from the
social networking boom.

More Related Content

What's hot

Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreTrendwise Analytics
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkLaxmi8
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case Muh Saleh
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 

What's hot (20)

Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 

Similar to Hadoop and Big Data Analytics | Sysfore

Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoptionfaizrashid1995
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceAssignment Help
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Samsung Business USA
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopArchana Gopinath
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Modeinventionjournals
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 

Similar to Hadoop and Big Data Analytics | Sysfore (20)

paper
paperpaper
paper
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
 
Hadoop
HadoopHadoop
Hadoop
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Hadoop and Big Data Analytics | Sysfore

  • 1. Sysfore Technologies #117-120, First Floor, 4th Block, 80 Feet Road, Koramangala, Bangalore 560034 HADOOP AND BIG DATA ANALYTICS
  • 2. Hadoop and Big Data Analytics Gartner defines Big Data as “high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”. Big data is data that, by virtue of its velocity, volume, or variety (the three Vs), cannot be easily stored or analysed with traditional methods. The term covers each and every piece of data your organization has stored till now. It includes all the data stored both on-premises or in the cloud. It could be papers, digital, structured and non-structured data within your company. There is a deluge of structured and unstructured data that is generated every second. This is known as Big Data, which can be analysed to help customers turn that data into insights. AWS provides a broad platform of managed services, infrastructure and tools to tackle your next Big Data project. It enables you to collect, store, process, analyse and visualize Big Data on the cloud. It provides all the hardware, software, infrastructure to maintain and scale, so that you focus on building your application. Some of the common Big Data Customer scenarios include Web & E-Tailing, Telecommunications, Government, Healthcare & Life Science, Bank & Financial Services and Retail, where Big Data is continuously generated. How is Big Data consumed by Businesses: Businesses can gain a lot of insight into how their product is being consumed, by analysing the Big Data
  • 3. generated. Big Data analytics is an area of rapidly growing diversity. Analysing large data sets requires significant compute capacity that can vary in size based on the amount of input data and the analysis required. This characteristic of big data workloads is ideally suited to the pay-as-you-go cloud computing model, where applications can easily scale up and down based on demand. Using Big Data analytics will give you a clear picture about how your data is being generated and consumed by the customers. It can be used for predictive marketing and plans to increase your business. It provides:  Early key indicators, gives insights into business trends resulting in business fortunes.  Analytics results in business advantage.  Get more precise analysis and results with more data. Limitations of using the traditional analytics methods: The advancements in technologies has resulted in huge volume of data being generated every second. Storing, processing, analysing and getting quality results is time consuming, costly and ineffective in the current scenario.  Only a limited amount of high fidelity raw data is available for analysis.  Storage is limited by the high volume of raw data that is continuously generated.  Moving data for computation doesn’t scale accordingly.  Data is archived regularly to conserve space. This limits the amount of data that is available for the analytical tools.
  • 4.  The perception that traditional data warehousing processes are too slow and limited in scalability.  The ability to converge data from multiple data sources, both structured and unstructured.  The realization that time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologies. As requirements change you can easily resize your environment (horizontally or vertically) on AWS to meet your Amazon Web Services. In addition, there are at least four major developmental segments that underline the diversity to be found within Big Data analytics. These segments are MapReduce, scalable database, real-time stream processing and Big Data appliance. Using Hadoop for Big Data Analytics: There is a big difference between Big Data and Hadoop. The former is an asset, often a complex and ambiguous one, while the latter is a program that accomplishes a set of goals and objectives for dealing with that asset. Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
  • 5. Hadoop is a framework, which allows processing of large data sets. It completes the tasks in minutes, while the same done using the RDMS would take hours. Hadoop has 2 main components:  HDFS – Hadoop Distributed File System (for Storage)  MapReduce (for Processing) Hadoop Distributed File System works: The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. It consists of HDFS clusters, which each contain one or more data nodes. Incoming data is split into segments and distributed across data nodes to support parallel processing. Each segment is then replicated on multiple data nodes to enable processing to continue in the event of a node failure. While HDFS protects against some types of failure, it is not entirely fault tolerant. A single NameNode located on a single server is required. If this server fails, the entire file system shuts down. A secondary NameNode periodically backs up the primary. The backup data is used to restart the primary but cannot be used to maintain operation. HDFS is typically used in a Hadoop installation, yet other distributed file systems are also supported. The Amazon S3 file system can be used but does not maintain information on the location of data segments, reducing the ability of Hadoop to survive server or rack failures. Other file systems such as open source CloudStore and the MapR file system can be used to do maintain location information.
  • 6. Distributed processing is handled by MapReduce: The idea behind MapReduce is that Hadoop can first map a large data set, and then perform a reduction on that content for specific results. A reduce function can be thought of as a kind of filter for raw data. The HDFS system then acts to distribute data across a network or migrate it as necessary. The MapReduce feature consists of one JobTracker and multiple TaskTrackers. Client applications submit jobs to the JobTracker, which assigns each job to a TaskTracker node. When HDFS or another location-aware file system is in use, JobTracker takes advantage of knowing the location of each data segment. It attempts to assign processing to the same node on which the required data has been placed. Apache Hadoop users typically build their own parallelized computing clusters from commodity servers, each with dedicated storage in the form of a small disk array or solid-state drive (SSD) for performance. These are commonly referred to as “shared-nothing” architectures. Big Data is getting Big and more important: As more and more data are collected, the analysis of these data requires scalable, flexible, and high- performing tools to provide analysis and insight in a timely fashion. Big Data analytics is a growing field, with the need to parse large data sets from multiple sources, and to produce information in real-time or near-real-time gaining importance. IT organizations are exploring various analytics technologies to parse web-based data sources and extract value from the social networking boom.