The document provides an introduction to the key concepts of Big Data including Hadoop, HDFS, and MapReduce. It defines big data as large volumes of data that are difficult to process using traditional methods. Hadoop is introduced as an open-source framework for distributed storage and processing of large datasets across clusters of computers. HDFS is described as Hadoop's distributed file system that stores data across clusters and replicates files for reliability. MapReduce is a programming model where data is processed in parallel across clusters using mapping and reducing functions.
This document provides an overview of big data concepts and Hadoop. It discusses the four V's of big data - volume, velocity, variety, and veracity. It then describes how Hadoop uses MapReduce and HDFS to process and store large datasets in a distributed, fault-tolerant and scalable manner across commodity hardware. Key components of Hadoop include the HDFS file system and MapReduce framework for distributed processing of large datasets in parallel.
This document provides an overview of Big Data and Hadoop. It defines Big Data as large volumes of structured, semi-structured, and unstructured data that is too large to process using traditional databases and software. It provides examples of the large amounts of data generated daily by organizations. Hadoop is presented as a framework for distributed storage and processing of large datasets across clusters of commodity hardware. Key components of Hadoop including HDFS for distributed storage and fault tolerance, and MapReduce for distributed processing, are described at a high level. Common use cases for Hadoop by large companies are also mentioned.
The document provides an introduction to Hadoop and HDFS (Hadoop Distributed File System). It discusses key concepts such as:
- HDFS stores large datasets across commodity hardware in a fault-tolerant manner and provides scalable storage and access.
- HDFS has a master/slave architecture with a NameNode that manages metadata and DataNodes that store data blocks.
- Data is replicated across DataNodes for reliability, with one replica on a local rack and two on remote racks by default.
- Hadoop allows processing of large datasets in parallel across clusters and is well-suited for massive amounts of structured and unstructured data.
The document discusses big data and key related concepts like the 3 Vs of big data (volume, velocity, and variety), Hadoop, HDFS, and MapReduce. It explains that big data refers to large amounts of data that are too large to process on a single machine. Hadoop is an open-source software framework for distributed storage and processing of big data using the HDFS file system and MapReduce programming model. HDFS stores large files across clusters of machines, providing fault tolerance through data replication. MapReduce allows distributed processing of large datasets across clusters.
This document summarizes key aspects of the Hadoop Distributed File System (HDFS). HDFS is designed for storing very large files across commodity hardware. It uses a master/slave architecture with a single NameNode that manages file system metadata and multiple DataNodes that store application data. HDFS allows for streaming access to this distributed data and can provide higher throughput than a single high-end server by parallelizing reads across nodes.
This document defines and describes big data and Hadoop. It states that big data is large datasets that cannot be processed using traditional techniques due to their volume, velocity and variety. It then describes the different types of data (structured, semi-structured, unstructured), challenges of big data, and Hadoop's use of MapReduce as a solution. It provides details on the Hadoop architecture including HDFS for storage and YARN for resource management. Common applications and users of Hadoop are also listed.
This document provides an introduction to Hadoop and big data concepts. It discusses what big data is, the four V's of big data (volume, velocity, variety, and veracity), different data types (structured, semi-structured, unstructured), how data is generated, and the Apache Hadoop framework. It also covers core Hadoop components like HDFS, YARN, and MapReduce, common Hadoop users, the difference between Hadoop and RDBMS systems, Hadoop cluster modes, the Hadoop ecosystem, HDFS daemons and architecture, and basic Hadoop commands.
The data management industry has matured over the last three decades, primarily based on relational database management system(RDBMS) technology. Since the amount of data collected, and analyzed in enterprises has increased several folds in volume, variety and velocityof generation and consumption, organisations have started struggling with architectural limitations of traditional RDBMS architecture. As a result a new class of systems had to be designed and implemented, giving rise to the new phenomenon of “Big Data”. In this paper we will trace the origin of new class of system called Hadoop to handle Big data.
This document provides an overview of big data concepts and Hadoop. It discusses the four V's of big data - volume, velocity, variety, and veracity. It then describes how Hadoop uses MapReduce and HDFS to process and store large datasets in a distributed, fault-tolerant and scalable manner across commodity hardware. Key components of Hadoop include the HDFS file system and MapReduce framework for distributed processing of large datasets in parallel.
This document provides an overview of Big Data and Hadoop. It defines Big Data as large volumes of structured, semi-structured, and unstructured data that is too large to process using traditional databases and software. It provides examples of the large amounts of data generated daily by organizations. Hadoop is presented as a framework for distributed storage and processing of large datasets across clusters of commodity hardware. Key components of Hadoop including HDFS for distributed storage and fault tolerance, and MapReduce for distributed processing, are described at a high level. Common use cases for Hadoop by large companies are also mentioned.
The document provides an introduction to Hadoop and HDFS (Hadoop Distributed File System). It discusses key concepts such as:
- HDFS stores large datasets across commodity hardware in a fault-tolerant manner and provides scalable storage and access.
- HDFS has a master/slave architecture with a NameNode that manages metadata and DataNodes that store data blocks.
- Data is replicated across DataNodes for reliability, with one replica on a local rack and two on remote racks by default.
- Hadoop allows processing of large datasets in parallel across clusters and is well-suited for massive amounts of structured and unstructured data.
The document discusses big data and key related concepts like the 3 Vs of big data (volume, velocity, and variety), Hadoop, HDFS, and MapReduce. It explains that big data refers to large amounts of data that are too large to process on a single machine. Hadoop is an open-source software framework for distributed storage and processing of big data using the HDFS file system and MapReduce programming model. HDFS stores large files across clusters of machines, providing fault tolerance through data replication. MapReduce allows distributed processing of large datasets across clusters.
This document summarizes key aspects of the Hadoop Distributed File System (HDFS). HDFS is designed for storing very large files across commodity hardware. It uses a master/slave architecture with a single NameNode that manages file system metadata and multiple DataNodes that store application data. HDFS allows for streaming access to this distributed data and can provide higher throughput than a single high-end server by parallelizing reads across nodes.
This document defines and describes big data and Hadoop. It states that big data is large datasets that cannot be processed using traditional techniques due to their volume, velocity and variety. It then describes the different types of data (structured, semi-structured, unstructured), challenges of big data, and Hadoop's use of MapReduce as a solution. It provides details on the Hadoop architecture including HDFS for storage and YARN for resource management. Common applications and users of Hadoop are also listed.
This document provides an introduction to Hadoop and big data concepts. It discusses what big data is, the four V's of big data (volume, velocity, variety, and veracity), different data types (structured, semi-structured, unstructured), how data is generated, and the Apache Hadoop framework. It also covers core Hadoop components like HDFS, YARN, and MapReduce, common Hadoop users, the difference between Hadoop and RDBMS systems, Hadoop cluster modes, the Hadoop ecosystem, HDFS daemons and architecture, and basic Hadoop commands.
The data management industry has matured over the last three decades, primarily based on relational database management system(RDBMS) technology. Since the amount of data collected, and analyzed in enterprises has increased several folds in volume, variety and velocityof generation and consumption, organisations have started struggling with architectural limitations of traditional RDBMS architecture. As a result a new class of systems had to be designed and implemented, giving rise to the new phenomenon of “Big Data”. In this paper we will trace the origin of new class of system called Hadoop to handle Big data.
Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It allows for the reliable, scalable, and distributed processing of large data sets across commodity hardware. The core of Hadoop consists of HDFS for storage and MapReduce for processing data in parallel on multiple nodes. The Hadoop ecosystem includes additional projects that extend the functionality of the core components.
This document provides an introduction and overview of installing Hadoop 2.7.2 in pseudo-distributed mode. It discusses the core components of Hadoop including HDFS for distributed storage and MapReduce for distributed processing. It also covers prerequisites like Java and SSH setup. The document then describes downloading and extracting Hadoop, configuring files, and starting services to run Hadoop in pseudo-distributed mode on a single node.
Big data refers to large and complex datasets that are difficult to process using traditional methods. Key challenges include capturing, storing, searching, sharing, and analyzing large datasets in domains like meteorology, physics simulations, biology, and the internet. Hadoop is an open-source software framework for distributed storage and processing of big data across clusters of computers. It allows for the distributed processing of large data sets in a reliable, fault-tolerant and scalable manner.
This document provides an introduction to big data and Hadoop. It discusses how the volume of data being generated is growing rapidly and exceeding the capabilities of traditional databases. Hadoop is presented as a solution for distributed storage and processing of large datasets across clusters of commodity hardware. Key aspects of Hadoop covered include MapReduce for parallel processing, the Hadoop Distributed File System (HDFS) for reliable storage, and how data is replicated across nodes for fault tolerance.
Hadoop is a framework for distributed storage and processing of large datasets across clusters of commodity hardware. It uses HDFS for fault-tolerant storage and MapReduce as a programming model for distributed computing. HDFS stores data across clusters of machines as blocks that are replicated for reliability. The namenode manages filesystem metadata while datanodes store and retrieve blocks. MapReduce allows processing of large datasets in parallel using a map function to distribute work and a reduce function to aggregate results. Hadoop provides reliable and scalable distributed computing on commodity hardware.
Hadoop is a framework for distributed storage and processing of large datasets across clusters of commodity hardware. It uses HDFS for fault-tolerant storage and MapReduce as a programming model for distributed computing. HDFS stores data across clusters of machines and replicates it for reliability. MapReduce allows processing of large datasets in parallel by splitting work into independent tasks. Hadoop provides reliable and scalable storage and analysis of very large amounts of data.
- Big data refers to large sets of data that businesses and organizations collect, while Hadoop is a tool designed to handle big data. Hadoop uses MapReduce, which maps large datasets and then reduces the results for specific queries.
- Hadoop jobs run under five main daemons: the NameNode, DataNode, Secondary NameNode, JobTracker, and TaskTracker.
- HDFS is Hadoop's distributed file system that stores very large amounts of data across clusters. It replicates data blocks for reliability and provides clients high-throughput access to files.
Enroll Free Live demo of Hadoop online training and big data analytics courses online and become certified data analyst/ Hadoop developer. Get online Hadoop training & certification.
This document provides an overview of big data and Hadoop. It defines big data using the 3Vs - volume, variety, and velocity. It describes Hadoop as an open-source software framework for distributed storage and processing of large datasets. The key components of Hadoop are HDFS for storage and MapReduce for processing. HDFS stores data across clusters of commodity hardware and provides redundancy. MapReduce allows parallel processing of large datasets. Careers in big data involve working with Hadoop and related technologies to extract insights from large and diverse datasets.
We have entered an era of Big Data. Huge information is for the most part accumulation of information sets so extensive and complex that it is exceptionally hard to handle them utilizing close by database administration devices. The principle challenges with Big databases incorporate creation, curation, stockpiling, sharing, inquiry, examination and perception. So to deal with these databases we require, "exceedingly parallel software's". As a matter of first importance, information is procured from diverse sources, for example, online networking, customary undertaking information or sensor information and so forth. Flume can be utilized to secure information from online networking, for example, twitter. At that point, this information can be composed utilizing conveyed document frameworks, for example, Hadoop File System. These record frameworks are extremely proficient when number of peruses are high when contrasted with composes.
This document discusses big data and Hadoop. It defines big data as large amounts of unstructured data that would be too costly to store and analyze in a traditional database. It then describes how Hadoop provides a solution to this challenge through distributed and parallel processing across clusters of commodity hardware. Key aspects of Hadoop covered include HDFS for reliable storage, MapReduce for distributed computing, and how together they allow scalable analysis of very large datasets. Popular users of Hadoop like Amazon, Yahoo and Facebook are also mentioned.
Big data interview questions and answersKalyan Hadoop
This document provides an overview of the Hadoop Distributed File System (HDFS), including its goals, design, daemons, and processes for reading and writing files. HDFS is designed for storing very large files across commodity servers, and provides high throughput and reliability through replication. The key components are the NameNode, which manages metadata, and DataNodes, which store data blocks. The Secondary NameNode assists the NameNode in checkpointing filesystem state periodically.
This document discusses Hadoop Distributed File System (HDFS) and MapReduce. It begins by explaining HDFS architecture, including the NameNode and DataNodes. It then discusses how HDFS is used to store large files reliably across commodity hardware. The document also provides steps to install Hadoop in single node cluster and describes core Hadoop services like JobTracker and TaskTracker. It concludes by discussing HDFS commands and a quiz about Hadoop components.
Big data and Hadoop are frameworks for processing and storing large datasets. Hadoop uses HDFS for distributed storage and MapReduce for distributed processing. HDFS stores large files across multiple machines for redundancy and parallel access. MapReduce divides jobs into map and reduce tasks that run in parallel across a cluster. Hadoop provides scalable and fault-tolerant solutions to problems like processing terabytes of data from jet engines or scaling to Google's data processing needs.
Hadoop Distributed Filesystem (HDFS) is a distributed filesystem designed for storing very large files across commodity hardware. It is optimized for streaming data access and is a good fit for large files, terabytes or petabytes in size, with streaming write-once and read-many access patterns. HDFS uses a master-slave architecture with a Namenode managing the filesystem metadata and Datanodes storing and retrieving block data. Blocks are replicated across Datanodes for reliability. The Namenode tracks block locations and clients read/write data by communicating with the Namenode and Datanodes in a pipeline.
The document summarizes the Hadoop Distributed File System (HDFS), which is designed to reliably store and stream very large datasets at high bandwidth. It describes the key components of HDFS, including the NameNode which manages the file system metadata and mapping of blocks to DataNodes, and DataNodes which store block replicas. HDFS allows scaling storage and computation across thousands of servers by distributing data storage and processing tasks.
Data Analytics: HDFS with Big Data : Issues and ApplicationDr. Chitra Dhawale
This document provides information about a course on data analytics. It outlines the course outcomes, which include developing scalable systems using Apache and Hadoop, writing MapReduce applications, differentiating SQL and NoSQL, and analyzing and developing big data solutions using Hive and Pig. The document also describes some of the topics that will be covered in the course, including distributed file systems and their issues, an introduction to big data, characteristics of big data, types of big data, and comparisons between traditional and big data approaches.
Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It allows for the reliable, scalable, and distributed processing of large data sets across commodity hardware. The core of Hadoop consists of HDFS for storage and MapReduce for processing data in parallel on multiple nodes. The Hadoop ecosystem includes additional projects that extend the functionality of the core components.
This document provides an introduction and overview of installing Hadoop 2.7.2 in pseudo-distributed mode. It discusses the core components of Hadoop including HDFS for distributed storage and MapReduce for distributed processing. It also covers prerequisites like Java and SSH setup. The document then describes downloading and extracting Hadoop, configuring files, and starting services to run Hadoop in pseudo-distributed mode on a single node.
Big data refers to large and complex datasets that are difficult to process using traditional methods. Key challenges include capturing, storing, searching, sharing, and analyzing large datasets in domains like meteorology, physics simulations, biology, and the internet. Hadoop is an open-source software framework for distributed storage and processing of big data across clusters of computers. It allows for the distributed processing of large data sets in a reliable, fault-tolerant and scalable manner.
This document provides an introduction to big data and Hadoop. It discusses how the volume of data being generated is growing rapidly and exceeding the capabilities of traditional databases. Hadoop is presented as a solution for distributed storage and processing of large datasets across clusters of commodity hardware. Key aspects of Hadoop covered include MapReduce for parallel processing, the Hadoop Distributed File System (HDFS) for reliable storage, and how data is replicated across nodes for fault tolerance.
Hadoop is a framework for distributed storage and processing of large datasets across clusters of commodity hardware. It uses HDFS for fault-tolerant storage and MapReduce as a programming model for distributed computing. HDFS stores data across clusters of machines as blocks that are replicated for reliability. The namenode manages filesystem metadata while datanodes store and retrieve blocks. MapReduce allows processing of large datasets in parallel using a map function to distribute work and a reduce function to aggregate results. Hadoop provides reliable and scalable distributed computing on commodity hardware.
Hadoop is a framework for distributed storage and processing of large datasets across clusters of commodity hardware. It uses HDFS for fault-tolerant storage and MapReduce as a programming model for distributed computing. HDFS stores data across clusters of machines and replicates it for reliability. MapReduce allows processing of large datasets in parallel by splitting work into independent tasks. Hadoop provides reliable and scalable storage and analysis of very large amounts of data.
- Big data refers to large sets of data that businesses and organizations collect, while Hadoop is a tool designed to handle big data. Hadoop uses MapReduce, which maps large datasets and then reduces the results for specific queries.
- Hadoop jobs run under five main daemons: the NameNode, DataNode, Secondary NameNode, JobTracker, and TaskTracker.
- HDFS is Hadoop's distributed file system that stores very large amounts of data across clusters. It replicates data blocks for reliability and provides clients high-throughput access to files.
Enroll Free Live demo of Hadoop online training and big data analytics courses online and become certified data analyst/ Hadoop developer. Get online Hadoop training & certification.
This document provides an overview of big data and Hadoop. It defines big data using the 3Vs - volume, variety, and velocity. It describes Hadoop as an open-source software framework for distributed storage and processing of large datasets. The key components of Hadoop are HDFS for storage and MapReduce for processing. HDFS stores data across clusters of commodity hardware and provides redundancy. MapReduce allows parallel processing of large datasets. Careers in big data involve working with Hadoop and related technologies to extract insights from large and diverse datasets.
We have entered an era of Big Data. Huge information is for the most part accumulation of information sets so extensive and complex that it is exceptionally hard to handle them utilizing close by database administration devices. The principle challenges with Big databases incorporate creation, curation, stockpiling, sharing, inquiry, examination and perception. So to deal with these databases we require, "exceedingly parallel software's". As a matter of first importance, information is procured from diverse sources, for example, online networking, customary undertaking information or sensor information and so forth. Flume can be utilized to secure information from online networking, for example, twitter. At that point, this information can be composed utilizing conveyed document frameworks, for example, Hadoop File System. These record frameworks are extremely proficient when number of peruses are high when contrasted with composes.
This document discusses big data and Hadoop. It defines big data as large amounts of unstructured data that would be too costly to store and analyze in a traditional database. It then describes how Hadoop provides a solution to this challenge through distributed and parallel processing across clusters of commodity hardware. Key aspects of Hadoop covered include HDFS for reliable storage, MapReduce for distributed computing, and how together they allow scalable analysis of very large datasets. Popular users of Hadoop like Amazon, Yahoo and Facebook are also mentioned.
Big data interview questions and answersKalyan Hadoop
This document provides an overview of the Hadoop Distributed File System (HDFS), including its goals, design, daemons, and processes for reading and writing files. HDFS is designed for storing very large files across commodity servers, and provides high throughput and reliability through replication. The key components are the NameNode, which manages metadata, and DataNodes, which store data blocks. The Secondary NameNode assists the NameNode in checkpointing filesystem state periodically.
This document discusses Hadoop Distributed File System (HDFS) and MapReduce. It begins by explaining HDFS architecture, including the NameNode and DataNodes. It then discusses how HDFS is used to store large files reliably across commodity hardware. The document also provides steps to install Hadoop in single node cluster and describes core Hadoop services like JobTracker and TaskTracker. It concludes by discussing HDFS commands and a quiz about Hadoop components.
Big data and Hadoop are frameworks for processing and storing large datasets. Hadoop uses HDFS for distributed storage and MapReduce for distributed processing. HDFS stores large files across multiple machines for redundancy and parallel access. MapReduce divides jobs into map and reduce tasks that run in parallel across a cluster. Hadoop provides scalable and fault-tolerant solutions to problems like processing terabytes of data from jet engines or scaling to Google's data processing needs.
Hadoop Distributed Filesystem (HDFS) is a distributed filesystem designed for storing very large files across commodity hardware. It is optimized for streaming data access and is a good fit for large files, terabytes or petabytes in size, with streaming write-once and read-many access patterns. HDFS uses a master-slave architecture with a Namenode managing the filesystem metadata and Datanodes storing and retrieving block data. Blocks are replicated across Datanodes for reliability. The Namenode tracks block locations and clients read/write data by communicating with the Namenode and Datanodes in a pipeline.
The document summarizes the Hadoop Distributed File System (HDFS), which is designed to reliably store and stream very large datasets at high bandwidth. It describes the key components of HDFS, including the NameNode which manages the file system metadata and mapping of blocks to DataNodes, and DataNodes which store block replicas. HDFS allows scaling storage and computation across thousands of servers by distributing data storage and processing tasks.
Data Analytics: HDFS with Big Data : Issues and ApplicationDr. Chitra Dhawale
This document provides information about a course on data analytics. It outlines the course outcomes, which include developing scalable systems using Apache and Hadoop, writing MapReduce applications, differentiating SQL and NoSQL, and analyzing and developing big data solutions using Hive and Pig. The document also describes some of the topics that will be covered in the course, including distributed file systems and their issues, an introduction to big data, characteristics of big data, types of big data, and comparisons between traditional and big data approaches.
Similar to Unit-1 Introduction to Big Data.pptx (20)
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
2. Big Data
⮚ Big dnalytics has been aata refers to data that is so large, fast or complex
that it’s difficult or impossible to process using traditional methods.
⮚ The act of accessing and storing large amounts of information for
around for a long time. But the concept of big data gained momentum in
the early 2000s.
⮚ Big Data is high-volume, high-velocity and/or high-variety information
asset that requires new forms of processing for enhanced decision
making, insight discovery and process optimization (Gartner 2012).
⮚ “Data of a very large size, typically to the extent that its manipulation
and management present significant logistical challenges”.
3. Types of Big Data
⮚ Big data is classified in three ways: Structured Data, Unstructured
Data and Semi-Structured Data.
⮚ Structured data is the easiest to work with. It is highly organized with
dimensions defined by set parameters. Structured data follows schemas:
essentially road maps to specific data points. These schemas outline
where each datum is and what it means. It’s all your quantitative data like
Age, Billing, Address etc.
⮚ Unstructured data is all your unorganized data. The hardest part of
analyzing unstructured data is teaching an application to understand the
information it’s extracting. More often than not, this means translating it
into some form of structured data.
4. ⮚ Semi-structured data toes the line between structured and
unstructured. Most of the time, this translates to unstructured data with
metadata attached to it. Examples of this data are:- time, location, device
ID stamp or email address, or it can be a semantic tag attached to the data
later. Semi-structured data has no set schema.
7. What is Hadoop?
⮚Hadoop is an Apache open source framework written in java that
allows distributed processing of large datasets across clusters of
computers using simple programming models.
⮚The Hadoop framework application works in an environment that
provides distributed storage and computation across clusters of
computers.
⮚Hadoop is a framework that uses distributed storage and parallel
processing to store and manage Big Data.
⮚
⮚Hadoop is designed to scale up from single server to thousands of
machines, each offering local computation and storage.
8. Hadoop Applications
Social Network Analysis
Content Optimization
Network Analytics
Loyalty & Promotions Analysis
Fraud Analysis
Entity Analysis
Clickstream Sessionization
Clickstream Sessionization
Mediation
Data Factory
Trade Reconciliation
SIGINT
Application Application
Industry
Web
Media
Telco
Retail
Financial
Federal
Bioinformatics Genome Mapping
Sequencing Analysis
Use Case
Use Case
9. Hadoop Core Principles
⮚Scale-Out rather than Scale-Up
⮚Bring code to data rather than data to code
⮚Deal with failures – they are common
⮚Abstract complexity of distributed and concurrent
applications
10. Scale-Out rather than Scale-Up
1)It is harder and more expensive to Scale-Up
i. Add additional resources to an existing node (CPU, RAM)
ii.Moore’s Law can’t keep up with data growth
iii.New units must be purchased if required resources can not be added
iv.Also known as scale vertically
1)Scale-Out
i. Add more nodes/machines to an existing distributed application
ii.Software Layer is designed for node additions or removal
iii.Hadoop takes this approach - A set of nodes are bonded together as a single
distributed system
iv.Very easy to scale down as well
11. Bring Code to Data rather than Data to Code
◆Hadoop co-locates processors and storage
◆Code is moved to data (size is tiny, usually in KBs)
◆Processors execute code and access underlying local storage
12. Hadoop is designed to cope with node failures
⮚If a node fails, the master will detect that failure and re-assign the work to
a different node on the system.
⮚Restarting a task does not require communication with nodes working on
other portions of the data.
⮚If a failed node restarts, it is automatically added back to the system and
assigned new tasks.
⮚If a node appears to be running slowly, the master can redundantly
execute another instance of the same task
⮚ Results from the first to finish will be used
15. What is File System (FS)?
⮚ File management system is used by the operating system to access the
files and folders stored in a computer or any external storage devices.
⮚ A file system stores and organizes data and can be thought of as a type
of index for all the data contained in a storage device. These devices
can include hard drives, optical drives and flash drives.
⮚ Imagine file management system as a big dictionary that contains
information about file names, locations and types.
⮚ File systems specify conventions for naming files, including the
maximum number of characters in a name, which characters can be
used etc.
⮚ File management system is capable of handling files within one
16. What is Distributed File System
(DFS)?
⮚A Distributed File System (DFS) as the name suggests, is a file system
that is distributed on multiple file servers or multiple locations.
⮚It allows programs to access or store isolated files as they do with the
local ones, allowing programmers to access files from any network or
computer.
⮚The main purpose of the Distributed File System (DFS) is to allows
users of physically distributed systems to share their data and resources
by using a Common File System.
⮚A collection of workstations and mainframes connected by a Local
Area Network (LAN) is a configuration on Distributed File System.
17. How Distributed file system (DFS)
works?
?Distributed file system works as follows:
a) Distribution: Distribute blocks of data sets across multiple nodes. Each
node has its own computing power; which gives the ability of DFS to parallel
processing data blocks.
b) Replication: Distributed file system will also replicate data blocks on
different clusters by copy the same pieces of information into multiple
clusters on different racks. This will help to achieve the following:
c) Fault Tolerance: recover data block in case of cluster failure or Rack
failure.
d) High Concurrency: avail same piece of data to be processed by multiple
clients at the same time. It is done using the computation power of each node
to parallel process data blocks.
18. DFS Advantages
a) Scalability: You can scale up your infrastructure by adding more racks or
clusters to your system.
b) Fault Tolerance: Data replication will help to achieve fault tolerance in
the following cases: Cluster is down, Rack is down, Rack is disconnected
from the network and Job failure or restart.
c) High Concurrency: utilize the compute power of each node to handle
multiple client requests (in a parallel way) at the same time.
19. DFS Disadvantages
a) In Distributed File System nodes and connections needs to be secured
therefore we can say that security is at stake.
b) There is a possibility of lose of messages and data in the network
while movement from one node to another.
c) Database connection in case of Distributed File System is complicated.
d) Also handling of the database is not easy in Distributed File System as
compared to a single user system.
21. HDFS Basics
⮚ The Hadoop Distributed File System (HDFS) is based on the Google File
System (GFS)
⮚ Hadoop Distributed File System is responsible for storing data on the
cluster.
⮚ Data files are split into blocks and distributed across multiple nodes in the
cluster.
⮚ Each block is replicated multiple times
⮚--Default is to replicate each block three times
⮚--Replicas are stored on different nodes
⮚--This ensures both reliability and availability
⮚ A distributed file system that provides high-throughput access to
application data.
24. Hadoop Daemons
▪ Hadoop is comprised of five separate daemons
▪ NameNode: Holds the metadata for HDFS
▪ Secondary NameNode
– Performs housekeeping functions for the NameNode
– Is not a backup or hot standby for the NameNode!
▪ DataNode: Stores actual HDFS data blocks
▪ JobTracker: Manages MapReduce jobs, distributes individual tasks
▪ TaskTracker: Responsible for instantiating and monitoring individual Map and
Reduce tasks
25. Functions of Namenode
⮚ It is the master daemon that maintains and manages the DataNodes
(slave nodes)
⮚ It records the metadata of all the files stored in the cluster, e.g.
The location of blocks stored, the size of the files, permissions,
hierarchy, etc. There are two files associated with the metadata:
● FsImage: Complete state of the file system namespace since the start
of the NameNode.
● EditLogs: All the recent modifications made to the file system with
respect to the most recent FsImage.
⮚ It records each change that takes place to the file system metadata.
26. Functions of Namenode (Continued..)
⮚ It regularly receives a Heartbeat and a block report from all the
DataNodes in the cluster to ensure that the DataNodes are live.
⮚ It keeps a record of all the blocks in HDFS and in which nodes these
blocks are located.
⮚ The NameNode is also responsible to take care of
the replication factor .
⮚ In case of the DataNode failure, the NameNode chooses new
DataNodes for new replicas, balance disk usage and manages the
communication traffic to the DataNodes.
27. ⮚These are slave daemons or process which runs on each slave machine.
⮚The actual data is stored on DataNodes.
⮚The DataNodes perform the low-level read and write requests from the
file system’s clients.
⮚They send heartbeats to the NameNode periodically to report the
overall health of HDFS, by default, this frequency is set to 3 seconds.
Functions of Datanode
28. Functions of Secondary NameNode
⮚ The Secondary NameNode is one which constantly reads all the file systems and
metadata from the RAM of the NameNode and writes it into the hard disk or the
file system.
⮚ It is responsible for combining the EditLogs with FsImage from the NameNode.
⮚ It downloads the EditLogs from the NameNode at regular intervals and applies to
FsImage.
⮚ The new FsImage is copied back to the NameNode, which is used whenever the
NameNode is started the next time
30. What is MapReduce?
■MapReduce is a processing technique and a program model for distributed computing
based on java.
■The MapReduce algorithm contains two important tasks, namely Map and Reduce.
■Map takes a set of data and converts it into another set of data, where individual
elements are broken down into tuples (key/value pairs).
■Reducer task which takes the output from a map as an input and combines those data
tuples into a smaller set of tuples.
■As the sequence of the name MapReduce implies, the reduce task is always performed
after the map job.
31. ■ MapReduce is the system used to process data in the Hadoop cluster.
■ Consists of two phases: Map, and then Reduce.
■ Each Map task operates on a discrete portion (one HDFS Block) of the overall
dataset.
■ MapReduce system distributes the intermediate data to nodes which perform the
Reduce phase.