SlideShare a Scribd company logo
IT6701 – Information Management
Unit I – Database Modelling,
Management and Development
By
Kaviya.P, AP/IT
Kamaraj College of Engineering & Technology
1
Unit I – Database Modelling,
Management and Development
Database design and modelling - Business Rules and
Relationship; Java database Connectivity (JDBC),
Database connection Manager, Stored Procedures.
Trends in Big Data systems including NoSQL -
Hadoop HDFS, MapReduce, Hive, and
enhancements.
2
Database Design and Modelling
• Database Design
– Process of producing and representing a database in particular model.
– Process of defining the structure of a database.
– Data modelling is the first step in in database design.
• Levels of Abstraction
– Conceptual database design
– Logical database design
– Physical database design
3
Database Design and Modelling
• Conceptual database design - (What is represented in the database?)
– An abstract model is created from business rules and user requirements.
– Entity-Relationship (ER) Model is used to represent the conceptual design.
• Entity – Real things in the world
• Relationships – Reflects interactions between entities
• Attributes – Properties of entities and relationships
• Logical database design - (Logical representation and Relational model)
– ER Model is converted into a relational model through logical database
design.
– The data are arranged into the logical structures and mapped into DBMS
tables with accompanying constraints.
4
Database Design and Modelling
• Physical database design
– Actual physical implementation
of the database in a Database
Management Systems.
– It includes the description of data
features, data types, indexing, etc.
– How the information is
represented in the database or on
how data structures are
implemented to represent what is
modelled.
5
Database Design and Modelling
Database Modelling – ER Model
– Conceptual designing tool which describes data as
entities, relationships, and attributes.
– Diagrammatic representation of the model.
– Entity: Real world thing. (Eg: person, student, car)
– Entity Set: Collection of entities of similar type.
(Eg: Total No. of Students enrolled in a course)
– Attributes: Properties that describe the entity.
6
Database Design and Modelling
Database Modelling – ER Model
Types of Attributes:
– Composite Attribute: Combination of
multiple attribute. (Eg: Address includes
street, city, zip_code).
– Simple Attribute: One which cannot be
decomposed into smaller units. (Eg:
Age)
– Single Valued Attributes: Can hold a
single value. (Eg: Rank)
– Multi valued Attributes: Can store
multiple values. (Eg: Mobile No.)
7
Database Design and Modelling
Database Modelling – ER Model
Types of Attributes:
– Stored Attributes: Attributes
whose values are stored in the
database. (Eg: DoB)
– Derived Attributes: Attributes
whose values are calculated from
one or more attributes in the
database. (Eg: Age can be
calculated from DoB)
8
Database Design and Modelling
Database Modelling – ER Model
Types of Attributes:
– Null values: If the value for certain instances of an entity does not
exist or is not available.
– Complex Attributes: These attributes are those in which the
relationship can exist between two are more entities if they are
associated with each other or one entity refers to one or more
entities.
9
Database Design and Modelling
Database Modelling – ER Model
Relationship:
– Whenever an attribute of one entity
type refers to another entity type, a
relationship exists.
– Degree of relationship:
• Binary: A relationship of degree
two.
• Ternary: A relationship of degree
three.
• n-ary: n-entities participating in n.
10
Database Design and Modelling
Database Modelling – ER Model
Constraints
• Cardinality ratio – Maximum
number of relationship instances
that an entity can participate.
– 1:1 relationship
– 1:N relationship
– N:1 relationship
– M:N relationship
11
Database Design and Modelling
Database Modelling – ER Model
Constraints
• Participation constraint – It specifies whether the existence of an entity
depends on it being related to another entity.
– Total/Mandatory participation: If the existence of an entity is determined
through its participation in a relationship. (Eg: a student must enroll in a
course)
– Partial/optional participation: If only a part of the set of entities participate
in a relationship. (Eg: Every teaching_staff will not be an HoD of a
department)
12
Database Design and Modelling
Database Modelling – ER Model - Keys
• Keys: Allow us to identify a particular entity.
• Super key: A super key is a set of one or more
attributes (columns), which can uniquely identify a row
in a table.
• Candidate key: The minimal set of attribute
which can uniquely identify a tuple is known as
candidate key. (Minimal subset of super key)
• Primary key: An attribute which allows us to
uniquely identify a particular instance in the
database.
• Foreign key(Referential Integrity): If multiple
references exist then updation or modification of
any one should be reflected in all other places.
13
Database Design and Modelling
Database Modelling – ER Model
One – to – One
Cardinality
14
One – to – Many
Cardinality
Many – to – One
Cardinality
Many – to – Many
Cardinality
Participation
Database Design and Modelling
Database Modelling – ER Model
15
Database Design and Modelling
Database Modelling – Extended ER Model
• Specialisation: The result of taking a subset of a higher level entity set to
form a low level entity set. (Eg: Person -> Customer, Employee)
• Generalisation: The result of taking the union of two or more disjoint
entity sets to produce a higher level entity set. (Eg: Customer, Employee
-> Person)
• Aggregation: An abstraction in which relationship sets are treated as
higher level entity sets and can participate in the relationships.
16
Database Design and Modelling
Database Modelling – Case Study: Hospital Management System
17
Business Rules
• Database design is an important phase in the system development life cycle.
• The inputs to design phase will be the business rules and functions identified
in the requirement gathering phase.
• Business rules are used to describe various aspects of the business domain.
(Eg: Students need to be enrolled in the course before appearing for his/her
examination)
• The following are the business rules:
– The explanation of a concept relevant to the application. (Course is evaluated
through theory + practical examination)
– An integrity constraint on the data of the application. (Minimum mark to pass a
course is 50%)
– A derivation rule, whereby information can be derived form other information.
(Grade of the student is assigned based on the marks obtained)
18
Business Rules
Identifying Business Rules
• Business rules allow the database designer to develop relationship
rules and constraints and help in the creation of a correct data
model.
• It is good communication tool between users and designers.
• It gives the proper classification of entities, attributes, relationships,
and constraints.
• The noun in a business rule will be transformed into an entity in
the model and a verb (active or passive) will be interpreted as a
relationship among entities.
19
Java Database Connectivity (JDBC)
• The JDBC API (Application Programming Interface) provides a
way for creating database connections from Java programmes.
• It provides methods to execute SQL statements and process the
results obtained from those statements.
• Types of JDBC drivers
– Type 1 – JDBC ODBC Bridge Driver
– Type 2 – Java Native Driver
– Type 3 – Java Network Protocol Driver
– Type 4 – Pure Java Driver
20
Java Database Connectivity (JDBC)
Type 1 – JDBC ODBC Bridge Driver
• It provides a bridge to access the ODBC drivers installed on each client machine.
• This bridge translates the standard JDBC calls to corresponding ODBC calls and
send them to the ODBC data source via ODBC libraries.
• This driver requires that native ODBC libraries, divers and their required support
files be installed and configured on each client machine.
• They are the slowest of all types due to multiple levels of translation.
21
Java Database Connectivity (JDBC)
Type 2 – Java Native Driver
• It mainly uses the Java Native Interface (JNI) to translate calls to the local database API.
• The JDBC calls are translated into vendor-specific API calls which act as a façade for
forwarding requests between application and database.
• Type 2 drivers are usually faster than Type 1.
• Similar to Type 1 drivers, these drivers also require native libraries to be installed and
configured on each client machine.
22
Java Database Connectivity (JDBC)
Type 3 – Java Network Protocol Driver
• It use an intermediate driver listener that acts as a gateway for multiple database servers.
• The Java client sends JDBC request to the listener which is turn connect to database
server using another driver.
• It do not require any installation on the client side, which is why it is preferred over the
first two types of driver.
23
Java Database Connectivity (JDBC)
Type 4 – Pure Java Driver
• It is most commonly used JDBC driver in most enterprise application because they
convert JDBC API calls to direct network calls using vendor-specific implementation
details.
• Type 4 divers offer better performance compared to the other types and it also does not
require any installation or configuration to be done on client machine.
24
Java Database Connectivity (JDBC)
Accessing Database using JDBC
• Steps:
– Import JDBC Packages - import java.sql.*;
– Register the JDBC Driver - Class.forName()
Eg: Class.forName(“oracle.jdbc.driver.OracleDriver”);
– Creating a database connection
Eg: Connection con = DriverManager.getConnection(url,user,password)
String url = “ jdbc:oracle:thin:@localhost:1521:xe”
DriverManager.getConnection()
getConnection(String url)
getConnection(String url, Properties prop)
getConnection(String url, String user, String password)
25
Java Database Connectivity (JDBC)
Accessing Database using JDBC
• Steps:
– Executing queries
Eg: Statement st = con.createStatement();
int m = st.executeUpdate(sql);
Interfaces Recommended Use
Statement Use for general-purpose access to your database. Useful when you are using static
SQL statements at runtime. The Statement interface cannot accept parameters.
PreparedStatement Use when you plan to use the SQL statements many times. The PreparedStatement
interface accepts input parameters at runtime.
CallableStatement Use the when you want to access the database stored procedures. The
CallableStatement interface can also accept runtime input parameters.
boolean execute (String SQL)
int executeUpdate (String SQL)
ResultSet executeQuery (String SQL)
26
Java Database Connectivity (JDBC)
Accessing Database using JDBC
• Steps:
– Processing the results (handling SQL exceptions)
• ResultSet objects from Statement and PreparedStatement class, which contains the
query output which has to be processed.
• The output value from CallableStatement using OUT parameter, this could either be a
single value or a ResultSet.
• SQL Exception exception has to be caught and gracefully transmitted to the calling
programme.
– Closing the database connection
• By closing connection, objects of Statement and ResultSet will be closed
automatically.
• The close() method of Connection interface is used to close the connection.
Eg: con.close(); 27
Stored Procedure
• A stored procedure is a prepared SQL code that you can save and reuse over and over
again.
• A set of SQL statements, written together to form a logical unit, for performing a
specific task.
• It is a subroutine used by the applications to access relational databases and are stored in
the database data dictionary.
• It can be compiled and executed with different parameters and results and they can have
any combination of input, output, input/output parameters.
• Advantages of Stored Procedure:
– Stored procedures are fast.
– Stored procedures are portable.
– Stored procedures are always available as 'source code' in the database itself.
– Stored procedures are migratory. 28
Stored Procedure – PL/SQL
Example
Function
CREATE [OR REPLACE] FUNCTION function_name
[(parameter_name [IN | OUT | IN OUT] type [, ...])]
RETURN return_datatype
{IS | AS}
BEGIN
< function_body >
END [function_name];
Procedure
CREATE [OR REPLACE] PROCEDURE procedure_name
[(parameter_name [IN | OUT | IN OUT] type [, ...])]
{IS | AS}
BEGIN
< procedure_body >
END procedure_name;
29
Trends in Big Data systems
• Big data is the term used for a collection of data sets so large and complex that it
becomes difficult to process it using on-hand database management tools or
traditional data processing applications.
• Need for Big Data
– A huge amount of data needs to be analyzed for the betterment of the
organization and to improve customer experience.
– The current systems (single server) cannot handle such huge amount of
data.
– Hence, either the capacity of the single machine needs to be increased or a
cluster of machines can be used to act like a single system, which works in a
distributed manner.
– Such a solution is provided through Hadoop.
30
Trends in Big Data systems
Characteristics of Big Data
• Big data can be characterized by specifying three V’s: Volume, Variety, Velocity
• Volume: Specifies that amount of data handled by the application. (Eg: Twitter)
• Velocity: Addresses the rate at which the data flows into the system.
• Variety: Describes the different types of data generated from unstructured text to
structured records, from images to sound and video, from sensor data to
geographic locations, etc., all specifying information needed for processing.
• Fourth V, value: which refers to the return of investment (ROI) of the data and its
processing.
31
Hadoop
• Hadoop is an open-source framework that allows to store and process big
data in a distributed environment across clusters of computers using simple
programming models.
32
Hadoop
Hadoop Architecture
33
Hadoop
Hadoop 1.x Architecture
34
Hadoop
Hadoop 2.x Architecture
35
Hadoop
• Hadoop follows a master-slave architecture for the creation of a cluster.
• It consists of two parts: Storage unit & Processing unit.
• Storage is provided through a Hadoop Distributed File System (HDFS).
• Processing is done through MapReduce.
Storage - HDFS
• HDFS is spread across machines and acts as a single file system.
• The master node has information about the location of the data. The data
are stored in the slave node.
• HDFS runs daemons to handle data storage. They are NameNode,
DataNode, and Secondary NameNode.
36
Hadoop
• The cluster has a single NameNode running on the server and multiple
DataNodes running on the slaves.
• Every slave machine will run a DataNode daemon.
• NameNode acts as a single point of availability for the data. If it goes down
the DataNodes, it would be difficult to make sense of the blocks on them.
• Thus, the NameNode has to run on a dual or a triple redundant hardware
machine like RAID1+0 for storage.
• For faster access, the NameNode is stored in RAM. If the NameNode
crashes, all the data will be lost.
37
Hadoop
• To make this data persistent, secondary NameNode is used.
• NameNode contacts the secondary NameNode every hour and pushes the
metadata onto it, creating a checkpoint.
• NameNode can act as a single point of failure, and hence a backup is essential.
• From Hadoop 2.x onwards, a provision for passive or standby backup is
provided.
• This standby NameNode bachup will take control whenever the active
NameNode fails, thereby providing system availability.
• High data availability is achieved through data replication or duplication. The
default replication factor is 3, every file has three replicas.
38
Hadoop
Processing – MapReduce
• In Hadoop 1.x, the processing part was handled through MapReduce.
• The daemons running for MapReduce are JobTracker and TaskTracker.
• JobTracker: The master that manages the jobs submitted by the client and the
resources by it in the cluster.
• The JobTracker will split the job into various tasks that can run parallelly using the
TaskTracker.
• With hadoop 2.x, a few changes were made in the processing structure.
• Apache Hadoop 2.0 includes YARN, which separates processing components into
resource management and processing components.
• YARN daemons are ResourceManager, and NodeManager, help in processing the
data.
39
Hadoop
Characteristics of Hadoop
• Highly scalable: More machines can easily be added to the cluster as needed to
increase the capacity/power of the cluster.
• Commodity hardware-based: Desktops can be used to create a cluster. Specialized
hardware is not required. Therefore scalable and economical.
• Open source: You can look into the code and contribute back to the community.
• Reliable: If a machine crashes, the data are not lost.
40
Hadoop
Components of Hadoop
• Hadoop is a galaxy of tools.
• Every tool has a specific advantage or purpose.
• The collection of components is known as the Hadoop ecosystem.
• It inlcudes tools for data storage, data manipulation, integration with other systems,
machine learning, cluster management and development, etc.
• Components are:
– Hadoop Distributed File System (HDFS) – Flume, Sqoop
– YARN & HBase
– MapReduce
– Hive & Pig
– Oozie
– Zookeeper
41
Hadoop
Hadoop components
42
Hadoop
Components of Hadoop
• Flume and Sqoop are used for data integrating. They are used to get the data from
external files into Hadoop. Flume is a service to move a large amount of data in real
time. Sqoop is the integration of SQL and Hadoop.
• YARN and MapReduce for data processing.
• YARN stands for Yet Another Resource Negotiator which is for resource management.
• HBase is the data storage or Hadoop database which provides interactive access to the
data stored in HDFS.
• Hive, Pig are used for data analysis. These high-level languages that allow to conctruct
queries, so that data processing can be performed. (Hive – Facebook, Pig – Yahoo)
• Oozie is a workflow scheduler, which is used to manage Hadoop jobs.
• Zookeeper provides operational services for a Hadoop cluster. It provides distributed
configuration services, synchronization services and naming registry. 43
HDFS – Hadoop Distributed
File System
HDFS
• HDFS is the file system required by Hadoop.
• It is an atypical file system, which does not format the hard drives in the
cluster.
• Instead it sits on top of the underlying operating system and its file system and
uses it to store and manage data.
• HDFS divides the file into a block either 64MB or 128MB. This block is then
replicated thrice or the number of times specified by the user.
• The NameNode maintains the split information and location details.
44
HDFS – Hadoop Distributed
File System
Features of HDFS
• It is suitable for the distributed storage and processing.
• Hadoop provides a command interface to interact with HDFS.
• The built-in servers of NameNode and DataNode help users to easily check the
status of cluster.
• Streaming access to file system data.
• HDFS provides file permissions and authentication.
45
HDFS – Hadoop Distributed
File System
HDFS Architecture
46
HDFS – Hadoop Distributed
File System
HDFS
• The storage can sometimes get very huge such that the disks are arranged in
different racks and connected through switches.
• If all replicas are stored in the same rack, and if the switch accessing that rack
fails, all the replicas will be unavailable defying the purpose of having
redundancy.
• HDFS has a feature of rack awareness through which the NameNode knows
which rack each data file is on.
47
HDFS – Hadoop Distributed
File System
Rack awareness in HDFS
48
HDFS – Hadoop Distributed
File System
HDFS
• Hadoop also has an intelligent behavior in terms of self-healing because if one
of the DataNode goes down, then the heartbeat(or status message) from that
DataNode to the NameNode will cease.
• After a few minutes, the NameNode will consider that DataNode to be dead
and whatever tasks that were running on that DataNode will get respawned so
that the replica count of 3 is achieved.
49
HDFS – Hadoop Distributed
File System
HDFS – Preparing HDFS writes
50
HDFS – Hadoop Distributed File System
HDFS – Preparing HDFS writes
1. The client creates the file by calling create( ) on Distributed File System(DFS).
2. DFS makes an RPC call to the NameNode to create a new file in the file system's namespace,
with no blocks associated with it
3. The DFS returns an FSDataOutputStream for the client to start writing data to
FSDataOutputStream wraps a DFSOutputStream which handles communication with the
DataNodes and NameNode.
4. The DataStreamer streams the packets to the first DataNode in the pipeline, which stores each
packet and forwards it to the second DataNode in the pipeline.
5. When the client has finished writing data, it calls close( ) on the stream.
6. This action flushes all the remaining packets to the DataNode pipeline and waits for
acknowledgments before contacting the NameNode to signal that the file is complete.
7. The NameNode already knows which blocks the file is made up of (because DataStreamer asks
for block allocations), so it only has to wait for blocks to be minimally replicated before
returning successfully. 51
HDFS – Hadoop Distributed
File System
HDFS – Reading Data from HDFS
52
HDFS – Hadoop Distributed File System
HDFS – Reading Data from HDFS
1. The client opens the file it wishes to read by calling open ( ) on the File System object,
which for HDFS is an instance of Distributed File System (DFS).
2. DFS calls the NameNode using RPCs, to determine the locations of the first few blocks in
the file.
3. The DFS returns an FSDatalnputStream to the client for it to read data from.
4. FSDataInputStream in turn wraps a DFSInputStream, which manages the DataNode and
NameNode I/O.
5. The client then calls read() on the stream.
6. During reading, if the OFSInputStream encounters an error while communicating with a
DataNode, it will try the next closest one for that block.
7. If a corrupted block is found, the DFSInputStream attempts to read a replica of the block
from another DataNode; it also reports the corrupted block to the NameNode.
53
MapReduce
• MapReduce is a programming model for processing large data sets with a
parallel distributed algorithm on a cluster.
• In traditional systems, data are brought from the datacenter in the main
memory, where the application is running.
• In MapReduce, the application is transferred to the location where data are
stored and executed parallelly.
• Thus, multiple instances of the MapReduce jobs exist at any given time, which
works parallelly on the data stored in HDFS.
54
MapReduce
MapReduce Framework
• It works on a divide and conquer policy.
• The job is divided into multiple tasks known as Map and then the output is
combined using a task known as Reducer.
• The MapReduce program comprises of two components: Map and Reduce.
• The Mapper part does the processing, while the Reducer aggregates the data.
• There is a third component called shuffle and sort present between Map and
Reduce.
• The output of the Map is given to shuffle and sort which is then passed onto
the Reducer.
55
MapReduce
MapReduce Framework
• The shuffle and sort groups the output show that all the data belonging to the
same group are given to a single machine.
• There can be one or many instances of Reducer running for a given job. So, it
is essential that the group of similar data is given to a single machine.
56
MapReduce
Reading the Data into the MapReduce Program
• Map task reads the input from the cluster as a sequence of (key, value) pair.
• The processing is done on the value and the output is also provided as a (key,
value) pair.
• These pairs from Map tasks are combined into groups and then sorted based
on the key through Shuffle and Sort phase.
• This intermediate output is given to the Reduce task combines the results and
provides the final output, which is written onto HDFS.
57
MapReduce
MapReduce Structure
58
MapReduce
MapReduce WorkFlow
Worker
Worker
Worker
Worker
Worker
read
local
write
remote
read,
sort
Output
File 0
Output
File 1
write
Split 0
Split 1
Split 2
Input Data Output Data
Map
extract something you
care about from each
record
Reduce
aggregate,
summarize,
filter, or
transform 59
MapReduce
MapReduce - Example
60
MapReduce
MapReduce - Example
61
MapReduce
MapReduce – Example
62
Hive
• Hive started at Facebook.
• Hive is a data warehouse infrastructure tool to process structured data in
Hadoop.
• Hive resides on top of Hadoop to summarize Big Data, and makes querying
and analyzing easy.
• Using Hive, one can create tables, create database, read the data and create
partition so that the data set can be restructured for processing.
• Hive has a lot of schema flexibility such that the tables can be altered,
columns can be moved, or the whole data set can be reloaded.
• It also has JDBC-ODBC connection so that it can be used with tools like
Tableau visualization. 63
Hive
• Limitations of Hive:
– It is not a relational database.
– It is not designed for OnLine Transaction Processing (OLTP).
– It is not a language for real-time queries and row-level updates.
• Features of Hive:
– It stores schema in a database and processed data into HDFS.
– It is designed for OLAP.
– It provides SQL type language for querying called HiveQL or HQL.
– It is familiar, fast, scalable, and extensible.
64
Hive
• Metastore is the information stored when you create table, database or a
view.
• On top of Metastore lies thriftAPI that enables browsing and querying using
JDBC-ODBC.
• The table definition, column definition, view definition will be stored in
Metastore.
• For Hive, the default data store is derby.
65
Hive
Hive Architecture
66
Hive
Hive Architecture
• Hive shell: Interact through create table, submit query
• Metastore: Table definition, view definition, database definition
• Execution Engine: For execution
• Compiler: For optimization
• Driver: Take the code and convert it into Hadoop understandable terms for
execution.
67
Hive
Create Database Statement
– CREATE DATABASE [IF NOT EXISTS] <database name> ;
Drop Database Statement
– DROP DATABASE IF EXISTS <database name>;
Create Table Statement
– CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]
table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
68
NoSQL
• An approach to data management and database design that is useful for
very large sets of distributed data.
• NoSQL was designed to access and analyze massive amounts of
unstructured data or data stored remotely on multiple virtual servers in the
cloud.
• Types of NoSQL databases
– Graph database
– Key-value database
– Column stores (also known as wide-column stores)
– Document database
69
NoSQL
• Graph database
– It is based on graph theory and used for representing networks from the
network of people in a social context to network of cities in geological
mapping.
– These databases are designed for data whose relations are well represented
as a graph and has elements which are interconnected, with an
undetermined number of relations between them.
– Ex: Neo4j, Giraph
70
NoSQL
• Key-value store
– They are the simplest databases and use a key to access a value.
– These types of databases are designed for storing data in a scheme-free
way.
– In a key-value store, all of the data within consists of an indexed key and
a value, hence the name.
– Ex: Cassandra, DyanmoDB
71
NoSQL
• Column stores
– These data stores are designed for storing data tables as sections of
columns of data, rather than as rows of data.
– Wide-column stores offer high performance and a highly scalable
architecture.
– Ex: Hbase, BigTable
72
NoSQL
• Document database
– These databases expand the idea of key-value stores where “documents”
contain more complex data.
– They contain data and each document is assigned a unique key, which is
used to retrieve the document.
– These are designed for storing, retrieving and managing document-
oriented information, also known as semi-structured data.
– Tree or hierarchical data structures can be directly stored in these
databases.
– Ex: MongoDB, CouchDB
73

More Related Content

What's hot

Exploring Social Media with NodeXL
Exploring Social Media with NodeXL Exploring Social Media with NodeXL
Exploring Social Media with NodeXL
Shalin Hai-Jew
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
rathnaarul
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
Premsankar Chakkingal
 
Graph Structure In The Web
Graph Structure In The WebGraph Structure In The Web
Graph Structure In The Web
dailyye
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
World Agroforestry (ICRAF)
 
Transforming Semantic Web Ideas to Information Architecture
Transforming Semantic Web Ideas to Information ArchitectureTransforming Semantic Web Ideas to Information Architecture
Transforming Semantic Web Ideas to Information Architecture
Vestforsk.no
 
Barcelona Euro Ia Final No Picture
Barcelona Euro Ia Final No PictureBarcelona Euro Ia Final No Picture
Barcelona Euro Ia Final No Pictureanskaar
 
Node XL - features and demo
Node XL - features and demoNode XL - features and demo
Node XL - features and demo
Mayank Mohan
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
Arsalan Khan
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011
guillaume ereteo
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
BAINIDA
 
Social Network Analysis (Part 1)
Social Network Analysis (Part 1)Social Network Analysis (Part 1)
Social Network Analysis (Part 1)
Vala Ali Rohani
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
Traian Rebedea
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012CameliaN
 
It’s a “small world” after all
It’s a “small world” after allIt’s a “small world” after all
It’s a “small world” after allquanmengli
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
siyaza
 
LAK13 Tutorial Social Network Analysis 4 Learning Analytics
LAK13 Tutorial Social Network Analysis 4 Learning AnalyticsLAK13 Tutorial Social Network Analysis 4 Learning Analytics
LAK13 Tutorial Social Network Analysis 4 Learning Analytics
goehnert
 
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...
IAESIJEECS
 
Object models and object representation
Object models and object representationObject models and object representation
Object models and object representation
Julie Allinson
 

What's hot (20)

Exploring Social Media with NodeXL
Exploring Social Media with NodeXL Exploring Social Media with NodeXL
Exploring Social Media with NodeXL
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Graph Structure In The Web
Graph Structure In The WebGraph Structure In The Web
Graph Structure In The Web
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
Transforming Semantic Web Ideas to Information Architecture
Transforming Semantic Web Ideas to Information ArchitectureTransforming Semantic Web Ideas to Information Architecture
Transforming Semantic Web Ideas to Information Architecture
 
Barcelona Euro Ia Final No Picture
Barcelona Euro Ia Final No PictureBarcelona Euro Ia Final No Picture
Barcelona Euro Ia Final No Picture
 
Node XL - features and demo
Node XL - features and demoNode XL - features and demo
Node XL - features and demo
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 
Social Network Analysis (Part 1)
Social Network Analysis (Part 1)Social Network Analysis (Part 1)
Social Network Analysis (Part 1)
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Q046049397
Q046049397Q046049397
Q046049397
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
 
It’s a “small world” after all
It’s a “small world” after allIt’s a “small world” after all
It’s a “small world” after all
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
 
LAK13 Tutorial Social Network Analysis 4 Learning Analytics
LAK13 Tutorial Social Network Analysis 4 Learning AnalyticsLAK13 Tutorial Social Network Analysis 4 Learning Analytics
LAK13 Tutorial Social Network Analysis 4 Learning Analytics
 
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...
 
Object models and object representation
Object models and object representationObject models and object representation
Object models and object representation
 

Similar to IT6701 Information Management - Unit I

DBMS
DBMS DBMS
Unit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxUnit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptx
MaryJoseph79
 
Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02Jotham Gadot
 
(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)
Dr. Mazin Mohamed alkathiri
 
oracle
oracle oracle
DISE - Database Concepts
DISE - Database ConceptsDISE - Database Concepts
DISE - Database Concepts
Rasan Samarasinghe
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
NILESH UCHCHASARE
 
An Introduction To Software Development - Architecture & Detailed Design
An Introduction To Software Development - Architecture & Detailed DesignAn Introduction To Software Development - Architecture & Detailed Design
An Introduction To Software Development - Architecture & Detailed Design
Blue Elephant Consulting
 
Database Management System NOTES for 2nd year
Database Management System NOTES for 2nd yearDatabase Management System NOTES for 2nd year
Database Management System NOTES for 2nd year
dhasamalika
 
02010 ppt ch02
02010 ppt ch0202010 ppt ch02
02010 ppt ch02Hpong Js
 
INTRODUCTION OF DATA BASE
INTRODUCTION OF DATA BASEINTRODUCTION OF DATA BASE
INTRODUCTION OF DATA BASE
AMUTHAG2
 
BAB 7 Pangkalan data new
BAB 7   Pangkalan data newBAB 7   Pangkalan data new
BAB 7 Pangkalan data new
Nur Salsabila Edu
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptx
NurulIzrin
 
Chapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfChapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdf
TamiratDejene1
 
Database management system.pptx
Database management system.pptxDatabase management system.pptx
Database management system.pptx
AshmitKashyap1
 
Week 1 and 2 Getting started with DBMS.pptx
Week 1 and 2 Getting started with DBMS.pptxWeek 1 and 2 Getting started with DBMS.pptx
Week 1 and 2 Getting started with DBMS.pptx
Riannel Tecson
 
DBMS-Unit-1.pptx
DBMS-Unit-1.pptxDBMS-Unit-1.pptx
DBMS-Unit-1.pptx
Bhavya304221
 
Database management system
Database management systemDatabase management system
Database management system
sangeethachandrabose
 
Unit 1 DBMS
Unit 1 DBMSUnit 1 DBMS
Unit 1 DBMS
DhivyaSubramaniyam
 
ER modeling
ER modelingER modeling
ER modeling
Dabbal Singh Mahara
 

Similar to IT6701 Information Management - Unit I (20)

DBMS
DBMS DBMS
DBMS
 
Unit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxUnit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptx
 
Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02
 
(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)
 
oracle
oracle oracle
oracle
 
DISE - Database Concepts
DISE - Database ConceptsDISE - Database Concepts
DISE - Database Concepts
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
An Introduction To Software Development - Architecture & Detailed Design
An Introduction To Software Development - Architecture & Detailed DesignAn Introduction To Software Development - Architecture & Detailed Design
An Introduction To Software Development - Architecture & Detailed Design
 
Database Management System NOTES for 2nd year
Database Management System NOTES for 2nd yearDatabase Management System NOTES for 2nd year
Database Management System NOTES for 2nd year
 
02010 ppt ch02
02010 ppt ch0202010 ppt ch02
02010 ppt ch02
 
INTRODUCTION OF DATA BASE
INTRODUCTION OF DATA BASEINTRODUCTION OF DATA BASE
INTRODUCTION OF DATA BASE
 
BAB 7 Pangkalan data new
BAB 7   Pangkalan data newBAB 7   Pangkalan data new
BAB 7 Pangkalan data new
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptx
 
Chapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfChapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdf
 
Database management system.pptx
Database management system.pptxDatabase management system.pptx
Database management system.pptx
 
Week 1 and 2 Getting started with DBMS.pptx
Week 1 and 2 Getting started with DBMS.pptxWeek 1 and 2 Getting started with DBMS.pptx
Week 1 and 2 Getting started with DBMS.pptx
 
DBMS-Unit-1.pptx
DBMS-Unit-1.pptxDBMS-Unit-1.pptx
DBMS-Unit-1.pptx
 
Database management system
Database management systemDatabase management system
Database management system
 
Unit 1 DBMS
Unit 1 DBMSUnit 1 DBMS
Unit 1 DBMS
 
ER modeling
ER modelingER modeling
ER modeling
 

More from pkaviya

IT2255 Web Essentials - Unit V Servlets and Database Connectivity
IT2255 Web Essentials - Unit V Servlets and Database ConnectivityIT2255 Web Essentials - Unit V Servlets and Database Connectivity
IT2255 Web Essentials - Unit V Servlets and Database Connectivity
pkaviya
 
IT2255 Web Essentials - Unit IV Server-Side Processing and Scripting - PHP.pdf
IT2255 Web Essentials - Unit IV Server-Side Processing and Scripting - PHP.pdfIT2255 Web Essentials - Unit IV Server-Side Processing and Scripting - PHP.pdf
IT2255 Web Essentials - Unit IV Server-Side Processing and Scripting - PHP.pdf
pkaviya
 
IT2255 Web Essentials - Unit III Client-Side Processing and Scripting
IT2255 Web Essentials - Unit III Client-Side Processing and ScriptingIT2255 Web Essentials - Unit III Client-Side Processing and Scripting
IT2255 Web Essentials - Unit III Client-Side Processing and Scripting
pkaviya
 
IT2255 Web Essentials - Unit II Web Designing
IT2255 Web Essentials - Unit II  Web DesigningIT2255 Web Essentials - Unit II  Web Designing
IT2255 Web Essentials - Unit II Web Designing
pkaviya
 
IT2255 Web Essentials - Unit I Website Basics
IT2255 Web Essentials - Unit I  Website BasicsIT2255 Web Essentials - Unit I  Website Basics
IT2255 Web Essentials - Unit I Website Basics
pkaviya
 
BT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdf
BT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdfBT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdf
BT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdf
pkaviya
 
OIT552 Cloud Computing Material
OIT552 Cloud Computing MaterialOIT552 Cloud Computing Material
OIT552 Cloud Computing Material
pkaviya
 
OIT552 Cloud Computing - Question Bank
OIT552 Cloud Computing - Question BankOIT552 Cloud Computing - Question Bank
OIT552 Cloud Computing - Question Bank
pkaviya
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
pkaviya
 
CS8592 Object Oriented Analysis & Design - UNIT V
CS8592 Object Oriented Analysis & Design - UNIT V CS8592 Object Oriented Analysis & Design - UNIT V
CS8592 Object Oriented Analysis & Design - UNIT V
pkaviya
 
CS8592 Object Oriented Analysis & Design - UNIT IV
CS8592 Object Oriented Analysis & Design - UNIT IV CS8592 Object Oriented Analysis & Design - UNIT IV
CS8592 Object Oriented Analysis & Design - UNIT IV
pkaviya
 
CS8592 Object Oriented Analysis & Design - UNIT III
CS8592 Object Oriented Analysis & Design - UNIT III CS8592 Object Oriented Analysis & Design - UNIT III
CS8592 Object Oriented Analysis & Design - UNIT III
pkaviya
 
CS8592 Object Oriented Analysis & Design - UNIT II
CS8592 Object Oriented Analysis & Design - UNIT IICS8592 Object Oriented Analysis & Design - UNIT II
CS8592 Object Oriented Analysis & Design - UNIT II
pkaviya
 
CS8592 Object Oriented Analysis & Design - UNIT I
CS8592 Object Oriented Analysis & Design - UNIT ICS8592 Object Oriented Analysis & Design - UNIT I
CS8592 Object Oriented Analysis & Design - UNIT I
pkaviya
 
Cs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT VCs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT V
pkaviya
 
CS8591 Computer Networks - Unit IV
CS8591 Computer Networks - Unit IVCS8591 Computer Networks - Unit IV
CS8591 Computer Networks - Unit IV
pkaviya
 
CS8591 Computer Networks - Unit III
CS8591 Computer Networks - Unit IIICS8591 Computer Networks - Unit III
CS8591 Computer Networks - Unit III
pkaviya
 
CS8591 Computer Networks - Unit II
CS8591 Computer Networks - Unit II CS8591 Computer Networks - Unit II
CS8591 Computer Networks - Unit II
pkaviya
 
CS8591 Computer Networks - Unit I
CS8591 Computer Networks - Unit ICS8591 Computer Networks - Unit I
CS8591 Computer Networks - Unit I
pkaviya
 
IT8602 Mobile Communication - Unit V
IT8602 Mobile Communication - Unit V IT8602 Mobile Communication - Unit V
IT8602 Mobile Communication - Unit V
pkaviya
 

More from pkaviya (20)

IT2255 Web Essentials - Unit V Servlets and Database Connectivity
IT2255 Web Essentials - Unit V Servlets and Database ConnectivityIT2255 Web Essentials - Unit V Servlets and Database Connectivity
IT2255 Web Essentials - Unit V Servlets and Database Connectivity
 
IT2255 Web Essentials - Unit IV Server-Side Processing and Scripting - PHP.pdf
IT2255 Web Essentials - Unit IV Server-Side Processing and Scripting - PHP.pdfIT2255 Web Essentials - Unit IV Server-Side Processing and Scripting - PHP.pdf
IT2255 Web Essentials - Unit IV Server-Side Processing and Scripting - PHP.pdf
 
IT2255 Web Essentials - Unit III Client-Side Processing and Scripting
IT2255 Web Essentials - Unit III Client-Side Processing and ScriptingIT2255 Web Essentials - Unit III Client-Side Processing and Scripting
IT2255 Web Essentials - Unit III Client-Side Processing and Scripting
 
IT2255 Web Essentials - Unit II Web Designing
IT2255 Web Essentials - Unit II  Web DesigningIT2255 Web Essentials - Unit II  Web Designing
IT2255 Web Essentials - Unit II Web Designing
 
IT2255 Web Essentials - Unit I Website Basics
IT2255 Web Essentials - Unit I  Website BasicsIT2255 Web Essentials - Unit I  Website Basics
IT2255 Web Essentials - Unit I Website Basics
 
BT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdf
BT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdfBT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdf
BT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdf
 
OIT552 Cloud Computing Material
OIT552 Cloud Computing MaterialOIT552 Cloud Computing Material
OIT552 Cloud Computing Material
 
OIT552 Cloud Computing - Question Bank
OIT552 Cloud Computing - Question BankOIT552 Cloud Computing - Question Bank
OIT552 Cloud Computing - Question Bank
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
CS8592 Object Oriented Analysis & Design - UNIT V
CS8592 Object Oriented Analysis & Design - UNIT V CS8592 Object Oriented Analysis & Design - UNIT V
CS8592 Object Oriented Analysis & Design - UNIT V
 
CS8592 Object Oriented Analysis & Design - UNIT IV
CS8592 Object Oriented Analysis & Design - UNIT IV CS8592 Object Oriented Analysis & Design - UNIT IV
CS8592 Object Oriented Analysis & Design - UNIT IV
 
CS8592 Object Oriented Analysis & Design - UNIT III
CS8592 Object Oriented Analysis & Design - UNIT III CS8592 Object Oriented Analysis & Design - UNIT III
CS8592 Object Oriented Analysis & Design - UNIT III
 
CS8592 Object Oriented Analysis & Design - UNIT II
CS8592 Object Oriented Analysis & Design - UNIT IICS8592 Object Oriented Analysis & Design - UNIT II
CS8592 Object Oriented Analysis & Design - UNIT II
 
CS8592 Object Oriented Analysis & Design - UNIT I
CS8592 Object Oriented Analysis & Design - UNIT ICS8592 Object Oriented Analysis & Design - UNIT I
CS8592 Object Oriented Analysis & Design - UNIT I
 
Cs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT VCs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT V
 
CS8591 Computer Networks - Unit IV
CS8591 Computer Networks - Unit IVCS8591 Computer Networks - Unit IV
CS8591 Computer Networks - Unit IV
 
CS8591 Computer Networks - Unit III
CS8591 Computer Networks - Unit IIICS8591 Computer Networks - Unit III
CS8591 Computer Networks - Unit III
 
CS8591 Computer Networks - Unit II
CS8591 Computer Networks - Unit II CS8591 Computer Networks - Unit II
CS8591 Computer Networks - Unit II
 
CS8591 Computer Networks - Unit I
CS8591 Computer Networks - Unit ICS8591 Computer Networks - Unit I
CS8591 Computer Networks - Unit I
 
IT8602 Mobile Communication - Unit V
IT8602 Mobile Communication - Unit V IT8602 Mobile Communication - Unit V
IT8602 Mobile Communication - Unit V
 

Recently uploaded

MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 

Recently uploaded (20)

MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 

IT6701 Information Management - Unit I

  • 1. IT6701 – Information Management Unit I – Database Modelling, Management and Development By Kaviya.P, AP/IT Kamaraj College of Engineering & Technology 1
  • 2. Unit I – Database Modelling, Management and Development Database design and modelling - Business Rules and Relationship; Java database Connectivity (JDBC), Database connection Manager, Stored Procedures. Trends in Big Data systems including NoSQL - Hadoop HDFS, MapReduce, Hive, and enhancements. 2
  • 3. Database Design and Modelling • Database Design – Process of producing and representing a database in particular model. – Process of defining the structure of a database. – Data modelling is the first step in in database design. • Levels of Abstraction – Conceptual database design – Logical database design – Physical database design 3
  • 4. Database Design and Modelling • Conceptual database design - (What is represented in the database?) – An abstract model is created from business rules and user requirements. – Entity-Relationship (ER) Model is used to represent the conceptual design. • Entity – Real things in the world • Relationships – Reflects interactions between entities • Attributes – Properties of entities and relationships • Logical database design - (Logical representation and Relational model) – ER Model is converted into a relational model through logical database design. – The data are arranged into the logical structures and mapped into DBMS tables with accompanying constraints. 4
  • 5. Database Design and Modelling • Physical database design – Actual physical implementation of the database in a Database Management Systems. – It includes the description of data features, data types, indexing, etc. – How the information is represented in the database or on how data structures are implemented to represent what is modelled. 5
  • 6. Database Design and Modelling Database Modelling – ER Model – Conceptual designing tool which describes data as entities, relationships, and attributes. – Diagrammatic representation of the model. – Entity: Real world thing. (Eg: person, student, car) – Entity Set: Collection of entities of similar type. (Eg: Total No. of Students enrolled in a course) – Attributes: Properties that describe the entity. 6
  • 7. Database Design and Modelling Database Modelling – ER Model Types of Attributes: – Composite Attribute: Combination of multiple attribute. (Eg: Address includes street, city, zip_code). – Simple Attribute: One which cannot be decomposed into smaller units. (Eg: Age) – Single Valued Attributes: Can hold a single value. (Eg: Rank) – Multi valued Attributes: Can store multiple values. (Eg: Mobile No.) 7
  • 8. Database Design and Modelling Database Modelling – ER Model Types of Attributes: – Stored Attributes: Attributes whose values are stored in the database. (Eg: DoB) – Derived Attributes: Attributes whose values are calculated from one or more attributes in the database. (Eg: Age can be calculated from DoB) 8
  • 9. Database Design and Modelling Database Modelling – ER Model Types of Attributes: – Null values: If the value for certain instances of an entity does not exist or is not available. – Complex Attributes: These attributes are those in which the relationship can exist between two are more entities if they are associated with each other or one entity refers to one or more entities. 9
  • 10. Database Design and Modelling Database Modelling – ER Model Relationship: – Whenever an attribute of one entity type refers to another entity type, a relationship exists. – Degree of relationship: • Binary: A relationship of degree two. • Ternary: A relationship of degree three. • n-ary: n-entities participating in n. 10
  • 11. Database Design and Modelling Database Modelling – ER Model Constraints • Cardinality ratio – Maximum number of relationship instances that an entity can participate. – 1:1 relationship – 1:N relationship – N:1 relationship – M:N relationship 11
  • 12. Database Design and Modelling Database Modelling – ER Model Constraints • Participation constraint – It specifies whether the existence of an entity depends on it being related to another entity. – Total/Mandatory participation: If the existence of an entity is determined through its participation in a relationship. (Eg: a student must enroll in a course) – Partial/optional participation: If only a part of the set of entities participate in a relationship. (Eg: Every teaching_staff will not be an HoD of a department) 12
  • 13. Database Design and Modelling Database Modelling – ER Model - Keys • Keys: Allow us to identify a particular entity. • Super key: A super key is a set of one or more attributes (columns), which can uniquely identify a row in a table. • Candidate key: The minimal set of attribute which can uniquely identify a tuple is known as candidate key. (Minimal subset of super key) • Primary key: An attribute which allows us to uniquely identify a particular instance in the database. • Foreign key(Referential Integrity): If multiple references exist then updation or modification of any one should be reflected in all other places. 13
  • 14. Database Design and Modelling Database Modelling – ER Model One – to – One Cardinality 14
  • 15. One – to – Many Cardinality Many – to – One Cardinality Many – to – Many Cardinality Participation Database Design and Modelling Database Modelling – ER Model 15
  • 16. Database Design and Modelling Database Modelling – Extended ER Model • Specialisation: The result of taking a subset of a higher level entity set to form a low level entity set. (Eg: Person -> Customer, Employee) • Generalisation: The result of taking the union of two or more disjoint entity sets to produce a higher level entity set. (Eg: Customer, Employee -> Person) • Aggregation: An abstraction in which relationship sets are treated as higher level entity sets and can participate in the relationships. 16
  • 17. Database Design and Modelling Database Modelling – Case Study: Hospital Management System 17
  • 18. Business Rules • Database design is an important phase in the system development life cycle. • The inputs to design phase will be the business rules and functions identified in the requirement gathering phase. • Business rules are used to describe various aspects of the business domain. (Eg: Students need to be enrolled in the course before appearing for his/her examination) • The following are the business rules: – The explanation of a concept relevant to the application. (Course is evaluated through theory + practical examination) – An integrity constraint on the data of the application. (Minimum mark to pass a course is 50%) – A derivation rule, whereby information can be derived form other information. (Grade of the student is assigned based on the marks obtained) 18
  • 19. Business Rules Identifying Business Rules • Business rules allow the database designer to develop relationship rules and constraints and help in the creation of a correct data model. • It is good communication tool between users and designers. • It gives the proper classification of entities, attributes, relationships, and constraints. • The noun in a business rule will be transformed into an entity in the model and a verb (active or passive) will be interpreted as a relationship among entities. 19
  • 20. Java Database Connectivity (JDBC) • The JDBC API (Application Programming Interface) provides a way for creating database connections from Java programmes. • It provides methods to execute SQL statements and process the results obtained from those statements. • Types of JDBC drivers – Type 1 – JDBC ODBC Bridge Driver – Type 2 – Java Native Driver – Type 3 – Java Network Protocol Driver – Type 4 – Pure Java Driver 20
  • 21. Java Database Connectivity (JDBC) Type 1 – JDBC ODBC Bridge Driver • It provides a bridge to access the ODBC drivers installed on each client machine. • This bridge translates the standard JDBC calls to corresponding ODBC calls and send them to the ODBC data source via ODBC libraries. • This driver requires that native ODBC libraries, divers and their required support files be installed and configured on each client machine. • They are the slowest of all types due to multiple levels of translation. 21
  • 22. Java Database Connectivity (JDBC) Type 2 – Java Native Driver • It mainly uses the Java Native Interface (JNI) to translate calls to the local database API. • The JDBC calls are translated into vendor-specific API calls which act as a façade for forwarding requests between application and database. • Type 2 drivers are usually faster than Type 1. • Similar to Type 1 drivers, these drivers also require native libraries to be installed and configured on each client machine. 22
  • 23. Java Database Connectivity (JDBC) Type 3 – Java Network Protocol Driver • It use an intermediate driver listener that acts as a gateway for multiple database servers. • The Java client sends JDBC request to the listener which is turn connect to database server using another driver. • It do not require any installation on the client side, which is why it is preferred over the first two types of driver. 23
  • 24. Java Database Connectivity (JDBC) Type 4 – Pure Java Driver • It is most commonly used JDBC driver in most enterprise application because they convert JDBC API calls to direct network calls using vendor-specific implementation details. • Type 4 divers offer better performance compared to the other types and it also does not require any installation or configuration to be done on client machine. 24
  • 25. Java Database Connectivity (JDBC) Accessing Database using JDBC • Steps: – Import JDBC Packages - import java.sql.*; – Register the JDBC Driver - Class.forName() Eg: Class.forName(“oracle.jdbc.driver.OracleDriver”); – Creating a database connection Eg: Connection con = DriverManager.getConnection(url,user,password) String url = “ jdbc:oracle:thin:@localhost:1521:xe” DriverManager.getConnection() getConnection(String url) getConnection(String url, Properties prop) getConnection(String url, String user, String password) 25
  • 26. Java Database Connectivity (JDBC) Accessing Database using JDBC • Steps: – Executing queries Eg: Statement st = con.createStatement(); int m = st.executeUpdate(sql); Interfaces Recommended Use Statement Use for general-purpose access to your database. Useful when you are using static SQL statements at runtime. The Statement interface cannot accept parameters. PreparedStatement Use when you plan to use the SQL statements many times. The PreparedStatement interface accepts input parameters at runtime. CallableStatement Use the when you want to access the database stored procedures. The CallableStatement interface can also accept runtime input parameters. boolean execute (String SQL) int executeUpdate (String SQL) ResultSet executeQuery (String SQL) 26
  • 27. Java Database Connectivity (JDBC) Accessing Database using JDBC • Steps: – Processing the results (handling SQL exceptions) • ResultSet objects from Statement and PreparedStatement class, which contains the query output which has to be processed. • The output value from CallableStatement using OUT parameter, this could either be a single value or a ResultSet. • SQL Exception exception has to be caught and gracefully transmitted to the calling programme. – Closing the database connection • By closing connection, objects of Statement and ResultSet will be closed automatically. • The close() method of Connection interface is used to close the connection. Eg: con.close(); 27
  • 28. Stored Procedure • A stored procedure is a prepared SQL code that you can save and reuse over and over again. • A set of SQL statements, written together to form a logical unit, for performing a specific task. • It is a subroutine used by the applications to access relational databases and are stored in the database data dictionary. • It can be compiled and executed with different parameters and results and they can have any combination of input, output, input/output parameters. • Advantages of Stored Procedure: – Stored procedures are fast. – Stored procedures are portable. – Stored procedures are always available as 'source code' in the database itself. – Stored procedures are migratory. 28
  • 29. Stored Procedure – PL/SQL Example Function CREATE [OR REPLACE] FUNCTION function_name [(parameter_name [IN | OUT | IN OUT] type [, ...])] RETURN return_datatype {IS | AS} BEGIN < function_body > END [function_name]; Procedure CREATE [OR REPLACE] PROCEDURE procedure_name [(parameter_name [IN | OUT | IN OUT] type [, ...])] {IS | AS} BEGIN < procedure_body > END procedure_name; 29
  • 30. Trends in Big Data systems • Big data is the term used for a collection of data sets so large and complex that it becomes difficult to process it using on-hand database management tools or traditional data processing applications. • Need for Big Data – A huge amount of data needs to be analyzed for the betterment of the organization and to improve customer experience. – The current systems (single server) cannot handle such huge amount of data. – Hence, either the capacity of the single machine needs to be increased or a cluster of machines can be used to act like a single system, which works in a distributed manner. – Such a solution is provided through Hadoop. 30
  • 31. Trends in Big Data systems Characteristics of Big Data • Big data can be characterized by specifying three V’s: Volume, Variety, Velocity • Volume: Specifies that amount of data handled by the application. (Eg: Twitter) • Velocity: Addresses the rate at which the data flows into the system. • Variety: Describes the different types of data generated from unstructured text to structured records, from images to sound and video, from sensor data to geographic locations, etc., all specifying information needed for processing. • Fourth V, value: which refers to the return of investment (ROI) of the data and its processing. 31
  • 32. Hadoop • Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. 32
  • 36. Hadoop • Hadoop follows a master-slave architecture for the creation of a cluster. • It consists of two parts: Storage unit & Processing unit. • Storage is provided through a Hadoop Distributed File System (HDFS). • Processing is done through MapReduce. Storage - HDFS • HDFS is spread across machines and acts as a single file system. • The master node has information about the location of the data. The data are stored in the slave node. • HDFS runs daemons to handle data storage. They are NameNode, DataNode, and Secondary NameNode. 36
  • 37. Hadoop • The cluster has a single NameNode running on the server and multiple DataNodes running on the slaves. • Every slave machine will run a DataNode daemon. • NameNode acts as a single point of availability for the data. If it goes down the DataNodes, it would be difficult to make sense of the blocks on them. • Thus, the NameNode has to run on a dual or a triple redundant hardware machine like RAID1+0 for storage. • For faster access, the NameNode is stored in RAM. If the NameNode crashes, all the data will be lost. 37
  • 38. Hadoop • To make this data persistent, secondary NameNode is used. • NameNode contacts the secondary NameNode every hour and pushes the metadata onto it, creating a checkpoint. • NameNode can act as a single point of failure, and hence a backup is essential. • From Hadoop 2.x onwards, a provision for passive or standby backup is provided. • This standby NameNode bachup will take control whenever the active NameNode fails, thereby providing system availability. • High data availability is achieved through data replication or duplication. The default replication factor is 3, every file has three replicas. 38
  • 39. Hadoop Processing – MapReduce • In Hadoop 1.x, the processing part was handled through MapReduce. • The daemons running for MapReduce are JobTracker and TaskTracker. • JobTracker: The master that manages the jobs submitted by the client and the resources by it in the cluster. • The JobTracker will split the job into various tasks that can run parallelly using the TaskTracker. • With hadoop 2.x, a few changes were made in the processing structure. • Apache Hadoop 2.0 includes YARN, which separates processing components into resource management and processing components. • YARN daemons are ResourceManager, and NodeManager, help in processing the data. 39
  • 40. Hadoop Characteristics of Hadoop • Highly scalable: More machines can easily be added to the cluster as needed to increase the capacity/power of the cluster. • Commodity hardware-based: Desktops can be used to create a cluster. Specialized hardware is not required. Therefore scalable and economical. • Open source: You can look into the code and contribute back to the community. • Reliable: If a machine crashes, the data are not lost. 40
  • 41. Hadoop Components of Hadoop • Hadoop is a galaxy of tools. • Every tool has a specific advantage or purpose. • The collection of components is known as the Hadoop ecosystem. • It inlcudes tools for data storage, data manipulation, integration with other systems, machine learning, cluster management and development, etc. • Components are: – Hadoop Distributed File System (HDFS) – Flume, Sqoop – YARN & HBase – MapReduce – Hive & Pig – Oozie – Zookeeper 41
  • 43. Hadoop Components of Hadoop • Flume and Sqoop are used for data integrating. They are used to get the data from external files into Hadoop. Flume is a service to move a large amount of data in real time. Sqoop is the integration of SQL and Hadoop. • YARN and MapReduce for data processing. • YARN stands for Yet Another Resource Negotiator which is for resource management. • HBase is the data storage or Hadoop database which provides interactive access to the data stored in HDFS. • Hive, Pig are used for data analysis. These high-level languages that allow to conctruct queries, so that data processing can be performed. (Hive – Facebook, Pig – Yahoo) • Oozie is a workflow scheduler, which is used to manage Hadoop jobs. • Zookeeper provides operational services for a Hadoop cluster. It provides distributed configuration services, synchronization services and naming registry. 43
  • 44. HDFS – Hadoop Distributed File System HDFS • HDFS is the file system required by Hadoop. • It is an atypical file system, which does not format the hard drives in the cluster. • Instead it sits on top of the underlying operating system and its file system and uses it to store and manage data. • HDFS divides the file into a block either 64MB or 128MB. This block is then replicated thrice or the number of times specified by the user. • The NameNode maintains the split information and location details. 44
  • 45. HDFS – Hadoop Distributed File System Features of HDFS • It is suitable for the distributed storage and processing. • Hadoop provides a command interface to interact with HDFS. • The built-in servers of NameNode and DataNode help users to easily check the status of cluster. • Streaming access to file system data. • HDFS provides file permissions and authentication. 45
  • 46. HDFS – Hadoop Distributed File System HDFS Architecture 46
  • 47. HDFS – Hadoop Distributed File System HDFS • The storage can sometimes get very huge such that the disks are arranged in different racks and connected through switches. • If all replicas are stored in the same rack, and if the switch accessing that rack fails, all the replicas will be unavailable defying the purpose of having redundancy. • HDFS has a feature of rack awareness through which the NameNode knows which rack each data file is on. 47
  • 48. HDFS – Hadoop Distributed File System Rack awareness in HDFS 48
  • 49. HDFS – Hadoop Distributed File System HDFS • Hadoop also has an intelligent behavior in terms of self-healing because if one of the DataNode goes down, then the heartbeat(or status message) from that DataNode to the NameNode will cease. • After a few minutes, the NameNode will consider that DataNode to be dead and whatever tasks that were running on that DataNode will get respawned so that the replica count of 3 is achieved. 49
  • 50. HDFS – Hadoop Distributed File System HDFS – Preparing HDFS writes 50
  • 51. HDFS – Hadoop Distributed File System HDFS – Preparing HDFS writes 1. The client creates the file by calling create( ) on Distributed File System(DFS). 2. DFS makes an RPC call to the NameNode to create a new file in the file system's namespace, with no blocks associated with it 3. The DFS returns an FSDataOutputStream for the client to start writing data to FSDataOutputStream wraps a DFSOutputStream which handles communication with the DataNodes and NameNode. 4. The DataStreamer streams the packets to the first DataNode in the pipeline, which stores each packet and forwards it to the second DataNode in the pipeline. 5. When the client has finished writing data, it calls close( ) on the stream. 6. This action flushes all the remaining packets to the DataNode pipeline and waits for acknowledgments before contacting the NameNode to signal that the file is complete. 7. The NameNode already knows which blocks the file is made up of (because DataStreamer asks for block allocations), so it only has to wait for blocks to be minimally replicated before returning successfully. 51
  • 52. HDFS – Hadoop Distributed File System HDFS – Reading Data from HDFS 52
  • 53. HDFS – Hadoop Distributed File System HDFS – Reading Data from HDFS 1. The client opens the file it wishes to read by calling open ( ) on the File System object, which for HDFS is an instance of Distributed File System (DFS). 2. DFS calls the NameNode using RPCs, to determine the locations of the first few blocks in the file. 3. The DFS returns an FSDatalnputStream to the client for it to read data from. 4. FSDataInputStream in turn wraps a DFSInputStream, which manages the DataNode and NameNode I/O. 5. The client then calls read() on the stream. 6. During reading, if the OFSInputStream encounters an error while communicating with a DataNode, it will try the next closest one for that block. 7. If a corrupted block is found, the DFSInputStream attempts to read a replica of the block from another DataNode; it also reports the corrupted block to the NameNode. 53
  • 54. MapReduce • MapReduce is a programming model for processing large data sets with a parallel distributed algorithm on a cluster. • In traditional systems, data are brought from the datacenter in the main memory, where the application is running. • In MapReduce, the application is transferred to the location where data are stored and executed parallelly. • Thus, multiple instances of the MapReduce jobs exist at any given time, which works parallelly on the data stored in HDFS. 54
  • 55. MapReduce MapReduce Framework • It works on a divide and conquer policy. • The job is divided into multiple tasks known as Map and then the output is combined using a task known as Reducer. • The MapReduce program comprises of two components: Map and Reduce. • The Mapper part does the processing, while the Reducer aggregates the data. • There is a third component called shuffle and sort present between Map and Reduce. • The output of the Map is given to shuffle and sort which is then passed onto the Reducer. 55
  • 56. MapReduce MapReduce Framework • The shuffle and sort groups the output show that all the data belonging to the same group are given to a single machine. • There can be one or many instances of Reducer running for a given job. So, it is essential that the group of similar data is given to a single machine. 56
  • 57. MapReduce Reading the Data into the MapReduce Program • Map task reads the input from the cluster as a sequence of (key, value) pair. • The processing is done on the value and the output is also provided as a (key, value) pair. • These pairs from Map tasks are combined into groups and then sorted based on the key through Shuffle and Sort phase. • This intermediate output is given to the Reduce task combines the results and provides the final output, which is written onto HDFS. 57
  • 59. MapReduce MapReduce WorkFlow Worker Worker Worker Worker Worker read local write remote read, sort Output File 0 Output File 1 write Split 0 Split 1 Split 2 Input Data Output Data Map extract something you care about from each record Reduce aggregate, summarize, filter, or transform 59
  • 63. Hive • Hive started at Facebook. • Hive is a data warehouse infrastructure tool to process structured data in Hadoop. • Hive resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. • Using Hive, one can create tables, create database, read the data and create partition so that the data set can be restructured for processing. • Hive has a lot of schema flexibility such that the tables can be altered, columns can be moved, or the whole data set can be reloaded. • It also has JDBC-ODBC connection so that it can be used with tools like Tableau visualization. 63
  • 64. Hive • Limitations of Hive: – It is not a relational database. – It is not designed for OnLine Transaction Processing (OLTP). – It is not a language for real-time queries and row-level updates. • Features of Hive: – It stores schema in a database and processed data into HDFS. – It is designed for OLAP. – It provides SQL type language for querying called HiveQL or HQL. – It is familiar, fast, scalable, and extensible. 64
  • 65. Hive • Metastore is the information stored when you create table, database or a view. • On top of Metastore lies thriftAPI that enables browsing and querying using JDBC-ODBC. • The table definition, column definition, view definition will be stored in Metastore. • For Hive, the default data store is derby. 65
  • 67. Hive Hive Architecture • Hive shell: Interact through create table, submit query • Metastore: Table definition, view definition, database definition • Execution Engine: For execution • Compiler: For optimization • Driver: Take the code and convert it into Hadoop understandable terms for execution. 67
  • 68. Hive Create Database Statement – CREATE DATABASE [IF NOT EXISTS] <database name> ; Drop Database Statement – DROP DATABASE IF EXISTS <database name>; Create Table Statement – CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [ROW FORMAT row_format] [STORED AS file_format] 68
  • 69. NoSQL • An approach to data management and database design that is useful for very large sets of distributed data. • NoSQL was designed to access and analyze massive amounts of unstructured data or data stored remotely on multiple virtual servers in the cloud. • Types of NoSQL databases – Graph database – Key-value database – Column stores (also known as wide-column stores) – Document database 69
  • 70. NoSQL • Graph database – It is based on graph theory and used for representing networks from the network of people in a social context to network of cities in geological mapping. – These databases are designed for data whose relations are well represented as a graph and has elements which are interconnected, with an undetermined number of relations between them. – Ex: Neo4j, Giraph 70
  • 71. NoSQL • Key-value store – They are the simplest databases and use a key to access a value. – These types of databases are designed for storing data in a scheme-free way. – In a key-value store, all of the data within consists of an indexed key and a value, hence the name. – Ex: Cassandra, DyanmoDB 71
  • 72. NoSQL • Column stores – These data stores are designed for storing data tables as sections of columns of data, rather than as rows of data. – Wide-column stores offer high performance and a highly scalable architecture. – Ex: Hbase, BigTable 72
  • 73. NoSQL • Document database – These databases expand the idea of key-value stores where “documents” contain more complex data. – They contain data and each document is assigned a unique key, which is used to retrieve the document. – These are designed for storing, retrieving and managing document- oriented information, also known as semi-structured data. – Tree or hierarchical data structures can be directly stored in these databases. – Ex: MongoDB, CouchDB 73