The document discusses database modeling, management, and development. It covers database design and modeling including conceptual, logical, and physical database design. It also discusses entity-relationship modeling including entities, attributes, relationships, keys, and constraints. Additionally, it covers Java database connectivity (JDBC) including the different types of JDBC drivers and how to access a database using JDBC.
UNIT II MODELING AND VISUALIZATION
Visualizing Online Social Networks - A Taxonomy of Visualizations - Graph Representation -
Centrality- Clustering - Node-Edge Diagrams - Visualizing Social Networks with Matrix-Based
Representations- Node-Link Diagrams - Hybrid Representations - Modelling and aggregating
social network data – Random Walks and their Applications –Use of Hadoop and Map Reduce -
Ontological representation of social individuals and relationships.
UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product review mining – Review Classification – Tracking sentiments towards topics over time
UNIT II MODELING AND VISUALIZATION
Visualizing Online Social Networks - A Taxonomy of Visualizations - Graph Representation -
Centrality- Clustering - Node-Edge Diagrams - Visualizing Social Networks with Matrix-Based
Representations- Node-Link Diagrams - Hybrid Representations - Modelling and aggregating
social network data – Random Walks and their Applications –Use of Hadoop and Map Reduce -
Ontological representation of social individuals and relationships.
UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product review mining – Review Classification – Tracking sentiments towards topics over time
UNIT III MINING COMMUNITIES
Aggregating and reasoning with social network data, Advanced Representations - Extracting
evolution of Web Community from a Series of Web Archive - Detecting Communities in Social
Networks - Evaluating Communities – Core Methods for Community Detection & Mining Applications of Community Mining Algorithms - Node Classification in Social Networks.
Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities. SNA provides both a visual and a mathematical analysis of human relationships.
A presentation describing application of Node XL into analyzing social networks.
Made as part of project work for ITB course at VGSOM IIT Kharagpur.
By : Mayank Mohan
Anuradha Chakraborty
( Batch of 2012)
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
Subscriber Churn Prediction Model using Social Network Analysis In Telecommunication Industry โดย เชษฐพงศ์ ปัญญาชนกุล อาจารย์ ดร. อานนท์ ศักดิ์วรวิชญ์
ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Everything is connected: people, information, events and places. A practical way of making sense of the tangle of connections is to analyze them as networks. The objective of this workshop is to introduce the essential concepts of Social Network Analysis (SNA). It also seeks to show how SNA may help organizations unlock and mobilize these informal networks in order to achieve sustainable strategic goals. After discussing the essential concepts in theory of SNA, the computational tools for modeling and analysis of social networks will also be introduced in this presentation.
Wholi: The right people find each other (at the right time)
Two key elements in this talk:
•PART 1: Machine learning for entity extraction
Natural language processing (NLP), information extraction
•PART 2: Matching profiles using deep learning classifier
Deep learning, word embeddings
LAK13 Tutorial Social Network Analysis 4 Learning Analyticsgoehnert
Slides of the tutorial "Computational Methods and Tools for Social Network Analysis Networked Learning Communities" at the LAK 2013 in Leuven.
Tutorial Homepage:
http://snatutoriallak2013.ku.de/index.php/SNA_tutorial_at_LAK_2013
Conference Homepage:
http://lakconference2013.wordpress.com/
Graphs have become the dominant life-form of many tasks as they advance a
structure to represent many tasks and the corresponding relations. A powerful
role of networks/graphs is to bridge local feats that exist in vertices as they
blossom into patterns that help explain how nodal relations and their edges
impacts a complex effect that ripple via a graph. User cluster are formed as a
result of interactions between entities. Many users can hardly categorize their
contact into groups today such as “family”, “friends”, “colleagues” etc. Thus,
the need to analyze such user social graph via implicit clusters, enables the
dynamism in contact management. Study seeks to implement this dynamism
via a comparative study of deep neural network and friend suggest algorithm.
We analyze a user’s implicit social graph and seek to automatically create
custom contact groups using metrics that classify such contacts based on a
user’s affinity to contacts. Experimental results demonstrate the importance
of both the implicit group relationships and the interaction-based affinity in
suggesting friends.
Tutorial at OAI5 (cern.ch/oai5). Abstract: This tutorial will provide a practical overview of current practices in modelling complex or compound digital objects. It will examine some of the key scenarios around creating complex objects and will explore a number of approaches to packaging and transport. Taking research papers, or scholarly works, as an example, the tutorial will explore the different ways in which these, and their descriptive metadata, can be treated as complex objects. Relevant application profiles and metadata formats will be introduced and compared, such as Dublin Core, in particular the DCMI Abstract Model, and MODS, alongside content packaging standards, such as METS MPEG 21 DIDL and IMS CP. Finally, we will consider some future issues and activities that are seeking to address these. The tutorial will be of interest to librarians and technical staff with an interest in metadata or complex objects, their creation, management and re-use.
UNIT III MINING COMMUNITIES
Aggregating and reasoning with social network data, Advanced Representations - Extracting
evolution of Web Community from a Series of Web Archive - Detecting Communities in Social
Networks - Evaluating Communities – Core Methods for Community Detection & Mining Applications of Community Mining Algorithms - Node Classification in Social Networks.
Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities. SNA provides both a visual and a mathematical analysis of human relationships.
A presentation describing application of Node XL into analyzing social networks.
Made as part of project work for ITB course at VGSOM IIT Kharagpur.
By : Mayank Mohan
Anuradha Chakraborty
( Batch of 2012)
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
Subscriber Churn Prediction Model using Social Network Analysis In Telecommunication Industry โดย เชษฐพงศ์ ปัญญาชนกุล อาจารย์ ดร. อานนท์ ศักดิ์วรวิชญ์
ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Everything is connected: people, information, events and places. A practical way of making sense of the tangle of connections is to analyze them as networks. The objective of this workshop is to introduce the essential concepts of Social Network Analysis (SNA). It also seeks to show how SNA may help organizations unlock and mobilize these informal networks in order to achieve sustainable strategic goals. After discussing the essential concepts in theory of SNA, the computational tools for modeling and analysis of social networks will also be introduced in this presentation.
Wholi: The right people find each other (at the right time)
Two key elements in this talk:
•PART 1: Machine learning for entity extraction
Natural language processing (NLP), information extraction
•PART 2: Matching profiles using deep learning classifier
Deep learning, word embeddings
LAK13 Tutorial Social Network Analysis 4 Learning Analyticsgoehnert
Slides of the tutorial "Computational Methods and Tools for Social Network Analysis Networked Learning Communities" at the LAK 2013 in Leuven.
Tutorial Homepage:
http://snatutoriallak2013.ku.de/index.php/SNA_tutorial_at_LAK_2013
Conference Homepage:
http://lakconference2013.wordpress.com/
Graphs have become the dominant life-form of many tasks as they advance a
structure to represent many tasks and the corresponding relations. A powerful
role of networks/graphs is to bridge local feats that exist in vertices as they
blossom into patterns that help explain how nodal relations and their edges
impacts a complex effect that ripple via a graph. User cluster are formed as a
result of interactions between entities. Many users can hardly categorize their
contact into groups today such as “family”, “friends”, “colleagues” etc. Thus,
the need to analyze such user social graph via implicit clusters, enables the
dynamism in contact management. Study seeks to implement this dynamism
via a comparative study of deep neural network and friend suggest algorithm.
We analyze a user’s implicit social graph and seek to automatically create
custom contact groups using metrics that classify such contacts based on a
user’s affinity to contacts. Experimental results demonstrate the importance
of both the implicit group relationships and the interaction-based affinity in
suggesting friends.
Tutorial at OAI5 (cern.ch/oai5). Abstract: This tutorial will provide a practical overview of current practices in modelling complex or compound digital objects. It will examine some of the key scenarios around creating complex objects and will explore a number of approaches to packaging and transport. Taking research papers, or scholarly works, as an example, the tutorial will explore the different ways in which these, and their descriptive metadata, can be treated as complex objects. Relevant application profiles and metadata formats will be introduced and compared, such as Dublin Core, in particular the DCMI Abstract Model, and MODS, alongside content packaging standards, such as METS MPEG 21 DIDL and IMS CP. Finally, we will consider some future issues and activities that are seeking to address these. The tutorial will be of interest to librarians and technical staff with an interest in metadata or complex objects, their creation, management and re-use.
ESOFT Metro Campus - Diploma in Software Engineering - (Module IV) Database Concepts
(Template - Virtusa Corporate)
Contents:
Introduction to Databases
Data
Information
Database
Database System
Database Applications
Evolution of Databases
Traditional Files Based Systems
Limitations in Traditional Files
The Database Approach
Advantages of Database Approach
Disadvantages of Database Approach
Database Management Systems
DBMS Functions
Database Architecture
ANSI-SPARC 3 Level Architecture
The Relational Data Model
What is a Relation?
Primary Key
Cardinality and Degree
Relationships
Foreign Key
Data Integrity
Data Dictionary
Database Design
Requirements Collection and analysis
Conceptual Design
Logical Design
Physical Design
Entity Relationship Model
A mini-world example
Entities
Relationships
ERD Notations
Cardinality
Optional Participation
Entities and Relationships
Attributes
Entity Relationship Diagram
Entities
ERD Showing Weak Entities
Super Type / Sub Type Relationships
Mapping ERD to Relational
Map Regular Entities
Map Weak Entities
Map Binary Relationships
Map Associated Entities
Map Unary Relationships
Map Ternary Relationships
Map Supertype/Subtype Relationships
Normalization
Advantages of Normalization
Disadvantages of Normalization
Normal Forms
Functional Dependency
Purchase Order Relation in 0NF
Purchase Order Relation in 1NF
Purchase Order Relations in 2NF
Purchase Order Relations in 3NF
Normalized Relations
BCNF – Boyce Codd Normal Form
Structured Query Language
What We Can Do with SQL ?
SQL Commands
SQL CREATE DATABASE
SQL CREATE TABLE
SQL DROP
SQL Constraints
SQL NOT NULL
SQL PRIMARY KEY
SQL CHECK
SQL FOREIGN KEY
SQL ALTER TABLE
SQL INSERT INTO
SQL INSERT INTO SELECT
SQL SELECT
SQL SELECT DISTINCT
SQL WHERE
SQL AND & OR
SQL ORDER BY
SQL UPDATE
SQL DELETE
SQL LIKE
SQL IN
SQL BETWEEN
SQL INNER JOIN
SQL LEFT JOIN
SQL RIGHT JOIN
SQL UNION
SQL AS
SQL Aggregate Functions
SQL Scalar functions
SQL GROUP BY
SQL HAVING
Database Administration
SQL Database Administration
An Introduction To Software Development - Architecture & Detailed DesignBlue Elephant Consulting
This presentation is a part of the COP2271C college level course taught at the Florida Polytechnic University located in Lakeland Florida. The purpose of this course is to introduce Freshmen students to both the process of software development and to the Python language.
The course is one semester in length and meets for 2 hours twice a week. The Instructor is Dr. Jim Anderson.
A video of Dr. Anderson using these slides is available on YouTube at:
https://youtu.be/PXYATve92zU
IT2255 Web Essentials - Unit II Web Designingpkaviya
HTML - Form Elements - Input types and Media elements - HTML 5 - CSS3 - Selectors, Box Model, Backgrounds and Borders, Text Effects, Animations, Multiple Column Layout, User Interface.
IT2255 Web Essentials - Unit I Website Basicspkaviya
Internet Overview – Fundamental computer network concepts – Web Protocols – URL – Domain Name – Web Browsers and Web Servers – Working principle of a Website – Creating a Website – Client-side and server-side scripting.
BT2252 - ETBT - UNIT 3 - Enzyme Immobilization.pdfpkaviya
Enzymes are catalysts that perform all vital biological reactions within an organism’s body. Their distinguishing characteristic is that they endure the reaction unchanged.
Therefore, they can be utilised repeatedly. However, soluble enzymes are limited by their separation from the product and substrate.
The majority of Enzymes in a living organism are either connected to the cell membrane or encapsulated within the cells.
This result led to the hypothesis that pure separated enzymes may work better when immobilised on a solid substrate.
The phrase immobilised enzyme refers to “catalytically active enzymes that are physically limited or localised in a specific region of space and can be used again and continuously.”
The benefit of immobilisation is that it promotes work-up product isolation. Listed below are some potential advantages and disadvantages of immobility.
Soluble Enzyme + Substrate———– Product (single time usage of enzyme)
Immobilized Enzyme + Substrate———Product (Repeated usage of enzyme)
A number of essential considerations must be made when immobilising an enzyme.
The enzyme’s biological activity should be maintained.
The enzyme ought to be more stable than its soluble equivalent.
The price of immobilisation shouldn’t be excessively high.
The relationship between humans and enzymes has evolved over time. Even during historical times, where there was no concept of enzymes, ancient Egypt people produced beer and wine by enzymatic fermentation. After several thousand years, enzymatic studies have significantly progressed. Enzymes are proteins that accelerate many biochemical and chemical reactions. They are natural catalysts and are ubiquitous, in plants, animals, and microorganisms, where they catalyze processes that are vital to living organisms. The growing knowledge and technique improvement about protein extraction and purification lead to the production of many enzymes at an analytical grade purity for research and biotechnological applications. Enzymes are intimately involved in a wide variety of traditional food processes, such as cheese making, beer brewing, and wine industry. Recent advances in biotechnology, particularly in protein engineering, have provided the basis for the efficient development of enzymes with improved properties. This has led to establishment of novel, tailor-made enzymes for completely new applications, where enzymes were not previously used. The technology of immobilized enzymes is still going through a phase of evolution and maturation. Evolution is reflected in the ever-broadening range of applications of immobilized enzymes. Maturation is mirrored in the development of the theory of how immobilized enzymes function and how the technique of immobilization is related to their primary structure through the formation and configuration of their three dimensional structure. There still remains much room for the development of useful processes and materials based on this hard-won understanding.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
1. IT6701 – Information Management
Unit I – Database Modelling,
Management and Development
By
Kaviya.P, AP/IT
Kamaraj College of Engineering & Technology
1
2. Unit I – Database Modelling,
Management and Development
Database design and modelling - Business Rules and
Relationship; Java database Connectivity (JDBC),
Database connection Manager, Stored Procedures.
Trends in Big Data systems including NoSQL -
Hadoop HDFS, MapReduce, Hive, and
enhancements.
2
3. Database Design and Modelling
• Database Design
– Process of producing and representing a database in particular model.
– Process of defining the structure of a database.
– Data modelling is the first step in in database design.
• Levels of Abstraction
– Conceptual database design
– Logical database design
– Physical database design
3
4. Database Design and Modelling
• Conceptual database design - (What is represented in the database?)
– An abstract model is created from business rules and user requirements.
– Entity-Relationship (ER) Model is used to represent the conceptual design.
• Entity – Real things in the world
• Relationships – Reflects interactions between entities
• Attributes – Properties of entities and relationships
• Logical database design - (Logical representation and Relational model)
– ER Model is converted into a relational model through logical database
design.
– The data are arranged into the logical structures and mapped into DBMS
tables with accompanying constraints.
4
5. Database Design and Modelling
• Physical database design
– Actual physical implementation
of the database in a Database
Management Systems.
– It includes the description of data
features, data types, indexing, etc.
– How the information is
represented in the database or on
how data structures are
implemented to represent what is
modelled.
5
6. Database Design and Modelling
Database Modelling – ER Model
– Conceptual designing tool which describes data as
entities, relationships, and attributes.
– Diagrammatic representation of the model.
– Entity: Real world thing. (Eg: person, student, car)
– Entity Set: Collection of entities of similar type.
(Eg: Total No. of Students enrolled in a course)
– Attributes: Properties that describe the entity.
6
7. Database Design and Modelling
Database Modelling – ER Model
Types of Attributes:
– Composite Attribute: Combination of
multiple attribute. (Eg: Address includes
street, city, zip_code).
– Simple Attribute: One which cannot be
decomposed into smaller units. (Eg:
Age)
– Single Valued Attributes: Can hold a
single value. (Eg: Rank)
– Multi valued Attributes: Can store
multiple values. (Eg: Mobile No.)
7
8. Database Design and Modelling
Database Modelling – ER Model
Types of Attributes:
– Stored Attributes: Attributes
whose values are stored in the
database. (Eg: DoB)
– Derived Attributes: Attributes
whose values are calculated from
one or more attributes in the
database. (Eg: Age can be
calculated from DoB)
8
9. Database Design and Modelling
Database Modelling – ER Model
Types of Attributes:
– Null values: If the value for certain instances of an entity does not
exist or is not available.
– Complex Attributes: These attributes are those in which the
relationship can exist between two are more entities if they are
associated with each other or one entity refers to one or more
entities.
9
10. Database Design and Modelling
Database Modelling – ER Model
Relationship:
– Whenever an attribute of one entity
type refers to another entity type, a
relationship exists.
– Degree of relationship:
• Binary: A relationship of degree
two.
• Ternary: A relationship of degree
three.
• n-ary: n-entities participating in n.
10
11. Database Design and Modelling
Database Modelling – ER Model
Constraints
• Cardinality ratio – Maximum
number of relationship instances
that an entity can participate.
– 1:1 relationship
– 1:N relationship
– N:1 relationship
– M:N relationship
11
12. Database Design and Modelling
Database Modelling – ER Model
Constraints
• Participation constraint – It specifies whether the existence of an entity
depends on it being related to another entity.
– Total/Mandatory participation: If the existence of an entity is determined
through its participation in a relationship. (Eg: a student must enroll in a
course)
– Partial/optional participation: If only a part of the set of entities participate
in a relationship. (Eg: Every teaching_staff will not be an HoD of a
department)
12
13. Database Design and Modelling
Database Modelling – ER Model - Keys
• Keys: Allow us to identify a particular entity.
• Super key: A super key is a set of one or more
attributes (columns), which can uniquely identify a row
in a table.
• Candidate key: The minimal set of attribute
which can uniquely identify a tuple is known as
candidate key. (Minimal subset of super key)
• Primary key: An attribute which allows us to
uniquely identify a particular instance in the
database.
• Foreign key(Referential Integrity): If multiple
references exist then updation or modification of
any one should be reflected in all other places.
13
14. Database Design and Modelling
Database Modelling – ER Model
One – to – One
Cardinality
14
15. One – to – Many
Cardinality
Many – to – One
Cardinality
Many – to – Many
Cardinality
Participation
Database Design and Modelling
Database Modelling – ER Model
15
16. Database Design and Modelling
Database Modelling – Extended ER Model
• Specialisation: The result of taking a subset of a higher level entity set to
form a low level entity set. (Eg: Person -> Customer, Employee)
• Generalisation: The result of taking the union of two or more disjoint
entity sets to produce a higher level entity set. (Eg: Customer, Employee
-> Person)
• Aggregation: An abstraction in which relationship sets are treated as
higher level entity sets and can participate in the relationships.
16
17. Database Design and Modelling
Database Modelling – Case Study: Hospital Management System
17
18. Business Rules
• Database design is an important phase in the system development life cycle.
• The inputs to design phase will be the business rules and functions identified
in the requirement gathering phase.
• Business rules are used to describe various aspects of the business domain.
(Eg: Students need to be enrolled in the course before appearing for his/her
examination)
• The following are the business rules:
– The explanation of a concept relevant to the application. (Course is evaluated
through theory + practical examination)
– An integrity constraint on the data of the application. (Minimum mark to pass a
course is 50%)
– A derivation rule, whereby information can be derived form other information.
(Grade of the student is assigned based on the marks obtained)
18
19. Business Rules
Identifying Business Rules
• Business rules allow the database designer to develop relationship
rules and constraints and help in the creation of a correct data
model.
• It is good communication tool between users and designers.
• It gives the proper classification of entities, attributes, relationships,
and constraints.
• The noun in a business rule will be transformed into an entity in
the model and a verb (active or passive) will be interpreted as a
relationship among entities.
19
20. Java Database Connectivity (JDBC)
• The JDBC API (Application Programming Interface) provides a
way for creating database connections from Java programmes.
• It provides methods to execute SQL statements and process the
results obtained from those statements.
• Types of JDBC drivers
– Type 1 – JDBC ODBC Bridge Driver
– Type 2 – Java Native Driver
– Type 3 – Java Network Protocol Driver
– Type 4 – Pure Java Driver
20
21. Java Database Connectivity (JDBC)
Type 1 – JDBC ODBC Bridge Driver
• It provides a bridge to access the ODBC drivers installed on each client machine.
• This bridge translates the standard JDBC calls to corresponding ODBC calls and
send them to the ODBC data source via ODBC libraries.
• This driver requires that native ODBC libraries, divers and their required support
files be installed and configured on each client machine.
• They are the slowest of all types due to multiple levels of translation.
21
22. Java Database Connectivity (JDBC)
Type 2 – Java Native Driver
• It mainly uses the Java Native Interface (JNI) to translate calls to the local database API.
• The JDBC calls are translated into vendor-specific API calls which act as a façade for
forwarding requests between application and database.
• Type 2 drivers are usually faster than Type 1.
• Similar to Type 1 drivers, these drivers also require native libraries to be installed and
configured on each client machine.
22
23. Java Database Connectivity (JDBC)
Type 3 – Java Network Protocol Driver
• It use an intermediate driver listener that acts as a gateway for multiple database servers.
• The Java client sends JDBC request to the listener which is turn connect to database
server using another driver.
• It do not require any installation on the client side, which is why it is preferred over the
first two types of driver.
23
24. Java Database Connectivity (JDBC)
Type 4 – Pure Java Driver
• It is most commonly used JDBC driver in most enterprise application because they
convert JDBC API calls to direct network calls using vendor-specific implementation
details.
• Type 4 divers offer better performance compared to the other types and it also does not
require any installation or configuration to be done on client machine.
24
26. Java Database Connectivity (JDBC)
Accessing Database using JDBC
• Steps:
– Executing queries
Eg: Statement st = con.createStatement();
int m = st.executeUpdate(sql);
Interfaces Recommended Use
Statement Use for general-purpose access to your database. Useful when you are using static
SQL statements at runtime. The Statement interface cannot accept parameters.
PreparedStatement Use when you plan to use the SQL statements many times. The PreparedStatement
interface accepts input parameters at runtime.
CallableStatement Use the when you want to access the database stored procedures. The
CallableStatement interface can also accept runtime input parameters.
boolean execute (String SQL)
int executeUpdate (String SQL)
ResultSet executeQuery (String SQL)
26
27. Java Database Connectivity (JDBC)
Accessing Database using JDBC
• Steps:
– Processing the results (handling SQL exceptions)
• ResultSet objects from Statement and PreparedStatement class, which contains the
query output which has to be processed.
• The output value from CallableStatement using OUT parameter, this could either be a
single value or a ResultSet.
• SQL Exception exception has to be caught and gracefully transmitted to the calling
programme.
– Closing the database connection
• By closing connection, objects of Statement and ResultSet will be closed
automatically.
• The close() method of Connection interface is used to close the connection.
Eg: con.close(); 27
28. Stored Procedure
• A stored procedure is a prepared SQL code that you can save and reuse over and over
again.
• A set of SQL statements, written together to form a logical unit, for performing a
specific task.
• It is a subroutine used by the applications to access relational databases and are stored in
the database data dictionary.
• It can be compiled and executed with different parameters and results and they can have
any combination of input, output, input/output parameters.
• Advantages of Stored Procedure:
– Stored procedures are fast.
– Stored procedures are portable.
– Stored procedures are always available as 'source code' in the database itself.
– Stored procedures are migratory. 28
29. Stored Procedure – PL/SQL
Example
Function
CREATE [OR REPLACE] FUNCTION function_name
[(parameter_name [IN | OUT | IN OUT] type [, ...])]
RETURN return_datatype
{IS | AS}
BEGIN
< function_body >
END [function_name];
Procedure
CREATE [OR REPLACE] PROCEDURE procedure_name
[(parameter_name [IN | OUT | IN OUT] type [, ...])]
{IS | AS}
BEGIN
< procedure_body >
END procedure_name;
29
30. Trends in Big Data systems
• Big data is the term used for a collection of data sets so large and complex that it
becomes difficult to process it using on-hand database management tools or
traditional data processing applications.
• Need for Big Data
– A huge amount of data needs to be analyzed for the betterment of the
organization and to improve customer experience.
– The current systems (single server) cannot handle such huge amount of
data.
– Hence, either the capacity of the single machine needs to be increased or a
cluster of machines can be used to act like a single system, which works in a
distributed manner.
– Such a solution is provided through Hadoop.
30
31. Trends in Big Data systems
Characteristics of Big Data
• Big data can be characterized by specifying three V’s: Volume, Variety, Velocity
• Volume: Specifies that amount of data handled by the application. (Eg: Twitter)
• Velocity: Addresses the rate at which the data flows into the system.
• Variety: Describes the different types of data generated from unstructured text to
structured records, from images to sound and video, from sensor data to
geographic locations, etc., all specifying information needed for processing.
• Fourth V, value: which refers to the return of investment (ROI) of the data and its
processing.
31
32. Hadoop
• Hadoop is an open-source framework that allows to store and process big
data in a distributed environment across clusters of computers using simple
programming models.
32
36. Hadoop
• Hadoop follows a master-slave architecture for the creation of a cluster.
• It consists of two parts: Storage unit & Processing unit.
• Storage is provided through a Hadoop Distributed File System (HDFS).
• Processing is done through MapReduce.
Storage - HDFS
• HDFS is spread across machines and acts as a single file system.
• The master node has information about the location of the data. The data
are stored in the slave node.
• HDFS runs daemons to handle data storage. They are NameNode,
DataNode, and Secondary NameNode.
36
37. Hadoop
• The cluster has a single NameNode running on the server and multiple
DataNodes running on the slaves.
• Every slave machine will run a DataNode daemon.
• NameNode acts as a single point of availability for the data. If it goes down
the DataNodes, it would be difficult to make sense of the blocks on them.
• Thus, the NameNode has to run on a dual or a triple redundant hardware
machine like RAID1+0 for storage.
• For faster access, the NameNode is stored in RAM. If the NameNode
crashes, all the data will be lost.
37
38. Hadoop
• To make this data persistent, secondary NameNode is used.
• NameNode contacts the secondary NameNode every hour and pushes the
metadata onto it, creating a checkpoint.
• NameNode can act as a single point of failure, and hence a backup is essential.
• From Hadoop 2.x onwards, a provision for passive or standby backup is
provided.
• This standby NameNode bachup will take control whenever the active
NameNode fails, thereby providing system availability.
• High data availability is achieved through data replication or duplication. The
default replication factor is 3, every file has three replicas.
38
39. Hadoop
Processing – MapReduce
• In Hadoop 1.x, the processing part was handled through MapReduce.
• The daemons running for MapReduce are JobTracker and TaskTracker.
• JobTracker: The master that manages the jobs submitted by the client and the
resources by it in the cluster.
• The JobTracker will split the job into various tasks that can run parallelly using the
TaskTracker.
• With hadoop 2.x, a few changes were made in the processing structure.
• Apache Hadoop 2.0 includes YARN, which separates processing components into
resource management and processing components.
• YARN daemons are ResourceManager, and NodeManager, help in processing the
data.
39
40. Hadoop
Characteristics of Hadoop
• Highly scalable: More machines can easily be added to the cluster as needed to
increase the capacity/power of the cluster.
• Commodity hardware-based: Desktops can be used to create a cluster. Specialized
hardware is not required. Therefore scalable and economical.
• Open source: You can look into the code and contribute back to the community.
• Reliable: If a machine crashes, the data are not lost.
40
41. Hadoop
Components of Hadoop
• Hadoop is a galaxy of tools.
• Every tool has a specific advantage or purpose.
• The collection of components is known as the Hadoop ecosystem.
• It inlcudes tools for data storage, data manipulation, integration with other systems,
machine learning, cluster management and development, etc.
• Components are:
– Hadoop Distributed File System (HDFS) – Flume, Sqoop
– YARN & HBase
– MapReduce
– Hive & Pig
– Oozie
– Zookeeper
41
43. Hadoop
Components of Hadoop
• Flume and Sqoop are used for data integrating. They are used to get the data from
external files into Hadoop. Flume is a service to move a large amount of data in real
time. Sqoop is the integration of SQL and Hadoop.
• YARN and MapReduce for data processing.
• YARN stands for Yet Another Resource Negotiator which is for resource management.
• HBase is the data storage or Hadoop database which provides interactive access to the
data stored in HDFS.
• Hive, Pig are used for data analysis. These high-level languages that allow to conctruct
queries, so that data processing can be performed. (Hive – Facebook, Pig – Yahoo)
• Oozie is a workflow scheduler, which is used to manage Hadoop jobs.
• Zookeeper provides operational services for a Hadoop cluster. It provides distributed
configuration services, synchronization services and naming registry. 43
44. HDFS – Hadoop Distributed
File System
HDFS
• HDFS is the file system required by Hadoop.
• It is an atypical file system, which does not format the hard drives in the
cluster.
• Instead it sits on top of the underlying operating system and its file system and
uses it to store and manage data.
• HDFS divides the file into a block either 64MB or 128MB. This block is then
replicated thrice or the number of times specified by the user.
• The NameNode maintains the split information and location details.
44
45. HDFS – Hadoop Distributed
File System
Features of HDFS
• It is suitable for the distributed storage and processing.
• Hadoop provides a command interface to interact with HDFS.
• The built-in servers of NameNode and DataNode help users to easily check the
status of cluster.
• Streaming access to file system data.
• HDFS provides file permissions and authentication.
45
46. HDFS – Hadoop Distributed
File System
HDFS Architecture
46
47. HDFS – Hadoop Distributed
File System
HDFS
• The storage can sometimes get very huge such that the disks are arranged in
different racks and connected through switches.
• If all replicas are stored in the same rack, and if the switch accessing that rack
fails, all the replicas will be unavailable defying the purpose of having
redundancy.
• HDFS has a feature of rack awareness through which the NameNode knows
which rack each data file is on.
47
48. HDFS – Hadoop Distributed
File System
Rack awareness in HDFS
48
49. HDFS – Hadoop Distributed
File System
HDFS
• Hadoop also has an intelligent behavior in terms of self-healing because if one
of the DataNode goes down, then the heartbeat(or status message) from that
DataNode to the NameNode will cease.
• After a few minutes, the NameNode will consider that DataNode to be dead
and whatever tasks that were running on that DataNode will get respawned so
that the replica count of 3 is achieved.
49
51. HDFS – Hadoop Distributed File System
HDFS – Preparing HDFS writes
1. The client creates the file by calling create( ) on Distributed File System(DFS).
2. DFS makes an RPC call to the NameNode to create a new file in the file system's namespace,
with no blocks associated with it
3. The DFS returns an FSDataOutputStream for the client to start writing data to
FSDataOutputStream wraps a DFSOutputStream which handles communication with the
DataNodes and NameNode.
4. The DataStreamer streams the packets to the first DataNode in the pipeline, which stores each
packet and forwards it to the second DataNode in the pipeline.
5. When the client has finished writing data, it calls close( ) on the stream.
6. This action flushes all the remaining packets to the DataNode pipeline and waits for
acknowledgments before contacting the NameNode to signal that the file is complete.
7. The NameNode already knows which blocks the file is made up of (because DataStreamer asks
for block allocations), so it only has to wait for blocks to be minimally replicated before
returning successfully. 51
52. HDFS – Hadoop Distributed
File System
HDFS – Reading Data from HDFS
52
53. HDFS – Hadoop Distributed File System
HDFS – Reading Data from HDFS
1. The client opens the file it wishes to read by calling open ( ) on the File System object,
which for HDFS is an instance of Distributed File System (DFS).
2. DFS calls the NameNode using RPCs, to determine the locations of the first few blocks in
the file.
3. The DFS returns an FSDatalnputStream to the client for it to read data from.
4. FSDataInputStream in turn wraps a DFSInputStream, which manages the DataNode and
NameNode I/O.
5. The client then calls read() on the stream.
6. During reading, if the OFSInputStream encounters an error while communicating with a
DataNode, it will try the next closest one for that block.
7. If a corrupted block is found, the DFSInputStream attempts to read a replica of the block
from another DataNode; it also reports the corrupted block to the NameNode.
53
54. MapReduce
• MapReduce is a programming model for processing large data sets with a
parallel distributed algorithm on a cluster.
• In traditional systems, data are brought from the datacenter in the main
memory, where the application is running.
• In MapReduce, the application is transferred to the location where data are
stored and executed parallelly.
• Thus, multiple instances of the MapReduce jobs exist at any given time, which
works parallelly on the data stored in HDFS.
54
55. MapReduce
MapReduce Framework
• It works on a divide and conquer policy.
• The job is divided into multiple tasks known as Map and then the output is
combined using a task known as Reducer.
• The MapReduce program comprises of two components: Map and Reduce.
• The Mapper part does the processing, while the Reducer aggregates the data.
• There is a third component called shuffle and sort present between Map and
Reduce.
• The output of the Map is given to shuffle and sort which is then passed onto
the Reducer.
55
56. MapReduce
MapReduce Framework
• The shuffle and sort groups the output show that all the data belonging to the
same group are given to a single machine.
• There can be one or many instances of Reducer running for a given job. So, it
is essential that the group of similar data is given to a single machine.
56
57. MapReduce
Reading the Data into the MapReduce Program
• Map task reads the input from the cluster as a sequence of (key, value) pair.
• The processing is done on the value and the output is also provided as a (key,
value) pair.
• These pairs from Map tasks are combined into groups and then sorted based
on the key through Shuffle and Sort phase.
• This intermediate output is given to the Reduce task combines the results and
provides the final output, which is written onto HDFS.
57
63. Hive
• Hive started at Facebook.
• Hive is a data warehouse infrastructure tool to process structured data in
Hadoop.
• Hive resides on top of Hadoop to summarize Big Data, and makes querying
and analyzing easy.
• Using Hive, one can create tables, create database, read the data and create
partition so that the data set can be restructured for processing.
• Hive has a lot of schema flexibility such that the tables can be altered,
columns can be moved, or the whole data set can be reloaded.
• It also has JDBC-ODBC connection so that it can be used with tools like
Tableau visualization. 63
64. Hive
• Limitations of Hive:
– It is not a relational database.
– It is not designed for OnLine Transaction Processing (OLTP).
– It is not a language for real-time queries and row-level updates.
• Features of Hive:
– It stores schema in a database and processed data into HDFS.
– It is designed for OLAP.
– It provides SQL type language for querying called HiveQL or HQL.
– It is familiar, fast, scalable, and extensible.
64
65. Hive
• Metastore is the information stored when you create table, database or a
view.
• On top of Metastore lies thriftAPI that enables browsing and querying using
JDBC-ODBC.
• The table definition, column definition, view definition will be stored in
Metastore.
• For Hive, the default data store is derby.
65
67. Hive
Hive Architecture
• Hive shell: Interact through create table, submit query
• Metastore: Table definition, view definition, database definition
• Execution Engine: For execution
• Compiler: For optimization
• Driver: Take the code and convert it into Hadoop understandable terms for
execution.
67
68. Hive
Create Database Statement
– CREATE DATABASE [IF NOT EXISTS] <database name> ;
Drop Database Statement
– DROP DATABASE IF EXISTS <database name>;
Create Table Statement
– CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]
table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
68
69. NoSQL
• An approach to data management and database design that is useful for
very large sets of distributed data.
• NoSQL was designed to access and analyze massive amounts of
unstructured data or data stored remotely on multiple virtual servers in the
cloud.
• Types of NoSQL databases
– Graph database
– Key-value database
– Column stores (also known as wide-column stores)
– Document database
69
70. NoSQL
• Graph database
– It is based on graph theory and used for representing networks from the
network of people in a social context to network of cities in geological
mapping.
– These databases are designed for data whose relations are well represented
as a graph and has elements which are interconnected, with an
undetermined number of relations between them.
– Ex: Neo4j, Giraph
70
71. NoSQL
• Key-value store
– They are the simplest databases and use a key to access a value.
– These types of databases are designed for storing data in a scheme-free
way.
– In a key-value store, all of the data within consists of an indexed key and
a value, hence the name.
– Ex: Cassandra, DyanmoDB
71
72. NoSQL
• Column stores
– These data stores are designed for storing data tables as sections of
columns of data, rather than as rows of data.
– Wide-column stores offer high performance and a highly scalable
architecture.
– Ex: Hbase, BigTable
72
73. NoSQL
• Document database
– These databases expand the idea of key-value stores where “documents”
contain more complex data.
– They contain data and each document is assigned a unique key, which is
used to retrieve the document.
– These are designed for storing, retrieving and managing document-
oriented information, also known as semi-structured data.
– Tree or hierarchical data structures can be directly stored in these
databases.
– Ex: MongoDB, CouchDB
73