Topics Covered
▪ Whatis Data?
▪ What are Databases?
▪ Need of Databases
▪ Types of Databases
▪ How Data is Stored in Relational Databases?
▪ What is SQL?
▪ Why SQL?
▪ What is Data Science?
▪ Why is SQL required for Data Science?
What is Data?
▪Derived from Latin word “datum” meaning single fact, entity
or point of matter
▪ Data is collection of facts, information, or knowledge
▪ Examples
– Twitter tweets
– Company’s financial report
– Newsletter
– Sales data, Employee data, etc.
5.
What are Databases?
Adatabase is an
organized collection
of structured
information, or data,
typically stored
electronically in a
computer system
[Source - Wikipedia]
Source – Pixabay.com
6.
Features of aDatabase
▪ Data are raw facts that constitute building blocks of
information.
▪ Database is a collection of information and a means to
manipulate data. It allows,
– Easy
– Fast access
– Facilitate the processing of data.
7.
Need of Databases
▪Data is easier to store
▪ Data is easier to manage
▪ Need multiple views of data
▪ Improved sharing of data
▪ High security of data
▪ Enforce quality data
▪ Better integrated data
8.
Types of Databases
1960s
TraditionalFiles
1970s
Hierarchical
Network based
1980s
Relational
1990s
1. Object-
Oriented
2. Object-
Relational
2000s
1.Datawarehouse
2. Distributed
Database
3. Big Data
Data is
Flat files
Files stored in
parent/child
manner
Data stored
in tabular
manner
Data was
created as
objects
Large data
stored
across
networks
9.
Types of databases
▪Relational Databases
▪ Hierarchical Databases
▪ Graph based Databases
▪ Operational Databases
▪ Distributed Databases
▪ Data Warehouse
10.
Database Management System(DBMS)
As per technopedia,
A database management system (DBMS) is a software package
designed to define, manipulate, retrieve and manage data in a
database.
Relational Database Management
System(RDBMS)
▪ A relational database is a digital database based on
the relational model of data, as proposed by E. F. Codd in 1970.
▪ A software system used to maintain relational databases is
a relational database management system (RDBMS).
▪ A RDBMS consists of fields, tables and records.
▪ Structured Query Language (SQL) is used for querying and
maintaining the database.
▪ Some popular RDBMS systems are MySQL, MS SQL Server, IBM
DB2, Oracle, PostgreSQL, Microsoft Access, etc.
Basic Elements ofRDBMS
▪ Tables – collection of rows
and columns e.g. Customer
table as shown
▪ Records or Tuple – represents
each row of the table
▪ Fields or Column name or
Attribute – are the columns of
the table
▪ Keys – establishes
relationship between tables
e.g. Cust Number
What is SQL?
▪SQL stands for Structured Query language, pronounced as "S-Q-L" or as "See-
Quel".
▪ Initially developed @ IBM by Donald Chamberlin and Raymond Boyce (early
1970s)
▪ ANSI standard in 1986 and ISO standard in 1987
▪ SQL skills are in popular demand in the industry
▪ SQL is the standard language for Relational Databases.
▪ SQL is used to create, insert, search, update and delete database records.
▪ SQL can do other operations including optimizing and maintenance of
databases.
▪ Relational databases like MySQL, Oracle, MS SQL Server, Sybase, etc. use SQL.
18.
Why SQL?
▪ Popularas it is easy to understand
▪ It is a declarative language
▪ Read and written similar to English language
▪ Directly accesses the stored data, hence it is very fast
What is DataScience?
Book on Prediction of Cancer Patient Outcomes Based on Artificial Intelligence - By Suk Lee, Eunbin Ju, etc.
21.
How SQL fitsinto Data Science?
▪ Since data is at the core of
Data Science, there is a
frequent need to store and
access data
▪ Data could be very large --
of the order of millions and
billions data points
▪ Hence SQL is needed
SQL
Key Takeaways
▪ Datais a everywhere, hence databases are used to store and
manage data
▪ Databases can be relational or non-relational
▪ The basic elements of a RDBMS are tables, rows, columns,
keys
▪ SQL is a structured query language and a vital tool for
accessing and manipulating data in the entire Data Science
life cycle
#20 Points to cover on these slides –
Data Science
Big Data – stored in various databases, formats, etc
Data processing – where SQL can be used
Analysis – where SQL can be used
Machine Learning – the extracted data (using SQL, etc.) is used to build models, and models are used for prediction, etc.
#22 Points to explain
Data stored in various data sources are used by different groups like purchase, sales, etc. to perform various tasks like analysis, business intelligence, analytics, etc.