Hive
Table of Contents
• Introduction to HIVE
What is HIVE?
• It is a framework for data warehousing on top of
Hadoop.
• Hive grew from a need to manage and learn from the huge
volumes of data that Facebook was producing every day
from its burgeoning social network.
• Hive was created to make it possible for analysts with strong
SQL skills to run queries on the huge volumes of data that
Facebook stored in HDFS.
What is HIVE?
• Hive is a data warehouse infrastructure tool to process structured
data in Hadoop.
• It resides on top of Hadoop to summarize Big Data, and makes
querying and analysing easy.
What is HIVE?
• A system for querying and managing structured data built on top of Hadoop
• Uses Map-Reduce for execution
• HDFS for storage – but any system that implements Hadoop FS API
• Key Building Principles:
• Structured data with rich data types (structs, lists and maps)
• Directly query data from different formats (text/binary) and file formats
(Flat/Sequence)
• SQL as a familiar programming tool and for standard analytics
• Allow embedded scripts for extensibility and for non standard applications
• Rich MetaData to allow data discovery and for optimization
What is HIVE?
• Data warehouse software facilitates reading, writing, and
managing large datasets residing in distributed storage using
SQL.
• Hive is not
• A relational database
• A design for OnLine Transaction Processing (OLTP)
• A language for real-time queries and row-level updates
Features of Hive
• Features of Hive are:
• Tools to enable easy access to data via SQL, thus enabling
data warehousing tasks such as extract/transform/load (ETL),
reporting, and data analysis.
• Apache Hive supports analysis of large datasets stored in
Hadoop's HDFS and compatible file systems such as Amazon
S3 filesystem.
• Access to files stored either directly in Apache HDFS or in
other data storage systems such as Apache HBase
Features of Hive
• Features of Hive are:
• It provides SQL like querying language called HiveQL or HQL.
• HiveQL are implicitly converted into MapReduce or Tez, or
Spark jobs.
• Using HiveQL doesn't require any knowledge of programming
language, Knowledge of basic SQL query if enough.
Features of Hive
• Features of Hive are:
• Built-in user defined functions (UDFs) to manipulate dates,
strings, and other data-mining tools.
• Hive supports extending the UDF set to handle use-cases not
supported by built-in functions.
• Hive's SQL can also be extended with user code via user
defined functions (UDFs), user defined aggregates (UDAFs),
and user defined table functions (UDTFs).
Features of Hive
• Features of Hive are:
• Hive support file formats which are textFile, SequenceFile,
ORC, RCFile, Avro Files, Parquet, LZO Compression etc.
• Operates on compressed data stored into the Hadoop
ecosystem using algorithms including DEFLATE, BWT, snappy,
etc.
Features of Hive
• Features of Hive are:
• Supports external tables which make it possible to
process data without actually storing in HDFS.
• It stores schema in a database and processed data
into HDFS.
• Metadata storage in an RDBMS, significantly
reducing the time to perform semantic checks during
query execution.
Features of Hive
• Features of Hive are:
• It is designed for OLAP.
• It is simple to use SQL, fast, scalable, and extensible.
Advantages
• Hive is designed to
• enable easy data summarization
• ad-hoc querying
• analysis of large volumes of data.
• Hive is built on hadoop, so supports and handles all
the capabilities of hadoop provides like reliability,
high performance , node failure.
Advantages
• HiveQL statements are automatically translated into
MapReduce jobs
• Database developer need not learn the java
programming for writing map reduce programs for
retrieving data from hadoop system
Advantages
• High level query language - Simplifies working with
large amounts of data
• Lower learning curve than Pig or MapReduce -
HiveQL is much closer to SQL than Pig.
• Less trial and error than Pig
Disadvantages
• Hive is not for OLAP processing.
• Not all ‘standard’ SQL is supported
• No support for INSERTing single rows
• Updating data is complicated
• Mainly because of using HDFS
• Can add records
• Can overwrite partitions
Disadvantages
• Relatively limited number of built-in functions
• Does not support TRANSACTION
• No real time access to data
• Use other means like Hbase or Impala
• High latency
Running Hive
• Hive can be executed from:
• Hive web interface
Running Hive
• Hive can be executed from:
• Hive shell
• $HIVE_HOME/bin/hive for interactive shell
• Or you can run queries directly:
• $HIVE_HOME/bin/hive -e ‘select a.col from tab1 a’
•
Running Hive
• Hive can be executed from:
• JDBC - Java Database Connectivity
• "jdbc:hive://host:port/dbname"
Running Hive
• Hive can be executed from:
• Also possible to use hive directly in Python, C, C++, PHP 6
Connection con =
DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", "");
Statement stmt = con.createStatement();
String tableName = "testHiveDriverTable";
stmt.executeQuery("drop table " + tableName);
ResultSet res = stmt.executeQuery("create table " + tableName + " (key int, value
string)");
// show tables
String sql = "show tables '" + tableName + "'";
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
if (res.next()) {
System.out.println(res.getString(1));
}
Pig Vs. Hive
• Hive is a good choice:
• if you are familiar with SQL
• when you want to query the data
• when you need an answer to a specific question
• Pig is a good choice:
• for ETL (Extract -­> Transform -­> Load)
‐ ‐
• preparing your data so that it is easier to analyse
• when you have a long series of steps to perform
• Many businesses use both Pig and Hive together
Pig Vs. Hive
Pig Vs. Hive
Hive Vs. RDBMS
• Differences between Hive vs RDBMS (traditional relation databases).
• Few examples of traditional relational databases are MySQL, PostgreSQL,
Oracle 11 g, MS SQL Server etc.
• Some of the key features of Hive that differ from RDBMS.
• Hive resembles a traditional database by supporting SQL interface but it is
not a full database. Hive can be better called as data warehouse instead of
database.
• Hive enforces schema on read time whereas RDBMS enforces schema on write
time.
Hive Vs. RDBMS
• Key features of Hive that differ from RDBMS.
• In RDBMS, a table's schema is enforced at data load time, If the
data being loaded doesn't conform to the schema, then it is
rejected.
• This design is called schema on write.
• But Hive doesn't verify the data when it is loaded, but rather when
it is retrieved.
• This is called schema on read.
Hive Vs. RDBMS
• Key features of Hive that differ from RDBMS.
• Schema on read makes for a very fast initial load, since the
data does not have to be read, parsed, and serialized to disk in
the database's internal format.
• The load operation is just a file copy or move.
• Schema on write makes query time performance faster, since
the database can index columns and perform compression on
the data but it takes longer to load data into the database.
Hive Vs. RDBMS
• Key features of Hive that differ from RDBMS.
• Hive is based on the notion of Write once, Read
many times.
• RDBMS is designed for Read and Write many times.
Hive Vs. RDBMS
• Key features of Hive that differ from RDBMS.
• In RDBMS, record level updates, insertions and deletes,
transactions and indexes are possible.
• This is not allowed in Hive because Hive was built to operate
over HDFS data using MapReduce, where full-table scans
are the norm and a table update is achieved by
transforming the data into a new table.
Hive Vs. RDBMS
• Key features of Hive that differ from RDBMS.
• In RDBMS, maximum data size allowed will be in
10's of Terabytes
• Hive can 100's Petabytes very easily.
Hive Vs. RDBMS
• Key features of Hive that differ from RDBMS.
• As Hadoop is a batch-oriented system, Hive doesn't support
OLTP (Online Transaction Processing) but it is closer to OLAP
(Online Analytical Processing) but not ideal since there is
significant latency between issuing a query and receiving a
reply, due to the overhead of Mapreduce jobs and due to
the size of the data sets Hadoop was designed to serve.
Hive Vs. RDBMS
• Key features of Hive that differ from RDBMS.
• RDBMS is best suited for dynamic data analysis and where fast
responses are expected
• Hive is suited for data warehouse applications, where relatively
static data is analyzed, fast response times are not required, and
when the data is not changing rapidly.
• To overcome the limitations of Hive, HBase is being integrated
with Hive to support record level operations and OLAP.
Hive Vs. RDBMS Exclusive
Hive Vs. RDBMS
• Key features of Hive that differ from RDBMS.
• Hive is very easily scalable at low cost
• RDBMS is not that much scalable that too it is very
costly scale up.
RDBMS Vs. Hive
Traditional databases Vs. Hive
Hive Traditional Databases
SQL Interface SQL Interface
Focus on batch analytics Mostly online, interactive analytics
No transactions Transactions are their way of life
No random inserts
Updates are not natively supported (but possible.)
Random insert and updates
Distributed processing via MR Distributed processing capabilities
vary
Scales to hundreds of nodes Seldom scales beyond 20 nodes
Built for commodity hardware Expensive, proprietary hardware
Low cost per petabyte Does not support petabyte
HiveQL
• Hive query language provides the basic SQL like operations.
• These operations are:
• Ability to filter rows from a table using a where clause.
• Ability to select certain columns from the table using a select
clause.
• Ability to do equi-joins between two tables.
• Ability to evaluate aggregations on multiple "group by"
columns for the data stored in a table.
Equi Join
HiveQL
• These operations are:
• Ability to store the results of a query into another table.
• Ability to download the contents of a table to a local directory.
• Ability to store the results of a query in a hadoop dfs directory.
• Ability to manage tables and partitions (create, drop and
alter).
• Ability to use custom scripts in chosen language (for map /
reduce).
SQL Vs. HiveQL
Operations
& Functions
SQL Hive Query Language
Select SQL-92 supports it. Single table or view in FROM
clause.
For partial ordering SORT BY is
used.
To limit number of rows returned
LIMIT operations is used.
HAVING clause is not supported.
SQL Vs. HiveQL
Operations&
Functions
SQL Hive Query Language
Updates UPDATE, INSERT,
DELETE
INSERT OVERWRITE TABLE
(It populates complete table
or partition)
Data types
present
Integral, floating
point, fixed point, text
and binary strings.
temporal
Integral, floating point.
boolean, string, array, map.
struct
SQL Vs. HiveQL
Operations&
Functions
SQL Hive Query Language
Default Join
Types
Inner Join Equi Join
Built-in
Functions
Built-in functions are in
Hundreds.
Dozens of built- in
Functions present.
Multiple table
inserts
Not supported in SQL Supported in HiveQL
SQL Vs. HiveQL
Operations&
Functions
SQL Hive Query Language
Create table
as select
Not valid in SQL but
may be found in
some databases
Supported by HiveQL
Extension
points
User-defined
functions and Stored
procedures.
User-defined functions and
Map-Reduce scripts.
SQL Vs. HiveQL
Operations&
Functions
SQL Hive Query Language
Transactions Supported Not supported
Indexes Supported Not supported
Latency Sub-second Minutes
Hive Data Model
DB HDFS
Directory
Partitions
(sub-directory)
Buckets
(Files)
Tables
• Hive structure data into a well defined database concept i.e
Tables , columns and rows, partitions ,buckets etc .
Hive Data Model
Abstraction Layers in Hive
Hive Data Model
• Tables
• Types Columns(int , float , string , date , Boolean)
• Supports array/map/struct for JSON like data
• Partitions
• ie, range partition tables by date
• Buckets
• Hash partition within ranges
• Useful for sampling , join optimization
Metastore
• Database
• Namespace containing a set of tables
• Table
• Containing list of columns and their types .
• Partition
• Each partition can have its own columns storage info
• Mapping to HDFS directories
• Statistics
• Info about the database
Hive Physical Layout
• Warehouse directory in HDFS
• Table row data is stored in warehouse subdirectory
• Partition creates subdirectory within table directories
• Actual data is stored in flat files
Hive - A theoretical overview in Detail.pptx

Hive - A theoretical overview in Detail.pptx

  • 1.
  • 2.
    Table of Contents •Introduction to HIVE
  • 3.
    What is HIVE? •It is a framework for data warehousing on top of Hadoop. • Hive grew from a need to manage and learn from the huge volumes of data that Facebook was producing every day from its burgeoning social network. • Hive was created to make it possible for analysts with strong SQL skills to run queries on the huge volumes of data that Facebook stored in HDFS.
  • 4.
    What is HIVE? •Hive is a data warehouse infrastructure tool to process structured data in Hadoop. • It resides on top of Hadoop to summarize Big Data, and makes querying and analysing easy.
  • 5.
    What is HIVE? •A system for querying and managing structured data built on top of Hadoop • Uses Map-Reduce for execution • HDFS for storage – but any system that implements Hadoop FS API • Key Building Principles: • Structured data with rich data types (structs, lists and maps) • Directly query data from different formats (text/binary) and file formats (Flat/Sequence) • SQL as a familiar programming tool and for standard analytics • Allow embedded scripts for extensibility and for non standard applications • Rich MetaData to allow data discovery and for optimization
  • 6.
    What is HIVE? •Data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. • Hive is not • A relational database • A design for OnLine Transaction Processing (OLTP) • A language for real-time queries and row-level updates
  • 7.
    Features of Hive •Features of Hive are: • Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis. • Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem. • Access to files stored either directly in Apache HDFS or in other data storage systems such as Apache HBase
  • 8.
    Features of Hive •Features of Hive are: • It provides SQL like querying language called HiveQL or HQL. • HiveQL are implicitly converted into MapReduce or Tez, or Spark jobs. • Using HiveQL doesn't require any knowledge of programming language, Knowledge of basic SQL query if enough.
  • 9.
    Features of Hive •Features of Hive are: • Built-in user defined functions (UDFs) to manipulate dates, strings, and other data-mining tools. • Hive supports extending the UDF set to handle use-cases not supported by built-in functions. • Hive's SQL can also be extended with user code via user defined functions (UDFs), user defined aggregates (UDAFs), and user defined table functions (UDTFs).
  • 10.
    Features of Hive •Features of Hive are: • Hive support file formats which are textFile, SequenceFile, ORC, RCFile, Avro Files, Parquet, LZO Compression etc. • Operates on compressed data stored into the Hadoop ecosystem using algorithms including DEFLATE, BWT, snappy, etc.
  • 11.
    Features of Hive •Features of Hive are: • Supports external tables which make it possible to process data without actually storing in HDFS. • It stores schema in a database and processed data into HDFS. • Metadata storage in an RDBMS, significantly reducing the time to perform semantic checks during query execution.
  • 12.
    Features of Hive •Features of Hive are: • It is designed for OLAP. • It is simple to use SQL, fast, scalable, and extensible.
  • 13.
    Advantages • Hive isdesigned to • enable easy data summarization • ad-hoc querying • analysis of large volumes of data. • Hive is built on hadoop, so supports and handles all the capabilities of hadoop provides like reliability, high performance , node failure.
  • 14.
    Advantages • HiveQL statementsare automatically translated into MapReduce jobs • Database developer need not learn the java programming for writing map reduce programs for retrieving data from hadoop system
  • 15.
    Advantages • High levelquery language - Simplifies working with large amounts of data • Lower learning curve than Pig or MapReduce - HiveQL is much closer to SQL than Pig. • Less trial and error than Pig
  • 16.
    Disadvantages • Hive isnot for OLAP processing. • Not all ‘standard’ SQL is supported • No support for INSERTing single rows • Updating data is complicated • Mainly because of using HDFS • Can add records • Can overwrite partitions
  • 17.
    Disadvantages • Relatively limitednumber of built-in functions • Does not support TRANSACTION • No real time access to data • Use other means like Hbase or Impala • High latency
  • 18.
    Running Hive • Hivecan be executed from: • Hive web interface
  • 19.
    Running Hive • Hivecan be executed from: • Hive shell • $HIVE_HOME/bin/hive for interactive shell • Or you can run queries directly: • $HIVE_HOME/bin/hive -e ‘select a.col from tab1 a’ •
  • 20.
    Running Hive • Hivecan be executed from: • JDBC - Java Database Connectivity • "jdbc:hive://host:port/dbname"
  • 21.
    Running Hive • Hivecan be executed from: • Also possible to use hive directly in Python, C, C++, PHP 6 Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", ""); Statement stmt = con.createStatement(); String tableName = "testHiveDriverTable"; stmt.executeQuery("drop table " + tableName); ResultSet res = stmt.executeQuery("create table " + tableName + " (key int, value string)"); // show tables String sql = "show tables '" + tableName + "'"; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); if (res.next()) { System.out.println(res.getString(1)); }
  • 22.
    Pig Vs. Hive •Hive is a good choice: • if you are familiar with SQL • when you want to query the data • when you need an answer to a specific question • Pig is a good choice: • for ETL (Extract -­> Transform -­> Load) ‐ ‐ • preparing your data so that it is easier to analyse • when you have a long series of steps to perform • Many businesses use both Pig and Hive together
  • 23.
  • 24.
  • 25.
    Hive Vs. RDBMS •Differences between Hive vs RDBMS (traditional relation databases). • Few examples of traditional relational databases are MySQL, PostgreSQL, Oracle 11 g, MS SQL Server etc. • Some of the key features of Hive that differ from RDBMS. • Hive resembles a traditional database by supporting SQL interface but it is not a full database. Hive can be better called as data warehouse instead of database. • Hive enforces schema on read time whereas RDBMS enforces schema on write time.
  • 26.
    Hive Vs. RDBMS •Key features of Hive that differ from RDBMS. • In RDBMS, a table's schema is enforced at data load time, If the data being loaded doesn't conform to the schema, then it is rejected. • This design is called schema on write. • But Hive doesn't verify the data when it is loaded, but rather when it is retrieved. • This is called schema on read.
  • 27.
    Hive Vs. RDBMS •Key features of Hive that differ from RDBMS. • Schema on read makes for a very fast initial load, since the data does not have to be read, parsed, and serialized to disk in the database's internal format. • The load operation is just a file copy or move. • Schema on write makes query time performance faster, since the database can index columns and perform compression on the data but it takes longer to load data into the database.
  • 28.
    Hive Vs. RDBMS •Key features of Hive that differ from RDBMS. • Hive is based on the notion of Write once, Read many times. • RDBMS is designed for Read and Write many times.
  • 29.
    Hive Vs. RDBMS •Key features of Hive that differ from RDBMS. • In RDBMS, record level updates, insertions and deletes, transactions and indexes are possible. • This is not allowed in Hive because Hive was built to operate over HDFS data using MapReduce, where full-table scans are the norm and a table update is achieved by transforming the data into a new table.
  • 30.
    Hive Vs. RDBMS •Key features of Hive that differ from RDBMS. • In RDBMS, maximum data size allowed will be in 10's of Terabytes • Hive can 100's Petabytes very easily.
  • 31.
    Hive Vs. RDBMS •Key features of Hive that differ from RDBMS. • As Hadoop is a batch-oriented system, Hive doesn't support OLTP (Online Transaction Processing) but it is closer to OLAP (Online Analytical Processing) but not ideal since there is significant latency between issuing a query and receiving a reply, due to the overhead of Mapreduce jobs and due to the size of the data sets Hadoop was designed to serve.
  • 32.
    Hive Vs. RDBMS •Key features of Hive that differ from RDBMS. • RDBMS is best suited for dynamic data analysis and where fast responses are expected • Hive is suited for data warehouse applications, where relatively static data is analyzed, fast response times are not required, and when the data is not changing rapidly. • To overcome the limitations of Hive, HBase is being integrated with Hive to support record level operations and OLAP.
  • 33.
    Hive Vs. RDBMSExclusive
  • 34.
    Hive Vs. RDBMS •Key features of Hive that differ from RDBMS. • Hive is very easily scalable at low cost • RDBMS is not that much scalable that too it is very costly scale up.
  • 35.
  • 36.
    Traditional databases Vs.Hive Hive Traditional Databases SQL Interface SQL Interface Focus on batch analytics Mostly online, interactive analytics No transactions Transactions are their way of life No random inserts Updates are not natively supported (but possible.) Random insert and updates Distributed processing via MR Distributed processing capabilities vary Scales to hundreds of nodes Seldom scales beyond 20 nodes Built for commodity hardware Expensive, proprietary hardware Low cost per petabyte Does not support petabyte
  • 37.
    HiveQL • Hive querylanguage provides the basic SQL like operations. • These operations are: • Ability to filter rows from a table using a where clause. • Ability to select certain columns from the table using a select clause. • Ability to do equi-joins between two tables. • Ability to evaluate aggregations on multiple "group by" columns for the data stored in a table.
  • 38.
  • 39.
    HiveQL • These operationsare: • Ability to store the results of a query into another table. • Ability to download the contents of a table to a local directory. • Ability to store the results of a query in a hadoop dfs directory. • Ability to manage tables and partitions (create, drop and alter). • Ability to use custom scripts in chosen language (for map / reduce).
  • 40.
    SQL Vs. HiveQL Operations &Functions SQL Hive Query Language Select SQL-92 supports it. Single table or view in FROM clause. For partial ordering SORT BY is used. To limit number of rows returned LIMIT operations is used. HAVING clause is not supported.
  • 41.
    SQL Vs. HiveQL Operations& Functions SQLHive Query Language Updates UPDATE, INSERT, DELETE INSERT OVERWRITE TABLE (It populates complete table or partition) Data types present Integral, floating point, fixed point, text and binary strings. temporal Integral, floating point. boolean, string, array, map. struct
  • 42.
    SQL Vs. HiveQL Operations& Functions SQLHive Query Language Default Join Types Inner Join Equi Join Built-in Functions Built-in functions are in Hundreds. Dozens of built- in Functions present. Multiple table inserts Not supported in SQL Supported in HiveQL
  • 43.
    SQL Vs. HiveQL Operations& Functions SQLHive Query Language Create table as select Not valid in SQL but may be found in some databases Supported by HiveQL Extension points User-defined functions and Stored procedures. User-defined functions and Map-Reduce scripts.
  • 44.
    SQL Vs. HiveQL Operations& Functions SQLHive Query Language Transactions Supported Not supported Indexes Supported Not supported Latency Sub-second Minutes
  • 47.
    Hive Data Model DBHDFS Directory Partitions (sub-directory) Buckets (Files) Tables • Hive structure data into a well defined database concept i.e Tables , columns and rows, partitions ,buckets etc . Hive Data Model
  • 48.
  • 49.
    Hive Data Model •Tables • Types Columns(int , float , string , date , Boolean) • Supports array/map/struct for JSON like data • Partitions • ie, range partition tables by date • Buckets • Hash partition within ranges • Useful for sampling , join optimization
  • 50.
    Metastore • Database • Namespacecontaining a set of tables • Table • Containing list of columns and their types . • Partition • Each partition can have its own columns storage info • Mapping to HDFS directories • Statistics • Info about the database
  • 51.
    Hive Physical Layout •Warehouse directory in HDFS • Table row data is stored in warehouse subdirectory • Partition creates subdirectory within table directories • Actual data is stored in flat files

Editor's Notes

  • #4 Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
  • #6 Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
  • #10 The Optimized Row Columnar (ORC) file format RCFile (Record Columnar File)  Avro stores the data definition in JSON format making it easy to read and interpret, the data itself is stored in binary format making it compact and efficient.  Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, 
  • #15  Data warehousing on top of Hadoop. • • 3 4. Advantages of Hive • • 4
  • #16 Disadvantages • Updating data is complicated - Mainly because of using HDFS - Can add records - Can overwrite partitions • No real time access to data - Use other means like Hbase or Impala • High latency
  • #17 Disadvantages • Updating data is complicated - Mainly because of using HDFS - Can add records - Can overwrite partitions • No real time access to data - Use other means like Hbase or Impala • High latency
  • #33 A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. 
  • #38 An equijoin is a join with a join condition containing an equality operator. An equijoin returns only the rows that have equivalent values for the specified columns. An inner join is a join of two or more tables that returns only those rows (compared using a comparison operator) that satisfy the join condition. Almost every join is an equijoin, because the condition for matching rows is based on the equality of two values—one from each of the tables being joined. So that's what makes it an equijoin: the ON condition is equality. This includes inner joins and all three types of outer joins. Inner joins, on the other hand, can be based on equality to match rows, or on some other condition entirely. If it's not an equijoin, then it's usually called a theta join, although to be precise, an equijoin is just one of the possible theta joins; other theta joins use less than, less than or equal, etc., as the comparison operator. As long as the comparison evaluates to TRUE, the matched rows qualify for the join.