Hive is a data warehouse infrastructure tool that allows querying and analyzing large datasets stored in Hadoop Distributed File System (HDFS). It provides SQL-like queries (HiveQL) to summarize big data. Hive stores metadata information about tables, columns, partitions etc. in a metastore database while the actual data resides in HDFS. It supports features like partitioning, bucketing, views, subqueries and joins to optimize querying large datasets.