This paper addresses the important problem of integrating heterogeneous data from sources as diverse as web pages, digital libraries, knowledge bases and databases. The ultimate aim of this work is to be able to query such heterogeneous data sources as if their data were conveniently held in a single relational database. Pursuant of this aim, we propose a generalisation of relational joins from the relational database model to enable joins on arbitrarily complex structured data in a higher-order representation. By incorporating kernels and distances for structured data, we further extend this model to support approximate joins of data originating from heterogeneous sources. We have implemented these higher-order relational operators and their associated kernels in Prolog and applied this framework on the CORA data sets. We demonstrate the flexibility of our approach in the publications domain by evaluating example approximate queries on structured data, joining on types ranging from sets of co-authors through to entire publications.