As the heterogeneity of data sources are increasing on the web, and due to the sparsity of data in each of these data sources, cross-domain recommendation is becoming an emerging research topic in the recent years. Cross-domain collaborative filtering aims to transfer the user rating pattern from source (auxiliary) domains to a target domain for the purpose of alleviating the sparsity problem and providing better target recommendations. However, the studies so far have either focused on a limited number of domains that are assumed to be related to each other (such as books and movies), or a division of the same dataset (such as movies) into different domains based on an item characteristic (such as genre). In this paper, we study a broad set of domains and their characteristics to understand the factors that affect the success or failure of cross-domain collaborative filtering, the amount of improvement in cross-domain approaches, and the selection of best source domains for a speficic target domain. We propose to use Canonical Correlation Analysis (CCA) as a significant major factor in finding the most
promising source domains for a target domain, and suggest a cross-domain collaborative filtering based on CCA (CD-CCA) that proves to be successful in using the shared information between domains in the target recommendations.