In today’s interconnected real world, social and informational entities are interconnected, forming gigantic, interconnected, integrated social and information networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous social and information networks. Most real world applications that handle big data, including interconnected social media and social networks, medical information systems, online e-commerce systems, or database systems, can be structured into typed, heterogeneous social and information networks. For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks. Effective analysis of large-scale heterogeneous social and information networks poses an interesting but critical challenge.
In this talk, we present a set of data mining scenarios in heterogeneous social and information networks and show that mining typed, heterogeneous networks is a new and promising research frontier in data mining research. However, such mining may raise some serious challenging problems on scalability computation. We identify a set of problems on scalable computation and calls for serious studies on such problems. This includes how to efficiently computation for (1) meta path-based similarity search, (2) rank-based clustering, (3) rank-based classification, (4) meta path-based link/relationship prediction, and (5) topical hierarchies from heterogeneous information networks. We introduce some recent efforts, discuss the trade-offs between query-independent pre-computation vs. query-dependent online computation, and point out some promising research directions.