Graph theory could potentially make a big impact on how we conduct businesses. Imagine the case where you wish to maximize the reach of your promotion via leveraging your customers' influence, to advocate your products and bring their friends on board. The same logic of harnessing one's networks can be applied to purchase recommendation, customer behavior, and fraud detection.
Running analyses on large graphs was not trivial for many companies - until recently. The field has made significant steps in the last five years and scalable graph computations are now the norm. You can now run graph computations out-of-core (no memory constraints) and in parallel (multiple machines), especially in Spark which is spreading like wildfire.
A lot of people are familiar with graphX, a pretty solid implementation of scalable graphs in Spark. GraphX is pretty interesting but the project seems to be orphaned. The good news is, there is now an alternative: Graphframes. They are a new data structure that takes the best parts of dataframes and graphs
In this talk, I will be explaining how to use Graphframes from Python, a new data structure in Spark 2.0 that takes the best parts of dataframes and graphs, with an example using personalized pagerank for recommendations.