Authors: Jingbo Xu, Zhanning Bai, Wenfei Fan, Longbin Lai, Xue Li, Zhao Li, Zhengping Qian, Lei Wang, Yanyan Wang, Wenyuan Yu, Jingren Zhou
Name of Conference: International Conference on Very Large Data Bases (VLDB 2021)
Date of Publication: Aug, 2021
Due to diverse graph data and algorithms, programming and orchestration of complex computation pipelines have become the major challenges to making use of graph applications for Web-scale data analysis. GraphScope aims to provide a one-stop and efficient solution for a wide range of graph computations at scale. It extends previous systems by offering a unified and high-level programming interface and allowing the seamless integration of specialized graph engines in a general data-parallel computing environment. As we will show in this demo, GraphScope enables developers to write sequential graph programs in Python and provides automatic parallel execution on a cluster. This further allows GraphScope to seamlessly integrate with existing data processing systems in PyData ecosystem. To validate GraphScope’s efficiency, we will compare a complex, multi-staged processing pipeline for a reallife fraud detection task with a manually assembled implementation comprising multiple systems. GraphScope achieves a 2.86×speedup on a trillion-scale graph in real production at Alibaba.