曾立

Li Zeng, senior algorithm researcher and programmer in computer science and technology.



Email: bookug AT qq DOT com


GitHub, Google Scholar

Biography

I obtained my B.S. degree from PKU in 2016, and studied as a master for two years. Development and optimization of graph database system gStore is my main work in this period. Later, I finished Ph.D. in 3 years by researching the optimization of graph algorithms on heterogeneous CPU-GPU platforms. My current research interests include graph computing, vector database, LLM acceleration and maintenance.



Publications
 
Li Zeng, et al. WindGP: Efficient Graph Partitioning on Heterogenous Machines. arXiv, 2024.  
Li Zeng (second author). LocMoE: A Low-Overhead MoE for Large Language Model Training. IJCAI, 2024.  
Li Zeng, et al. KBQA: Accelerate Fuzzy Path Query on Knowledge Graph. DEXA, 2023.  
Yu Gao, Meng Qin, Yibin Ding, Li Zeng, et al. RaftGP: Random Fast Graph Partitioning. IEEE High Performance Extreme Computing, 2023.   (GraphChallenge Innovation Award)
Zhijie Sun, Jing Li, Jun Xie, Binfan Zheng, Li Zeng, et al. indexPDT: A High Scalable Distributed Classification Approach with Novel Cache Structure for Geo-location. HPCC, 2023.  
Li Zeng, Lei Zou, M. Tamer Özsu. SGSI: A scalable GPU-friendly Subgraph Isomorphism Algorithm. IEEE Transactions on Knowledge and Data Engineering, 2022.   (CCF A journal)
Li Zeng, et al. HTC: Hybrid vertex-parallel and edge-parallel Triangle Counting. IEEE High Performance Extreme Computing, 2022.   (GraphChallenge Innovation Award)
Li Zeng, et al. SQLG+: Efficient k-hop Query Processing on RDBMS. International Conference on Database Systems for Advanced Applications, 2022.   (CCF B conference)
Li Zeng, Yan Jiang, Weixin Lu, Lei Zou. Deep Analysis on Subgraph Isomorphism. arXiv, 2021. [pdf]  
Li Zeng, Lei Zou, M. Tamer Özsu, et al. GSI: GPU-friendly Subgraph Isomorphism. IEEE International Conference on Data Engineering, 2020. [pdf]   (CCF A conference)
Fan Zhang, Lei Zou, Li Zeng, Xiangyang Gou. Dolha - an efficient and exact data structure for streaming graphs. The journal of World Wide Web, 2020.   (CCF B journal)
Li Zeng, Lei Zou. Redesign of the gStore system. Frontier of Computer Science, 2018. [pdf]   (CCF C journal)
Yu Zhang, Li Zeng, Lei Zou. Regular Path Queries on Large Graph Data. Natural Language Processing and Chinese Computing, 2018.   (CCF C conference)
张雨,曾立,邹磊。大规模图数据的正则路径查询北京大学学报自然科学版,2018。  
 
 

Projects

Improvement of Graph Machine Learning System
  • Distributed data loading of hybrid features: Redesign the architecture of data loading module to support hybrid features on node/edge, optimize the parsing of feature string and yield >2× speedup.
  • Distributed real-time sampling in data loading: Create memory clip module. During data loading, if memory is not enough, a clipping will be performed on graph store. The sampling function used in clipping is defined by users. Contribution: reduce the memory consumption greatly (31%) with little decrease in model effect (1%).
Acceleration of Large Scale Graph Algorithms
  • Survey and optimization of subgraph isomorphism on CPU: survey the best solutions of subgraph isomorphism on CPU and propose four general techniques for improvement
  • Acceleration of subgraph isomorphism on GPU: novel data structure and join algorithm, >10× speedup
  • Acceleration of other graph algorithms on GPU: optimize solutions of shortest path and triangle counting, both achieve >2× speedup
Development of Graph Database System As the primary developer of graph database gStore, cumulatively update 5,000,000 lines of code, greatly improve the performance (100×) and scale (40×). Meanwhile, accumulate experience of leading a team (>10 people), make them qualified for every module respectively.
  • Optimization of query plans: accelerate SPARQL queries, support predicate variable and property path
  • Improvement of indices: continually optimize the disk-based key-value indices (specially designed for gStore)
  • Others: improve user interface, create web server, standardize the development, add documents of design and usage
 

Recommended GitHub Repositories
 
  • PaperNotes: collect papers, write notes and search quickly based on multiple tags
  • LinuxProgramming: basic knowledge and some advanced topics about Linux Programming
  • GraphBenchmark: a benchmark for generating all kinds of graphs and queries
  • SIEP: the state-of-the-art subgraph matching algorithms on CPU
  • GSI: subgraph isomorphism on GPU, also see my implementation of GunrockSM, GpSM, gutil
  • gStore: graph database system
 
 




Last modified: Jun 1, 2024
粤ICP备2022011832号-1