Peng Li (李鹏)
Ph.D. Student in Computer Science, Georgia Tech | lipengpublic@gmail.com

I am a Ph.D. Student in Computer Science at Georgia Tech, advised by Prof. Xu Chu and Prof. Kexin Rong. I finished my M.S. degree in Computer Science at Georgia Tech. Prior to joining Georgia Tech, I graduated with a M.S. degree in Civil Engineering from Stanford University and a B.Eng. degree in Civil Engineering from Tongji Unviersity.

My research interests include: Machine Learning, Data Management, Data Cleaning, Data Integration, Entity Resolution

Recent News:
  • Jan 2024: I started as a Research Scientist at ByteDance Infrastructure System Lab!
  • Dec 2023: Excited to announce my graduation from Georgia Tech!🎉
  • Aug 2023: Exciting News! Our paper Auto-Tables has been awarded the best paper at VLDB 2023!🏆
  • Jul 2023: Our paper Auto-Tables has been accepted to VLDB 2023!
  • May 2023: I started my internship at Microsoft Research.
  • Feb 2023: Our paper DiffPrep has been accepted to SIGMOD 2023!
  • May 2022: I started my internship at Microsoft Research.
  • Dec 2021: Our paper Mislabel has been accepted to ICDE 2022!
  • Jun 2021: Our demo paper Panda has been accepted to VLDB Demo 2021!
  • Dec 2020: Our paper AutoFJ has been accepted to SIGMOD 2021!
  • Nov 2020: Our paper CPClean has been accepted to VLDB 2021!
  • Oct 2020: Our paper CleanML has been accepted to ICDE 2021!
  • Aug 2019: I started as a Phd student in Computer Science at Georgia Tech.

Education

Georgia Institute of Technology

M.S. and Ph.D. in Computer Science
Advisor: Prof. Xu Chu and Prof. Kexin Rong
GPA: 4.0 / 4.0

Aug 2018 - Dec 2023
Stanford University

M.S. in Civil Engineering
GPA: 4.07 / 4.0

Sep 2016 - Jun 2018
Tongji University

B.Eng. in Civil Engineering
GPA: 4.86 / 5.0
Ranking: 1 / 559

Sep 2011 - Jul 2015

Experience

Research Scientist
Jan 2024 - Present
Research Assistant
Chu Data Lab at Georgia Institute of Technology
  • Lead the research on data cleaning for machine learning.
  • Conduct research on machine learning for data integration.
Jan 2019 - Dec 2023
Research Intern
  • Conducted instruction finetuning to improve the performance of large language models on table-related tasks.
  • Generated training data on 20 table-related tasks (e.g., data cleaning/transformation) to finetune GPT-3 and Llama 2 models.
  • Achieved significant performance gains on seen/unseen tasks and attained new SOTA result on many benchmarks.
May 2023 - Aug 2023
Research Intern
  • Developed efficient algorithms to discover interpretable drivers for KPI analysis.
  • Integrated the algorithms into the KPI intelligence system.
Jan 2023 - May 2023
Research Intern
  • Built a model that can automatically generate data preparation pipelines to relationalize tables.
  • Used inverse operators to generate synthesized training data.
  • Used Sentence-BERT and CNN to build a deep learning model.
May 2022 - Aug 2022
Teaching Assistant
Georgia Institute of Technology
  • Work as a TA for Software Development Process.
  • Grade assignments and exams.
Aug 2021 - Dec 2021
Teaching Assistant
Georgia Institute of Technology
  • Worked as the head TA for Introduction to Database Systems.
  • Designed assignments, led discussion sections, graded assignments and exams.
Jan 2020 - May 2020

Publications

2023
2022
2021

Awards

  • Best Paper Award at VLDB 2023

  • Second Prize in SIGMOD Programming Contest, 2021

  • Best Seismic Noise Detection Prize in Stanford Big Earth Hackathon Competition, 2018

  • Outstanding Undergraduate Graduation Project, Tongji University, 2015

  • China National Scholarship, 2013

  • First Prize Academic Scholarship, Tongji University, 2012, 2013

  • Third Prize in Shanghai College Students Math Contest, 2012