I am a Ph.D. Student in Computer Science at Georgia Tech, advised by Prof. Xu Chu and Prof. Kexin Rong. I finished my M.S. degree in Computer Science at Georgia Tech. Prior to joining Georgia Tech, I graduated with a M.S. degree in Civil Engineering from Stanford University and a B.Eng. degree in Civil Engineering from Tongji Unviersity.
My research interests include: Machine Learning, Data Management, Data Cleaning, Data Integration, Entity Resolution
- Jan 2024: I started as a Research Scientist at ByteDance Infrastructure System Lab!
- Dec 2023: Excited to announce my graduation from Georgia Tech!🎉
- Aug 2023: Exciting News! Our paper Auto-Tables has been awarded the best paper at VLDB 2023!🏆
- Jul 2023: Our paper Auto-Tables has been accepted to VLDB 2023!
- May 2023: I started my internship at Microsoft Research.
- Feb 2023: Our paper DiffPrep has been accepted to SIGMOD 2023!
- May 2022: I started my internship at Microsoft Research.
- Dec 2021: Our paper Mislabel has been accepted to ICDE 2022!
- Jun 2021: Our demo paper Panda has been accepted to VLDB Demo 2021!
- Dec 2020: Our paper AutoFJ has been accepted to SIGMOD 2021!
- Nov 2020: Our paper CPClean has been accepted to VLDB 2021!
- Oct 2020: Our paper CleanML has been accepted to ICDE 2021!
- Aug 2019: I started as a Phd student in Computer Science at Georgia Tech.
Education
M.S. and Ph.D. in Computer Science
Advisor: Prof. Xu Chu and Prof. Kexin Rong
GPA: 4.0 / 4.0
Experience
- Lead the research on data cleaning for machine learning.
- Conduct research on machine learning for data integration.
- Conducted instruction finetuning to improve the performance of large language models on table-related tasks.
- Generated training data on 20 table-related tasks (e.g., data cleaning/transformation) to finetune GPT-3 and Llama 2 models.
- Achieved significant performance gains on seen/unseen tasks and attained new SOTA result on many benchmarks.
- Developed efficient algorithms to discover interpretable drivers for KPI analysis.
- Integrated the algorithms into the KPI intelligence system.
- Built a model that can automatically generate data preparation pipelines to relationalize tables.
- Used inverse operators to generate synthesized training data.
- Used Sentence-BERT and CNN to build a deep learning model.
- Work as a TA for Software Development Process.
- Grade assignments and exams.
- Worked as the head TA for Introduction to Database Systems.
- Designed assignments, led discussion sections, graded assignments and exams.
Publications
-
Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using ExamplesVLDB 2023
-
DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular DataSIGMOD 2023
Awards
-
Best Paper Award at VLDB 2023
-
Second Prize in SIGMOD Programming Contest, 2021
-
Best Seismic Noise Detection Prize in Stanford Big Earth Hackathon Competition, 2018
-
Outstanding Undergraduate Graduation Project, Tongji University, 2015
-
China National Scholarship, 2013
-
First Prize Academic Scholarship, Tongji University, 2012, 2013
-
Third Prize in Shanghai College Students Math Contest, 2012