Yoonsang Kim

Yoonsang Kim is a Senior Research Scientist with NORC at the University of Chicago in the public health department, and a lead Biostatistican for the Social Data Collaboratory (SDCollab), interdisciplinary social data research team in NORC. She oversees data analysis and filtering process of social media data.

Kim has been conducting methodological research to filter social media data and quantify the assessment of data quality. She is interested in developing measures using social media data to understand public sentiment and social environment around topics such as particular health behavior, health policy, and marketing of health-related products. She is also leading the analyses for the projects that evaluate televised anti-smoking media campaigns and electronic cigarette advertising, and that examine the relationship between exposure to social media related to tobacco smoking and individual's smoking behavior.

Before joining NORC, Kim was a Biostatistician at the Institute for Health Research and Policy and an instructor of Biostatistics at the University of Illinois at Chicago. She has served as a co-investigator and Biostatistician in several NIH-funded projects and provided statistical consulting. She has extensive experience and expertise in analysis for longitudinal data, randomized controlled trials, and complex survey data. Prior to UIC, she was a Research Assistant Professor in the department of Biostatistics and Epidemiology at University of Oklahoma Health Science Center.

Kim and SDCollab published an article that proposed a framework for the collection and assessment of social media data under common and challenging conditions and a checklist for reporting data preparation. This is an effort to promote transparency and replicability in social media research and to move toward a reporting standard that researchers and reviewers can use to compare the quality of social media data analyzed across diffident studies. In another notable article, she and her colleagues provided comprehensive review of the estimation methods for logistic regression with multiple random effects, implemented in commonly used statistical packages, and conducted simulation to find a best suitable method and package that produce unbiased and efficient estimates. This provided a practical guidance for applied statisticians and analysts who want to use this type of models.