[R] Matching when each subject has multiple records, but each subject should be used only once in the match

Jorgen Harmse JH@rm@e @end|ng |rom roku@com
Fri Sep 19 16:55:48 CEST 2025


Your propensity model (mentioned in your follow-up message) is presumably at the user level, so you need a function that accepts all the records for a user and produces a single record with features for the propensity model. (Maybe the function averages values over individual records or computes slopes or uses a sequence model to produce an embedding.) Then something like plyr::ddply can produce a new data frame with one row per user. Use that for your matching or weighting or other propensity-score method, and if necessary use something like base::merge to broadcast the result back to the original data frame.

Other answers refer to new entrants to the study & similar complications. Presumably the first step in your feature-extraction function for propensity scores is to discard all information from after the treatment selection. You want confounders that might have influenced the treatment, not pseudo-confounders that may have been influenced by the treatment.

Regards,
Jorgen Harmse.


Message: 1
Date: Thu, 18 Sep 2025 11:08:23 +0000
From: "Sorkin, John" <jsorkin using som.umaryland.edu>
To: Leo Mada via R-help <r-help using r-project.org>
Subject: [R] Matching when each subject has multiple records, but each
        subject should be used only once in the match
Message-ID:
        <DM6PR03MB50492D24D394635A0BE114F2E216A using DM6PR03MB5049.namprd03.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

I have a file that contains longitudinal data for each subject. As a result, each subject can have multiple records. For example a given subject might have a record in Jan 2020, another in June 2020, another in Feb 2021, another in May 2021, another in Sept 2022, etc. At each time for which a subject has a record the subject is identified as a case or a control.

Over the course of the longitudinal data, I want to match a given case to a given control. Once a subject is matched, I don't want the subject to be eligible for being matched again.

If each subject had a single record, matching could easily be accomplished. How can I accomplish the match in my file having repeated measures for each subject?

John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382




	[[alternative HTML version deleted]]



More information about the R-help mailing list