Categories
Mastering Development

I am trying to categorize obervations with a variety of names by grouping them by new variables

I am new to coding and have been trying to use R to simplify the management of mice for the research lab I work at.

To use mtcars as an examples.

I want to group different observations in mtcars by new variables. For example, if i wanted to group the cars by country of origin & manufacturer and year they were made, standard tire size.

More specifically for my example, i have a bunch of mice of different genotypes. There are different breeding schemes for the mice based on the genotype construct, genetic backgrounds and other factors and i want to group them by those different factors.

The problem that i currently have is that the mice that should have the same names have a range of names. So a TSLP.KO mouse, comes in the variations of TSLP-KO, TSKP.KO.B6, TSLP;KO.B6(N12F1) etc.

Lets call this DF1

   Mouse_ID Strain     Sex   Age_wk Genotype listgenobox DOB   Cage_ID Litter_ID Mice_Room_ID
   <fct>    <fct>      <fct>  <dbl> <fct>    <fct>       <fct> <fct>   <fct>     <fct>       
 1 ZDM862   TSLP.KO     M        6.7 ""       "_/_  _/_ ~ 12/1~ H118599 B23235-2  SZ8         
 2 ZDM863   TSLP.KO.B6  M        6.7 ""       "_/_  _/_ ~ 12/1~ H118599 B23235-2  SZ8         
 3 ZDM864   TSLP;KO     M        6.7 ""       "_/_  _/_ ~ 12/1~ H118600 B23235-2  SZ8         
 4 ZDM865   TSLP-KO     M        6.7 ""       "_/_  _/_ ~ 12/1~ H118600 B23235-2  SZ8         
 5 ZDM866   TSLP:KO     M        6.7 ""       "_/_  _/_ ~ 12/1~ H118600 B23235-2  SZ8         
 6 ZDM867   TSLPKO      F        6.7 ""       "_/_  _/_ ~ 12/1~ H118601 B23235-2  SZ8   

My instinct was to make an excel file with the different naming variations (there are a finite number of variations) as well as the preferred nicknames and breeding scheme groups and combine that with my larger data frame that contains the Mouse ID’s, Strains, Ages, Sex, Genotype etc.

Lets call this DF2

Breeding_Group    Preferred Name   Alternate_Name Alternate_Name2 Alternate_Name3  
   <fct>          <fct>             <fct>          <fct>           <fct>           
 1 1a             TSLP Knockout    "TSLP.KO"      "TSLP.KO.B6"      ""             
 2 2a             C57BL~           "C57BL/6"      ""                ""                          
 3 1b             CCR2.~           "CCR2.CreERT2" "CCR2-CreERT2-"   ""                           

The results im hoping for is as follows

 Mouse_ID Strain     Sex   Age_wk Genotype listgenobox DOB   Cage_ID Litter_ID Mice_Room_ID  Breeding_Group  Preferred Name
   <fct>    <fct>      <fct>  <dbl> <fct>    <fct>       <fct> <fct>   <fct>     <fct>         <fct>        <fct>    
 1 ZDM862   TSLP.KO     M        6.7 ""       "_/_  _/_ ~ 12/1~ H118599 B23235-2  SZ8           1a        TSLP Knockout 
 2 ZDM863   TSLP.KO.B6  M        6.7 ""       "_/_  _/_ ~ 12/1~ H118599 B23235-2  SZ8           1a        TSLP Knockout 
 3 ZDM864   TSLP;KO     M        6.7 ""       "_/_  _/_ ~ 12/1~ H118600 B23235-2  SZ8           1a        TSLP Knockout 
 4 ZDM865   TSLP-KO     M        6.7 ""       "_/_  _/_ ~ 12/1~ H118600 B23235-2  SZ8           1a        TSLP Knockout 
 5 ZDM866   TSLP:KO     M        6.7 ""       "_/_  _/_ ~ 12/1~ H118600 B23235-2  SZ8           1a        TSLP Knockout 
 6 ZDM867   TSLPKO      F        6.7 ""       "_/_  _/_ ~ 12/1~ H118601 B23235-2  SZ8           1a        TSLP Knockout

TL/DR
I want to add two new variables (Preferred name & Breeding Group) to DF1 by matching the Strain names to one of the “alternate_names” variables in the DF2.

I have tried different combinations of merge() and rbind.fill() with little success.

I hope those tables are readable. I’m sorry i’m not better are framing them…yet.

Thank you in advance if you stuck with me till the end of this question. I appreciate any advice.

Leave a Reply

Your email address will not be published. Required fields are marked *