Hi I am currently trying to perform a t-test for BMI and the Outcome variables in a diabetes dataset I am working on.
My goal is to determine whether there is a significant difference between the means of two groups.
So far I’ve tried looking for any
NULL values within my data set and these variables using
sum(is.na(diabetes$BMI)), and cannot seem to find any.
So far my code is:
diabetes <- t.test(diabetes$BMI ~ diabetes$Outcome)
Any help is greatly appreciated, thank you.
EDIT: okay so I realized what I was doing wrong. I was feeding the t.test back into my main data set (diabetes) and calling it as diabetes. when I print(diabetes) alone a welch two sample t-test actually comes out for the variables I selected! I am also using the Pima Indians diabetes dataset from kaggle (https://www.kaggle.com/uciml/pima-indians-diabetes-database).
Also, I believe after the first time running
diabetes <- t.test(diabetes$BMI ~ diabetes$Outcome) over again
was the reason why the error:
Error in model.frame.default(formula = diabetes$BMI ~ diabetes$Outcome) :
invalid type (NULL) for variable ‘diabetes$BMI’
was appearing since the first time I ran diabetes <- t.test(diabetes$BMI ~ diabetes$Outcome) actually came out with no error, but I failed to remember to print(diabetes) since that was where I fed it into.