Ghement Statistical Consulting Company Ltd.
  Home > Resources
Ghement Statistical Consulting Company Ltd.
301-7031 Blundell Road
Richmond, British Columbia
Canada, V6Y 1J5
Tel: 604-767-1250
E-Mail: info@ghement.ca


Isabella R. Ghement 2019






























































 

How can we convert a numerical variable into a categorical variable in R?


If you need to create a categorical variable from a numeric variable in R, the function case_when() from the dplyr package will come in handy.

To illustrate the use of case_when(), let's consider the airquality data set that comes with R:

data(airquality)

str(airquality)

This data set has a Temp variable (i.e., daily temperature in degrees F), whose range of values is given by 56 to 97, as seen via the R command below:

range(airquality$Temp, na.rm=TRUE)

Using the Temp variable, we can create a categorical variable TempCat with categories for which Temp satisfies the following conditions:

  • Temp < 72 (category will be named <72)
  • Temp >= 72 & Temp < 79 (category will be named [72, 79))
  • Temp >= 79 & Temp < 85 (category will be names [79, 85))
  • Temp >= 85 (category will be named >=85)

The R commands we need to create these categories and store them into a new variable called TempCat are as follows:

library(dplyr)

library(magrittr)

airquality <- airquality %>%

                  mutate(., TempCat = case_when(Temp < 72 ~ "<72",

                                                Temp >= 72 & Temp < 79 ~ "[72,79)",

                                                 Temp >= 79 & Temp < 85 ~ "[79,85)",

                                                  Temp >= 85 ~ ">=85"))

By default, case_when() will treat the new variable, TempCat, as a character variable:

str(airquality)

To save TempCat as a factor, we can use this command:

airquality$TempCat <- factor(airquality$TempCat, levels = c("<72", "[72,79)", "[79,85)", ">=85"))

Notice how case_when() has the following general syntax for the above example:

case_when(condition1 ~ "category1",

         condition2 ~ "category2",

         condition3 ~ "category3",

         condition4 ~ "category4")

The bits in bold blue are what we use to specify the conditions involving the original numeric variable (in this case, Temp) and the bits in magenta are the names we give to the categories created via these conditions.

For specifying conditions using a numeric or integer variable, you can use operators such as:

< strictly less than
> strictly greater than
<= less than or equal to
>= greater than or equal to
== equal to

To combine conditions, you can use operators like:

& and
| or

If you wanted to create a variable with just two categories from Temp, you would use something like this instead:

airquality <- airquality %>%

               mutate(., TempCat = case_when(Temp < 79 ~ "<79",

                                             Temp >= 79 ~ ">=79"))

airquality$TempCat <- factor(airquality$TempCat, levels = c("<79", ">=79"))

To learn more about the case_when() function, you can explore its help function:

help(case_when, package="dplyr")




 

Home | About Us | Training | Case Studies | Testimonials | Consulting | Contact Us | Privacy Policy