srakaless.blogg.se - Dplyr rename

Dplyr rename update#

While there are numerous ways to rename columns within R, I’ve found that dplyr’s approach is arguably one of the most intuitive. This is particularly handy if you’re sharing your work with others, or indeed if you’re in an environment where multiple people are working on the same data, meaning that clarity is key. This can be handy if you want to join two dataframes on a key, and it’s easier to just rename the column than specifying further in the join.Īlternatively, from a data munging perspective, sometimes you can have unhelpful column names like x1, x2, x3, so cleaning these up makes your dataframes and work more legible. The subsequent arguments can be copied as is.With dplyr, it’s super easy to rename columns within your dataframe. If there was a single element in vars() you can remove vars(), otherwise replace it with c().

Strip the _if(), _at() and _all() suffix off the function.Ĭall across().

Dplyr rename update#

If you want to update your existing code to use across() instead of the _if, _at, or _all() functions, it’s generally straightforward: It’s a bummer that we had a few false starts before we discovered across(), but even with hindsight, I don’t see how we could’ve skipped the intermediate steps. Vctrs package, where we learnt that you can have a column of a data frame that is itself a data frame. Why did it take it long to discover across()? Surprisingly, the key idea that makes across() works came out of our low-level work on the The _at() functions are the only place in dplyr where you have to use vars(), which makes them unusual, and hence harder to learn and remember. For example, you can now transform all numeric columns whose name begins with “x”: across(where(is.numeric) & starts_with("x")).Īcross() doesn’t need vars(). With the where() helper, across() unifies _if and _at semantics, allowing combinations that used to be impossible. This makes dplyr easier for you to use (because there are fewer functions to remember) and easier for us to develop (since we only need to implement one function for each new verb, not four). For example, it’s now easy to summarise numeric vectors with one function, factors with another, and still compute the number of rows in each group:ĭf %>% group_by ( g1, g2 ) %>% summarise ( across ( where ( is.numeric ), mean ), across ( where ( is.factor ), nlevels ), n = n (), )Īcross() reduces the number of functions that dplyr needs to provide. Why did we decide to move away from these functions in favour of across()?Īcross() makes it possible to compute useful summaries that were previously impossible. This means that they’ll stay around, but will only receive critical bug fixes. These functions solved a pressing need and are used by many people, but are now superseded. If you’ve tackled this problem with an older version of dplyr, you might’ve used one of the functions with an _if, _at, or _all suffix.

If needed, you can access the name of the column currently being processed with There are three cool features you might be particularly interested in: Library ( dplyr, nflicts = FALSE ) starwars %>% summarise ( across ( where ( is.character ), n_distinct )) #> # A tibble: 1 x 8 #> name hair_color skin_color eye_color sex gender homeworld species #> #> 1 87 13 31 15 5 3 49 38 starwars %>% group_by ( species ) %>% filter ( n () > 1 ) %>% summarise ( across ( c ( sex, gender, homeworld ), n_distinct )) #> `summarise()` ungrouping output (override with `.groups` argument) #> # A tibble: 9 x 4 #> species sex gender homeworld #> #> 1 Droid 1 2 3 #> 2 Gungan 1 1 1 #> 3 Human 2 2 16 #> 4 Kaminoan 2 2 1 #> 5 Mirialan 1 1 1 #> 6 Twi'lek 2 2 1 #> 7 Wookiee 1 1 1 #> 8 Zabrak 1 1 2 #> 9 1 1 3 starwars %>% group_by ( homeworld ) %>% filter ( n () > 1 ) %>% summarise ( across ( where ( is.numeric ), mean, na.rm = TRUE ), n = n ()) #> `summarise()` ungrouping output (override with `.groups` argument) #> # A tibble: 10 x 5 #> homeworld height mass birth_year n #> #> 1 Alderaan 176.