How to aggregate categorical data in R?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







7















I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



       Category.x  Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



           Category.x    Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?










share|improve this question


















  • 4





    Looks like you need table(df1)

    – akrun
    Apr 2 at 16:27











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    Apr 2 at 16:29













  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    Apr 2 at 16:43


















7















I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



       Category.x  Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



           Category.x    Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?










share|improve this question


















  • 4





    Looks like you need table(df1)

    – akrun
    Apr 2 at 16:27











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    Apr 2 at 16:29













  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    Apr 2 at 16:43














7












7








7


1






I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



       Category.x  Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



           Category.x    Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?










share|improve this question














I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:



       Category.x  Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar


I would like to come up with a table like this:



           Category.x    Category.y
Better 2 2
Similar 1 2
Worse 1 0


How would you go about it?







r aggregate






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Apr 2 at 16:26









DanielDaniel

665




665








  • 4





    Looks like you need table(df1)

    – akrun
    Apr 2 at 16:27











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    Apr 2 at 16:29













  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    Apr 2 at 16:43














  • 4





    Looks like you need table(df1)

    – akrun
    Apr 2 at 16:27











  • Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

    – Daniel
    Apr 2 at 16:29













  • I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

    – akrun
    Apr 2 at 16:43








4




4





Looks like you need table(df1)

– akrun
Apr 2 at 16:27





Looks like you need table(df1)

– akrun
Apr 2 at 16:27













Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

– Daniel
Apr 2 at 16:29







Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?

– Daniel
Apr 2 at 16:29















I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

– akrun
Apr 2 at 16:43





I would convert to factor with common levels lvls <- unique(unlist(df1)); df1 <- lapply(df1, factor, levels = lvls) and then do the table(df1)

– akrun
Apr 2 at 16:43












3 Answers
3






active

oldest

votes


















7














As mentioned in the comments, table is standard for this, like



table(stack(DT))

ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


or



table(value = unlist(DT), cat = names(DT)[col(DT)])

cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0


or



with(reshape(DT, direction = "long", varying = 1:2), 
table(value = Category, cat = time)
)

cat
value x y
Better 2 2
Similar 1 2
Worse 1 0





share|improve this answer































    3














    sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
    # Category.x Category.y
    #Better 2 2
    #Similar 1 2
    #Worse 1 0





    share|improve this answer































      2














      One dplyr and tidyr possibility could be:



      df %>%
      gather(var, val) %>%
      count(var, val) %>%
      spread(var, n, fill = 0)

      val Category.x Category.y
      <chr> <dbl> <dbl>
      1 Better 2 2
      2 Similar 1 2
      3 Worse 1 0


      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



      Or with dplyr and reshape2 you can do:



      df %>%
      mutate(rowid = row_number()) %>%
      melt(., id.vars = "rowid") %>%
      count(variable, value) %>%
      dcast(value ~ variable, value.var = "n", fill = 0)

      value Category.x Category.y
      1 Better 2 2
      2 Similar 1 2
      3 Worse 1 0





      share|improve this answer


























      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

        – Daniel
        Apr 2 at 16:56











      • Please see the updated post for commentary.

        – tmfmnk
        Apr 2 at 17:04












      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      7














      As mentioned in the comments, table is standard for this, like



      table(stack(DT))

      ind
      values Category.x Category.y
      Better 2 2
      Similar 1 2
      Worse 1 0


      or



      table(value = unlist(DT), cat = names(DT)[col(DT)])

      cat
      value Category.x Category.y
      Better 2 2
      Similar 1 2
      Worse 1 0


      or



      with(reshape(DT, direction = "long", varying = 1:2), 
      table(value = Category, cat = time)
      )

      cat
      value x y
      Better 2 2
      Similar 1 2
      Worse 1 0





      share|improve this answer




























        7














        As mentioned in the comments, table is standard for this, like



        table(stack(DT))

        ind
        values Category.x Category.y
        Better 2 2
        Similar 1 2
        Worse 1 0


        or



        table(value = unlist(DT), cat = names(DT)[col(DT)])

        cat
        value Category.x Category.y
        Better 2 2
        Similar 1 2
        Worse 1 0


        or



        with(reshape(DT, direction = "long", varying = 1:2), 
        table(value = Category, cat = time)
        )

        cat
        value x y
        Better 2 2
        Similar 1 2
        Worse 1 0





        share|improve this answer


























          7












          7








          7







          As mentioned in the comments, table is standard for this, like



          table(stack(DT))

          ind
          values Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          table(value = unlist(DT), cat = names(DT)[col(DT)])

          cat
          value Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          with(reshape(DT, direction = "long", varying = 1:2), 
          table(value = Category, cat = time)
          )

          cat
          value x y
          Better 2 2
          Similar 1 2
          Worse 1 0





          share|improve this answer













          As mentioned in the comments, table is standard for this, like



          table(stack(DT))

          ind
          values Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          table(value = unlist(DT), cat = names(DT)[col(DT)])

          cat
          value Category.x Category.y
          Better 2 2
          Similar 1 2
          Worse 1 0


          or



          with(reshape(DT, direction = "long", varying = 1:2), 
          table(value = Category, cat = time)
          )

          cat
          value x y
          Better 2 2
          Similar 1 2
          Worse 1 0






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Apr 2 at 16:48









          FrankFrank

          56.1k660135




          56.1k660135

























              3














              sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
              # Category.x Category.y
              #Better 2 2
              #Similar 1 2
              #Worse 1 0





              share|improve this answer




























                3














                sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
                # Category.x Category.y
                #Better 2 2
                #Similar 1 2
                #Worse 1 0





                share|improve this answer


























                  3












                  3








                  3







                  sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
                  # Category.x Category.y
                  #Better 2 2
                  #Similar 1 2
                  #Worse 1 0





                  share|improve this answer













                  sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
                  # Category.x Category.y
                  #Better 2 2
                  #Similar 1 2
                  #Worse 1 0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 2 at 16:33









                  d.bd.b

                  20.5k41949




                  20.5k41949























                      2














                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0





                      share|improve this answer


























                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        Apr 2 at 16:56











                      • Please see the updated post for commentary.

                        – tmfmnk
                        Apr 2 at 17:04
















                      2














                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0





                      share|improve this answer


























                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        Apr 2 at 16:56











                      • Please see the updated post for commentary.

                        – tmfmnk
                        Apr 2 at 17:04














                      2












                      2








                      2







                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0





                      share|improve this answer















                      One dplyr and tidyr possibility could be:



                      df %>%
                      gather(var, val) %>%
                      count(var, val) %>%
                      spread(var, n, fill = 0)

                      val Category.x Category.y
                      <chr> <dbl> <dbl>
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0


                      It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.



                      Or with dplyr and reshape2 you can do:



                      df %>%
                      mutate(rowid = row_number()) %>%
                      melt(., id.vars = "rowid") %>%
                      count(variable, value) %>%
                      dcast(value ~ variable, value.var = "n", fill = 0)

                      value Category.x Category.y
                      1 Better 2 2
                      2 Similar 1 2
                      3 Worse 1 0






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Apr 2 at 17:58

























                      answered Apr 2 at 16:41









                      tmfmnktmfmnk

                      3,6661516




                      3,6661516













                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        Apr 2 at 16:56











                      • Please see the updated post for commentary.

                        – tmfmnk
                        Apr 2 at 17:04



















                      • Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                        – Daniel
                        Apr 2 at 16:56











                      • Please see the updated post for commentary.

                        – tmfmnk
                        Apr 2 at 17:04

















                      Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                      – Daniel
                      Apr 2 at 16:56





                      Is var = Category.x and val= c('Better', 'Similar', 'Worse')?

                      – Daniel
                      Apr 2 at 16:56













                      Please see the updated post for commentary.

                      – tmfmnk
                      Apr 2 at 17:04





                      Please see the updated post for commentary.

                      – tmfmnk
                      Apr 2 at 17:04


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Færeyskur hestur Heimild | Tengill | Tilvísanir | LeiðsagnarvalRossið - síða um færeyska hrossið á færeyskuGott ár hjá færeyska hestinum

                      He _____ here since 1970 . Answer needed [closed]What does “since he was so high” mean?Meaning of “catch birds for”?How do I ensure “since” takes the meaning I want?“Who cares here” meaningWhat does “right round toward” mean?the time tense (had now been detected)What does the phrase “ring around the roses” mean here?Correct usage of “visited upon”Meaning of “foiled rail sabotage bid”It was the third time I had gone to Rome or It is the third time I had been to Rome

                      Bunad