In this post I’ll discuss the aggregate and merge functions. The reason I’ve selected these two functions together is because of their contrasting nature. While merge function is useful in expanding the data frame by combining multiple data frames, the aggregate function is used to get a concise view of the data frame. Let’s see how they are used.
I’ll start with creating a simple table called soccer with 3 columns and 6 rows that looks like below:
soccer <- data.frame(player= c("A", "B", "A", "B", "C","C"), goals=c(1,2,2,1,3,1), match=c(1,2,3,3,1,2))
player goals match
A 1 1
B 2 2
A 2 3
B 1 3
C 3 1
C 1 2
In the above table player column is for player name, goals column indicates number of goals scored by the player in a particular match. The match column denotes the serial number of the match. Now, if in the above table I want to find out the summary of total goals scored by a player I’ll use aggregate. Below is an simple example of aggregate function applied to the soccer table:
aggregate (goals~ player, data=soccer, sum)
In the above function we are passing 3 parameters- the first indicating a relation, the second is source of data and the third is a function name. The function we mention will be applied to the variable on the left side of tilde(~). So, in this case we will get a sum of goals. The sum will be found for each unique value of the variable(s) on the right side of tilde(~). Here, we have only player on right. So for each player the sum of goals will be calculated.
Now, imagine there is a second table with name of players and their teams. We want to combine these two tables to see the goals scored by each team in all these matches. For that we need to use the merge function.
teams<- data.frame(player=c("A", "B", "C"), teams=c("team1", "team2","team1"))
The merge function can be used as below
df3<- merge(soccer, teams, by="player")
player goals match teams
A 1 1 team1
A 2 3 team1
B 2 2 team2
B 1 3 team2
C 3 1 team1
C 1 2 team1
Now if we want to find out the team wise aggregate we have to apply aggregate function on the table df3.
aggregate (goals~ teams, data=df3, sum)
1 team1 7
2 team2 3