Abstract
This is the third instalment of our tutorial series devoted to a place-based study of American University Men in China. In this tutorial, we rely on community detection (Louvain algorithm) to detect, visualize and analyze subgroups of alumni within place-based networks.
In the previous tutorial, we learnt how to build and analyze the network of places and its transposed network of colleges using various centrality measures. In this new installment, we will see how we can detect communities of more densely connected places/colleges within the two networks.
The purpose is twofold :
Before we proceed to community detection, let’s recapitulate the successive operations for building place-based networks.
Summary of previous steps
# load packages
library(Places)
library(tidyverse)
library(igraph)
# load original data
aucplaces <- read_delim("Data/aucdata.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)
aucplaces <- as.data.frame(aucplaces)
# retrieve places
Result1 <- places(aucplaces, "Name_eng", "University")
result1df <- as.data.frame(Result1$PlacesData)
# load annotated data (places manually labeled wit qualitative attributes)
place_attributes <- read_csv("Data/place_attributes.csv",
col_types = cols(...1 = col_skip())) # manually labeled places
# load university data (region)
univ_region <- read_delim("Data/univ_region.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)
# create network of places
bimod<-table(Result1$Edgelist$Places, Result1$Edgelist$Set)
PlacesMat<-bimod %*% t(bimod)
diag(PlacesMat)<-0
Pla1Net<-graph_from_adjacency_matrix(PlacesMat, mode="undirected", weighted = TRUE)
# create network of colleges
bimod2<-table(Result1$Edgelist$Set, Result1$Edgelist$Places)
PlacesMat2<-bimod2 %*% t(bimod2)
diag(PlacesMat2)<-0
Pla2Net<-graph_from_adjacency_matrix(PlacesMat2, mode="undirected", weighted = TRUE)
# extract main components
Pla1NetMC <- induced.subgraph(Pla1Net,vids=clusters(Pla1Net)$membership==1)
Pla2NetMC <- induced.subgraph(Pla2Net,vids=clusters(Pla2Net)$membership==3)
We proceed in four steps:
We tested four clustering methods that are available in igraph: Louvain (lvc), spin glass (sg), fast greedy (fg), and edge betweeness (Girvan-Newman) (eb). The two first methods are non-hierarchical, whereas the two last ones are hierarchical clustering methods. The results are summarized in the table below:
Code | Method | Network | ModScore | NbrClusters | MinSize | MaxSize |
---|---|---|---|---|---|---|
lvc1 | Louvain | Places | 0.52 | 7 | 10 | 46 |
lvc2 | Louvain | Colleges | 0.49 | 8 | 6 | 21 |
sg1 | Spin glass | Places | 0.52 | 8 | 10 | 36 |
sg2 | Spin glass | Colleges | 0.48 | 11 | 3 | 21 |
fg1 | Fast greedy | Places | 0.49 | 5 | 31 | 41 |
fg2 | Fast greedy | Colleges | 0.49 | 8 | 3 | 21 |
eb1 | Girvan-Newman (edge betweeness) | Places | 0.48 | 43 | 1 | 33 |
eb2 | Girvan-Newman (edge betweeness) | Colleges | 0.46 | 13 | 2 | 20 |
The following lines of code were used to cluster the networks
using the various methods and evaluate their respective performance:
# Louvain
## detect communities
lvc1 <- cluster_louvain(Pla1NetMC)
lvc2 <- cluster_louvain(Pla2NetMC)
## inspect results
print(lvc1) # 7 groups, modularity score (mod): 0.52
print(lvc2) # 8 groups, modularity score (mod): 0.49
# communities sizes
sizes(lvc1) # from 10 to 46
sizes(lvc2) # from 6 to 21
# Spin glass
## detect communities
sg1 <- cluster_spinglass(Pla1NetMC)
sg2 <- cluster_spinglass(Pla2NetMC)
## inspect results
print(sg1) # 8 groups, mod: 0.52
print(sg2) # 11 groups, mod: 0.48 # larger number of smaller groups
sizes(sg1) # size of clusters ranges from 10 to 36 nodes
sizes(sg2) # size of clusters ranges from 3 to 21 nodes
# Fast greedy
## detect communities
fg1 <- cluster_fast_greedy(Pla1NetMC)
fg2 <- cluster_fast_greedy(Pla2NetMC)
## inspect results
print(fg1) # 5, mod: 0.49 # # smaller number of larger groups (less skewed, more egalitarian)
print(fg2) # 8, mod: 0.49 # larger number of smaller groups (more skewed, less egalitarian)
sizes(fg1) # size of clusters : ranges from 31 to 41 nodes
sizes(fg2) # size of clusters : ranges from 3 to 21 nodes
# Girvan-Newman (Edge Betweenness)
## detect communities
eb1 <- cluster_edge_betweenness(Pla1NetMC)
eb2 <- cluster_edge_betweenness(Pla2NetMC)
## inspect results
print(eb1) # 43, mod: 0.48 # larger number of communities (more dispersed)
print(eb2) # 13, mod: 0.46 # larger number of smaller groups
sizes(eb1) # size of clusters : ranges from 10 to 36 nodes
sizes(eb2) # size of clusters : ranges from 3 to 21 nodes
In the next steps, we will rely on the Louvain
algorithm because it presents the highest modularity scores on both
networks and it provides the most meaningful results from a humanistic
perspective.
Modularity is a measure of how well groups have been partitioned into clusters. It compares the relationships in a cluster compared to what would be expected for a random (or other baseline) number of connections. Modularity measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. The modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.
The Louvain algorithm is known for being one of the fastest modularity-based algorithms and for working well with large graphs (hundreds or thousands of nodes and edges). It also reveals a hierarchy of communities at different scales, which is useful for understanding the global functioning of a network. The method consists of repeated application of two steps. The first step is a “greedy” assignment of nodes to communities, favoring local optimizations of modularity. The second step is the definition of a new coarse-grained network based on the communities found in the first step. These two steps are repeated until no further modularity-increasing reassignments of communities are possible.
More details about the Louvain method can be found here.
## detect communities with Louvain
lvc1 <- cluster_louvain(Pla1NetMC)
lvc2 <- cluster_louvain(Pla2NetMC)
Network of places
V(Pla1NetMC)$group <- lvc1$membership # create a group for each community
V(Pla1NetMC)$color <- lvc1$membership # node color reflects group membership
plot(lvc1, Pla1NetMC, vertex.label=V(Pla1NetMC)$id,
vertex.label.color = "black",
vertex.label.cex = 0.5,
vertex.size=1.8,
main="Communities of places (Louvain method)")