This is the third instalment of our tutorial series devoted to a place-based study of American University Men in China. In this tutorial, we rely on community detection (Louvain algorithm) to detect, visualize and analyze subgroups of alumni within place-based networks.
In the previous tutorial, we learnt how to build and analyze the network of places and its transposed network of colleges using various centrality measures. In this new installment, we will see how we can detect communities of more densely connected places/colleges within the two networks.
The purpose is twofold :
Before we proceed to community detection, let’s recapitulate the successive operations for building place-based networks.
Summary of previous steps
# load packages library(Places) library(tidyverse) library(igraph) # load original data aucplaces <- read_delim("Data/aucdata.csv", delim = ";", escape_double = FALSE, trim_ws = TRUE) aucplaces <- as.data.frame(aucplaces) # retrieve places Result1 <- places(aucplaces, "Name_eng", "University") result1df <- as.data.frame(Result1$PlacesData) # load annotated data (places manually labeled wit qualitative attributes) place_attributes <- read_csv("Data/place_attributes.csv", col_types = cols(...1 = col_skip())) # manually labeled places # load university data (region) univ_region <- read_delim("Data/univ_region.csv", delim = ";", escape_double = FALSE, trim_ws = TRUE) # create network of places bimod<-table(Result1$Edgelist$Places, Result1$Edgelist$Set) PlacesMat<-bimod %*% t(bimod) diag(PlacesMat)<-0 Pla1Net<-graph_from_adjacency_matrix(PlacesMat, mode="undirected", weighted = TRUE) # create network of colleges bimod2<-table(Result1$Edgelist$Set, Result1$Edgelist$Places) PlacesMat2<-bimod2 %*% t(bimod2) diag(PlacesMat2)<-0 Pla2Net<-graph_from_adjacency_matrix(PlacesMat2, mode="undirected", weighted = TRUE) # extract main components Pla1NetMC <- induced.subgraph(Pla1Net,vids=clusters(Pla1Net)$membership==1) Pla2NetMC <- induced.subgraph(Pla2Net,vids=clusters(Pla2Net)$membership==3)
We proceed in four steps:
We tested four clustering methods that are available in igraph: Louvain (lvc), spin glass (sg), fast greedy (fg), and edge betweeness (Girvan-Newman) (eb). The two first methods are non-hierarchical, whereas the two last ones are hierarchical clustering methods. The results are summarized in the table below:
|eb1||Girvan-Newman (edge betweeness)||Places||0.48||43||1||33|
|eb2||Girvan-Newman (edge betweeness)||Colleges||0.46||13||2||20|
The following lines of code were used to cluster the networks using the various methods and evaluate their respective performance:
# Louvain ## detect communities lvc1 <- cluster_louvain(Pla1NetMC) lvc2 <- cluster_louvain(Pla2NetMC) ## inspect results print(lvc1) # 7 groups, modularity score (mod): 0.52 print(lvc2) # 8 groups, modularity score (mod): 0.49 # communities sizes sizes(lvc1) # from 10 to 46 sizes(lvc2) # from 6 to 21 # Spin glass ## detect communities sg1 <- cluster_spinglass(Pla1NetMC) sg2 <- cluster_spinglass(Pla2NetMC) ## inspect results print(sg1) # 8 groups, mod: 0.52 print(sg2) # 11 groups, mod: 0.48 # larger number of smaller groups sizes(sg1) # size of clusters ranges from 10 to 36 nodes sizes(sg2) # size of clusters ranges from 3 to 21 nodes # Fast greedy ## detect communities fg1 <- cluster_fast_greedy(Pla1NetMC) fg2 <- cluster_fast_greedy(Pla2NetMC) ## inspect results print(fg1) # 5, mod: 0.49 # # smaller number of larger groups (less skewed, more egalitarian) print(fg2) # 8, mod: 0.49 # larger number of smaller groups (more skewed, less egalitarian) sizes(fg1) # size of clusters : ranges from 31 to 41 nodes sizes(fg2) # size of clusters : ranges from 3 to 21 nodes # Girvan-Newman (Edge Betweenness) ## detect communities eb1 <- cluster_edge_betweenness(Pla1NetMC) eb2 <- cluster_edge_betweenness(Pla2NetMC) ## inspect results print(eb1) # 43, mod: 0.48 # larger number of communities (more dispersed) print(eb2) # 13, mod: 0.46 # larger number of smaller groups sizes(eb1) # size of clusters : ranges from 10 to 36 nodes sizes(eb2) # size of clusters : ranges from 3 to 21 nodes
In the next steps, we will rely on the Louvain algorithm because it presents the highest modularity scores on both networks and it provides the most meaningful results from a humanistic perspective.
Modularity is a measure of how well groups have been partitioned into clusters. It compares the relationships in a cluster compared to what would be expected for a random (or other baseline) number of connections. Modularity measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. The modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.
The Louvain algorithm is known for being one of the fastest modularity-based algorithms and for working well with large graphs (hundreds or thousands of nodes and edges). It also reveals a hierarchy of communities at different scales, which is useful for understanding the global functioning of a network. The method consists of repeated application of two steps. The first step is a “greedy” assignment of nodes to communities, favoring local optimizations of modularity. The second step is the definition of a new coarse-grained network based on the communities found in the first step. These two steps are repeated until no further modularity-increasing reassignments of communities are possible.
More details about the Louvain method can be found here.
## detect communities with Louvain lvc1 <- cluster_louvain(Pla1NetMC) lvc2 <- cluster_louvain(Pla2NetMC)
Network of places
V(Pla1NetMC)$group <- lvc1$membership # create a group for each community V(Pla1NetMC)$color <- lvc1$membership # node color reflects group membership plot(lvc1, Pla1NetMC, vertex.label=V(Pla1NetMC)$id, vertex.label.color = "black", vertex.label.cex = 0.5, vertex.size=1.8, main="Communities of places (Louvain method)")