In the previous tutorial, we learnt how to build and analyze the network of places and its transposed network of colleges using various centrality measures. In this new installment, we will see how we can detect communities of more densely connected places/colleges within the two networks.

The purpose is twofold :

  • Substantively, to understand how academic communities took shape through the interconnection of students’ trajectories
  • Methodologically, to illustrate the duality of place-based networks (which we emphasized in the previous tutorial) and to demonstrate the value of jointly analyzing the network of places and its transposed network of colleges.

Before we proceed to community detection, let’s recapitulate the successive operations for building place-based networks.

Recap

Summary of previous steps

# load packages

library(Places)
library(tidyverse)
library(igraph)

# load original data 

aucplaces <- read_delim("Data/aucdata.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)
aucplaces <- as.data.frame(aucplaces) 

# retrieve places 

Result1 <- places(aucplaces, "Name_eng", "University")
result1df <- as.data.frame(Result1$PlacesData) 

# load annotated data (places manually labeled wit qualitative attributes)

place_attributes <- read_csv("Data/place_attributes.csv",
col_types = cols(...1 = col_skip())) # manually labeled places

# load university data (region)

univ_region <- read_delim("Data/univ_region.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)

# create network of places 

bimod<-table(Result1$Edgelist$Places, Result1$Edgelist$Set) 
PlacesMat<-bimod %*% t(bimod)
diag(PlacesMat)<-0 
Pla1Net<-graph_from_adjacency_matrix(PlacesMat, mode="undirected", weighted = TRUE)

# create network of colleges 

bimod2<-table(Result1$Edgelist$Set, Result1$Edgelist$Places)
PlacesMat2<-bimod2 %*% t(bimod2)
diag(PlacesMat2)<-0
Pla2Net<-graph_from_adjacency_matrix(PlacesMat2, mode="undirected", weighted = TRUE)

# extract main components
Pla1NetMC <- induced.subgraph(Pla1Net,vids=clusters(Pla1Net)$membership==1)
Pla2NetMC <- induced.subgraph(Pla2Net,vids=clusters(Pla2Net)$membership==3)

Workflow

We proceed in four steps:

  1. We compare various clustering methods and select the most appropriate.
  2. We analyze the size of communities and their membership
  3. We extract, visualize and compare the largest communities (their global features)
  4. We extract global measures for each cluster and we apply Principal Component Analysis (PCA) to group communities based on their structural attributes.

Method selection

We tested four clustering methods that are available in igraph: Louvain (lvc), spin glass (sg), fast greedy (fg), and edge betweeness (Girvan-Newman) (eb). The two first methods are non-hierarchical, whereas the two last ones are hierarchical clustering methods. The results are summarized in the table below:

Code Method Network ModScore NbrClusters MinSize MaxSize
lvc1 Louvain Places 0.52 7 10 46
lvc2 Louvain Colleges 0.49 8 6 21
sg1 Spin glass Places 0.52 8 10 36
sg2 Spin glass Colleges 0.48 11 3 21
fg1 Fast greedy Places 0.49 5 31 41
fg2 Fast greedy Colleges 0.49 8 3 21
eb1 Girvan-Newman (edge betweeness) Places 0.48 43 1 33
eb2 Girvan-Newman (edge betweeness) Colleges 0.46 13 2 20


The following lines of code were used to cluster the networks using the various methods and evaluate their respective performance:

# Louvain 

## detect communities
lvc1 <- cluster_louvain(Pla1NetMC)
lvc2 <- cluster_louvain(Pla2NetMC)
## inspect results 
print(lvc1) # 7 groups, modularity score (mod): 0.52
print(lvc2) # 8 groups, modularity score (mod): 0.49
# communities sizes
sizes(lvc1) # from 10 to 46
sizes(lvc2) # from 6 to 21

# Spin glass 

## detect communities
sg1 <- cluster_spinglass(Pla1NetMC)
sg2 <- cluster_spinglass(Pla2NetMC)
## inspect results
print(sg1) #  8 groups, mod: 0.52 
print(sg2) #  11 groups, mod: 0.48 # larger number of smaller groups
sizes(sg1) # size of clusters ranges from 10 to 36 nodes
sizes(sg2) # size of clusters ranges from 3 to 21 nodes

# Fast greedy 

## detect communities
fg1 <- cluster_fast_greedy(Pla1NetMC)
fg2 <- cluster_fast_greedy(Pla2NetMC)
## inspect results
print(fg1) #  5, mod: 0.49 # # smaller number of larger groups (less skewed, more egalitarian)
print(fg2) #  8, mod: 0.49 # larger number of smaller groups (more skewed, less egalitarian)
sizes(fg1) # size of clusters : ranges from 31 to 41 nodes
sizes(fg2) # size of clusters : ranges from 3 to 21 nodes

# Girvan-Newman (Edge Betweenness) 

## detect communities
eb1 <- cluster_edge_betweenness(Pla1NetMC)
eb2 <- cluster_edge_betweenness(Pla2NetMC)
## inspect results
print(eb1) #  43, mod: 0.48 # larger number of communities (more dispersed)
print(eb2) #  13, mod: 0.46 # larger number of smaller groups
sizes(eb1) # size of clusters : ranges from 10 to 36 nodes
sizes(eb2) # size of clusters : ranges from 3 to 21 nodes


In the next steps, we will rely on the Louvain algorithm because it presents the highest modularity scores on both networks and it provides the most meaningful results from a humanistic perspective.

Modularity is a measure of how well groups have been partitioned into clusters. It compares the relationships in a cluster compared to what would be expected for a random (or other baseline) number of connections. Modularity measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. The modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

The Louvain algorithm is known for being one of the fastest modularity-based algorithms and for working well with large graphs (hundreds or thousands of nodes and edges). It also reveals a hierarchy of communities at different scales, which is useful for understanding the global functioning of a network. The method consists of repeated application of two steps. The first step is a “greedy” assignment of nodes to communities, favoring local optimizations of modularity. The second step is the definition of a new coarse-grained network based on the communities found in the first step. These two steps are repeated until no further modularity-increasing reassignments of communities are possible.

More details about the Louvain method can be found here.

Community detection

## detect communities with Louvain
lvc1 <- cluster_louvain(Pla1NetMC)
lvc2 <- cluster_louvain(Pla2NetMC)

Plot communities

Network of places

V(Pla1NetMC)$group <- lvc1$membership # create a group for each community
V(Pla1NetMC)$color <- lvc1$membership # node color reflects group membership 

plot(lvc1, Pla1NetMC, vertex.label=V(Pla1NetMC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size=1.8,
     main="Communities of places (Louvain method)")