Abstract

This is the third instalment of our tutorial series devoted to a place-based study of American University Men in China. In this tutorial, we rely on community detection (Louvain algorithm) to detect, visualize and analyze subgroups of alumni within place-based networks.

In the previous tutorial, we learnt how to build and analyze the network of places and its transposed network of colleges using various centrality measures. In this new installment, we will see how we can detect communities of more densely connected places/colleges within the two networks.

The purpose is twofold :

- Substantively, to understand how academic communities took shape through the interconnection of students’ trajectories
- Methodologically, to illustrate the duality of place-based networks (which we emphasized in the previous tutorial) and to demonstrate the value of jointly analyzing the network of places and its transposed network of colleges.

Before we proceed to community detection, let’s recapitulate the successive operations for building place-based networks.

**Summary of previous steps**

```
# load packages
library(Places)
library(tidyverse)
library(igraph)
# load original data
aucplaces <- read_delim("Data/aucdata.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)
aucplaces <- as.data.frame(aucplaces)
# retrieve places
Result1 <- places(aucplaces, "Name_eng", "University")
result1df <- as.data.frame(Result1$PlacesData)
# load annotated data (places manually labeled wit qualitative attributes)
place_attributes <- read_csv("Data/place_attributes.csv",
col_types = cols(...1 = col_skip())) # manually labeled places
# load university data (region)
univ_region <- read_delim("Data/univ_region.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)
# create network of places
bimod<-table(Result1$Edgelist$Places, Result1$Edgelist$Set)
PlacesMat<-bimod %*% t(bimod)
diag(PlacesMat)<-0
Pla1Net<-graph_from_adjacency_matrix(PlacesMat, mode="undirected", weighted = TRUE)
# create network of colleges
bimod2<-table(Result1$Edgelist$Set, Result1$Edgelist$Places)
PlacesMat2<-bimod2 %*% t(bimod2)
diag(PlacesMat2)<-0
Pla2Net<-graph_from_adjacency_matrix(PlacesMat2, mode="undirected", weighted = TRUE)
# extract main components
Pla1NetMC <- induced.subgraph(Pla1Net,vids=clusters(Pla1Net)$membership==1)
Pla2NetMC <- induced.subgraph(Pla2Net,vids=clusters(Pla2Net)$membership==3)
```

We proceed in four steps:

- We compare various clustering methods and select the most appropriate.
- We analyze the size of communities and their membership
- We extract, visualize and compare the largest communities (their global features)
- We extract global measures for each cluster and we apply Principal Component Analysis (PCA) to group communities based on their structural attributes.

We tested four clustering methods that are available in igraph: Louvain (lvc), spin glass (sg), fast greedy (fg), and edge betweeness (Girvan-Newman) (eb). The two first methods are non-hierarchical, whereas the two last ones are hierarchical clustering methods. The results are summarized in the table below:

Code | Method | Network | ModScore | NbrClusters | MinSize | MaxSize |
---|---|---|---|---|---|---|

lvc1 | Louvain | Places | 0.52 | 7 | 10 | 46 |

lvc2 | Louvain | Colleges | 0.49 | 8 | 6 | 21 |

sg1 | Spin glass | Places | 0.52 | 8 | 10 | 36 |

sg2 | Spin glass | Colleges | 0.48 | 11 | 3 | 21 |

fg1 | Fast greedy | Places | 0.49 | 5 | 31 | 41 |

fg2 | Fast greedy | Colleges | 0.49 | 8 | 3 | 21 |

eb1 | Girvan-Newman (edge betweeness) | Places | 0.48 | 43 | 1 | 33 |

eb2 | Girvan-Newman (edge betweeness) | Colleges | 0.46 | 13 | 2 | 20 |

The following lines of code were used to cluster the networks
using the various methods and evaluate their respective performance:

```
# Louvain
## detect communities
lvc1 <- cluster_louvain(Pla1NetMC)
lvc2 <- cluster_louvain(Pla2NetMC)
## inspect results
print(lvc1) # 7 groups, modularity score (mod): 0.52
print(lvc2) # 8 groups, modularity score (mod): 0.49
# communities sizes
sizes(lvc1) # from 10 to 46
sizes(lvc2) # from 6 to 21
# Spin glass
## detect communities
sg1 <- cluster_spinglass(Pla1NetMC)
sg2 <- cluster_spinglass(Pla2NetMC)
## inspect results
print(sg1) # 8 groups, mod: 0.52
print(sg2) # 11 groups, mod: 0.48 # larger number of smaller groups
sizes(sg1) # size of clusters ranges from 10 to 36 nodes
sizes(sg2) # size of clusters ranges from 3 to 21 nodes
# Fast greedy
## detect communities
fg1 <- cluster_fast_greedy(Pla1NetMC)
fg2 <- cluster_fast_greedy(Pla2NetMC)
## inspect results
print(fg1) # 5, mod: 0.49 # # smaller number of larger groups (less skewed, more egalitarian)
print(fg2) # 8, mod: 0.49 # larger number of smaller groups (more skewed, less egalitarian)
sizes(fg1) # size of clusters : ranges from 31 to 41 nodes
sizes(fg2) # size of clusters : ranges from 3 to 21 nodes
# Girvan-Newman (Edge Betweenness)
## detect communities
eb1 <- cluster_edge_betweenness(Pla1NetMC)
eb2 <- cluster_edge_betweenness(Pla2NetMC)
## inspect results
print(eb1) # 43, mod: 0.48 # larger number of communities (more dispersed)
print(eb2) # 13, mod: 0.46 # larger number of smaller groups
sizes(eb1) # size of clusters : ranges from 10 to 36 nodes
sizes(eb2) # size of clusters : ranges from 3 to 21 nodes
```

In the next steps, we will rely on the Louvain
algorithm because it presents the highest modularity scores on both
networks and it provides the most meaningful results from a humanistic
perspective.

**Modularity** is a measure of how well groups have been
partitioned into clusters. It compares the relationships in a cluster
compared to what would be expected for a random (or other baseline)
number of connections. Modularity measures the quality (i.e., presumed
accuracy) of a community grouping by comparing its relationship density
to a suitably defined random network. The modularity quantifies the
quality of an assignment of nodes to communities by evaluating how much
more densely connected the nodes within a community are, compared to how
connected they would be in a random network.

The **Louvain algorithm** is known for being one of the
fastest modularity-based algorithms and for working well with large
graphs (hundreds or thousands of nodes and edges). It also reveals a
hierarchy of communities at different scales, which is useful for
understanding the global functioning of a network. The method consists
of repeated application of two steps. The first step is a “greedy”
assignment of nodes to communities, favoring local optimizations of
modularity. The second step is the definition of a new coarse-grained
network based on the communities found in the first step. These two
steps are repeated until no further modularity-increasing reassignments
of communities are possible.

More details about the Louvain method can be found here.

```
## detect communities with Louvain
lvc1 <- cluster_louvain(Pla1NetMC)
lvc2 <- cluster_louvain(Pla2NetMC)
```

Network of places

```
V(Pla1NetMC)$group <- lvc1$membership # create a group for each community
V(Pla1NetMC)$color <- lvc1$membership # node color reflects group membership
plot(lvc1, Pla1NetMC, vertex.label=V(Pla1NetMC)$id,
vertex.label.color = "black",
vertex.label.cex = 0.5,
vertex.size=1.8,
main="Communities of places (Louvain method)")
```