---
title: "A place-based study of alumni networks in modern China"
subtitle: "Based on the directory of American University Men in Shanghai (1936)"
author: "Cécile Armand"
affiliation: Aix-Marseille University
date: "`r lubridate::today()`"
tags: [directory, newspaper, circulation, periodical, press, publisher]
abstract: |
This tutorial series applies a place-based methodology to study Sino-American alumni networks in modern China, based on a directory of the American University Club of Shanghai published in 1936. In this first instalment, we show how to find and analyze places in two-mode relational data using the R package "Places" (Delio Lucena).
output:
html_document:
toc: true
toc_float:
collapsed: false
smooth_scroll: false
toc_depth: 2
number_sections: false
code_folding: show # hide
fig_caption: true
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(Places)
library(tidyverse)
library(knitr)
library(kableExtra)
```
# Research context
This research originates in a directory of the American University Club (AUC) of Shanghai published in 1936^[American University Club of Shanghai. American University Men in China. Shanghai: Comacrib Press, 1936. We are most grateful to Dr. Jiang Jie (Shanghai Normal University) who kindly provided us with a digital copy of the directory.]. The AUC was one of the earliest and most important organizations of American college alumni in pre-1949 China. It was established around 1902 by American expatriates in Shanghai. Membership was initially restricted to foreigners, but the club began to admit Chinese in 1908. Thereafter, Chinese members steadily increased and became the majority in the early 1930s (rising from 97 (out of 196) in 1930 to 207 (out of 383 in 1933) and 204 (out of 396) in 1935. The main goal of the club was to provide former graduates of American universities with a common meeting ground in China. It held an annual dinner, monthly tiffins, garden parties, barbecues, dinner dances and many other social gatherings. The club granted scholarships to prospective students in the United States and organized conferences to disseminate useful information about study abroad and other issues related to education.
The directory provides the list of members with their academic curricula (degree, college and year of graduation) (Fig.1). The complete dataset can be downloaded on [Zenodo](https://zenodo.org/record/6370085#.YjXnyJrMKkg).

Our assumption is that we can process this list in a systematic way in order to reconstruct alumni networks within this population. The first step consists in reconstructing the individuals’ academic trajectories and identifying the individuals who attended their same colleges. The underlying hypothesis is that having attended the same college
* either created a potential for future collaboration later in their career (maximal hypothesis)
* or at the very least, created a shared cultural background and experience, which eventually contributed to building a common identity for international alumni and returned students as a new social group in post imperial China (minimal hypothesis)
The main challenge is that, more than often, individuals had attended more than one college (2 on average, with a maximum of 6). I argue that a place-based approach is a suitable solution to address this challenge.
# What are places?
The concept of *place* should not be understood in the geographical sense. First conceptualized by sociologist N. Pizarro (2000, 2002, 2007), it is more akin to the notion of "structural equivalence" developed in network analysis^[Pizarro, Narciso. “Structural Identity and Equivalence of Individuals in Social Networks.” *International Sociology* 22, no. 6 (2007): 767–92 ; “Appartenances, places et réseaux de places. La reproduction des processus sociaux et la génération d’un espace homogène pour la définition des structures sociales.” *Sociologie et sociétés* 31, no. 1 (2002): 143–61; “Regularidad Relacional, Redes de Lugares y Reproduccion Social.” *Politica y Sociedad* 33 (2000)]. To put it simply, two (or more) individuals belong to the same place if they are related to exactly the same institutions. In our example, two students belong to the same place if they attended exactly the same college(s).
We can rely on places when the following conditions are met:
* Our data consists of two-mode relational data (club membership, interlocking boards, participation in events, etc.).
* It involves multiple membership: Individuals often belong to/are related to more than one institution.
* The range of membership per individual should not be too wide.
* The distribution of members across institutions should not be too skewed.
Note: In order to alleviate the impact of long-tail distributions, we can adopt a more flexible approach based on *k-places* or "regular equivalence". This approach consists in applying a tolerance threshold (k) during the process of place detection. For instance, if we set k = 1, we admit that two (or more) individuals may differ by one institution; if k = 2, two (or more) individuals may differ by 2 institutions, and so on.
In this tutorial, we aim to demonstrate that this place-based methodology is a powerful alternative to the usual reliance on bipartite networks or one-mode projections.

As shown on the above diagram (Fig.2), places present two major advantages:
* they allow to reduce the network without losing too much information;
* they retain the same duality property that we find in two-mode networks.
# Objectives
The purpose of this tutorial is twofold:
1. *Substantively*, we aim to identify shared patterns of college affiliation and academic trajectories among our population of American University Men. Ultimately, we seek to better understand the role of alumni networks in shaping the collective identity of American-educated elites in modern China. The underlying hypothesis is twofold: (a) American-educated Chinese played a crucial role in the formation of transnational alumni networks in the early 20th century. (b) These alumni networks, in turn, were essential to individuals' professional career and to strengthening Sino-American relationships at a broader level.
2. *Methodologically*, we aim to devise a standard [workflow](https://www.xmind.net/m/YX2g4H) for detecting, analyzing and interpreting places in two-mode data. We hope that any scholar working with two-mode data can further reuse and adapt this workflow to her own data and research questions (Fig.3).

For the interactive version, see [Xmind](https://www.xmind.net/m/YX2g4H).
In the following, we will focus on:
1. Detecting and analyzing places (steps 2 and 3)
2. Building and analyzing place-based networks (steps 4, 5, 6)
3. Filtering the dataset to analyze place formation over time (optional step)
# Data
We load the data and we inspect its class (the dataset must be in a "dataframe" format so that the *place()* function can be applied):
```{r warning = FALSE, message = FALSE}
aucplaces <- read_delim("Data/aucdata.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)
aucplaces <- as.data.frame(aucplaces)
class(aucplaces) # inspect its class
```
The dataset consists of an edge list linking individuals (students) and the colleges they attended. It also includes various attributes related to:
* the individuals (elements): nationality, employer/professional affiliation in 1936;
* the colleges (sets): state/region;
* the links (curricula): degree (level and field of study), year of graduation.
The data includes 418 unique students, among which 234 Chinese (56%) and 184 non Chinese, mostly Americans (43%), and 4 Japanese.
```{r warning = FALSE, message = FALSE}
aucplaces %>%
distinct(Name_eng, Nationality) %>%
count(Nationality) %>%
mutate(ptg = paste0(round(n / sum(n) * 100, 0), "%")) %>%
arrange(desc(n))
```
Altogether, these students attended 147 colleges, which individually totalled from 1 to 61 curricula (Columbia):
```{r warning = FALSE, message = FALSE}
aucplaces %>%
drop_na(University) %>%
group_by(University)%>%
count() %>%
filter(n>3)%>%
ggplot(aes(reorder(x=University, n), y =n, fill = University)) +
geom_col(show.legend = "FALSE") +
coord_flip() +
labs(title = "American University Men in China",
subtitle = "Most attended universities (more than 3 curricula)",
x = NULL ,
y = "Number of curricula",
fill = NULL,
caption = "Based on 'American University Men in China' (1936)")
```
The 418 students and 147 universities represent a total of 682 curricula:
# Places detection
For finding places in our population of American University Men, we rely on the R package ["Places"](http://lereps.sciencespo-toulouse.fr/new-r-package-places-structural-equivalence-analysis-for-two-mode-networks) developed by Delio Lucena (LEREPS, Science-Po Toulouse).
First, we install and load the package:
```{r warning = FALSE, message = FALSE}
install.packages("http://lereps.sciencespo-toulouse.fr/IMG/gz/places_0.2.3.tar.gz", repos = NULL, type = "source")
library(Places)
```
We can now apply the function *place()*. The function is composed of three arguments. The first argument refers to the dataset (edge list of curricula), the second argument serves to select the elements (here, the students), and the last argument indicates the sets (i.e., the colleges).
```{r warning = FALSE, message = FALSE}
Result1 <- places(data = aucplaces, col.elements = "Name_eng", col.sets = "University")
Result1 <- places(aucplaces, "Name_eng", "University") # shorter formula
```
Based on our dataset of 418 students and 147 universities, 223 unique places are found. These places refer to academic trajectories. Two students belong to the same place if they attended the exact same set of colleges.
The resulting table contains four main variables, each row corresponds to a unique place:
* PlaceNumber/PlaceLabel (unique identifier for each place)
* Number of elements (NbElements): number of students in each place
* Number of sets (NbSets): number of universities in each place
* PlaceDetail: names of students and universities included in each place
We create a dataframe from the list of results for further examination:
```{r warning = FALSE, message = FALSE}
result1df <- as.data.frame(Result1$PlacesData)
kable(head(result1df), caption = "First 6 places") %>%
kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
```
# Places attributes (quantitative)
Most places (179, 80%) ) consist of unique trajectories focused on a single student. These places perfectly identified with their students:
```{r warning = FALSE, message = FALSE}
hist(result1df$NbElements, main = "Students per places (distribution)")
table(Result1$PlacesData$NbElements)
round(prop.table(table(Result1$PlacesData$NbElements))*100,2)
```
Similarly, most places contain a maximum of two universities. This reflects the fact that most students attended a maximum of two different colleges. Very few individuals attended more than one or two universities during their studies:
```{r warning = FALSE, message = FALSE}
hist(result1df$NbSets, main = "Colleges per place (distribution)")
table(Result1$PlacesData$NbSets)
round(prop.table(table(Result1$PlacesData$NbSets))*100,2)
```
We can represent simultaneously the number of sets and elements as scatter plots, barplots or boxplots :
```{r warning = FALSE, message = FALSE}
library(tidyverse)
ggplot(data = result1df) +
geom_point(mapping = aes(x = NbSets, y = NbElements),
position = "jitter", alpha = 0.5) +
geom_abline(alpha = 0.5) +
labs(x = "Colleges per place", y = "Students per place")+
labs(title = "Places: Quantitative attributes",
caption = "American University Men of China (1936)")
```
As a barplot:
```{r warning = FALSE, message = FALSE}
ggplot(data = result1df) +
geom_bar(mapping = aes(x = NbSets, y = NbElements), stat = "identity") +
labs(x = "Colleges per place", y = "Students per place")+
labs(title = "Places: Quantitative attributes",
caption = "American University Men of China (1936)")
```
Or alternatively, as a boxplot:
```{r warning = FALSE, message = FALSE}
result1df %>%
ggplot(aes(as.factor(NbSets), NbElements)) +
geom_boxplot(alpha = 0.4, show.legend = FALSE) +
labs(x = "Colleges per place", y = "Students per place",
title = "Places: Quantitative attributes",
caption = "American University Men of China (1936)")
```
All these visualizations reveal a linear, inverse relationship between the number of students and the number of colleges attended. The majority of places contain just one university attended by many students. This reflects the fact that our dataset contains a handful of prestigious universities which attracted large number of students, whereas most universities were attended by just one or few students. The number of students naturally decreases as the number of colleges increases, which supports our previous observation that most students attended only one university. Very few places include more than 2 colleges. We find only one place with 4 universities and 2 individuals 5 places includes 3 universities with 2 individuals.
Next, we want to know more about the students and the universities which defined each place. Since it would be time-consuming to examine the 223 places one by one, we first focus on the 13 largest places that include a minimum of 2 students and 2 colleges:
```{r warning = FALSE, message = FALSE}
nn2 <- result1df %>%
filter(NbElements >1 & NbSets>1) # 13 places contain at least 2 individuals and 2 universities
kable(nn2, caption = "The 13 most populated places") %>%
kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
```
In order to facilitate the exploration, we can label each place with its corresponding quantitative attributes, as described below:
```{r warning = FALSE, message = FALSE}
# find examples for each cases
# E3S2 : more than 2 students, 2 universities
E3S2 <- result1df %>% filter(NbElements > 2) %>% filter(NbSets == 2) %>% mutate(Type = "E3S2")
# E3S1 : more than 2 students, 1 university
E3S1 <- result1df %>% filter(NbElements > 2) %>% filter(NbSets == 1) %>% mutate(Type = "E3S1")
# E2S2 : 2 students, 2 universities
E2S2 <- result1df %>% filter(NbElements == 2) %>% filter(NbSets == 2) %>% mutate(Type = "E2S2")
# E2S1 : 2 students, 1 university
E2S1 <- result1df %>% filter(NbElements == 2) %>% filter(NbSets == 1) %>% mutate(Type = "E2S1")
# E1S3 : one student, more than 2 universities
E2S3 <- result1df %>% filter(NbElements == 1) %>% filter(NbSets > 2) %>% mutate(Type = "E2S3")
# E1S2 : one student, 2 universities
E1S2 <- result1df %>% filter(NbElements == 1) %>% filter(NbSets == 2) %>% mutate(Type = "E1S2")
# E1S1 : one student, one university
E1S1 <- result1df %>% filter(NbElements == 1) %>% filter(NbSets == 1) %>% mutate(Type = "E1S1")
ESlist <- bind_rows(E1S1, E1S2, E2S3, E2S1, E2S2, E3S1, E3S2)
kable(head(ESlist), caption = "First 6 places, labeled with their quantitative attributes") %>%
kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
```
Note: You may adjust the threshold to the particular structure of your data. Here, we set the number of sets to 2 because it refers to the average number of curricula, and we set elements to 1 because we are interested in places that involved more than one student, beyond singular trajectories.
At this stage, it is recommended to carefully examine the list of places, starting from the most important, and gradually expanding the selection to include less populated places. In the next step, we will see how we can use the students' and colleges' attributes to further categorize the places, especially the 44 places (20%) that involve two or more students.
# Places attributes (qualitative)
For this analysis, we rely on a manually annotated dataset:
```{r warning = FALSE, message = FALSE}
place_attributes <- read_csv("Data/place_attributes.csv",
col_types = cols(...1 = col_skip()))
place_attributes
```
After a careful examination, each place has been labeled with the following attributes:
* **National profile** (Nationality) : Chinese only, non-Chinese only, multinational)
* **Academic profile**: range of disciplines (field_nbr), nature of disciplines (field_group)
* **Level of qualification**: range of degrees (LevelNbr), highest degree obtained in each place (degree_high)
* **Geographical coverage**: geographical diversity (Region_nbr), regions covered (Region_code)
* **Period of study**: same period or not (period_nbr), period of study (period_group)
Regarding the **geographical coverage**, we adopt the following code:
Region_nbr | Region_code | Description |
---|---|---|
Monoregion | EAST | East Coast |
Monoregion | MID | Midwest |
Monoregion | OTHER - US | Other regions in the United States |
Monoregion | OTHER - NON US | Outside the United States |
Multiregion | EM | East Coast-Midwest |
Multiregion | EO | East Coast-Other |
Multiregion | MO | Midwest-Other |
Multiregion | OTHER | Other |
Period of study Field of study |
SAME TIME |
DIFFERENT TIME |
---|---|---|
SAME DISCIPLINE |
TYPE A : Strong potential for regular interaction (4 places, 9%) |
TYPE C : Potential for later collaboration (7 places, 16%) |
DIFFERENT DISCIPLINE |
TYPE B: Potential for extra-curricula interaction (8 places, 18%) |
TYPE D : Shared academic experience and cultural identity (25 places, 32%) |