Abstract
This tutorial series applies a place-based methodology to study Sino-American alumni networks in modern China, based on a directory of the American University Club of Shanghai published in 1936. In this first instalment, we show how to find and analyze places in two-mode relational data using the R package “Places” (Delio Lucena).
This research originates in a directory of the American University Club (AUC) of Shanghai published in 19361. The AUC was one of the earliest and most important organizations of American college alumni in pre-1949 China. It was established around 1902 by American expatriates in Shanghai. Membership was initially restricted to foreigners, but the club began to admit Chinese in 1908. Thereafter, Chinese members steadily increased and became the majority in the early 1930s (rising from 97 (out of 196) in 1930 to 207 (out of 383 in 1933) and 204 (out of 396) in 1935. The main goal of the club was to provide former graduates of American universities with a common meeting ground in China. It held an annual dinner, monthly tiffins, garden parties, barbecues, dinner dances and many other social gatherings. The club granted scholarships to prospective students in the United States and organized conferences to disseminate useful information about study abroad and other issues related to education.
The directory provides the list of members with their academic curricula (degree, college and year of graduation) (Fig.1). The complete dataset can be downloaded on Zenodo.
Our assumption is that we can process this list in a systematic way in order to reconstruct alumni networks within this population. The first step consists in reconstructing the individuals’ academic trajectories and identifying the individuals who attended their same colleges. The underlying hypothesis is that having attended the same college
The main challenge is that, more than often, individuals had attended more than one college (2 on average, with a maximum of 6). I argue that a place-based approach is a suitable solution to address this challenge.
The concept of place should not be understood in the geographical sense. First conceptualized by sociologist N. Pizarro (2000, 2002, 2007), it is more akin to the notion of “structural equivalence” developed in network analysis2. To put it simply, two (or more) individuals belong to the same place if they are related to exactly the same institutions. In our example, two students belong to the same place if they attended exactly the same college(s).
We can rely on places when the following conditions are met:
Note: In order to alleviate the impact of long-tail distributions, we can adopt a more flexible approach based on k-places or “regular equivalence”. This approach consists in applying a tolerance threshold (k) during the process of place detection. For instance, if we set k = 1, we admit that two (or more) individuals may differ by one institution; if k = 2, two (or more) individuals may differ by 2 institutions, and so on.
In this tutorial, we aim to demonstrate that this place-based methodology is a powerful alternative to the usual reliance on bipartite networks or one-mode projections.
As shown on the above diagram (Fig.2), places present two major
advantages:
The purpose of this tutorial is twofold:
For the interactive version, see Xmind.
In the following, we will focus on:
We load the data and we inspect its class (the dataset must be in a “dataframe” format so that the place() function can be applied):
aucplaces <- read_delim("Data/aucdata.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)
aucplaces <- as.data.frame(aucplaces)
class(aucplaces) # inspect its class
## [1] "data.frame"
The dataset consists of an edge list linking individuals (students) and the colleges they attended. It also includes various attributes related to:
The data includes 418 unique students, among which 234 Chinese (56%) and 184 non Chinese, mostly Americans (43%), and 4 Japanese.
aucplaces %>%
distinct(Name_eng, Nationality) %>%
count(Nationality) %>%
mutate(ptg = paste0(round(n / sum(n) * 100, 0), "%")) %>%
arrange(desc(n))
Altogether, these students attended 147 colleges, which individually totalled from 1 to 61 curricula (Columbia):
aucplaces %>%
drop_na(University) %>%
group_by(University)%>%
count() %>%
filter(n>3)%>%
ggplot(aes(reorder(x=University, n), y =n, fill = University)) +
geom_col(show.legend = "FALSE") +
coord_flip() +
labs(title = "American University Men in China",
subtitle = "Most attended universities (more than 3 curricula)",
x = NULL ,
y = "Number of curricula",
fill = NULL,
caption = "Based on 'American University Men in China' (1936)")
The 418 students and 147 universities represent a total of 682 curricula:
For finding places in our population of American University Men, we rely on the R package “Places” developed by Delio Lucena (LEREPS, Science-Po Toulouse).
First, we install and load the package:
install.packages("http://lereps.sciencespo-toulouse.fr/IMG/gz/places_0.2.3.tar.gz", repos = NULL, type = "source")
library(Places)
We can now apply the function place(). The function is composed of three arguments. The first argument refers to the dataset (edge list of curricula), the second argument serves to select the elements (here, the students), and the last argument indicates the sets (i.e., the colleges).
Result1 <- places(data = aucplaces, col.elements = "Name_eng", col.sets = "University")
Result1 <- places(aucplaces, "Name_eng", "University") # shorter formula
Based on our dataset of 418 students and 147 universities, 223 unique places are found. These places refer to academic trajectories. Two students belong to the same place if they attended the exact same set of colleges.
The resulting table contains four main variables, each row corresponds to a unique place:
We create a dataframe from the list of results for further examination:
result1df <- as.data.frame(Result1$PlacesData)
kable(head(result1df), caption = "First 6 places") %>%
kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
1 | P001(1-4) | 1 | 4 | {Lacy_Carleton} - {Columbia;Garrett Biblical Institute;Northwestern;Ohio Wesleyan} |
2 | P002(1-4) | 1 | 4 | {Luccock_Emory W.} - {McCormick Seminary;Northwestern;Wabash;Wooster} |
3 | P003(1-4) | 1 | 4 | {Ly_J .Usang} - {Columbia;Haverford;New York University;Pennsylvania} |
4 | P004(1-4) | 1 | 4 | {Pott_Francis L. Hawks} - {Columbia;General Theological Seminary;Trinity;University of Edinburgh} |
5 | P005(1-3) | 1 | 3 | {Chu_Fred M.C.} - {Chicago;Pratt Institute;Y.M.C.A. College} |
6 | P006(1-3) | 1 | 3 | {Chung_Elbert} - {Georgetown;Pennsylvania;Southern California} |
Most places (179, 80%) ) consist of unique trajectories focused on a single student. These places perfectly identified with their students:
hist(result1df$NbElements, main = "Students per places (distribution)")
table(Result1$PlacesData$NbElements)
##
## 1 2 3 4 5 7 10 11 12 15 16 18
## 179 15 12 4 1 3 1 1 1 2 2 2
round(prop.table(table(Result1$PlacesData$NbElements))*100,2)
##
## 1 2 3 4 5 7 10 11 12 15 16 18
## 80.27 6.73 5.38 1.79 0.45 1.35 0.45 0.45 0.45 0.90 0.90 0.90
Similarly, most places contain a maximum of two universities.
This reflects the fact that most students attended a maximum of two
different colleges. Very few individuals attended more than one or two
universities during their studies:
hist(result1df$NbSets, main = "Colleges per place (distribution)")
table(Result1$PlacesData$NbSets)
##
## 1 2 3 4
## 79 119 21 4
round(prop.table(table(Result1$PlacesData$NbSets))*100,2)
##
## 1 2 3 4
## 35.43 53.36 9.42 1.79
We can represent simultaneously the number of sets and elements as scatter plots, barplots or boxplots :
library(tidyverse)
ggplot(data = result1df) +
geom_point(mapping = aes(x = NbSets, y = NbElements),
position = "jitter", alpha = 0.5) +
geom_abline(alpha = 0.5) +
labs(x = "Colleges per place", y = "Students per place")+
labs(title = "Places: Quantitative attributes",
caption = "American University Men of China (1936)")
As a barplot:
ggplot(data = result1df) +
geom_bar(mapping = aes(x = NbSets, y = NbElements), stat = "identity") +
labs(x = "Colleges per place", y = "Students per place")+
labs(title = "Places: Quantitative attributes",
caption = "American University Men of China (1936)")
Or alternatively, as a boxplot:
result1df %>%
ggplot(aes(as.factor(NbSets), NbElements)) +
geom_boxplot(alpha = 0.4, show.legend = FALSE) +
labs(x = "Colleges per place", y = "Students per place",
title = "Places: Quantitative attributes",
caption = "American University Men of China (1936)")
All these visualizations reveal a linear, inverse relationship
between the number of students and the number of colleges attended. The
majority of places contain just one university attended by many
students. This reflects the fact that our dataset contains a handful of
prestigious universities which attracted large number of students,
whereas most universities were attended by just one or few students. The
number of students naturally decreases as the number of colleges
increases, which supports our previous observation that most students
attended only one university. Very few places include more than 2
colleges. We find only one place with 4 universities and 2 individuals 5
places includes 3 universities with 2 individuals.
Next, we want to know more about the students and the universities which defined each place. Since it would be time-consuming to examine the 223 places one by one, we first focus on the 13 largest places that include a minimum of 2 students and 2 colleges:
nn2 <- result1df %>%
filter(NbElements >1 & NbSets>1) # 13 places contain at least 2 individuals and 2 universities
kable(nn2, caption = "The 13 most populated places") %>%
kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
26 | P026(4-2) | 4 | 2 | {Chu_Percy;Lee_Alfred S.;Liang_Louis K.L.;Sun_J.H.} - {Columbia;New York University} |
27 | P027(3-2) | 3 | 2 | {Au_Silwing P.C.;Yee_S.K.;Zee_Andrew} - {Chicago;Michigan} |
28 | P028(3-2) | 3 | 2 | {Chang_Ting-Chin;Hsueh_Wei Fan;Wong_Tse-Kong} - {Ohio State;Pennsylvania} |
29 | P029(3-2) | 3 | 2 | {Ho_Teh-Kuei;Sze_F.C.;Tsai_Thomas Wen-hsi} - {Harvard;Wisconsin} |
30 | P030(3-2) | 3 | 2 | {Huang_H.L.;Wang_K.P.;Welles_Henry H.} - {Columbia;Princeton} |
31 | P031(2-2) | 2 | 2 | {Chen_Kwan-Pu;Wong_I.K.} - {Pennsylvania;St. John’s University} |
32 | P032(2-2) | 2 | 2 | {Jen_Lemuel C.C.;West_Eric Ralph} - {California;George Washington} |
33 | P033(2-2) | 2 | 2 | {Lee_Shee-Mou;Parker_Frederick A.} - {Harvard;Massachusetts Institute of Technology} |
34 | P034(2-2) | 2 | 2 | {Lin_Peter Wei;Ma_Y.C.} - {Columbia;Yale} |
35 | P035(2-2) | 2 | 2 | {Lum_Joe W.;Wu_Jack Foy} - {Columbia;Stanford} |
36 | P036(2-2) | 2 | 2 | {Ngao_Sz-Chow;Speery_Henry M.} - {Columbia;Michigan} |
37 | P037(2-2) | 2 | 2 | {Sze_Ying Tse-yu;Zhen_M.S.} - {Columbia;Massachusetts Institute of Technology} |
38 | P038(2-2) | 2 | 2 | {Tsao_Y.S.;Yen_Fu-ching} - {Harvard;Yale} |
In order to facilitate the exploration, we can label each place with its corresponding quantitative attributes, as described below:
# find examples for each cases
# E3S2 : more than 2 students, 2 universities
E3S2 <- result1df %>% filter(NbElements > 2) %>% filter(NbSets == 2) %>% mutate(Type = "E3S2")
# E3S1 : more than 2 students, 1 university
E3S1 <- result1df %>% filter(NbElements > 2) %>% filter(NbSets == 1) %>% mutate(Type = "E3S1")
# E2S2 : 2 students, 2 universities
E2S2 <- result1df %>% filter(NbElements == 2) %>% filter(NbSets == 2) %>% mutate(Type = "E2S2")
# E2S1 : 2 students, 1 university
E2S1 <- result1df %>% filter(NbElements == 2) %>% filter(NbSets == 1) %>% mutate(Type = "E2S1")
# E1S3 : one student, more than 2 universities
E2S3 <- result1df %>% filter(NbElements == 1) %>% filter(NbSets > 2) %>% mutate(Type = "E2S3")
# E1S2 : one student, 2 universities
E1S2 <- result1df %>% filter(NbElements == 1) %>% filter(NbSets == 2) %>% mutate(Type = "E1S2")
# E1S1 : one student, one university
E1S1 <- result1df %>% filter(NbElements == 1) %>% filter(NbSets == 1) %>% mutate(Type = "E1S1")
ESlist <- bind_rows(E1S1, E1S2, E2S3, E2S1, E2S2, E3S1, E3S2)
kable(head(ESlist), caption = "First 6 places, labeled with their quantitative attributes") %>%
kable_styling(bootstrap_options = "striped", full_width = T, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail | Type |
---|---|---|---|---|---|
176 | P176(1-1) | 1 | 1 | {Barnett_E.E.} - {Emory} | E1S1 |
177 | P177(1-1) | 1 | 1 | {Bau_C.L.} - {Minnesota} | E1S1 |
178 | P178(1-1) | 1 | 1 | {Bixby_Harold M.} - {Amherst} | E1S1 |
179 | P179(1-1) | 1 | 1 | {Bowen_Frederick A.} - {Middlebury} | E1S1 |
180 | P180(1-1) | 1 | 1 | {Boynton_C.L.} - {Pomona} | E1S1 |
181 | P181(1-1) | 1 | 1 | {Brown_Irving S.} - {Drake} | E1S1 |
Note: You may adjust the threshold to the particular structure
of your data. Here, we set the number of sets to 2 because it refers to
the average number of curricula, and we set elements to 1 because we are
interested in places that involved more than one student, beyond
singular trajectories.
At this stage, it is recommended to carefully examine the list of places, starting from the most important, and gradually expanding the selection to include less populated places. In the next step, we will see how we can use the students’ and colleges’ attributes to further categorize the places, especially the 44 places (20%) that involve two or more students.
For this analysis, we rely on a manually annotated dataset:
place_attributes <- read_csv("Data/place_attributes.csv",
col_types = cols(...1 = col_skip()))
place_attributes
After a careful examination, each place has been labeled with the following attributes:
Regarding the geographical coverage, we adopt the following code:
Region_nbr | Region_code | Description |
---|---|---|
Monoregion | EAST | East Coast |
Monoregion | MID | Midwest |
Monoregion | OTHER - US | Other regions in the United States |
Monoregion | OTHER - NON US | Outside the United States |
Multiregion | EM | East Coast-Midwest |
Multiregion | EO | East Coast-Other |
Multiregion | MO | Midwest-Other |
Multiregion | OTHER | Other |
Based on the distribution of data and our knowledge of the historical context, we defined three main periods:
Based on the above categories, we found 23 multinational places (52% of places involving more than one student), 12 non-Chinese (27%), and only 9 strictly Chinese places (20%). The Chinese students showed a strong propensity to mingle with non-Chinese students.
place_attributes %>% filter(NbElements > 1) %>%
count(Nationality) %>%
mutate(ptg = paste0(round(n / sum(n) * 100, 0), "%")) %>%
arrange(desc(n))
From a geographical perspective, American University Men
generally showed a strong degree of mobility during their studies. 65%
transferred between two or more colleges, among which 60% across
different regions (39% of all places), and 40% within the same region
(East, Midwest, other) (26% of all places).
place_attributes %>%
count(Mobility) %>%
mutate(ptg = paste0(round(n / sum(n) * 100, 0), "%")) %>%
arrange(desc(n))
By combining the field of study with the period of study, we can
further distinguish between four categories of places. This allows us to
measure the relative strength of each place:
Period of study Field of study |
SAME TIME |
DIFFERENT TIME |
---|---|---|
SAME DISCIPLINE |
TYPE A : Strong potential for regular interaction (4 places, 9%) |
TYPE C : Potential for later collaboration (7 places, 16%) |
DIFFERENT DISCIPLINE |
TYPE B: Potential for extra-curricula interaction (8 places, 18%) |
TYPE D : Shared academic experience and cultural identity (25 places, 32%) |
Find examples for each type:
typeA <- place_attributes %>% filter(NbElements > 1) %>%
filter(field_nbr == "Monofield") %>%
filter(period_nbr == "SYNC") %>%
mutate(TypeQuali = "TypeA")
typeB <- place_attributes %>% filter(NbElements > 1) %>%
filter(period_nbr == "SYNC") %>%
filter(field_nbr == "Multifield") %>%
mutate(TypeQuali = "TypeB")
typeC <- place_attributes %>% filter(NbElements > 1) %>%
filter(field_nbr == "Monofield") %>%
filter(period_nbr == "DIAC") %>%
mutate(TypeQuali = "TypeC")
typeD <- place_attributes %>% filter(NbElements > 1) %>%
filter(period_nbr == "DIAC") %>%
filter(field_nbr == "Multifield") %>%
mutate(TypeQuali = "TypeD")
Typelist <- bind_rows(typeA, typeB, typeC, typeD)
Typelist
Type A principally includes graduates in
sciences from MIT, Columbia, Harvard, Michigan, Nebraska
(non-Chinese):
typeA
Type B includes New York-trained financiers and
scientists (1919-1924), Michigan-Chigago trained lawyers and scientists
(1921-1933), or Yale-Harvard-trained professionals and businessmen
(1909-1914):
typeB
Type C includes Columbia graduates in economics
or medicine in combination with another (preceding) university, as well
as more narrowly specialized curricula centered on a single
university:
typeC
Type D usually involves a large number of
students (more than 10) and present a wide range of profiles, as in
P161(3-1) which involved Stanford graduates from different national
origins, who graduated in different fields (engineering, humanities) and
at different times (between 1905 and 1922).
typeD
In the last step, we use Multiple
Correspondence Analysis (MCA) and hierarchical
clustering (HCPC) to group places according to their combinations of
attributes.
First, we need to prepare the data for multiple correspondence analysis (MCA). This implies converting quantitative variables (NbElements, NbSets) into categorical ones, selecting the relevant variables, and setting places labels as row names.
Based on the distribution of data, we categorize the number of students per place (NbElements) into two categories (1, +1):
# Categorize the number of students per place (NbElements) into two categories (1, +1)
place_attributes_mca <- within(place_attributes, {
NbElements.cat <- NA # need to initialize variable
NbElements.cat[NbElements == 1] <- "1"
NbElements.cat[NbElements > 1] <- "+1"
} )
place_attributes_mca$NbElements.cat <- factor(place_attributes_mca$NbElements.cat, levels = c("1", "+1"))
summary(place_attributes_mca$NbElements.cat) # check that the operation went well
## 1 +1
## 179 44
Similarly, we categorize the number of universities per place
(NbSets) into 3 categories (1, 2, +2):
place_attributes_mca <- within(place_attributes_mca, {
NbSets.cat <- NA # need to initialize variable
NbSets.cat[NbSets == 1] <- "1"
NbSets.cat[NbSets == 2] <- "2"
NbSets.cat[NbSets > 2] <- "+2"
} )
place_attributes_mca$NbSets.cat <- factor(place_attributes_mca$NbSets.cat, levels = c("1", "2", "+2"))
summary(place_attributes_mca$NbSets.cat) # check
## 1 2 +2
## 79 119 25
Finally, we select the relevant variables and we set places
labels as row names:
place_attributes_mca <- place_attributes_mca %>% select(-c(PlaceNumber, PlaceDetail, NbElements, NbSets))
place_attributes_mca_rowname <- tibble::column_to_rownames(place_attributes_mca, "PlaceLabel")
We can now perform multiple correspondence analysis (MCA). In
this tutorial, we rely on the package FactoMinR and its companion
packages FactoShiny
and Factoextra:
# load packages
library(FactoMineR)
library(Factoshiny)
library(factoextra)
library(explor) # for interactive exploration
We apply the function MCA():
res.MCA<-MCA(place_attributes_mca_rowname,graph=FALSE)
plot.MCA(res.MCA, choix='var',title="Graph of Variables",col.var=c(1,2,3,4,5,6,7,8,9,10,11,12))
Note: On the above graph, each variable is represented by a
distinct color.
As shown on the graph of variables above, the first dimension captures 16% of information, and the second 8%. Altogether, the two first dimensions capture 24% of information. 9 dimensions are necessary to capture 50% and 36 dimensions to capture 100% information. The first dimension is strongly associated with students’ attributes (nationality, field and period of study), whereas the second is more strongly associated with colleges (geographical) attributes.
MCA Graphs colored by quantitative variables:
grp1 <- as.factor(place_attributes_mca_rowname[, "NbElements.cat"])
fviz_mca_ind(res.MCA, habillage = grp1, label = FALSE,
addEllipses = TRUE, repel = TRUE, title = "MCA Graph: Number of students per place")
grp2 <- as.factor(place_attributes_mca_rowname[, "NbSets.cat"])
fviz_mca_ind(res.MCA, habillage = grp2, label = FALSE,
addEllipses = FALSE, repel = TRUE, title = "MCA Graph: Number of colleges per place")
The two graphs above clearly separate multi-student places on the
right from singular trajectories on the left, and multi-colleges places
on the left from places centered on a single university, on the
right.
MCA Graphs colored by qualitative variables:
par(mfrow=c(2,2))
grp3 <- as.factor(place_attributes_mca_rowname[, "Nationality"])
fviz_mca_ind(res.MCA, habillage = grp3, label = FALSE,
addEllipses = FALSE, repel = TRUE, title = "Graph of Places: Nationality")
grp4 <- as.factor(place_attributes_mca_rowname[, "Mobility"])
fviz_mca_ind(res.MCA, habillage = grp4, label = FALSE,
addEllipses = TRUE, repel = TRUE, title = "Graph of Places: Mobility")
grp5 <- as.factor(place_attributes_mca_rowname[, "field_nbr"])
fviz_mca_ind(res.MCA, habillage = grp5, label = FALSE,
addEllipses = TRUE, repel = TRUE, title = "Graph of Places: Field of study")
grp6 <- as.factor(place_attributes_mca_rowname[, "field_group"])
fviz_mca_ind(res.MCA, habillage = grp6, label = FALSE,
addEllipses = FALSE, repel = TRUE, title = "Graph of Places: Field of study")
The graphs clearly separate multinational, multidisciplinary,
multiregional and diachronic places on the right, from narrowly
specialized places involving less mobile students on the left.
Next, we perform a hierarchical clustering (HCPC) on all 36 dimensions in order to further classify places:
res.MCA<-MCA(place_attributes_mca_rowname,ncp=36,graph=FALSE)
res.HCPC<-HCPC(res.MCA,nb.clust=3,consol=FALSE,graph=FALSE)
plot.HCPC(res.HCPC,choice='map',draw.tree=FALSE,title='Factor map')
plot.HCPC(res.HCPC,choice='tree',title='Tree map')
plot.HCPC(res.HCPC,choice='3D.map',ind.names=FALSE,centers.plot=FALSE,angle=60,title='3D tree on factor map')
The partition is strongly characterized by academic specialization
(field_nbr, field_group) and geographical mobility (mobility,
region_nbr, region_code), and to a lesser extent, by the period of study
and the students’ nationality.
The algorithm identifies three classes of places:
We can finally store the results and examine more closely the
paragons, i.e. representative places for each class:
# store results
clusters <- res.HCPC$data.clust
placesclusters <- tibble::rownames_to_column(clusters, "PlaceLabel") %>% rename(MCAcluster = clust)
# identify paragons
res.HCPC$desc.ind$para
## Cluster: 1
## P043(1-2) P049(1-2) P051(1-2) P133(1-2) P122(1-2)
## 0.8504655 0.8504655 0.8504655 0.8504655 0.8759060
## ------------------------------------------------------------
## Cluster: 2
## P201(1-1) P179(1-1) P212(1-1) P217(1-1) P113(1-2)
## 0.9918145 1.0045793 1.0045793 1.0051068 1.0196690
## ------------------------------------------------------------
## Cluster: 3
## P148(16-1) P167(3-1) P156(7-1) P157(5-1) P030(3-2)
## 1.746776 1.746776 1.806594 1.806594 1.910119
Paragons for class 1: Single-student places: Post-WWI
highly mobile Chinese professionals
para1 <- place_attributes %>%
filter(PlaceLabel %in% c("P043(1-2)", "P049(1-2)", "P051(1-2)", "P133(1-2)", "P122(1-2)"))
para1
Paragons for class 2: Single-college places: American
undergraduates
para2 <- place_attributes %>%
filter(PlaceLabel %in% c("P201(1-1)", "P179(1-1)", "P212(1-1)", "P217(1-1)", "P113(1-2)"))
para2
Paragons for class 3: Multi-student places
para3 <- place_attributes %>%
filter(PlaceLabel %in% c("P148(16-1)", "P167(3-1)", "P156(7-1)", "P157(5-1)", "P030(3-2)"))
para3
You can use Factoextra to improve the above visualizations. Running the lines of code below will improve the dendogram:
par(mfrow=c(1,2))
fviz_dend(res.HCPC, show_labels = FALSE,
main = "Cluster dendogram of Academic Places",
caption = "Based on 'American University Men of China' (1936)")
You can also create interactive graphs with the package explor
library(explor)
res <- explor::prepare_results(res.MCA)
explor::MCA_var_plot(res, xax = 1, yax = 2, var_sup = FALSE, var_sup_choice = ,
var_lab_min_contrib = 0, col_var = "Variable", symbol_var = NULL, size_var = NULL,
size_range = c(10, 300), labels_size = 10, point_size = 40, transitions = TRUE,
labels_positions = "auto", labels_prepend_var = FALSE, xlim = c(-1.48, 3.54),
ylim = c(-2.59, 2.43))
Point cloud of places, colored by number of students (elements), with confidence ellipses:
explor::MCA_ind_plot(res, xax = 1, yax = 2, ind_sup = FALSE, lab_var = "Lab",
ind_lab_min_contrib = 0, col_var = "NbElements.cat", labels_size = 7, point_opacity = 0.5,
opacity_var = NULL, point_size = 64, ellipses = TRUE, transitions = TRUE,
labels_positions = NULL, xlim = c(-1.19, 2.81), ylim = c(-1.76, 2.24))
Point cloud of places, colored by number of universities (sets), with confidence ellipses:
explor::MCA_ind_plot(res, xax = 1, yax = 2, ind_sup = FALSE, lab_var = NULL,
ind_lab_min_contrib = 0, col_var = "NbSets.cat", labels_size = 9, point_opacity = 0.5,
opacity_var = NULL, point_size = 64, ellipses = TRUE, transitions = TRUE,
labels_positions = NULL, xlim = c(-1.19, 2.81), ylim = c(-1.76, 2.24))
Point cloud of places, colored by nationality, with confidence ellipses:
explor::MCA_ind_plot(res, xax = 1, yax = 2, ind_sup = FALSE, lab_var = "Lab",
ind_lab_min_contrib = 0, col_var = "Nationality", labels_size = 9, point_opacity = 0.5,
opacity_var = NULL, point_size = 64, ellipses = TRUE, transitions = TRUE,
labels_positions = NULL, xlim = c(-1.19, 2.81), ylim = c(-1.76, 2.24))
Point cloud of places, colored by mobility:
explor::MCA_ind_plot(res, xax = 1, yax = 2, ind_sup = FALSE, lab_var = "Lab",
ind_lab_min_contrib = 0, col_var = "Mobility", labels_size = 8, point_opacity = 0.44,
opacity_var = NULL, point_size = 46, ellipses = FALSE, transitions = TRUE,
labels_positions = NULL, xlim = c(-1.19, 2.81), ylim = c(-1.76, 2.24))
Point cloud of places, colored by academic specialization (range):
explor::MCA_ind_plot(res, xax = 1, yax = 2, ind_sup = FALSE, lab_var = "Lab",
ind_lab_min_contrib = 0, col_var = "field_nbr", labels_size = 9, point_opacity = 0.5,
opacity_var = NULL, point_size = 64, ellipses = FALSE, transitions = TRUE,
labels_positions = NULL, xlim = c(-1.19, 2.81), ylim = c(-1.76, 2.24))
Point cloud of places, colored by academic specialization (field):
explor::MCA_ind_plot(res, xax = 1, yax = 2, ind_sup = FALSE, lab_var = "Lab",
ind_lab_min_contrib = 0, col_var = "field_group", labels_size = 8, point_opacity = 0.44,
opacity_var = NULL, point_size = 46, ellipses = FALSE, transitions = TRUE,
labels_positions = NULL, xlim = c(-1.19, 2.81), ylim = c(-1.76, 2.24))
In this tutorial, we learnt how to find places in a two-mode relational dataset linking elements (e.g. students) and sets (e.g. universities). We also saw how to use elements’ and sets’ attributes to further categorize and interpret the resulting places.
Using this method, it appears that alumni places formed on the basis on academic prestige and specialization, rather than on the students’ nationality. From a historical perspective, we found that the Chinese students showed a greater propensity for international mixing than their foreign counterparts. These American-educated Chinese actively contributed to the emergence of transnational alumni networks in the late 19th-early 20th century. We also found that only a minority of places implied a potential for actual interaction, while the majority created opportunities for professional collaboration and shared cultural values after they returned to China. Eventually, these academic places contributed to the formation of the American-returned students as a new, self-conscious social group in modern China.
In the next tutorial, we will see how we can build and analyze networks of places in order to further investigate the structure and dynamics of Sino-American alumni networks.
American University Club of Shanghai (1936). American University Men in China. Shanghai: Comacrib Press.
Armand, Cécile. (2022). American University Men of China (1936) (2.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6370085
Hunt, Michael H. “The American Remission of the Boxer Indemnity: A Reappraisal.” The Journal of Asian Studies 31, no. 3 (1972): 539–59.
Pizarro, Narciso. “Appartenances, places et réseaux de places. La reproduction des processus sociaux et la génération d’un espace homogène pour la définition des structures sociales.” Sociologie et sociétés 31, no. 1 (2002): 143–61.
Pizarro, Narciso. “Regularidad Relacional, Redes de Lugares y Reproduccion Social.” Politica y Sociedad 33 (2000).
Pizarro, Narciso. “Structural Identity and Equivalence of Individuals in Social Networks.” International Sociology 22, no. 6 (2007): 767–92.
American University Club of Shanghai. American University Men in China. Shanghai: Comacrib Press, 1936. We are most grateful to Dr. Jiang Jie (Shanghai Normal University) who kindly provided us with a digital copy of the directory.↩︎
Pizarro, Narciso. “Structural Identity and Equivalence of Individuals in Social Networks.” International Sociology 22, no. 6 (2007): 767–92 ; “Appartenances, places et réseaux de places. La reproduction des processus sociaux et la génération d’un espace homogène pour la définition des structures sociales.” Sociologie et sociétés 31, no. 1 (2002): 143–61; “Regularidad Relacional, Redes de Lugares y Reproduccion Social.” Politica y Sociedad 33 (2000)↩︎