Abstract
This is the last instalment of our tutorial series devoted to the study of American University Men in China using a place-based methodology. In this tutorial, we show how to filter place-based networks in order to trace the formation of alumni networks over time.
In the previous tutorial, we learnt how to detect, visualize and analyze sub-communities of academic places and colleges. In this tutorial, we will show how we can filter place-based networks in order to trace the formation of alumni networks over time.
This tutorial proceeds in four steps:
# load packages
library(tidyverse)
library(igraph)
library(Places)
# load data
aucplaces <- read_delim("Data/aucdata.csv",
delim = ";", escape_double = FALSE, trim_ws = TRUE)
aucplaces <- as.data.frame(aucplaces)
Based on the periodization defined in the first tutorial,
we split the original dataset into three period-based datasets:
# Filter the data by period
aucp1 <- aucplaces %>% filter(period=="1883-1908") # 94 curricula
aucp2 <- aucplaces %>% filter(period=="1909-1918") # 251 curricula
aucp3 <- aucplaces %>% filter(period=="1919-1935") # 337 curricula
# Inspect the distribution of students' nationalities in each period
aucp1 %>% distinct(Name_eng, Nationality) %>% group_by(Nationality) %>% count(sort = TRUE) # 5 Chinese, 1 Japanese, 43 Westerners
aucp2 %>% distinct(Name_eng, Nationality) %>% group_by(Nationality) %>% count(sort = TRUE) # 81 Chinese, 1 Japanese, 66 Westerners
aucp3 %>% distinct(Name_eng, Nationality) %>% group_by(Nationality) %>% count(sort = TRUE) # 148 Chinese, 2 Japanese, 71 Westerners
Next, we apply the function “places” to each time period:
resultp1 <- places(aucp1, "Name_eng", "University")
From the original population of 49 students and 49 universities,
the algorithm found 42 places (academic trajectories). As in the first
tutorial, we create a dataframe in order to examine the resulting places
in more detail:
resultp1df <- as.data.frame(resultp1$PlacesData) # create dataframe from list of results
kable(head(resultp1df), caption = "The 6 first places during the first period (1883-1908)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
1 | P01(1-4) | 1 | 4 | {Pott_Francis L. Hawks} - {Columbia;General Theological Seminary;Trinity;University of Edinburgh} |
2 | P02(1-3) | 1 | 3 | {Fong_F. Sec} - {California;Columbia;Pomona} |
3 | P03(1-3) | 1 | 3 | {Helmick_Milton J.} - {Colorado;Denver;Stanford} |
4 | P04(1-3) | 1 | 3 | {Hylbert_L.C.} - {Bucknell;Columbia;Crozen Theological Seminary} |
5 | P05(1-2) | 1 | 2 | {Arnold_Julean} - {California;St. John’s University} |
6 | P06(1-2) | 1 | 2 | {Bassett_Arthur} - {Missouri State;Washington, St. Louis} |
Before 1908, there were only 4 places involving more than one
student, with a maximum of four (Princeton). Foreign students clearly
dominated. We found only one place - P26(3-1) - which included a Chinese
student (Yung Wing’s son, Barlett Yung). According to the classification
we devised in the first tutorial,
these places presented a potential for shared academic experience and
culture (TYPE D). None on them implied actual interaction, since the
students attended the same colleges but at different times:
np1el <- resultp1df %>% filter(NbElements >1)
kable(head(np1el), caption = "The 4 multi-student places during the first period (1883-1908)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
25 | P25(4-1) | 4 | 1 | {Belknap_W.C.;Daub_W.H.;Ely_J.A.;Hoyt_Lansing} - {Princeton} |
26 | P26(3-1) | 3 | 1 | {Murphy_H.K.;Throop_Montgomery;Yung_Barlett G.} - {Yale} |
27 | P27(2-1) | 2 | 1 | {Baker_John Earl;Hager_A.R.} - {Wisconsin} |
28 | P28(2-1) | 2 | 1 | {Sawyer_John B.;Service_Robert R.} - {California} |
Twenty-four students attended more than one university, with a
maximum of 4 (the missionary Francis L. Hawks Pott, P01(1-4))
np1set <- resultp1df %>% filter(NbSets>1)
kable(head(np1set), caption = "The 6 first muti-college places during the first period (1883-1908)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
1 | P01(1-4) | 1 | 4 | {Pott_Francis L. Hawks} - {Columbia;General Theological Seminary;Trinity;University of Edinburgh} |
2 | P02(1-3) | 1 | 3 | {Fong_F. Sec} - {California;Columbia;Pomona} |
3 | P03(1-3) | 1 | 3 | {Helmick_Milton J.} - {Colorado;Denver;Stanford} |
4 | P04(1-3) | 1 | 3 | {Hylbert_L.C.} - {Bucknell;Columbia;Crozen Theological Seminary} |
5 | P05(1-2) | 1 | 2 | {Arnold_Julean} - {California;St. John’s University} |
6 | P06(1-2) | 1 | 2 | {Bassett_Arthur} - {Missouri State;Washington, St. Louis} |
Kuang Fuzhuo 鄺富灼 (Fong Foo Sec) defined as P02(1-3) was the
Chinese who attended the largest number of universities (California,
Columbia, Pomona). Next, we find two Chinese who each attended two
universities: Kong Xiangxi P16(1-2) studied at Yale and Oberlin, and Lo
Panhui 羅泮輝 (Pan H. Lo, P17(1-2)) studied at the University of Chicago
and Harvard. Other Chinese students attended just one university.
As a conclusion, in this early period preceding the Boxer Indemnity Program, our group of American University Men did not really form a “network”. Singular trajectories dominated. Very few places included more than one student, and when it was the case, they did not imply actual interaction between the students. Alumni networks began to take shape during the second period, after the enactment of the Boxer Indemnity Program.
resultp2 <- places(aucp2, "Name_eng", "University")
During the second period, 105 places were identified from a
total population of 148 students and 82 universities.
resultp2df <- as.data.frame(resultp2$PlacesData) # create dataframe from list of results
kable(head(resultp2df), caption = "The 6 first places during the second period (1909-1918)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
1 | P001(1-4) | 1 | 4 | {Lacy_Carleton} - {Columbia;Garrett Biblical Institute;Northwestern;Ohio Wesleyan} |
2 | P002(1-4) | 1 | 4 | {Luccock_Emory W.} - {McCormick Seminary;Northwestern;Wabash;Wooster} |
3 | P003(1-4) | 1 | 4 | {Ly_J .Usang} - {Columbia;Haverford;New York University;Pennsylvania} |
4 | P004(1-3) | 1 | 3 | {Chu_Fred M.C.} - {Chicago;Pratt Institute;Y.M.C.A. College} |
5 | P005(1-3) | 1 | 3 | {Chung_Elbert} - {Georgetown;Pennsylvania;Southern California} |
6 | P006(1-3) | 1 | 3 | {Hayes_Ernest M.} - {Princeton;Washington & Jefferson;Wooster} |
Four places involved two or more students who attended more than
one college, all located on the East Coast (Columbia, Princeton,
Harvard, Yale, MIT):
np2 <- resultp2df %>% filter(NbElements >1 & NbSets>1)
kable(head(np2), caption = "The 4 most important places during the second period (1909-1918)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
13 | P013(2-2) | 2 | 2 | {Huang_H.L.;Wang_K.P.} - {Columbia;Princeton} |
14 | P014(2-2) | 2 | 2 | {Lee_Shee-Mou;Parker_Frederick A.} - {Harvard;Massachusetts Institute of Technology} |
15 | P015(2-2) | 2 | 2 | {Sze_Ying Tse-yu;Zhen_M.S.} - {Columbia;Massachusetts Institute of Technology} |
16 | P016(2-2) | 2 | 2 | {Tsao_Y.S.;Yen_Fu-ching} - {Harvard;Yale} |
23 places involved more than students and 61 included more than
one college.
np2el <- resultp2df %>% filter(NbElements >1)
kable(head(np2el), caption = "The 6 first muti-student places during the second phase (1909-1918)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
13 | P013(2-2) | 2 | 2 | {Huang_H.L.;Wang_K.P.} - {Columbia;Princeton} |
14 | P014(2-2) | 2 | 2 | {Lee_Shee-Mou;Parker_Frederick A.} - {Harvard;Massachusetts Institute of Technology} |
15 | P015(2-2) | 2 | 2 | {Sze_Ying Tse-yu;Zhen_M.S.} - {Columbia;Massachusetts Institute of Technology} |
16 | P016(2-2) | 2 | 2 | {Tsao_Y.S.;Yen_Fu-ching} - {Harvard;Yale} |
62 | P062(9-1) | 9 | 1 | {Conant_Harold A.R.;Fistere_Joseph Jr.;Kuo_Cheng Chih;Lau_Waan Wai;Li_Kien Yo;Mead_L.J.;Palmer_Walter;Tsou_P.W.;Yu_T.M.} - {Cornell} |
63 | P063(5-1) | 5 | 1 | {Armour_Wendell;Chen_L.T.;Chung_Daniel M.;Sheridan_H.J.;Tan_W.H.} - {Yale} |
np2set <- resultp2df %>% filter(NbSets>1)
kable(head(np2set), caption = "The 6 first multi-college places during the second phase (1909-1918)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
1 | P001(1-4) | 1 | 4 | {Lacy_Carleton} - {Columbia;Garrett Biblical Institute;Northwestern;Ohio Wesleyan} |
2 | P002(1-4) | 1 | 4 | {Luccock_Emory W.} - {McCormick Seminary;Northwestern;Wabash;Wooster} |
3 | P003(1-4) | 1 | 4 | {Ly_J .Usang} - {Columbia;Haverford;New York University;Pennsylvania} |
4 | P004(1-3) | 1 | 3 | {Chu_Fred M.C.} - {Chicago;Pratt Institute;Y.M.C.A. College} |
5 | P005(1-3) | 1 | 3 | {Chung_Elbert} - {Georgetown;Pennsylvania;Southern California} |
6 | P006(1-3) | 1 | 3 | {Hayes_Ernest M.} - {Princeton;Washington & Jefferson;Wooster} |
resultp3 <- places(aucp3, "Name_eng", "University")
During the last phase, 116 places were identified from a total
population of 221 students and 82 universities.
resultp3df <- as.data.frame(resultp3$PlacesData) # create dataframe from list of results
kable(head(resultp3df), caption = "The 6 first places during the last period (1919-1935)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
1 | P001(1-3) | 1 | 3 | {Ho_Paul Hsu} - {Illinois;Pennsylvania;Temple} |
2 | P002(1-3) | 1 | 3 | {Hu_Stephen} - {California;Cornell;Johns Hopkins} |
3 | P003(1-3) | 1 | 3 | {Huang_James Chiomin} - {Columbia;National (Manila);Philippine} |
4 | P004(1-3) | 1 | 3 | {Huang_Tsefang} - {Chicago;Johns Hopkins;Rush Medical College} |
5 | P005(1-3) | 1 | 3 | {Kam_Edwin} - {Hawaii;Pennsylvania;St. John’s University} |
6 | P006(1-3) | 1 | 3 | {Liu_H.C.E.} - {Chicago;Columbia;Denison} |
Five places involved two or more students who attended more than
one college. Most of them remained based on the East Coast (Columbia,
Harvard, NYU, Pennsylvania) but we also notice a shift toward the
Midwest (Chicago, Michigan, Ohio State, Wisconsin):
np3 <- resultp3df %>% filter(NbElements >1 & NbSets>1)
kable(head(np3), caption = "The 5 most important places during the last phase (1919-1935)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
10 | P010(4-2) | 4 | 2 | {Chu_Percy;Lee_Alfred S.;Liang_Louis K.L.;Sun_J.H.} - {Columbia;New York University} |
11 | P011(3-2) | 3 | 2 | {Au_Silwing P.C.;Yee_S.K.;Zee_Andrew} - {Chicago;Michigan} |
12 | P012(2-2) | 2 | 2 | {Ho_Teh-Kuei;Tsai_Thomas Wen-hsi} - {Harvard;Wisconsin} |
13 | P013(2-2) | 2 | 2 | {Hsueh_Wei Fan;Wong_Tse-Kong} - {Ohio State;Pennsylvania} |
14 | P014(2-2) | 2 | 2 | {Ngao_Sz-Chow;Speery_Henry M.} - {Columbia;Michigan} |
25 places involved more than students and 66 included more than
one college:
np3el <- resultp3df %>% filter(NbElements >1)
kable(head(np3el), caption = "The 6 first multi-student places during the last phase (1919-1935)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
10 | P010(4-2) | 4 | 2 | {Chu_Percy;Lee_Alfred S.;Liang_Louis K.L.;Sun_J.H.} - {Columbia;New York University} |
11 | P011(3-2) | 3 | 2 | {Au_Silwing P.C.;Yee_S.K.;Zee_Andrew} - {Chicago;Michigan} |
12 | P012(2-2) | 2 | 2 | {Ho_Teh-Kuei;Tsai_Thomas Wen-hsi} - {Harvard;Wisconsin} |
13 | P013(2-2) | 2 | 2 | {Hsueh_Wei Fan;Wong_Tse-Kong} - {Ohio State;Pennsylvania} |
14 | P014(2-2) | 2 | 2 | {Ngao_Sz-Chow;Speery_Henry M.} - {Columbia;Michigan} |
67 | P067(13-1) | 13 | 1 | {Chang_Kin-fang;Fu_W.S.;Ho_Philip L.;MacKinnon_Joseph A.;Shen_Pao Guay;Soong_T.A.;Sung_I-chung;Sung_Jess;Wadsworth_Julius;Woo_L.S.;Woo_S.T.;Yu_Zu Shung;Zau_Z.D.} - {Harvard} |
np3set <- resultp3df %>% filter(NbSets>1)
kable(head(np3set), caption = "The 6 first multi-college places during the last phase (1919-1935)") %>%
kable_styling(full_width = F, position = "left")
PlaceNumber | PlaceLabel | NbElements | NbSets | PlaceDetail |
---|---|---|---|---|
1 | P001(1-3) | 1 | 3 | {Ho_Paul Hsu} - {Illinois;Pennsylvania;Temple} |
2 | P002(1-3) | 1 | 3 | {Hu_Stephen} - {California;Cornell;Johns Hopkins} |
3 | P003(1-3) | 1 | 3 | {Huang_James Chiomin} - {Columbia;National (Manila);Philippine} |
4 | P004(1-3) | 1 | 3 | {Huang_Tsefang} - {Chicago;Johns Hopkins;Rush Medical College} |
5 | P005(1-3) | 1 | 3 | {Kam_Edwin} - {Hawaii;Pennsylvania;St. John’s University} |
6 | P006(1-3) | 1 | 3 | {Liu_H.C.E.} - {Chicago;Columbia;Denison} |
As the network densified, we see more complex patterns of
academic specialization emerging during this period, such as New
York-trained bankers and businessmen (P010), Michigan-Chicago lawyers
(P011), and Ohio/Pennsylvania graduates in business administration
(insurance, railway) (P014).
Finally, we can export the results (list of places) for further analysis in Excel or SNA software:
write.csv(resultp1df, "placesp1.csv")
write.csv(resultp2df, "placesp2.csv")
write.csv(resultp3df, "placesp3.csv")
For each period, we will create the corresponding network of places and its transposed network of colleges. The successive visualizations reveal the progressive formation of alumni networks over time.
Create network of places linked by colleges:
# Network of Places (academic trajectories) linked by universities (P)
bimodp1p<-table(resultp1$Edgelist$Places, resultp1$Edgelist$Set) # create adjacency matrix from Edgelist
PlacesMatp1p<-bimodp1p %*% t(bimodp1p)
diag(PlacesMatp1p)<-0
Create network of colleges linked by places
# Network of universities linked by places (academic trajectories) (P* = transposed network of P P*) (cf. Pizarro 2009)
bimodp1u<-table(resultp1$Edgelist$Set, resultp1$Edgelist$Places)
PlacesMatp1u<-bimodp1u %*% t(bimodp1u)
diag(PlacesMatp1u)<-0
Build networks from adjacency matrices with igraph
library(igraph)
PlacesMatp1pNet<-graph_from_adjacency_matrix(PlacesMatp1p, mode="undirected", weighted = TRUE)
PlacesMatp1uNet<-graph_from_adjacency_matrix(PlacesMatp1u, mode="undirected", weighted = TRUE)
Export the edge list for re-use in SNA software:
# convert igraph object into edge list
edgelistp1p <- as_edgelist(PlacesMatp1pNet)
edgelistp1u <- as_edgelist(PlacesMatp1uNet)
# export edge and node lists
write.csv(edgelistp1p, "edgelistp1p.csv")
write.csv(edgelistp1u, "edgelistp1u.csv")
Visualize the networks
plot(PlacesMatp1pNet,
vertex.color = "red",
vertex.label.color = "black",
vertex.label.cex = 0.5,
vertex.size = degree(PlacesMatp1pNet), # node size proportionate to degree centrality
edge.width=E(PlacesMatp1pNet)$weight, # edge width proportionate to ties weight
main="Network of places: Phase 1 (1883-1908)",
sub = "Node size proportionate to degree centrality")
plot(PlacesMatp1uNet,
vertex.color = "steel blue",
vertex.shape="square",
vertex.label.color = "black",
vertex.label.cex = 0.5,
vertex.size = degree(PlacesMatp1uNet), # node size proportionate to degree centrality
edge.width=E(PlacesMatp1uNet)$weight, # edge width proportionate to ties weight
main="Network of colleges: Phase 1 (1883-1908)",
sub = "Node size proportionate to degree centrality")
Create networks
# Network of Places (academic trajectories) linked by universities (P)
bimodp2p<-table(resultp2$Edgelist$Places, resultp2$Edgelist$Set) # create adjacency matrix from Edgelist
PlacesMatp2p<-bimodp2p %*% t(bimodp2p)
diag(PlacesMatp2p)<-0
# Network of universities linked by places (academic trajectories) (P* = transposed network of P P*) (cf. Pizarro 2009)
bimodp2u<-table(resultp2$Edgelist$Set, resultp2$Edgelist$Places)
PlacesMatp2u<-bimodp2u %*% t(bimodp2u)
diag(PlacesMatp2u)<-0
# build network from adjacency matrix with igraph
library(igraph)
PlacesMatp2pNet<-graph_from_adjacency_matrix(PlacesMatp2p, mode="undirected", weighted = TRUE)
PlacesMatp2uNet<-graph_from_adjacency_matrix(PlacesMatp2u, mode="undirected", weighted = TRUE)
# convert into edge list for re-use in SNA software
edgelistp2p <- as_edgelist(PlacesMatp2pNet)
edgelistp2u <- as_edgelist(PlacesMatp2uNet)
# export edge and node lists
write.csv(edgelistp2p, "edgelistp2p.csv")
write.csv(edgelistp2u, "edgelistp2u.csv")
Visualize graphs
plot(PlacesMatp2pNet,
vertex.color = "red",
vertex.label.color = "black",
vertex.label.cex = 0.5,
vertex.size = degree(PlacesMatp2pNet)/2.5, # size of node proportionate to degree centrality
edge.width=E(PlacesMatp2pNet)$weight, # edge width proportionate to number of ties
main="Network of places: Phase 2 (1909-1918)",
sub = "Node size proportionate to degree centrality")
plot(PlacesMatp2uNet,
vertex.color = "steel blue",
vertex.shape="square",
vertex.label.color = "black",
vertex.label.cex = 0.5,
vertex.size = degree(PlacesMatp2uNet), # node size proportionate to degree centrality
edge.width=E(PlacesMatp2uNet)$weight,
main="Network of colleges: Phase 2 (1909-1918)",
sub = "Node size proportionate to degree centrality")
Create networks
# Network of Places (academic trajectories) linked by universities (P)
bimodp3p<-table(resultp3$Edgelist$Places, resultp3$Edgelist$Set) # create adjacency matrix from Edgelist
PlacesMatp3p<-bimodp3p %*% t(bimodp3p)
diag(PlacesMatp3p)<-0
# Network of universities linked by places (academic trajectories) (P* = transposed network of P P*) (cf. Pizarro 2009)
bimodp3u<-table(resultp3$Edgelist$Set, resultp3$Edgelist$Places)
PlacesMatp3u<-bimodp3u %*% t(bimodp3u)
diag(PlacesMatp3u)<-0
# build network from adjacency matrix with igraph
library(igraph)
PlacesMatp3pNet<-graph_from_adjacency_matrix(PlacesMatp3p, mode="undirected", weighted = TRUE)
PlacesMatp3uNet<-graph_from_adjacency_matrix(PlacesMatp3u, mode="undirected", weighted = TRUE)
# convert into edge list for re-use in SNA software
edgelistp3p <- as_edgelist(PlacesMatp3pNet)
edgelistp3u <- as_edgelist(PlacesMatp3uNet)
# export edge and node lists
write.csv(edgelistp3p, "edgelistp3p.csv")
write.csv(edgelistp3u, "edgelistp3u.csv")
Visualize graphs
plot(PlacesMatp3pNet,
vertex.color = "red",
vertex.label.color = "black",
vertex.label.cex = 0.5,
vertex.size = degree(PlacesMatp3pNet)/2.5, # node size proportionate to degree centrality
edge.width=E(PlacesMatp3pNet)$weight,
main="Network of places: Phase 3 (1919-1935)",
sub = "Node size proportionate to degree centrality")
plot(PlacesMatp3uNet,
vertex.color = "steel blue",
vertex.shape="square",
vertex.label.color = "black",
vertex.label.cex = 0.5,
vertex.size = degree(PlacesMatp3uNet), # node size proportionate to degree centrality
edge.width=E(PlacesMatp3uNet)$weight,
main="Network of colleges: Phase 3 (1919-1935)",
sub = "Node size proportionate to degree centrality")
In the early phase (1883-1908), before the Boxer Indemnity Program was started, American University Men did not really form a network. Singular trajectories and geographical dispersion dominated during this period. We identified only four places (out of 42) which involved more than one student, but none of them attended the same colleges at the same time.
The second phase opened with the enactment of the Boxer Indemnity Program in 1908. During this ten-year period, the first alumni networks emerged around three core colleges based on the East Coast (Columbia, Harvard, MIT). Geographical proximity and academic prestige formed the basis of these emerging networks. Multi-student places grew in importance and implied actual interaction between the students.
After World War I, East Coast-based colleges maintained their prominence but new academic poles emerged in the Midwest (Michigan, Chicago, Ohio). This shift away from the East Coast was driven by academic specialization (law, finance, railway administration) and the search for lower costs of living during the Great Depression.