--- title: "A Practical Guide to the 'enpchina' package" subtitle: 'Case 1: The Rotary Club in the Chinese Press' author: "Cécile Armand" affiliation: Aix-Marseille University date: "`r lubridate::today()`" tags: [americanization, elite, press, NER, network] abstract: | This guide aims to demonstrate how China historians can take advantage of the "enpchina" package to explore massive corpora of historical newspapers, focusing on a major Chinese newspaper - *Shenbao* 申報 - and through a concrete case study - the Rotary Club of Shanghai 上海扶輪社 (Shanghai fulunshe). output: html_document: toc: true toc_float: collapsed: false smooth_scroll: false toc_depth: 2 number_sections: false code_folding: show # hide theme: readable # all theme -> https://bootswatch.com/3/ #fig_width: 10 #fig_height: 10 #fig_caption: true df_print: paged --- ```{r setup, include=FALSE, warning=FALSE, message=FALSE} knitr::opts_chunk$set(echo = TRUE) library(enpchina) library(dplyr) library(lubridate) library(ggplot2) library(plotly) library(tidygraph) library(igraph) library(tidyr) library(stringr) library(scales) library(webshot) ``` # Prologue This guide aims to present the various functions available in the "enpchina" R package. This package was developed by the European Research Council (ERC) project ["Elites, Networks and Power in modern China" (ENP-China)](https://www.enpchina.eu/), especially by our computational linguist Pierre Magistry. This package was designed to enable China historians to explore massive corpora of historical newspapers. Ultimately, we hope to demonstrate how historians can harness digital techniques to explore historical sources at an unprecedented scale and develop alternative approaches to historical research. Such practices do not displace qualitative analyses but complement and contextualize the close reading of historical documents. The "enpchina" package relies on R studio. We chose R-Studio because its language is relatively simple to handle for novice programmers (compared to Python, for instance) and because its widely-used platform (called an Integrated Programming Environment or IDE) provides a unique environment for performing the complete chain of operations - from extraction to analyses - within a single framework. ## Who is this guide for? This guide addresses historians with no *a priori* knowledge of language programming. We only provide the code for the sake of traceability, but the display is optional. One can choose to display it or not. It is easy to skip the code and just focus on the results and analyses. For technical details on the development of the package, please refer to the dedicated [GitHub page](https://github.com/enp-china) (forthcoming). ## Goals This tutorial will demonstrate how historians can use the various functions included in the "enphina" package in order to master the complete chain of operations from the extraction of historical information to the analysis and interpretation of the data extracted. In the following, we provide only basic analyses and interpretation of the results. For this demonstration, we will rely on a concrete case study: the Rotary Club of Shanghai (Shanghai fulunshe 上海扶輪社). Moreover, this guide applies specifically to the Chinese-language press (Shenbao 申報). A future tutorial will deal with the English-language press (ProQuest Historical Newspapers). ## Outline In this guide, we will use the various functions available in the "enpchina" package in order to perform the complete chain of operations from extracting historical information to analyzing and interpreting the data extracted: 1. Search the Rotary Club of Shanghai in the *Shenbao*. 2. Analyze and visualize the results of our query (number of documents, occurrences, distribution over time). 3. Build a subcorpus based on our query. 4. Perform named entity recognition (NER) on the corpus, in order to identify the persons, institutions, places and events associated with the club. 5. Search a list of members in the corpus. The list of members was drawn from external sources (rosters found in the archives of Rotary International and processed separately). 6. Build a two-mode network graph using "Padagraph" in order to explore the relations between named entities mentioned in connections with the Rotary Club. ## Getting started First, we need to load the "enpchina" package (and other necessary packages) in R Studio: ```{r warning=FALSE, message=FALSE} library(enpchina) library(dplyr) library(lubridate) library(ggplot2) library(tidygraph) library(igraph) library(tidyr) library(stringr) ``` The basic command below lists the corpora available with the "enpchina" package: ```{r message=FALSE, warning=FALSE} enpchina::list_corpora() ``` The corpus we are interested in for this tutorial is "shunpao" (Shenbao). # Research context ## Sources Established in Shanghai in 1872, the *Shenbao* 申報 was one of the largest, pioneering, and most enduring Chinese-language daily newspaper in modern China. Its circulation reached 150 000 copies per day in the 1930s. Although it was based in Shanghai, its readership was truly national in scope. The last issue appeared on [May 27, 1949](https://madspace.org/Press/?ID=513). ## Research Object (Rotary) Founded in Chicago in 1905, the Rotary Club was introduced in China after World War I. The organization aimed to enable business and professional men to socialize and legitimize their position by promoting higher standards of business and devising a new ethics of public service. The first club was established in Shanghai in 1919 - *Shanghai fulunshe* 上海扶輪社 - with some twenty members, exclusively foreigners. Its membership increased to over 150 and Chinese members became the majority before the Sino-Japanese war (1937-1945). In 1948, a Chinese-speaking club - the Rotary Club of Shanghai West (Huxi fulunshe 滬西扶輪社) - was established in parallel with the original *fulunshe* in order to incorporate non-English speakers. After the Communist takeover and the founding of the People's Republic of China (RPC), the organization was terminated in 1952. # 1. Overview of Rotary's presence in the *Shenbao* Let's start with a broad picture of the presence of the Rotary Club of Shanghai in the Chinese press. The following graph plots the number of documents mentioning the Rotary Club of Shanghai (上海扶輪社 Shanghai fulunshe) in the *Shenbao* between 1920 and 1948. ```{r message=FALSE, warning=FALSE} enpchina::count_documents('"上海扶輪社"', "shunpao") %>% mutate(Date=lubridate::as_date(Date,"%y%m%d")) %>% mutate(Year= year(Date)) %>% group_by(Year) %>% summarise(N=sum(N)) %>% filter (Year>=1920) %>% ggplot(aes(Year,N)) + geom_col(alpha = 0.8) + labs(title = "The Rotary Club of Shanghai in the Shenbao", subtitle = "Number of articles mentioning 上海扶輪社", x = "Year", y = "Number of articles") ``` **Analysis** The presence of the Rotary Club of Shanghai is rather discrete and sparse across the period. Interestingly, the *Shenbao* did not report the creation of the club in 1919. The graph highlights two major waves. The first one occurred in 1929-1930, which coincided with the election of Kuang Fuzhuo 鄺富灼 (Fong Foo Sec) as the second, major Chinese president of the club. The second peak in 1939-1940 probably reflects the commitment of the club in war relief. The close reading of the articles will clarify these hypotheses. Moreover, from 1931, the club included three representatives of the *Shenbao*, which may have leveraged the publicity of the organization in the Chinese newspaper. The manager Wang Yingbin (Y.P. Wang 汪英賓) was the first to joined in 1931. He was succeeded by Ma Yulin (Y.L. Ma 馬玉麟) in 1934. Wang Xianting (Wang Hsien-ting), manager of the *Shenbao* after the war, joined the Rotary Club of Shanghai West 王顯庭 in 1948. # 2. Build the corpus Next, we propose to build a corpus containing all the articles that mention the Rotary Club of Shanghai. We proceed in two steps: 1. We list all the articles mentioning the club with their metadata. 2. We retrieve the full text of the articles. ## Documents The function "search_documents()" produces the list of unique documents (articles) mentioning the club, regardless of the number of occurrences in each document. The command below produces the table of results (docs). For the sake of legibility, only the ten first results are displayed: ```{r message=FALSE, warning=FALSE} docs <- search_documents('"上海扶輪社"', "shunpao") docs ``` The table of documents contains four columns: * Id = unique identifier of the article * Date = date of publication (year, month, day) * Title = title of the article * Source = name of the source (newspaper) ## Concordance table (occurrences in context) The function "search_concordance()" produces the concordance table that contains all mentions of the club and help situate each occurrence in its context (conc). In this example, we chose to retain 30 characters surrounding the keyword, but one can easily increase or decrease the length of the segment just by modifying the number after "context_size =". ```{r message=FALSE, warning=FALSE} conc <- search_concordance('"上海扶輪社"', corpus = "shunpao", context_size = 30) conc ``` Compared to the table of documents, the concordance table contains three additional columns: * Id = unique identifier of the article * Date = date of publication (year, month, day) * Title = title of the article * Source = name of the source (newspaper) * Matched = the object of our research (上海扶輪社) * Before = the 15 characters preceding the object of our research * After = the 15 characters following the object of our research The table of documents contains 118 results (unique articles), whereas the concordance table contains a few more (132), which means that the club may be mentioned several times in the same article. Since we were also interested in the Chinese-speaking Rotary Club of Shanghai West (滬西扶輪社) established in 1948, we performed a combined search relying on basic bolean operators ("&" or "|": ```{r message=FALSE, warning=FALSE} docs2 <- search_documents('"上海扶輪社" | "滬西扶輪社"', "shunpao") conc2 <- search_concordance('"上海扶輪社" | "滬西扶輪社"', corpus = "shunpao", context_size = 30) ```
**Table of documents:** ```{r message=FALSE, warning=FALSE} docs2 ```
**Concordance table:** ```{r message=FALSE, warning=FALSE} conc2 ``` The results reveals that the Rotary Club of Shanghai West barely deserved attention in the *Shenbao* (only 4 articles). Obviously, it came too late and was short-lived to leave its print in the Chinese press. The graph below visualizes the uneven distribution of articles between the two clubs over time: ```{r message=FALSE, warning=FALSE} conc2 %>% distinct(Id, Date, Title, Matched) %>% mutate(Year = stringr::str_sub(Date,0,4)) %>% group_by(Year, Matched) %>% count() %>% filter(Matched!="上海") %>% ggplot(aes(x=Year, y=n, fill=Matched)) + geom_col(alpha = 0.8) + labs(title = "The Rotary Clubs of Shanghai and Shanghai West in the Shenbao (1925-1948)", subtitle = "Number of documents (articles) mentioning each club", y = "Number of articles", fill = "Club (Queried term)") ``` The original Rotary Club of Shanghai is represented by red bars, whereas the Rotary Club of Shanghai West appears in green. ## Retrieve full texts Let's have a closer look at the content of the articles. For this purpose, we will retrieve the full text of the documents and create a sub-corpus on this basis. The table below displays only the first ten documents: ```{r message=FALSE, warning=FALSE} corpus_with_fulltexts <- enpchina::get_documents(docs2, "shunpao") corpus_with_fulltexts ``` The table above contains the same columns as in the list of documents, plus an additional column ("Text") containing the full text of each article. ## Save results At this stage, it is possible to save and export the results as *csv* files. ```{r message=FALSE, warning=FALSE} write.csv(docs, "documents.csv") # list of documents write.csv(conc, "concordance.csv") # concordance table write.csv(corpus_with_fulltexts, "fulltext.csv") # full texts ``` # 3. Reconstructing the network of the Rotary Club using named entity recognition (NER) Who was related to the Rotary Club of Shanghai? Named entity recognition (NER) is one tool that can help answer the question. The function "ner_on_corpus()" included in the package allows users to extract the name of all the persons, organizations, places and events mentioned in any corpus of documents. The algorithm used for detecting named entities in the Chinese language relies on [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/). It identifies five main types of entities: geopolitical entities (GPE) (countries, administrative regions...), locations (LOC), organizations (ORG) and persons (PERS). In order to identify the actors associated with the Rotary Club of Shanghai, we apply this function to the corpus of articles mentioning the Rotary Club in the *Shenbao*. ```{r message=FALSE, warning=FALSE} docs_sb <- search_documents('"上海扶輪社"', "shunpao") full_text_sb <- enpchina::get_documents(docs_sb, "shunpao") %>% rename(Id = "DocID") ner_results_sb <- enpchina::ner_on_corpus(full_text_sb, "shunpao") %>% mutate(Type2 = str_extract(Type, "GPE|PER|ORG|LOC")) ner_results_sb ``` The table above lists all the entities recognized in our corpus "Rotary" (only the ten first results are displayed). In addition to the documents metadata (DocID), the table of results contains four new columns: * Text = the exact name of the identity (as given in the source) * Type = the type of the entity extracted with its confidence index (measuring how far the algorithm was right in classifying the entity). * Start = the position of the character immediately preceding the entity in the article * End = the position of the character immediately following With the *Shenbao*, the types of entities come with an index of confidence that makes any grouping by type impossible. Therefore, we created of a additional column "Type2" that contains only the type name. In our corpus, "organizations" represent the largest category (230, 36%), followed by geopolitical entities (212, 33%), persons (153, 24%), and finally, locations (52, 8%). ```{r message=FALSE, warning=FALSE} ner_count <- ner_results_sb %>% group_by(Type2) %>% count() %>% arrange(desc(n)) ner_count ``` ```{r message=FALSE, warning=FALSE} ner_results_sb %>% group_by(Type2) %>% count() %>% ggplot(aes(reorder(Type2, n), n)) + geom_col(alpha = 0.8) + labs(title = "The network of the Rotary Club of Shanghai in the Shenbao", subtitle = "Named entities associated with the club, per category", x = "Type of entity", y = "Number of entities") ```
If we retain only distinct entities, we obtain a different distribution. Organizations still rank first, but persons now supersede geopolitical entities. The Shanghai Rotary Club is associated with 144 unique organizations (35%), 132 persons (32%), 85 geopolitical entities (21%) and 46 distinct locations (11%). Individual persons are more likely to appear a limited number of times compared to GPE, which may be repeated many times. ```{r message=FALSE, warning=FALSE} ner_u_count <- ner_results_sb %>% group_by(Type2) %>% distinct(Text) %>% count() %>% arrange(desc(n)) ner_u_count ``` ```{r message=FALSE, warning=FALSE} ner_results_sb %>% group_by(Type2) %>% distinct(Text) %>% count() %>% ggplot(aes(reorder(Type2, n), n)) + geom_col(alpha = 0.8) + labs(title = "The network of the Rotary Club of Shanghai in the Shenbao", subtitle = "Unique named entities, per category", x = "Type of entity", y = "Number of entities") ``` We can also visualize the distribution over time : ```{r message=FALSE, warning=FALSE} ner_results_y <- ner_results_sb %>% rename(Id = DocID) ner_results_y <- full_join(docs, ner_results_y) %>% mutate(Year = stringr::str_sub(Date,0,4)) %>% na.omit() ggplot(data = ner_results_y) + geom_bar(mapping = aes(x = Year, fill = Type2), alpha = 0.6, position = "dodge")+ labs(title = "The Rotary Club of Shanghai in the Shenbao", subtitle = "Distribution of named entities over time (1925-1948)", x = "Year", y = "Number of entities", fill = "Type of Entity") ``` ## Explore entities Who (what persons) is most frequently associated with the club? In what position? Which organizations are related to the club? Of what nature (private companies, schools or universities, administration, government - local or national), philanthropic or voluntary associations? What does this suggest about the activities of the club and its cooperation with other institutions? Which locations are the most often mentioned in connection with the club - local places of meeting or more abstract locations? First, we build an index based on the number of occurrences for each entity (only the ten first results are displayed): ```{r message=FALSE, warning=FALSE} ner_index_sb <- ner_results_sb %>% group_by(Type2, Text) %>% tally() %>% arrange(desc(n)) ner_index_sb ```
We can then filter by type and examine separately each category of entity.
### Persons ```{r message=FALSE, warning=FALSE} ner_pers_sb <- ner_index_sb %>% filter(Type2 == "PER") ner_pers_sb ```
If we discard isolated patronyms such as 蔣, 吳, 徐 (impossible to disambiguate as such) , the 10 most frequently mentioned persons are mostly foreign personalities: - Percy Maude Roxby (1880-1947) - 羅士培 - a British geographer who specialized in Chinese geography and promoted the mutual understanding between China and Britain through geography ([Zhang, 2015](http://www.geog.com.cn/EN/10.11821/dlxb201510012)). He gave a lecture before the Rotary Club of Shanghai in December 1946. - Hiram Bingham III (1875-1956) - 平海姆 - an American explorer and politician who discovered the Inca citadel of Machu Picchu in 1911 - invited as a guest speaker at a Rotary tiffin in 1927. - Joseph Avenol (1879-1952) - 愛文諾 - French diplomat who served as Secretary General of the League of Nations (1933-1940) - guest speaker at a Rotary tiffin in 1929. - Henri Cosme (1885-1952) 戈斯默 - French Minister to China (1939-1940) - mentioned in 1940 in connection with his radio talk during the war. - P.N.H. Jones 瓊斯東 : Director of the Public Works Department in Hong Kong - mentioned in connection with his petition to the S.M.C. about the sanitary conditions in the International Settlement in Shanghai in 1930. - U.S. President Franklin D. Roosevelt (1882-1945) 羅斯福 - mentioned in 1941 in connection with his his critical role during WWII. It is surprising that neither Chinese personalities nor Rotary leaders appear at the top of the list. On the one hand, this reflects certain limitations of the algorithm. The CoreNLP model was initially trained on an English-language press corpus from the 1980s, which probably makes it better at detecting Chinese transliteration of foreign celebrities’ name. Our computational linguists in the ERC project are currently working on a new, [improved model]((https://enepchina.hypotheses.org/3453)) for detecting historical named entities in the Chinese language. On the other hand, the results also reflect certain realities of the club itself and the editorial choices in reporting on Rotary activities in the *Shenbao*. Very few Chinese experts were invited to lecture at Rotary tiffins. Since the organizers mostly relied on their network of foreign contacts, foreign speakers represented the majority (Armand, 2021). Moreover, the *Shenbao* gave very few information on the participants. Reports of meetings are poorly detailed compared to the English-language press (ProQuest) (Case 2). *Note: Rotary “tiffins” (or luncheons) consisted in having lunch together while attending a lecture on a timely topic of interest or related to Rotarians’ field of specialty (classification talks). In Shanghai, such tiffins took place every Thursday at noon. They were restricted to Rotarians and their guests. In addition to regular meetings, the club held special events (dinner dancing, garden party, Valentine’s Day, Christmas, tennis or golf competition) open to the Rotarians' family and special guests. Closed meetings took place once a year for the election of officers and new members, and more occasionally in critical circumstances, such as the "Lincheng Incident" in 1923, which involved the kidnapping of several foreigners on the Pukow-Tientsin train, or the Japanese invasion of Manchuria in September 1931.* ### Organizations ```{r message=FALSE, warning=FALSE} ner_org_sb <- ner_index_sb %>% filter(Type2 == "ORG") ner_org_sb ```
Outside the Rotary Club (上海扶輪社, 扶輪社), the 10 most frequent organizations include the Shanghai Municipal Council (工部局), banking institutions (a private bank - Youbang yinhang 友邦銀行 - and the Central bank of China 中央銀行), (美國總會), a newspaper (Shanghai Evening Post and Mercury, 大美晚報: 3), an entertaining center (世界游歷團), and a hospital (中山醫院, whose construction in 1937 was sponsored by the Rotary Club).
### GPE ```{r message=FALSE, warning=FALSE} ner_gpe_sb <- ner_index_sb %>% filter(Type2 == "GPE") ner_gpe_sb ```
The 10 most frequent "GPE" relate to Shanghai (上海: 19, 滬: Hu , alternative name for Shanghai: 12, total : 31). The list also includes countries related to the Republic of China, Shanghai and the Rotary Club: outside China (中國/中: 20), the United States (美國, 美 : 32), Japan (日本/日: 16) and France (法國: 8, due to the presence of a strong French community and the French Concession in Shanghai) are the best represented countries. The occurrence of Vietnam (越南: 8) calls for a closer examination. ### Locations ```{r message=FALSE, warning=FALSE} ner_loc_sb <- ner_index_sb %>% filter(Type2 == "LOC") ner_loc_sb ```
The 10 most frequent locations fall into four main categories: 1. Street names in Shanghai (Kiangse Road 江西路, Great Western Road 大西路, in the International Settlement, Rue Petit 吉祥街 in the French Concession, 吳淞口). They often refer to the club's places of meeting (such as the Metropole Hotel on Kiangse Road). 2. Provinces or part of province (浙西: West of Zhejiang) 3. Broader, more abstract geographical entities such as the Far East (遠東: 2) or the Middle East (中東). The latter points to possible overlaps with the GPE category. 4. More anecdotally, we also find natural elements such as the Beishan mountain in Henan (北山), the Wusong river (吳淞口: 1) or the waterfall Qililong (七里泷) - known as "the Little Three Gorges" - probably mentioned in connection with river floods or dragging works (one occurrence each). ### Concluding remarks NER raises questions as to why certain entities appear in connection to the club and how to interpret their presence. Some are obvious (Rotary, China, United States, members of the club), but others are more intriguing. They call for a more in-depth inquiry, which may in turn open unexpected research paths.
Don't forget to save the results! ```{r message=FALSE, warning=FALSE} write.csv(ner_results_y, "ner_results_sb.csv") # list of all entities write.csv(ner_index_sb, "ner_index_sb.csv") # entities with index ``` ## From tabular data to network graphs While simple lists of entities are useful for basic statistical analyses, they obscure the links that exist between entities - textual co-occurrences that may refer to actual, social relations. By contrast, network graphs provide a powerful tool to explore these relations. In the next sections, we propose to build two types of network based on the list of entities we previously extracted: two-mode networks linking entities with the documents in which they appear, and one-mode network linking entities that co-occurred in the same documents. In the last step, we will also create a two-mode network linking entities of different nature (e.g. persons and organizations). ### Build a two-mode network linking persons to documents First we select the persons only in the list of named entities: ```{r warning=FALSE, message=FALSE} ner_results_pers_sb <- ner_results_sb %>% filter(Type2 == "PER") ```
We prepare an edge list linking Person(s) with documents : ```{r warning=FALSE, message=FALSE} persondata_sb <- ner_results_pers_sb %>% select(DocID, Text) persondata_sb ```
We prepare the edge and the nodes of the network and we build the two-mode network Persons - Documents using [igraph](https://igraph.org/) and [tidygraph](https://www.data-imaginist.com/2017/introducing-tidygraph/) ```{r warning=FALSE, message=FALSE} edges_sb1 <- persondata_sb %>% transmute(from=DocID, to=Text) ig_sb1 <- graph_from_data_frame(d=edges_sb1, vertices=NULL, directed = FALSE) tg_sb1 <- tidygraph::as_tbl_graph(ig_sb1) ```
Finally, we project the network into Padagraph - a powerful network graph visualization tool developed by our computational linguist Pierre Magistry: ```{r eval = FALSE, warning=FALSE, message=FALSE} tg_sb1 %N>% mutate(label=name) %>% enpchina::in_padagraph("sb-PersDoc") ```
The graph is accessible [here](https://pdg.enpchina.eu/rstudio?gid=sb-PersDoc#+10). At first you need to click on "global" or "+10" to display the graph. ### Build a one-mode network linking persons through documents Now we want to project this two mode-network into a one-mode network linking persons to persons through documents. First we create an edge list in the form of a table linking the source person (from) to the target person (to) - which is the standard format for igraph object: ```{r warning=FALSE, message=FALSE} edges_sb2 <- inner_join(persondata_sb, persondata_sb, by = "DocID") %>% filter(Text.x < Text.y) %>% transmute(from=Text.x, to=Text.y) %>% distinct() edges_sb2 %>% arrange(from, to) ```
The inner_join() function joins the table with itself through DocID. It creates a link for each couple of relation. "Distinct()" is used to eliminate duplicates in documents. Next we create the one-mode network Person-to-person using igraph and tidygraph: ```{r warning=FALSE, message=FALSE} edges_pers_sb_tg <- edges_sb2 %>% transmute(from=from, to=to) ig_sb2 <- graph_from_data_frame(d=edges_pers_sb_tg, vertices=NULL, directed = FALSE) tg_sb2 <- tidygraph::as_tbl_graph(ig_sb2) ```
Finally we project the one-mode network Person-Person into Padagraph: ```{r eval = FALSE, warning=FALSE, message=FALSE} tg_sb2 %N>% mutate(label=name) %>% enpchina::in_padagraph("sb-PersPers") ```
The graph is accessible [here](https://pdg.enpchina.eu/rstudio?gid=sb-PersPers#+10). At first you need to click on "global" or "+10" to display the graph. Depending on your research questions, you can replace persons by organizations, locations, or events. You may also build a multimodal network linking entities of different nature (e.g. persons and organizations): ### Build a two-mode network person-organization Let's say we want to create a two-mode network linking persons and organizations using Padagraph. First we select only persons and organizations in the original list of named entities: ```{r message=FALSE, warning=FALSE} ner_results_sb_pers_org <- ner_results_sb %>% filter(Type2 %in% c("PER", "ORG")) %>% select(Type2, Text, DocID) ner_results_sb_pers_org ```
We create the edge list linking persons and organization by joining the table of entities with itself through DocID: ```{r} edges_sb3 <- ner_results_sb_pers_org %>% inner_join(ner_results_sb_pers_org, by = "DocID") %>% filter(Text.x < Text.y) %>% transmute(from=Text.x, to=Text.y) %>% distinct() edges_sb3 %>% arrange(from, to) ``` We create the two-mode network Person-to-Organization using igraph and tidygraph: ```{r warning=FALSE, message=FALSE} edges_pers_org_sb_tg <- edges_sb3 %>% transmute(from=from, to=to) ig_sb3 <- graph_from_data_frame(d=edges_pers_org_sb_tg, vertices=NULL, directed = FALSE) tg_sb3 <- tidygraph::as_tbl_graph(ig_sb3) ```
Finally we project the one-mode network Person-Person into Padagraph: ```{r eval = FALSE, warning=FALSE, message=FALSE} tg_sb3 %N>% mutate(label=name) %>% enpchina::in_padagraph("sb-PersOrg") ```
The graph is accessible [here](https://pdg.enpchina.eu/rstudio?gid=sb-PersOrg#+10). At first you need to click on "global" or "+10" to display the graph. ### Export edge lists You may also save the edge lists to rebuild the networks of entities with any social network analysis software (Gephi, Cytoscape...) ```{r message=FALSE, warning=FALSE} write.csv(edges_sb1, "edgelist_pers_doc_sb.csv") # two-mode network person-document write.csv(edges_sb2, "edgelist_pers_pers_sb.csv") # one-mode network person-person write.csv(edges_sb3, "edgelist_pers_org_sb.csv") # two-mode network person-org ``` # 4. Search external data (list of members) in the corpus Relying on the rosters of members available in the archives of Rotary International, we established a list of 131 Chinese Rotarians in Shanghai from 1919 to 1951. This list is available on [Zenodo ENP-China Community](https://zenodo.org/record/4283499#.X--WCOlKh0s). We were able to identify 84 of them (with their Chinese name). We propose to search these individuals in our corpus in order to determine how far they were active in the club. We takes the number of times they appear in the Rotary sub-corpus (containing only documents related to the Rotary Club) as an indicator of the degree of their commitment to the club. The first thing we need to do is to import the list of members and to format their names so as to enable performing the query. Only the 10 first members are displayed below, with their name in Chinese (Name_zh) and its transliteration in pinyin (Nampe_py) and Wade-Giles (Name_eng): ```{r message=FALSE, warning=FALSE} library(readr) rotarian_zh <- read_delim("Rotary_input/rotarian.csv", ";", escape_double = FALSE, col_types = cols(Name_full = col_skip()), trim_ws = TRUE) %>% na.omit() %>% arrange(Name_eng) rotarian_zh ``` In order to identify the Chinese Rotarians in the press and examine their relation with the club, we proceed in two steps: 1. We search them in the entire corpus (Shenbao) 2. We join the results with the Rotary subcorpus based on the "id" of the documents. We identified all the 84 unique individuals from the original list. In the Rotary subcorpus, these individuals total 119 occurrences, i.e. an average of 1,4 per person, with great variations from one individual to the other. In the whole corpus, the 84 Rotarians represent 36,306 occurrences (432/person), with even greater discrepancies between them. The following sections describe in detail the method for obtaining these results and offer a preliminary interpretation. ## Index of Commitment The table below lists all the occurrences of Rotarians mentioned in the *Shenbao* in connection with the club (Rotary subcorpus). It contains five columns with the individuals' name in Chinese (Name), the identifier of the article in which they appear (Id), the date of publication (Date), the title of the article (Title), and the source (Shenbao). Only the ten first results are displayed: ```{r message=FALSE, warning=FALSE} rotarianzh <- rotarian_zh %>% select(Name_zh) %>% na.omit() %>% mutate(Queries=str_glue('"{Name_zh}"')) multiple_search <- function(queries, corpus) { results <- enpchina::search_documents(queries[1], corpus) %>% mutate(Q=queries[1]) for(q in queries){ new_result <- enpchina::search_documents(q, corpus) %>% mutate(Q=q) results <- dplyr::bind_rows(results, new_result) } distinct(results) } rotarians_in_shunpao <- multiple_search(rotarianzh$Queries, "shunpao") rotarians_in_sb_subcorpus <- inner_join(docs2, rotarians_in_shunpao, by = "Id") rotarians_in_sb_subcorpus <- rotarians_in_sb_subcorpus %>% select(Q, Id, Date.x, Title.x, Source.x) %>% rename (Date= Date.x) %>% rename (Title= Title.x) %>% rename (Source= Source.x) %>% rename (Queries= Q) rotarians_in_sb_subcorpus <- full_join(rotarians_in_sb_subcorpus, rotarianzh) rotarians_in_sb_subcorpus <- select(rotarians_in_sb_subcorpus,-c(Queries)) rotarians_in_sb_subcorpus <-rotarians_in_sb_subcorpus %>% select(Name_zh, Id, Date, Title, Source) %>% rename (Name= Name_zh) rotarians_in_sb_subcorpus ``` Based on these results, we propose to measure the strength of each individual's connection with the club by counting the number of times they appear in the subcorpus. The table below displays the ten first individuals, by decreasing order of importance (number of occurrences): ```{r, message=FALSE, warning=FALSE} rotarians_in_sb_subcorpus_uniq <- rotarians_in_sb_subcorpus %>% distinct(Name, Id) %>% group_by(Name) %>% count() %>% arrange(desc(n)) rotarians_in_sb_subcorpus_uniq ``` The table reveals discrepancies between members, though not as dramatic as in the English-language press (Case 2). Wang Zhengting (7) tops the lists with 7 occurrences, followed by Kuang Fuzhuo (Fong Sec) (5), Wu Tiecheng (Wu Te-cheng) (4), and a group of four Rotarians who were mentioned three times. The next 15 Rotarians appeared twice and the remaining 60 (71%) only once: ```{r message=FALSE, warning=FALSE} rotarians_in_sb_subcorpus_uniq %>% group_by(n) %>% count(Name) %>% ggplot(aes(x=n)) + geom_histogram(alpha=0.8)+ labs(title = "The Chinese Rotarians in the Shenbao", subtitle = "Index of commitment to the club", x = "Number of occurrences", y = "Number of individuals") ``` The distribution partially reflects the Rotarians' position in the club. In Wang's case, for instance, it reflects his appointment to several international positions in the organization - as a commissioner, district governor, special advisor, board member and even vice-president of Rotary International. Kuang Fuzhuo 鄺富灼 was elected president of the club in 1932 and appointed to prominent positions in Rotary International as well. He actively campaigned for the sinicization of the organization and sponsored the creation of the first Chinese-speaking club in Shanghai in 1936. Although he did not hold any position in the club, the Mayor of Shanghai Wu Tiecheng 吳鐵城 was often invited as a guest of honor during its special events. Li Yuanxin 李元信 (William Yinson Lee) sit at the board of directors and was the chairman of the program committee for two consecutive years (1934-1936), among other committees. He Dekui 何德奎 served in various committees. Xu Jianping (Jabin Hsu) 許建屏 was a member of the board of directors (1924-1925) and of the Fellowship committee (1933-1934). In several cases, however, the index does not accurately reflect the actual importance of members in the club. Although he never took any positions in the club, Gu Bingyuan, for instance, ranks relatively high compared to other Rotarians who held prominent positions, such as Zhu Boquan 朱博泉 - who served as the treasurer of the club for five consecutive years (1928-1935) and was eventually elected president in 1934-1935 - and Guo Baoshu 郭寶樹 (Percy Kwok), who was elected vice-president of the club in 1930 and chaired several important committees (Club Service, Fellowship). They are placed at the same level as second-rank and barely active members like Xia Peng 夏鵬 - who according to the attendance records of Rotary International, missed many regular meetings without excuse - and Hu Hongdi 胡鸿基, who contributed only once in the Public Affairs committee. From a comparative perspective, the ranking in ProQuest offer a better reflection of their actual influence in the club (Case 2). ## Index of Reputation We are also interested in measuring the Rotarians' reputation and social importance more generally, based on the number of times they appeared in the *Shenbao*. To this end, we extend the query to the entire corpus. We obtain a table with 36,306 results, which corresponds to the total number of occurrences for all members. In this table, a person may be mentioned several times in the same article. The table includes the unique identifier (Id), the title of the article (Title), the name of the newspaper (Source), and the name of the person (Q). Only the 10 first results are displayed: ```{r message=FALSE, warning=FALSE} rotarianzh <- rotarian_zh %>% select(Name_zh) %>% na.omit() %>% mutate(Queries=str_glue('"{Name_zh}"')) multiple_search <- function(queries, corpus) { results <- enpchina::search_documents(queries[1], corpus) %>% mutate(Q=queries[1]) for(q in queries){ new_result <- enpchina::search_documents(q, corpus) %>% mutate(Q=q) results <- dplyr::bind_rows(results, new_result) } distinct(results) } rotarians_in_shunpao <- multiple_search(rotarianzh$Queries, "shunpao") rotarians_in_shunpao <- rotarians_in_shunpao %>% rename (Queries= Q) rotarians_in_shunpao <- full_join(rotarians_in_shunpao, rotarianzh) rotarians_in_shunpao <- select(rotarians_in_shunpao,-c(Queries)) rotarians_in_shunpao <-rotarians_in_shunpao %>% select(Name_zh, Id, Date, Title, Source) %>% rename (Name= Name_zh) rotarians_in_shunpao ``` Based on these results, we built an index of reputation for each individual by counting the number of times they appear in the *Shenbao*. Only the 10 first individuals are displayed: ```{r message=FALSE, warning=FALSE} rotarians_in_shunpao_uniq <- rotarians_in_shunpao %>% distinct(Name, Id) %>% group_by(Name) %>% count() %>% arrange(desc(n)) rotarians_in_shunpao_uniq ``` Wang Zhengting 王正廷 still shows the highest score, with more than 7000 occurrences. On the opposite, four Rotarians appear only once (Cao Diqiu 曹荻秋 appears only once. The reputation index follows an even more dramatic heavy-tail distribution than the index of commitment examined earlier. In sum, the higher the reputation, the lower the number of individuals: ```{r message=FALSE, warning=FALSE} rotarians_in_shunpao_uniq %>% group_by(n) %>% count() %>% ggplot(aes(x=n)) + geom_histogram(alpha=0.8)+ labs(title = "The Chinese Rotarians in the Shenbao", subtitle = "Index of reputation", x = "Number of occurrences", y = "Number of individuals") ``` Except for the two most frequently mentioned names (Wang Zhengting 王正廷 and Wu Tiecheng 吳鐵城), the ranking based on the entire corpus differs greatly from the ranking based on the Rotary subcorpus. In general, the index reputation reflects individuals' prominence at the local, national or international stage. It is not surprising that the diplomats Wang Zhengting (7189 occurrences), Gu Weijun 顾维钧 (Wellington Koo) (4176) and Shi Zhaoji 施肇基 (Alfred Sze) (1516) appear among the fifth most mentioned names. Other prominent names include the Mayor of Shanghai Wu Tiecheng (5220), industrialists such as Liu Hongsheng 劉鴻生 (1757) and professionals such as the New-York trained lawyer Liu Zhan'en 劉湛恩 (1383), the technocrat Guo Bingwen 郭秉文 (1344) or the educator Chen Siliang 沈嗣良 (1253). They illustrate the diversity of sectors in which the American-trained elites were represented in Chinese society. From these differences, three main profiles may be distinguished: 1. Rotarians with an equivalent level of reputation and commitment (e.g. Wang Zhengting, Wu Tiecheng) 2. Rotarians with a high level of commitment but a comparatively lower reputation (e.g. Kuang Fuzhuo) 3. Rotarians with a high index of reputation but a comparatively lower level of commitment to the club (e.g. Gu Weijun, Liu Hongsheng). Comparing more systematically the general distribution and the relative position of each individual in the two corpora will help determine whether the ranking reflects their importance strictly in the club and/or their social influence more generally beyond the club. From a comparative perspective, the ranking is very different in the ProQuest corpus (Case 2). The English-language press relied on different criteria for evaluating Chinese elites' social importance and reputation, placing professionals, financiers and entrepreneurs above diplomats and local politicians. The journalist Xu Jianping 許建屏 (Jabin Hsu) ranks first with 476 occurrences, the banker Zhu Boquan 朱博泉 (Percy Chu) ranks second (322), while the entrepreneur Guo Baoshu 郭寶樹 (Percy Kwok) ranks third (200). The architect Fan Wenzhao 范文照 (Robert Fan) ranks 4/37 with 183 occurrences in ProQuest, whereas he ranks only 32/84 with 174 occurrences in the *Shenbao*. Conversely, the first diplomat on the list - Shi Siming 施思明 (Szeming Sze) appears only at the 10th position, while the Mayor of Shanghai - Wu Tiecheng - ranks 21st. ##Save the results At this stage, it is recommended to save and export the results. You can export the complete lists of occurrences for each individual in the two corpora and the synthetic indices we visualized above. In order to facilitate the comparison between the two corpora, we combined the two indices (general reputation in the Shenbao and committment to the Rotary) into a single table: ```{r message=FALSE, warning=FALSE} write.csv(rotarians_in_shunpao, "rotarians_in_shunpao.csv") # list of occurrences in the Shenbao write.csv(rotarians_in_sb_subcorpus, "rotarians_in_sb_subcorpus.csv") # list of occurrences in the "Rotary" subcorpus rotarians_sb_index <- full_join(rotarians_in_shunpao_uniq, rotarians_in_sb_subcorpus_uniq, by = "Name") %>% rename(Shenbao = n.x) %>% rename(Rotary = n.y) %>% rename(Name_zh = Name) rotarians_sb_index <- inner_join(rotarianzh, rotarians_sb_index, by = "Name_zh") rotarians_sb_index <- rotarians_sb_index %>% rename(Name = Name_zh) write.csv(rotarians_sb_index, "rotarians_sb_index.csv") # synthetic indices for each individual ``` How and why did these indices evolve over time? We propose to examine the variations in their reputation and commitment to the club during the course of their life. The following analyses focus on the 15 most popular Rotarians. ## Reputation over time ```{r message=FALSE, warning=FALSE} rotarians_in_shunpao <- full_join(rotarians_in_shunpao, rotarians_sb_index) integer_breaks <- function(n = 5, ...) { fxn <- function(x) { breaks <- floor(pretty(x, n, ...)) names(breaks) <- attr(breaks, "labels") breaks } return(fxn) } rotarians_in_shunpao %>% mutate(Year = stringr::str_sub(Date,0,4)) %>% filter(Shenbao>470) %>% group_by(Year) %>% count(Name) %>% ggplot(aes(Year, n, fill = as.factor(Name))) + geom_col(alpha = 0.8, show.legend = FALSE) + facet_wrap(~ Name, scales = "free_y", ncol = 3) + scale_x_discrete(breaks = pretty_breaks(10)) + scale_y_continuous(breaks = pretty_breaks()) + labs(x = "Year", y = "Number of occurrences", title = "The 15 most popular Chinese Rotarians in the Shenbao (> 470 occ.)", subtitle = "Index of reputation over time (1873-1949)") ``` For most Rotarians, their popularity timespan covers the years 1920-1940s. Their appearance in the *Shenbao* follows the same regular, linear pattern starting with a few occurrences in the 1920s, followed by a peak in the 1930s, and eventually a slow decrease by the end of the war. This general pattern reflects a generational effect. The majority of the Chinese Rotarians in Shanghai were born during the waning years of the Qing dynasty, between 1880 and 1896 (31, 53%), with two peaks in 1884 and 1896. They joined the club in their thirties or forties, i.e. at the height of their professional career (Armand 2021, Liu 2012). The peaks on the graphs therefore coincide with the most active phase in their life. There are specific deviations from this general pattern, however. The reputation index of diplomats - Shi Zhaoji, Wang Zhengting and Gu Weijun - generally covers a longer period extending from the early years of the Republic to the early People's Republic of China (PRC). Notice that Gu’s appearance is much less continuous in the *Shenbao* than in the English-language press (case 2). ```{r message=FALSE, warning=FALSE} rotarians_in_shunpao %>% mutate(Year = stringr::str_sub(Date,0,4)) %>% filter(Shenbao>470) %>% group_by(Year) %>% count(Name) %>% filter(Name %in% c("王正廷", "施肇基", "顾维钧")) %>% ggplot(aes(Year, n, fill = as.factor(Name))) + geom_col(alpha = 0.8, show.legend = FALSE) + facet_wrap(~ Name, scales = "free", ncol = 3) + scale_x_discrete(breaks = pretty_breaks()) + scale_y_continuous(breaks = pretty_breaks()) + labs(x = "Year", y = "Number of occurrences", title = "The most popular Chinese Rotarians in the Shenbao", subtitle = "Diplomats' Reputational Profile") ``` Intellectual professionals, such as the educator Chen Siliang and the journalist Xu Jianping, experienced a more discrete trajectory, each with three distinct peaks that occurred about the same time. Li Ximou (Director of the Bureau of Education of the Municipality of Greater Shanghai, and member of the Rotary Club of Shanghai West) presents another intriguing case. Except for two specular appearances in the late 1920-early 1930s, he is almost nonexistent in the *Shenbao*. ```{r message=FALSE, warning=FALSE} rotarians_in_shunpao %>% mutate(Year = stringr::str_sub(Date,0,4)) %>% filter(Shenbao>470) %>% group_by(Year) %>% count(Name) %>% filter(Name %in% c("沈嗣良", "許建屏", "李熙謀")) %>% ggplot(aes(Year, n, fill = as.factor(Name))) + geom_col(alpha = 0.8, show.legend = FALSE) + facet_wrap(~ Name, scales = "free_y", ncol = 3) + scale_x_discrete(breaks = pretty_breaks()) + scale_y_continuous(breaks = pretty_breaks()) + labs(x = "Year", y = "Number of occurrences", title = "The most popular Chinese Rotarians in the Shenbao", subtitle = "Intellectual Professionals") ``` Local politicians and bureaucrats display a different profile. Most of them rose in popularity towards the end of the period. The youngest - Wu Guozhen (born 1903) - Mayor of Shanghai in 1945-1949 - and Li Ximou (b.1894), member of the post-war Rotary Club of Shanghai West - appeared essentially after the war, whereas senior Rotarians - the Chinese member of the Shanghai Municipal Council (S.M.C.) He Dekui, the Mayor of Shanghai Wu Tiecheng, and the industrialist Liu Hongsheng) had experienced an earlier peak in the 1930s. ```{r message=FALSE, warning=FALSE} rotarians_in_shunpao %>% mutate(Year = stringr::str_sub(Date,0,4)) %>% filter(Shenbao>470) %>% group_by(Year) %>% count(Name) %>% filter(Name %in% c("吳鐵城", "吳國楨", "何德奎")) %>% ggplot(aes(Year, n, fill = as.factor(Name))) + geom_col(alpha = 0.8, show.legend = FALSE) + facet_wrap(~ Name, scales = "free_y", ncol = 3) + scale_x_discrete(breaks = pretty_breaks()) + scale_y_continuous(breaks = pretty_breaks()) + labs(x = "Year", y = "Number of occurrences", title = "The most popular Chinese Rotarians in the Shenbao", subtitle = "Local Politicians and Bureaucrats") ``` The graphs also highlight anomalous cases such as Wu Guozhen, whose reputational timespan extends over almost a century, and the banker Xu Zhendong - whose name appear in 1904 while he was barely two years old. These cases possibly point to distinct homonyms that we need to disambiguate. One should also question the apparently aberrant profile of the orthopedist Niu Huisheng, whose name appears in the late 1940s even though he died in 1937. ```{r message=FALSE, warning=FALSE} rotarians_in_shunpao %>% mutate(Year = stringr::str_sub(Date,0,4)) %>% filter(Shenbao>470) %>% group_by(Year) %>% count(Name) %>% filter(Name %in% c("吳國楨", "徐振東","牛惠生")) %>% ggplot(aes(Year, n, fill = as.factor(Name))) + geom_col(alpha = 0.8, show.legend = FALSE) + facet_wrap(~ Name, scales = "free_y", ncol = 3) + scale_x_discrete(breaks = pretty_breaks()) + scale_y_continuous(breaks = integer_breaks()) + labs(x = "Year", y = "Number of occurrences", title = "The most popular Chinese Rotarians in the Shenbao", subtitle = "Anomalous profiles") ``` ## Club commitment over time Similarly, we propose to examine how the commitment of the most popular Rotarians changed over time. We focus on the Rotarians who are mentioned more than twice in the subcorpus: ```{r message=FALSE, warning=FALSE} rotarians_in_sb_subcorpus <- full_join(rotarians_in_sb_subcorpus, rotarians_sb_index) rotarians_in_sb_subcorpus %>% mutate(Year = stringr::str_sub(Date,0,4)) %>% filter(Rotary>2) %>% group_by(Year) %>% count(Name) %>% filter(Name!= "吳鐵城") %>% ggplot(aes(Year, n, fill = as.factor(Name))) + geom_col(alpha = 0.8, show.legend = FALSE)+ facet_wrap(~ Name, scales = "free_y", ncol = 3) + scale_x_discrete(breaks = pretty_breaks())+ scale_y_continuous(breaks = integer_breaks())+ labs(x = "Year", y = "Number of occurrences", title = "The most committed Chinese Rotarians in the Shenbao (> 2 occ.)", subtitle = "Index of commitment (1926-1948)") ``` ```{r message=FALSE, warning=FALSE} rotarians_in_sb_subcorpus <- full_join(rotarians_in_sb_subcorpus, rotarians_sb_index) p <- rotarians_in_sb_subcorpus %>% mutate(Year = stringr::str_sub(Date,0,4)) %>% filter(Rotary>2) %>% group_by(Year) %>% count(Name) %>% filter(Name!= "吳鐵城") %>% ggplot(aes(Year, n, fill = as.factor(Name))) + geom_col(alpha = 0.8, show.legend = FALSE)+ facet_wrap(~ Name, scales = "free_y", ncol = 3) + scale_x_discrete(breaks = pretty_breaks())+ scale_y_continuous(breaks = integer_breaks())+ labs(x = "Year", y = "Number of occurrences", title = "The most committed Chinese Rotarians in the Shenbao (> 2 occ.)", subtitle = "Index of commitment (1926-1948)") fig <- ggplotly(p) fig ``` Four main profiles may be identified: * Early, lifelong members: Wang Zhengting (1920-1948), among the founding members of both the original Rotary Club of Shanghai and the postwar Rotary Club of Shanghai West, to which he chose to transfer in 1948. * Early but aborted membership: Li Yuanxin (1924-1938) (who left during the war), Kuang Fuzhuo (1922-1938) (who died in 1938), Xu Jianping (1923-1933). Notice that the latter shows a similarly discontinuous pattern of commitment in the English-language press (case 2). * Late membership: this profile points to either the Rotarians who joined the original club in the 1930s and had left it by the end of the war, such as He Dekui (1931-1939) or to the charter members of the postwar *Huxi fulunshe*, such as Gu Bingyuan. ## Map Chinese Rotarians' network in the *Shenbao* In order to further analyze the interactions between Rotarians within and outside of the club, we propose to build a two-mode network linking individuals with the documents in which they are mentioned in the Rotary subcorpus. "Two-mode" means that the network contains two categories of nodes: individuals (represented by triangles on the graph) and documents (square nodes). We chose to retain only the members that appear at least 3 times (1128 articles). First, we create the corpus containing the documents that mention the Rotarians who appeared at least 3 times in the *Shenbao*: ```{r message=FALSE, warning=FALSE} subcorpus_sb <- rotarians_in_shunpao %>% group_by(Id) %>% count() %>% filter(n>2) subcorpus_sb <- enpchina::get_documents(subcorpus_sb, "shunpao") ```
We create the node list of individuals: ```{r message=FALSE, warning=FALSE} regexps_zh <- rotarianzh %>% transmute( Regexp = Name_zh, Type="PER" ) ```
We create the edge list: ```{r message=FALSE, warning=FALSE} rotarianzh_mentions <- enpchina::extract_regexps_from_subcorpus(subcorpus_sb, regexps_zh) edges_sb <- rotarianzh_mentions %>% transmute(from=DocID, to=Match) ```
We customize the shape of nodes (triangles): ```{r message=FALSE, warning=FALSE} rotarianzh_nodes <- rotarianzh_mentions %>% select(Match) %>% distinct() %>% transmute( name=Match, label=Match, Text="", shape="triangle" ) ```
Similarly, we create the node list of documents: ```{r message=FALSE, warning=FALSE} docs_nodes_sb <- subcorpus_sb %>% distinct() %>% transmute( name=DocID, label=str_sub(str_replace_all(Title, "[^[:alnum:]]", ""),1,10), Text=str_replace_all(Text, "[^[:alnum:]]", ""), shape="square" ) ```
Finally we project the graph into "Padagraph": ```{r eval = FALSE, message=FALSE, warning=FALSE} ig_sb4 <- graph_from_data_frame(d=edges_sb, directed = FALSE, vertices=(docs_nodes_sb %>% bind_rows(rotarianzh_nodes))) tg_sb4 <- tidygraph::as_tbl_graph(ig_sb4) tg_sb4 %N>% enpchina::in_padagraph("SBRotarians") ```
The [interactive version](https://pdg.enpchina.eu/rstudio?gid=SBRotarians#+10) of the graph can be explored in Padagraph. At first you need to click on "global" or "+10" to display the graph. The graph contains a total of 1199 nodes (1126 documents and 73 individuals) and 3595 edges. Square nodes represent documents whereas triangular nodes represent individuals. Distinct colors represent clusters, i.e. groups of more densely connected nodes. Click on any node or edge to display its individual properties. The interactive panel on the right contains two tabs: the "details" tab displays the attributes of any selected node or edge, whereas the "list" tab lists all the nodes that constitute the network. You can sort them by label (name), type (individual or document), cluster, or degree centrality (number of neighbors). The panel on the left allows users to zoom in/out, change the layout, the size of the label and the general settings. One can navigate the graph by focusing on a particular node and explore its ego-network by adding or removing a selected number of its neighbors. Alternatively, one can use the list of linked entities in the right-side panel. The panel also provides access to the full text associated with each document-node. You can use the search engine at the bottom to search individual names. Click on "global" to return to the original network. This is a powerful tool for examining patterns of co-occurrence in large corpora. Such visualization makes it possible to explore entities in their broader context (individuals in documents, documents in corpora). It assists researchers in identifying the most relevant documents to focus on for close reading (nodes with the highest degree), with no *a priori* knowledge of their content. In addition, the colored clusters raise intriguing questions about why certain individuals were likely to group and meet more frequently than others - in what context, based on which shared activities and common interests? This graph more specifically allows us to examine the Rotarians’ co-participation in social events. The largest nodes refer to either the most attended meetings - i.e. the events that involved the highest number of participants - or to the most active individuals, who participated to a large number of meetings. The latter reproduces the index of commitment we built earlier. For instance, no less than seven Rotarians attended the meeting the Y.M.C.A. that took place on April 1936 (doc n°SPSP193604261217). The graph also helps detect possible errors in the ocerization or segmentation of the original text, such as in doc. n°SPSP194801130501. It is eventually possible to export the edge and node lists so as to reuse them with any Social Network Analysis software. ```{r message=FALSE, warning=FALSE} write.csv(edges_sb, "edgelist_sb.csv") write.csv(docs_nodes_sb , "docs_nodes_sb.csv") write.csv(regexps_zh, "pers_nodes_zh.csv") ``` # References Cécile Armand, “Foreign Clubs with Chinese Flavor: The Rotary Club of Shanghai and the Politics of Language”, In Cécile Armand, Christian Henriot and Huei-min Sung (ed.), *Knowledge, Power, and Networks: Elites in Transition in Modern China*, (forthcoming). Guan Yuting and Chen Yunqian, “Minguoshiqi Zhongguo Fulunshe Fazhan Chutan (The Development of Rotary Clubs in Republican China),” *Jiangxi Shehuikexue*, no. 6 (2009): 156; Jiang Pei and Geng Keyan, “Minguoshiqi Tianjin Zujie Waiqiao Jingying Shetuan -- Fulunshe Shu Lun (Foreign Elite Organization in Republican Tianjin: Tianjin Foreign Settlements: The Tianjin Rotary Club),” *Lishi Jiaoxue* (Xiaban Yuekan) 12, no. 673 (2013): 7. Liu Bensen, “Jindai Shanghai Shangye Jingying Yu Fulunshe (The Shanghai Rotary Club and Business Elites in Republican Shanghai),” *Suzhou Keji Xueyuan Xuebao* (Shehuikexue Ban) 29, no. 5 (2012): 64–69.