1 Introduction


The Modern China Biographical Database (MCBD) constitutes a core initiative to establish a long-term publicly accessible resource for historical research in the China field. The database is the major instrument developed by the ENP-China project. ENP-China stands for Elites, Networks, and Power in Modern Urban China. It is an ERC-funded Advanced Research Grant (ERC no. 788476). From the start, however, we designed the database to serve a much broader purpose. Its object is not limited to elites; it means to include all historical actors. Its usage is not limited to the ENP-China project. It is a platform made available to the whole scholarly community of historians of modern China. Our hope is to make it the most essential biographical resource for the study of modern Chinese history.


1.1 Aim and Scope


The origin and function of MCBD are to lay the grounds for the collection of massive amounts of biographical data and information on historical actors in modern Chinese history. The structure of the database reflects this intellectual ambition. The user manual describes the objectives, structure, and operation of the Modern China Biographical Database. MCBD is based on Heurist, a web database interface that provides both a backend and a frontend for relational research data in the humanities. The ENP-China project is very grateful for the support of the Heurist team, especially to Ian Johnson for his unrelenting assistance in the development phase of MCBD.

MCBD sits at the heart of the project of breaking through the current constraints of historical research. The massive transformation of historical documentation into digital full-text format presents a formidable opportunity for historians. Except for manuscript archives, almost anything else can be turned into a searchable digital document: newspapers, periodicals, books, academic literature, not to mention the millions of pages on the Internet. The challenge is precisely to design the methods and the tools to make a profitable use of the vast quantity of information at the historians’ fingertips. The digital transformation of historical sources opens the way to exploring and exploiting documents to a scale unimaginable in the past. For historical research, it means establishing new practices that will enable the production of data-rich history.

MCBD is a general-purpose database that collects a wide range of biographical information. It revolves primarily around individuals — they are at the center of the database, as well as institutions (any kind of organization: ministry, club, company, etc.), locations (any named human settlement), and events (any form of individual or collective action). In the database, we make the distinction between historical information and historical data. The latter is the refined and trimmed down form of the former. In MCBD, most of the information is transformed into data that find their way into specific data fields. Each field always contains only a single piece of data. Yet, not all historical information lends itself to such reduction or, sometimes it is desirable to retain not just the one word/digit that qualifies an actor or an action, but to preserve this word in its context, in the form of a short sentence. This is also necessary — methodologically — for the kind of raw information that can be extracted from the press.

To address this particular challenge, MCBD includes both “data” fields that contain just a single item and “observations” that receive strings of unstructured text that document more substantially the action of an individual. This structure is designed to meet the needs of historical research on the modern period when mass printing, especially newspaper printing, became the norm. It is not conceivable to transform all the available or collected information into data. Our “observations” constitute an intermediary stage of data collection that records actions attached to a given actor to produce a more substantial biographical record. Ultimately, whenever necessary, such “observations” can be filtered down and turned into “cooked data.”

The temporal coverage of the database is 1830 to 1949, namely to include all the individuals born between 1800 and 1930 who were active in China during this period, regardless of their origin, nationality and the duration of their presence in China. The year 1949 represents the current terminus ad quem of MCBD because our primary purpose is to document the history of modern China before the establishment of the People’s Republic of China. Yet the database documents the selected individuals, as much as possible, from their birth to the time of their death, beyond the 1949 divide. Furthermore, the current temporal coverage is not final. This temporal coverage is linked to the timeline of the ENP-China Project but the database can and will extend incrementally to later periods through future research projects or contributions by scholars involved in the study of contemporary China.

MCDB is meant to serve as a tool for both qualitative and quantitative analysis. The collection and compilation of rich biographical data creates the conditions for the systematic exploration and processing of data from various perspectives and through different methods. The life of individuals is made up of the successive and finely detailed events that marked their existence, including education, life events, career, social activities, etc. These events are related to other individuals, to other institutions and they are grounded in various locations. The data can be used for prosopography, which precisely relies on individual biographical data to study different groups in society. But the data lends itself to a wide variety of approaches and methods as well.

Because the database connects the elements of information that two or more individuals have in common — birthplace, degree from the same university, membership in the same association, participation in the same event, etc. — there is an infinite possibility to explore the connections between individuals, institutions, locations, and events. The range of methodological approaches — separately or in combination — include social network analysis, spatial analysis (GIS), graphs, and all types of visualization. It may even include textual analysis with NLP (natural language processing) tools since in the case of newspapers, MCBD maintains the connection to the full text of articles whenever each document in the source database carries its own identifiable ID (e.g. Proquest historical newspapers).

In the initial stage, the source of the data has come mostly from the documents that the ENP-China team has been processing, especially newspapers and biographical dictionaries. The Institute of Modern History (Academia Sinica) generously shared the data from its 近現代人物資訊整合系統 database. The database also collects the data produced in the course of the case studies of the ENP-China project.

The development of the database also relies on an on-going process of curating and mining biographical-data publications from before and after 1949 (biographical dictionaries, directories, etc.), as well as the incorporation of large datasets produced by the members of the team in previous research. We are also incorporating the rich biographical data that we have extracted from Wikipedia and Baidu, and similar resources.

Whenever relevant, we hope to incorporate external datasets or data produced by external research groups or scholars, each with a clear and manifest identification of provenance. We believe in the benefits and virtues of building a collective database that includes the broadest spectrum of biographical data for modern Chinese history. No small dataset — it can be a simple spreadsheet with data on a few individuals — is negligible. We believe that “Small streams make a big river” and we genuinely welcome contributions from the scholarly community. We can provide guidelines for a smooth and seamless preparation of datasets.

MCBD is in its development stage. We plan to enrich it in the coming years until the ENP-China project completes its course. Thereafter, the database will be entrusted to the care of a major academic institution for maintenance and development.

1.2 Sources


Data in MCBD comes from six main categories of sources. Most printed sources have been digitized and ocerized so that data can be extracted and processed automatically or semi-automatically, even though there may be variation in the quality of OCR and the results of extraction. The cross-compilation of documents allow to solve pending issues of imprecise data (dates) or contradictory information.

1.2.1 Biographical dictionaries


Biographical dictionaries trace the entire life of individuals. Although the quality and quantity of information may vary considerably between dictionaries and between the biographies they contain, they basically provide information on the date and place of birth and death, kinship and family relations, educational background and professional life (positions, works). While some dictionaries were carefully crafted narratives, others were just collections of factual information, chronologically arranged. The construction of MCBD relies on three major reference dictionaries that constitute the backbone of the database: Hummel’s Eminent Chinese of the Qing Period, Boorman’s Biographical Dictionary of Republican China (BDRC) and Klein & Clark’s Biographic Dictionary of Chinese Communism.1 The BDRC in particular has provided a test bench for processing biographical dictionaries.2 These reference books have been supplemented by more specific dictionaries that focused on particular groups of elites, such as Guomindang generals, overseas students or foreign businessmen in China.3


1.2.2 Directories and Who’s who


Directories and Who’s who who focused on living individuals only. Basic information included their date and place of birth, their educational background, and the past and current positions they held in various institutions. Since the individuals were still living at the time of publication, such sources did not cover their entire life. Despite their incompleteness, their added value compared to dictionaries lies in the fact that they provide a highly detailed list of positions, often with their exact date, including participation in social clubs and associations. Like biographical dictionaries, some consisted of literary narratives, whereas others were just collections of factual information arranged by chronological order. Each volume amounts to hundreds of pages of dense printed words that contain very rich and complete information on the individuals that lived and worked in China and Asia. MCBD relies primarily on the rich collection of Who’s who held at the Institute of Modern History, Academia Sinica, Taipei.4 This collection contains more than fifty titles in Chinese, English and Japanese, covering about 133,000 characters from the mid-Qing to the early years of People’s Republic of China (PRC). For foreign elites specifically, the series of Asia Directory and Chronicle documents with precision who was doing what (position, activity), where (location, movement), and when (exact date, fuzzy dating).5

These two major corpora were supplemented by more focused Who’s who including professional minglu (registers) and directories of returned students published by Qinghua University and students’ associations in the United States.6

1.2.3 Newspapers and periodicals


The inclusion of newspaper data is the major innovation of MCBD. The press represented the most complete set of “observations” on everyday life in modern China. In contrast to dictionaries and directories, newspapers and periodicals provided information on a daily basis, allowing for a more finely grained level of observation. They constitute an irreplaceable material for the study of the elites in the making. Moreover, they documented a wider range of actors - not only the personal and social life of individuals, but also institutions, locations, and events. They reported on individuals’ social or political activities, associations and companies meetings, ceremonies, sports competitions, the coming and going of ships and passengers, etc. They documented with high precision the place, time, and nature of actions at the most elementary level. They caught actors in their tracks, from the moment they appeared “on print” to the time when they vanished or the medium itself disappeared.

Newspaper data in MCBD comes from two main corpora. The Shenbao 申報 (1872-1949) forms the backbone of the Chinese-language press. Established in Shanghai, it was one of the largest and most enduring Chinese-language newspaper in modern China, with a daily circulation of 150,000 copies in the early 1930s. Although based in Shanghai, its readership was truly national in scope. The last issue appeared on May 27, 1949. For the English-language press, MCBD relies on ProQuest’s Chinese Newspapers Collection (CNC) This collection consists of twelve English-language periodicals published in China from 1832 to 1953, with variations in their temporal and spatial coverage.7 While most of them were published in Shanghai, the collection also includes one newspaper from Guangzhou - Canton Times - and three from Beijing - Peking Daily News, Peking Gazette, Peking Leader. The three most important were the British North-China Herald and Daily News (1850-1941), the American China Weekly Review (1917-1953) - formerly Millard’s Review of the Far East - and the Sino-American China Press (Dalubao 大陸報) (1925-1938). Although they were based in Shanghai, they reached a national readership as well. The North-China Herald enjoyed a circulation of 10,000 in the early 1930s, with an increasing number of Chinese readers.8 The corpus of English-language periodicals also comprises the South-China Morning Post published in Hong-Kong (1903-2001).9

1.2.4 Wikipedia


Another major contribution of MCBD is the incorporation of data from the web, especially the Chinese Wikipedia. The web represents a precious source for reconstructing the full trajectory of individuals, especially those who survived after 1949, who did not appear in the Republican Who’s who and in more recent biographical dictionaries. Some 200,000 biographies in Chinese, English and Japanese have been identified. The biographical information they contain have been downloaded, along with their metadata (date, etc.), which enables to trace the history of each entry and their modifications over time. The data extracted from Wikipedia provides a unique resource by the massive character of its population, its multilingual perspective and its unprecedented temporal and geographical scope.10

1.2.5 Miscellaneous printed sources


Memoirs and personal diaries: Memoirs and personal diaries offer a more subjective approach to biographical events, as they were experienced or remembered by the individuals themselves or their relatives. Moreover, these sources often contain details that editors of dictionaries and Who’s who may have missed or discarded. An increasing number of diaries has been published, digitized and ocerized since the 1980s, opening new perspectives on the personal life of the Republican elites and their social environment. The selection of diaries has been guided by ERC members’ research concerns so far (e.g. Zhang Gang’s diary used by Guo Weiting, Zhou Fohai’s diary studied by David Serfass, or American returned students’ autobiographies studied by Cécile Armand).11 Others will be added to the collection as historian colleagues join the project with they own materials.

Professional journals: The journals published by professional, academic or students’ associations (e.g. Science Society of China, Society of Chinese Engineers), learned societies (e.g. Royal Asiatic Society) technical bureaus (e.g. Bureau of Economic Information/Foreign Trade) provide a wealth of information on their members, staff and activities (list of members and contributors, contributions, reports of meetings, related institutions). In addition, specialized journals like the Chinese Economic Journal/Bulletin (Zhongguo jingji yuekan 中國經濟月刊) contains lists of companies and related corporate data.12

Other printed sources: This category includes miscellaneous directories, such as the lists of doctoral dissertations by returned students, and other materials often discovered by accident.13

1.2.6 Archival data


In addition, the database welcomes data from non-digitized, unpublished sources (especially archives) that have been processed manually by individual researchers. Archival materials fall under two main categories: institutional archives and personal papers.

Institutional archives include primarily the National Archives of the Republic of China (Historical Archives No.2 in Nanjing, Guoshi guan in Taipei) which among other materials have preserved lists of government officials and staff of technical bureaus. The archives of clubs and associations (e.g. Rotary International) provide additional information on the social life of individuals. Their collections include rosters of members, official publications, correspondence and unpublished reports. In addition, the lists of industrial companies found at the Shanghai Municipal Archives provide an excellent starting point for the study of modern entrepreneurs in China. Although they are rare, scattered and generally uneasy to localize, personal papers constitute irreplaceable sources as they contained first-hand materials such as individuals’ correspondence and unpublished diaries. As for the leading personalities of the Republican period, specifically, their personal papers have been preserved in archival centers and libraries such as the Hoover Institution and Columbia University, along and transcription of interviews.14

1.3 Structure


MCBD is composed of eleven tables with the Person table as the connecting hub for all tables, except for “Company statistics” that connects only to the “Company” table. The structure and content of each table is described in the text below.

Fig. 1. MCBD structure schematic


2 Tables


In the following section, we present the eleven tables that form the main data container of MCBD. We describe the purpose and organization of each table and provide a description of all their fields.

2.1 Person


The Person table constitutes the core element that connects all the other tables in the database. It is composed of nine sections:

  • Naming
  • Additional information
  • Birth
  • Death
  • Professional and social life
  • Personal life
  • Events
  • Metadata
  • Visual data

2.1.1 Naming and Additional Information


The “Naming” section provide the basic information for the identification of individuals, namely the canonical name (Surname + Given Name) and the different variations of the names under which the individual was known in the course of his/her life and in the various sources that document the life of the individual. In the case of Chinese, they may have different Chinese names, usually alternative given names, courtesy names (字) , style name (號), not to mention pen names, stage names, etc. They could also be known through their name in transliteration that, before the advent of pinyin, could take many forms.

The “Additional Information” section provides supplementary elements about the quality of the person such as nationality, ethnicity, ancestral native place, language skills, and titles.


Fig. 2. The Naming and Additional Information sections


2.1.2 Fields for Naming


Field Description
ID_db_source This is an identifier created for the import
Fullname Fullname in vernacular
Surname Family name of person
Surname - vernacular Surname in vernacular language
Given name(s) The given name or names of the person, placed in their normal order
Given name(s) - vernacular Given name in vernacular language
Alternative name(s) Individual’s alternative name in different sources
Transliteration Name Transliteration of non-Chinese name (foreigner) in Chinese
Zi 字 Courtesy name (字)
Hao 號 Style name(號)
Gender Male/Female


2.1.3 Fields for Additional Information


Field Description
Jiguan 籍貫 Ancestral native province (not   where someone was born, but where his/her ancestors were born)
Ethnicity Ethnic group (Mongol, Miao,   Kejia, Han, etc.)
Language skill Any language skill of the person   (non-Chinese languages, Chinese dialects)
Nationality Nationality (official national   citizenship)
Honorific Any title or grade - Prof, Dr,   Sir, M.C.G.D., etc., but we recommend omitting Mr., Mrs. or Ms.
Representative picture Thumbnail image used to   represent the person (eg. portrait photo), up to 400 pixels wide, for display   normally 100 - 200 pixels wide in search results, lists and reports
Short description Short description of the person   for use in annotated lists/web pages (100 - 200 words).
Related Person(s) People related to the person by   birth, marriage or other relationships


2.1.4 Birth and Death


The “Birth” and “Death” sections indicate the dates and locations of birth and death. The location can be recorded as a textual reference in the following format “Beijing – 北京”), but it should preferably be recorded as a geolocation linked to the Modern China Geospatial Database. The linkage between MCGD and MCBD is done through a system of unique IDs. To retrieve the location IDs (LID), we provide an dedicated interface with the possibility to upload a list of place names and obtain all the related data.


Fig. 3. The Birth and Death sections


2.1.5 Fields for Birth and Death

Birth
Field Description
Date of birth Date of Birth (year or year-month or year-month-day)
Place of birth Place of birth expressed as a location (place) record
Birth place (textual) Name of the location   where the person was born in free text (rule: “New York,”Beijing -   北京“,”Jiangjin xian, Zhejiang - 浙江浙江鄞縣"
Province of birth Name of the province   of birth as indicated in the source, in the following format: pinyin -   Chinese (e.g., Hunan - 湖南)
Country of birth The list is   prepopulated, but can be extended, and includes two- and three-letter country   codes
Note on Date of birth, if alternative date of   birth or precision is needed.
Death
Field Description
Date of death Date of death (year, or year-month or year-month-day)
Death Place Select the location in the Geospatial database
Death place (textual) Name of the location   where the person died in free text (rule: “New York,” “Beijing   北京,” “Jiangjin xian, Zhejiang 浙江鄞縣”)
Country of death The country in which the person died
Cause of death The cause of death, if known/applicable


2.2 Professional and social life


The “Professional and social life” section makes the bridge with three distinct tables described further below. The three tables cover three types of events: education (institution, degree, discipline, date(s)), positions (position, institution, date(s)), and life events (any event to which the person is related) as recorded in the Events table.


Fig. 4. The Professional and social life sections


2.2.1 Fields for Professional and social life


Field Description
Education events Information about the events that document the education of the person (school, degree, discipline)
Position(s) Information about the positions held during the social-professional-political life of the person
Life events Information about the events in the life of the person (social, political, cultural events, etc.)


2.2.2 Child records: Education events


The following table is meant to record the episode in which an individual receives their education in their life. The episode is described by ten variables: name of the institution of education, the starting date and the ending date of one’s studies, the year of graduation, the title of the received degree, name of the PHD advisor(s), additional comment and the sources of all the information. In order to provide such data, one should provide at least the name of the institution of education as well as the person’s name.


Fig. 5. The Education Events sections


2.2.3 Fields for Education Events


Field name Description
Institution Select or create the name of the institution
Start date Date or year when the person began his/her education in this institution
End date Date or year when the person completed his/her education in this   institution
Graduation year Graduation year
Degree Name of degree
Discipline Field of study
Thesis Title of the thesis
Ph D Adviser(s) Full name of the adviser(s)
Comment Free comment text field


2.2.4 Child records: Positions


The “Positions” record provides information about all the positions – paid or un-paid – that the person held in any institution (government, company, university, association, etc.) during the course of his/her life. Institutions are divided into two type: Company (any sort of private or public business venture) and Institution (any form of collective entity: school, club, ministry, etc.).


Fig. 6. The Positions section


2.2.5 Fields for Positions


Field Description
Institution/Company Select or create the   name of the institution
Sublevel 1 Institution main   sublevel
Sublevel 2 Institution lower   sublevel
Position: source The name of the   position, as given in the original source and in the language of the source
Position category The category of the   position occupied, as defined by historians (first level). Select a category   in the scroll down menu or add a new category
Position category 2 The category of the   position occupied, as defined by historians (lower sublevel). Select a   category in the scroll down menu or add a new category
Start date Date when he/she took   his/her position
End date Date when he/she   ended his/her position
Position start mode Way of taking his/her   position
Position end mode Way of ending his/her   position
Metadata
Field Description
Sources for   positions Sources of data for   position information. There are hidden fields in this form. Modify structure   to enable them.


2.2.6 Child records: Life Events


The “Life Events” here records all major episodes during the course of an individual’s life. Each event comes under three major categories, as well as several sub-fields: Primary Information (person’s basic information, type of life event, and places), Connections (description of life event and the persons/organizations involved), and Dating (start and end date, end place, and connecting line pixels).


Fig. 7. The Life Events section


2.2.7 Fields for Life Events


Field Description
Person The person whose life event is represented (exit form and save the person first if the person does not show in the list)
Type of life event The type of life event
Place(s) Place(s) where the life event took place
Description of life event Short summary, typically used in annotated listings, information popups and so forth. Aim for 100-200 words.
Other persons involved Other people involved in an organisation or event
Other persons involved Other people involved in an organisation or event
Related organisations Organisations related to the event eg. through the individual being a member, owner, etc.  
Date of event Enter a date either as a simple calendar or through the temporal object popup (for complex/uncertain dates)  
Range - Start date Start Date for a range of dates – leave date field above blank
Range - End date End Date – leave Date field blank if using range
End place (if different from start) The place where the event ended if different from the start (essentially for migrations, voyages and other types of travel)
Connecting line pixels The thickness of connecting lines drawn between places listed in separate fields of a record (eg. between birth place, residential addresses and death place). 0 or missing = no line


2.3 Personal life


The “Personal life” section documents two major elements: religious affiliation(s) and kinship (marriage, spouse(s), and other relatives).


Fig. 8. The Personal Life section


2.3.1 Fields for Personal Life


Field Description
Religious affiliation(s) Religious creed of the person (Christian, Catholic, Buddhist, etc.) (click to add a religious creed)
Marriage(s) Information of any form of formal marital relationship (click to connect to the “Marriage” table)
Other spouse Other spouse, concubines
Relative(s) Relative’s name


2.3.2 Child records : Religious affiliation


The rapid transformation of religious groups (Buddhist, Daoist, Christian associations, and redemptive societies) as well as the evolution of the concept of superstition in Republican and Communist China, has not only shaped the dynamics on the daily life of modern Chinese elites, but also exerted a profound influence on their involvement in the value systems of identities, cultures, or politics.


Fig. 9. The Religious affiliation section


2.3.3 Fields for Religious affiliation


Field name Description
Religion Religious creed (eg.   Buddhism, Daoism, Christianity, redemptive societies)
Rite/ Ritual performed The type of rite   performed (temple festivals, spirit-writing, sacraments, philanthropic   activities, publication, build schools with temple property movement)
Rite/ Ritual: date Date when the rite   was received (1927, 1932, 1933)
Rite/ Ritual: location Location where the   rite was received ( Beijing, Tianjin, Shanghai, Guangzhou, Hongkong)
Metadata
Source Original source   (source_id., keyword, abbreviation)


2.3.4 Child records : Marriage(s)


The Marriage(s) section records the formal marital relations between individuals. Marriage is understood here as a formal bond within men and women, regardless of the various forms that such a bond can take (“first wife,” concubine, etc.). This does not include liaisons, affairs and all other forms of romantic relationships.


Fig. 10. The Marriage(s) section


2.3.5 Fields for Marriage(s)


Field Description
Marital status Status: married, single
Spouse Main spouse
Wedding: location Location where the marriage took   place
Start date Marriage date
End date Date when the marriage ended
End mode Mode : divorce, separation, death,   etc.
Marriage type Type of marriage : religious   (Catholic, Protestant) or not (secular)
Wedding General event associated to this   marriage
Metadata
Sources Bibliographical references


2.3.6 Relationship: Other spouse(s)


This is a pointer field from which to select a person in the Person table. The person to be selected must already have been created in the Person table.


Fig. 11 The Other spouse(s) section


2.3.7 Relationship: Relative(s)


This is a pointer field from which to select a person in the Person table. The person to be selected must already have been created in the Person table.


Fig. 12 The Other spouse(s) section


2.4 Events


The “Events” section records all the events that happened in the life of the individual, with basic information such as name of the event, dates, location, and participants.


Fig. 13. The Events section


2.4.1 Fields for Events


Event Identification
Field name Description
Label Name of event (as given in the original source)
Type Type of event, based on controlled vocabulary


2.4.2 Event Identification and Datation


The “Events” table records all sorts of social or personal events (meetings, weddings, funerals, etc.) in which individuals or organizations were involved.


Fig. 14a. The Event information section


2.4.3 Fields for Event Identification and Datation


Event Identification
Field name Description
Label Name of event (as given in the original source)
Type Type of event, based on controlled vocabulary
Event Datation
Field name Description Additional information
Start Date Date when the event started (YYYY-MM-DD) Select a date in the calendar or enter the date manually (YYYY-MM-DD)
End Date Date when the event ended    (YYYY-MM-DD) Select a date in the calendar or enter the date manually (YYYY-MM-DD)
Start Time Time when the event started (hour,   minute)
End Time Time when the event ended (hour,   minute)
Temporal marker Moment during the day (morning,   noon/lunch, afternoon, evening, night…)


2.4.4 Event Location and Actors


Fig. 14b. The Event information section


In the Location section, the denomination of the road needs to be defined from the scrolling menu (street, avenue, boulevard, square, 路, 廣場, etc.). The field for street name should include only the proper name of the road (Adam Smith, Confucius, Château, 南京, 吉祥, 共和, etc.), without any suffix. For buildings, the name shall be as indicated in the source.


2.4.5 Fields for Event Location and Actors


Event Location
Field name Description Additional information
Location information Location (place -> building name -> street address -> city ->   province/state -> country) Select a place in the geospatial database
Street Name Main street name
Street Number Main street number
Substreet Name Secondary lane, such as lilong, hutong, etc.
Substreet Number Number in the substreet
Building name Name of the building or open space   (park, cemetery, golf…)
Event   Actors
Field name Description Additional information
Actors   participating in this event Select a person or organization in the database
Role Actor’s role during the event, his/her position in connection   with the event, based on controlled vocabulary Add a relationship, select the role in the list
Event   Metadata
Field name Description Additional information
Source Original source for the event Select the id of the source
Related event Other events related to this event Select the event in the database
Observation Observations about an event Select a record in the “Observation” table


2.4.6 Observation(s)


The Observations field is an essential component of MCBD. Whereas all other fields contain a single and unique item of data (date, name, etc.), the Observations field collects whole sentences (or part of a sentence). The reason for this field is to enable the collection of information extracted from sources from two perspectives: first, it provides the elements of data in context (a whole sentence); second, it collects information that may be curated at a later stage, either as split-up data that goes in a specific field or as a curated and validated informative sentence. This is designed to address two distinct issues:

First, as “information in context,” we drew our inspiration from Jean-Pierre Dedieu’s Fichoz database system that records “actions,” namely any event associated to an individual in the form of text. Historical information does not always lend itself to being boiled down to tabular data. Historians need to record such information in a format that retains most of the information in a form that reflects as closely as possible the information in the source. It can be seen as a form of note-taking, but it actually fits in-between note-taking and tabular data. This is information already transformed and streamlined by the historian. In MCBD, such streamlined and curated information is transformed from the Observations_Raw field to the Observations_Validated field.

Second, thanks to the use of digital tools, especially Natural Language Processing algorithms, it is possible to retrieve a lot of information on any individual in a source. This produces a volume of information that it is beyond the ability of anyone to process systematically. Such processing into data or validated information will make sense only in relation to research on a case study or to extract specific data. The collection of such “raw information,” however, presents us with two major benefits: first, all the information that relates to an individual is gathered in one place, it becomes associated to this individual, and it expands the biographical information on the individual; second, the information collected in the Observations Raw field constitutes a subset of information that can easily be extracted and re-processed for data extraction.


Fig. 15. The Observations section


2.4.7 Fields for Observation(s)


Field Description
Observation: raw Additional action/event information (non atomized data)
Observation: validated Qualitative curated content
Link to source Direct link to text, preferably to line in paragraph in original source   (e.g. SolR, Freizo)
Source URL URL of source document
Date Date of observation
Related Data Source Link to other databases through link (e.g. 近現代人物資訊整合系統)
Field Description
Crowdsourcing
Crowdsourcing comments Information provided through crowdsourcing on a given observation


2.4.8 Metadata and Visual data


The “Metadata” section serves to record strictly the documentary source(s) from which the information was drawn. The metadata field is the only field to be found in all the tables. All the other fields are unique to each table and do not overlap.

If the data originate from a third party (person, group, institution, project, resource, etc.), this is recorded in the Provenance field (and whenever relevant with the indication of the web page (URL) of the third party. This does not record the Source, but only the Provenance.`

The Source(s) and Provenance fields are repeated fields. They do not point to a single resource, but to all the resources that contributed to the information in the Person table.

The “Visual data” section provides the possibility to associate and document any element of visual data (any type of fixed or moving image).


Fig. 16. The Metadata and Visual data sections


2.4.9 Fields for Metadata and Visual data


Metadata
Field Description
Sources for Person Source of the data   (full bibliographical reference and page number when relevant)
Source URL URL to source document
Provenance    
Name of the   person, group, resource that provided the data   
Provenance URL    
URL of the   person, group, resource that provided the data   
Visual data
Field Description
Multimedia Points to a multimedia record, image etc.


2.5 Secondary records


2.5.1 Child records : Institutions – Institutions


The Institution table is a secondary table for Persons. It records all the information on all types of institutions, except companies for which a distinct table exists. Although the term “Institution” could include companies, we made the choice to have two separate tables because the nature and degree of completeness of the information vary greatly. Moreover, very specific date may often be asssociated to companies that have no relevance for other institutions (such as capital, workforce, products, etc.).


Fig. 17. The Institutions section


2.5.2 Fields for Institutions


Field Description
Name (English) Name of the institution (as given in the source, but in English or   transliteration if not available in English)
Name (Vernacular) Name of the institution in Chinese, Japanese, Korean
Name (Transliteration) Name of the institution in standard transliteration applied to the   vernacular name
Name (Source) Name of the institution as in the source
Type Basic typology based on controlled vocabulary
Sector 1 Institution Primary sector of activiy for institutions (Level 1)
Sector 2 institution Sub-sector of activity for institutions (Level 2)
Sector 3 institution Sub-sector of activity for institutions (Level 3)
Datation
Start date Date (year or year-month or year-month-day) of creation of the   institution
End date Date (year or year-month or year-month-day) when the institution ceased   to exist
End mode Reason for the disappearance of the institution
Institution Location
Main location Location of head office (city: )
Branch location Location of branch offices


2.5.3 Child records : Institutions – Company


The Company table is designed to record data on public or private business companies.


Fig. 18. The Company section


2.5.4 Fields for Company 1


Identification
Field Description
Name (English) Name of the institution (as given in the source, but in English or   transliteration if not available in English)
Name (Vernacular) Name of the institution in Chinese, Japanese, Korean
Name (Transliteration) Name of the institution in standard transliteration applied to the   vernacular name
Name (Source) Name of the institution as in the source
Related company List of names of twin companies: companies that changed name and/or   status (e.g. Tsinghua College, Tsinghua University)
Typology
Type Basic typology based on controlled vocabulary
Sector 1 Primary sector of activiy for companies (Level 1)
Sector 2 Sub-sector of activity for companies (Level 2)
Sector 3 Sub-sector of activity for companies (Level 3)
Datation
Start date Date (year or year-month or year-month-day) of creation of the   institution
End date Date (year or year-month or year-month-day) when the institution ceased   to exist
End mode Reason for the disappearance of the institution


Fig. 19. The Company section


2.5.5 Fields for Company 2


Main location
Main location Location of head office (city ) Select location from geodatabase
Branch location Location of branch offices (city) Select location from geodatabase
Actors
Actor(s) People associated with the company,   including their roles
Metadata
Observation Observations about a company
Summary Free text from source
Source of data The name of the source where data   where found
Source of data (pages) Page(s) or page number in the   source.


2.5.6 Child records : Company statistics


This table is devoted to recording operational data about the company, such as workforce, capital, production, etc.


Fig. 20 The Company Statistics section


2.5.7 Fields for Company statistics


Identification
Date of company info Date (year or year-month or year-month-day) the organisation was   established
Name of company - vernacular The name of the company
Staff
Workforce Integer
Company   info
Capital Integer
Currency (English) Name of currency
Currency (Vernacular) Name of currency in vernacular language
Currency (Source) Name of currency as in source
Inventory   and Operations
Inventory Integer
Unit (Inventory) Name of unit of count
Production Integer
Unit (Production) Name of unit of count
Profit/Loss Integer (+ or -)
Turnover Integer
Metadata
Observation Observations about a company
Summary Free text from source
Source of data The name of the source where data where found
Source of data (pages) Page(s) or page number in the source.


3 Queries


MCBD offers multiple ways to search biographical data. The first level is the pre-defined facet search in “Search the database” [can we change it to “Search MCBD?”]. A query can be launched in any of the search fields at the same time as if it were a Boolean “AND” search.


The initial list of searchable fields includes:

  • FullName:
    • for Chinese or Japanese names, enter the full name without any separator: 蔣介石; 歐陽山
    • for names in Latin characters, enter the following: Surname, Given name (Smith, John; Crow, Carl)
  • Surname: surname in Latin characters, including pinyin
  • Surname – Vernacular: surname in Chinese or Japanese characters
  • Given name(s): given name in Latin characters, including pinyin
  • Given name(s) – Vernacular: given name in Chinese or Japanese characters
  • Institution Name (English): name of the institution in Latin characters
  • Institution Name (Vernacular): the institution in Chinese or Japanese characters
  • Observations: any term or sequence of terms in any language
  • Source: any term or sequence of terms in any language


Advanced search queries require a basic knowledge of the structure of the database. For an overview of the advanced search functions, please refer to the Heurist “Search filter” page. The MCBD User manual provides a detailed presentation of the structure, tables, and fields of the database.


3.1 Chinese characters

The online database distinguishes between simplified and traditional Chinese characters and only uses the latter. Using simplified characters in the search will not yield correct results.

4 Exports


There is no export of data at this stage. The database is made public so as to make the information contained therein available to the scholarly community. Yet, the database still requires curation before we release datasets. As soon as we reach a point when we can release the data, the datasets will be exported on a regular basis as XML files available on the data repository of ENP-China on Zenodo.

5 References

Armand, Cécile. “X-perts (1): Digging up the Chinese Economic Journal.” Billet. ADVERTISING HISTORY (blog). Accessed January 17, 2021. https://advertisinghistory.hypotheses.org/3372.

Armand, Cécile, Christian Henriot, and Pierre Magistry. “X-Boorman. The Biographical Dictionary of Republican China in the Digital Age.” In Knowledge, Power, and Networks. Elites in Transition in Modern China, edited by Cécile Armand, Christian Henriot, and Huei-min Sun, forthcoming.

Boorman, Howard L., Richard C. Howard, and Joseph K. H. Cheng. Biographical Dictionary of Republican China. New York, Columbia University Press, 1967.

Bosch, Nora Van den. “On Chinese Wikipedia Biographies.” Billet. Elites, Networks and Power in Modern China (blog). Accessed December 28, 2020. https://enepchina.hypotheses.org/3125.

Carl Crow, Inc., and advertising agents firm Shanghai. Newspaper Directory of China (Including Hongkong) with Check List of Newspapers and Periodicals Published in Japan, Chosen, Java, Sumatra, Borneo, Siam, Singapore and Federated Malay States. Shanghai: Carl Crow, Inc., 1931.

Chinese Student Christian Association in North America. Directory of Chinese Students in America. New York, N.Y.: Chinese Student Christian Association in North America., 1935. http://www.columbia.edu/cgi-bin/cul/resolve?clio11381017.

Chinese Students’ Alliance in the United States of America. Who’s Who of the Chinese Students in America. Lederer, Street & Zeus company, 1921.

Cornwell, Peter J, and Madeleine Herren-Oesch. “Asian Directories: Foreign Residents Benchmark Dataset,” August 30, 2019. https://doi.org/10.5281/ZENODO.2580998.

Herren, Madeleine. “Global Information at a Glance: Power, Law, and Commerce through the Lens of Asia Directories,” 2017. https://europa.unibas.ch/de/forschung/forschungsfeld-gesellschaft-und-geschichte/global-information-at-a-glance/.

Huang Guangyu 黃光域. Waiguo zai Hua gongshang qiye cidian 外国在华工商企业辞典 (The universal dictionary of foreign business in Modern China). Chengdu: Sichuan renmin chubanshe 四川人民出版社, 1995. http://catalog.hathitrust.org/api/volumes/oclc/36348713.html.

Hummel, Arthur W and Library of Congress. Eminent Chinese of the Ch’ing Period, 1644-1912. Washington: U.S. Govt. Print. Off., 1943.

Klein, Donald W, and Anne Bolling Clark. Biographic Dictionary of Chinese Communism, 1921-1965 [by] Donald W. Klein [and] Anne B. Clark. Cambridge, Mass: Harvard University Press, 1971.

Meng, Zhi. Chinese American Understanding: A Sixty-Year Search. New York, N.Y.: China Institute in America, 1981.

The Directory & Chronicle for China, Japan, Corea, Indo-China, Straits Settlements, Malay States, Siam, Netherlands India, Borneo, the Philippines, Etc. Hongkong: Hongkong Daily Press Office, 1865.

Tsing Hua college, Peking. Who’s Who of American Returned Students. Peking: Tsing Hua college, 1917.

Yuan, Tong-li. A Guide to Doctoral Dissertations by Chinese Students in America, 1905-1960. Washington: Sino-American Cultural Society, 1961.

Yuan, Tongli. A Guide to Doctoral Dissertations by Chinese Students in Continental Europe, 1907-1962. Taipei: Chinese Culture Quarterly, 1964.

———. Doctoral Dissertations by Chinese Students in Great Britain and Northern Ireland, 1916-1961. Taipei: Chinese Cultural Research Institute, 1963.

Yung Wing. My Life In China And America. New York: Henry Holt And Company, 1909.

Zhang, Gang. Zhang Gang ri ji 张棡日记. 1st ed. Vol. Volumes 1-10. 10 vols. Beijing: Zhonghua shu ju, 2019.

Zhou Mian 周棉, Zhongguo liuxuesheng da cidian 中国留学生大辞典 (Nanjing: Nanjing daxue chubanshe, 1999).


  1. Howard L. Boorman, Richard C. Howard, and Joseph K. H. Cheng, Biographical Dictionary of Republican China (New York, Columbia University Press, 1967); Arthur W Hummel and Library of Congress, Eminent Chinese of the Ch’ing Period, 1644-1912. (Washington: U.S. Govt. Print. Off., 1943); Donald W. Klein and Anne Bolling Clark, Biographic Dictionary of Chinese Communism, 1921-1965 (Cambridge, Mass: Harvard University Press, 1971).↩︎

  2. Cécile Armand, Christian Henriot, and Pierre Magistry, “X-Boorman. The Biographical Dictionary of Republican China in the Digital Age,” in Knowledge, Power, and Networks. Elites in Transition in Modern China, ed. Cécile Armand, Christian Henriot, and Huei-min Sun, forthcoming.↩︎

  3. Huang Guangyu 黃光域, Waiguo zai Hua gongshang qiye cidian 外国在华工商企业辞典 (The Universal Dictionary of Foreign Business in Modern China) (Chengdu: Sichuan renmin chubanshe 四川人民出版社, 1995). Zhou Mian 周棉, Zhongguo liuxuesheng da cidian 中国留学生大辞典 (Nanjing: Nanjing daxue chubanshe, 1999). ↩︎

  4. Zhongyang Yanjiuyuan Jindaishi yanjiusuo 中央研究院近代史研究所 [Institute of Modern History, Academia Sinica], « Jin xiandai renwu zixun zhenghe xitong 近現代人物資訊整合系統 [The Integrated Information System on Modern and Contemporary Characters (IISMCC)] » http://mhdb.mh.sinica.edu.tw/mhpeople/index.php (Last accessed: January 17, 2021). ↩︎

  5. The Directory & Chronicle for China, Japan, Corea, Indo-China, Straits Settlements, Malay States, Siam, Netherlands India, Borneo, the Philippines, Etc. (Hongkong: Hongkong Daily Press Office, 1865). The data contained in the directories was extracted during a pilot project conducted at the Basel Institute for European Global Studies, Westminster University (Data Futures) and the Institute of Asian Studies (IrAsia) in 2017. This project has established a successful proof of concept through a process of intelligent digitization to automatically extract data on hundreds of thousands of actors, while keeping full traceability and connection to the original source, down to the relevant fragment of information on page, and export as structured data into a processing platform. Processed through Geographical Information System (GIS), this pilot project has made it possible to reconstitute in its entirety the recorded foreign presence in China and in the whole Asian region over a century. Madeleine Herren, “Global Information at a Glance: Power, Law, and Commerce through the Lens of Asia Directories,” 2017, https://europa.unibas.ch/de/forschung/forschungsfeld-gesellschaft-und-geschichte/global-information-at-a-glance/; Peter J Cornwell and Madeleine Herren-Oesch, “Asian Directories: Foreign Residents Benchmark Dataset,” August 30, 2019.↩︎

  6. Peking Tsing Hua college, Who’s Whoof American Returned Students (Peking: Tsing Hua college, 1917); Chinese Students’ Alliance in the United States of America, Who’s Who of the Chinese Students in America (Lederer, Street & Zeus company, 1921); Chinese Student Christian Association in North America, Directory of Chinese Students in America (New York, N.Y.: Chinese Student Christian Association in North America, 1935).↩︎

  7. ProQuest Chinese Newspapers Collection (CNC), https://about.proquest.com/products-services/hnp_cnc.html (Last accessed: January 17, 2021). ↩︎

  8. “A Circulation Census,” North-China Herald, May 24, 1933, 317. ↩︎

  9. https://about.proquest.com/blog/2014/South-China-Morning-Post-added-to-ProQuest-Historical-Newspapers.html (Last accessed: January 17, 2021). ↩︎

  10. Nora Van den Bosch, “On Chinese Wikipedia Biographies,” Elites, Networks and Power in Modern China (blog), https://enepchina.hypotheses.org/3125 (Last accessed: January 17, 2021). Nora Van den Bosch, “Update on the Wikipedia Biographies,” Elites, Networks and Power in Modern China (blog), https://enepchina.hypotheses.org/3533 (Last accessed: April 12, 2021) ↩︎

  11. Zhang Gang, Zhang Gang riji 张棡日记, 1st ed., vol. Volumes 1-10, 10 vols. (Beijing: Zhonghua shu ju, 2019); Zhou Fohai, 2003, Zhou Fohai riji quanbian* 周佛海日記全編 (Zhou Fohai’s Diary), ed. Cai Dejin 蔡德金, Zhongguo wenlian chubanshe, Beijing, 2003. ; Yung Wing (Rong Hong), My Life In China And America (New York: Henry Holt And Company, 1909); Zhi Meng, Chinese-American Understanding: A Sixty-Year Search (New York, N.Y.: China Institute in America, 1981). ↩︎

  12. Cécile Armand, “X-perts (1): Digging up the Chinese Economic Journal,” Advertising History (blog), https://advertisinghistory.hypotheses.org/3372 (Last accessed: January 17, 2021).↩︎

  13. Tong-li Yuan, A Guide to Doctoral Dissertations by Chinese Students in America, 1905-1960 (Washington: Sino-American Cultural Society, 1961); Tongli Yuan, Doctoral Dissertations by Chinese Students in Great Britain and Northern Ireland, 1916-1961 (Taipei: Chinese Cultural Research Institute, 1963); Tongli Yuan, A Guide to Doctoral Dissertations by Chinese Students in Continental Europe, 1907-1962 (Taipei: Chinese Culture Quarterly, 1964).↩︎

  14. Chinese Oral History Project, Columbia University Libraries : https://library.columbia.edu/libraries/eastasian/chinese/oral_history.html ; Hoover Institution, Stanford: https://www.hoover.org/library-archives (Last accessed: January 17, 2021).↩︎