search_documents_ex
and search_concordance_ex
list_search_fields
and accepts_date_queries
ner_on_df
load_pdf_as_df
get_padagraph_url
list_corpora
now only displays collections than can be queriedsearch_concordance_ex <- function(q,
corpus="imh",
search_fields=c(),
context_size=30,
start=0,
dates=c())
q
: the search query (same as in search_concordance
)corpus
: name of the corpus to search in (use list_corpora
to get possible names)search_fields
: name(s) of the field(s) to search in (use list_search_fields
to get available search fields). If no field is explicitly given, searches by default in all possible fields.context_size
: size of the context window (same as in search_concordance
)start
: first row index to start retrieving results from. Only useful when search query returns more than 100,000 results.dates
: date(s) used to filter the search results. Can be date ranges. Only works with corpora that include dates (see accepts_date_queries
).List possible search fields in a corpus:
# For the shunpao corpus
enpchina::list_search_fields("shunpao")
## [1] "text" "title"
# For the imh corpus
enpchina::list_search_fields("imh")
## [1] "book" "page" "date" "story" "bookno"
Tell if a corpus supports date filters:
# On the shunpao corpus
enpchina::accepts_date_queries("shunpao")
## [1] TRUE
# On the wikibio corpus
enpchina::accepts_date_queries("wikibio-zh")
## [1] FALSE
Two types of dates can be used:
Examples:
Search for 蔣介石 in the titles of the shunpao collection:
search_concordance_ex("\"蔣介石\"", corpus="shunpao", search_fields="title")
Search for 蔣介石 in the titles OR in the content of the shunpao collection:
search_concordance_ex("\"蔣介石\"", corpus="shunpao", search_fields=c("title", "text"))