| Title: | Data and Tools for Analyzing the Pali Canon |
|---|---|
| Description: | Provides access to the complete Pali Canon, or Tipitaka, the canonical scripture for Theravadin Buddhists worldwide. Based on the Chattha Sangayana Tipitaka version 4 (Vipassana Research Institute, 1990). Includes word frequency data and tools for Pali string sorting. For a lemmatized critical edition with sutta-level granularity, see the companion package 'tipitaka.critical'. |
| Authors: | Dan Zigmond [aut, cre] |
| Maintainer: | Dan Zigmond <[email protected]> |
| License: | CC0 |
| Version: | 1.0.1 |
| Built: | 2026-05-20 09:41:55 UTC |
| Source: | https://github.com/dangerzig/tipitaka |
A subset of tipitaka_names consisting of only the books of
the Abhidhamma Pitaka. These are easier to read if you call
pali_string_fix() first.
abhidhamma_pitakaabhidhamma_pitaka
A tibble with the variables:
Abbreviated title
Full title
\
# Clean up the Unicode characters to make things more readable: abhidhamma_pitaka$name <- stringi::stri_unescape_unicode(abhidhamma_pitaka$name)# Clean up the Unicode characters to make things more readable: abhidhamma_pitaka$name <- stringi::stri_unescape_unicode(abhidhamma_pitaka$name)
Pali alphabet in order
pali_alphabetpali_alphabet
The Pali alphabet in traditional order.
# Returns TRUE because a comes before b in Pali: match("a", pali_alphabet) < match("b", pali_alphabet) # Returns FALSE beceause c comes before b in Pali match("b", pali_alphabet) < match("c", pali_alphabet)# Returns TRUE because a comes before b in Pali: match("a", pali_alphabet) < match("b", pali_alphabet) # Returns FALSE beceause c comes before b in Pali match("b", pali_alphabet) < match("c", pali_alphabet)
Note that all Pali string comparisons are case-insensitive.
pali_eq(word1, word2)pali_eq(word1, word2)
word1 |
A first Pali word as a string |
word2 |
A second Pali word as a string |
TRUE if word1 and word2 are the same
Note that all Pali string comparisons are case-insensitive. #' Also non-Pali characters are placed at the end of the alphabet and are considered equivalent to each other.
pali_gt(word1, word2)pali_gt(word1, word2)
word1 |
A first Pali word as a string |
word2 |
A second Pali word as a string |
TRUE if word1 comes after word2 alphabetically
Note that all Pali string comparisons are case-insensitive. Also non-Pali characters are placed at the end of the alphabet and are considered equivalent to each other. This has been implemented in C++ for speed.
pali_lt(word1, word2)pali_lt(word1, word2)
word1 |
A first Pali word as a string |
word2 |
A second Pali words as a string |
TRUE if word1 comes before word2 alphabetically
Note that all Pali string comparisons are case-insensitive. This algorithm is based on Quicksort, but creates lots of intermediate data structures instead of doing swaps in place. This has been implemented in C++ as the original R version was about 500x slower.
pali_sort(word_list)pali_sort(word_list)
word_list |
A vector of Pali words |
A new vector of Pali words in Pali alphabetical order
# Sort some Pali words into traditional alphabetical order: pali_sort(c("dhamma", "buddha", "sangha", "nibbana", "sutta"))# Sort some Pali words into traditional alphabetical order: pali_sort(c("dhamma", "buddha", "sangha", "nibbana", "sutta"))
A list of all declinables and particles from the PTS Pali-English Dictionary.
pali_stop_wordspali_stop_words
An object of class tbl_df (inherits from tbl, data.frame) with 245 rows and 1 columns.
https://dsal.uchicago.edu/dictionaries/pali/
head(pali_stop_words)head(pali_stop_words)
A subset of tipitaka_names consisting of only the books of
the Sutta Pitaka. These are easier to read if you call
stringi::stri_unescape_unicode first.
sutta_pitakasutta_pitaka
A tibble with the variables:
Abbreviated title
Full title
# Clean up the Unicode characters to make things more readable: sutta_pitaka$name <- stringi::stri_unescape_unicode(sutta_pitaka$name)# Clean up the Unicode characters to make things more readable: sutta_pitaka$name <- stringi::stri_unescape_unicode(sutta_pitaka$name)
The tipitaka package provides access to the complete Pali Canon, or Tipitaka, from R. The Tipitaka is the canonical scripture for Theravadin Buddhists worldwide. This package includes the VRI (Vipassana Research Institute) Chattha Sangayana edition along with tools for working with Pali text.
Provides access to the complete Pali Canon, or Tipitaka, the canonical scripture for Theravadin Buddhists worldwide. Based on the Chattha Sangayana Tipitaka version 4 (Vipassana Research Institute, 1990). Includes word frequency data and tools for Pali string sorting. For a lemmatized critical edition with sutta-level granularity, see the companion package 'tipitaka.critical'.
tipitaka_raw: the complete text of the Tipitaka (VRI)
tipitaka_names: the names of each book of the Tipitaka
sutta_pitaka: the names of each volume of the Sutta Pitaka
vinaya_pitaka: the names of each volume of the Vinaya Pitaka
abhidhamma_pitaka: the names of each volume of the Abhidhamma Pitaka
pali_alphabet: the complete Pali alphabet in traditional order
pali_stop_words: a set of "stop words" for Pali
These are computed on demand from tipitaka_raw on first access:
tipitaka_long: word frequencies per volume
tipitaka_wide: word frequency matrix (volumes x words)
Functions for working with Pali text:
pali_lt: less-than function for Pali strings
pali_gt: greater-than function for Pali strings
pali_eq: equals function for Pali strings
pali_sort: sorting function for vectors of Pali strings
The companion package tipitaka.critical provides a lemmatized critical edition of the complete Tipitaka based on a five-witness collation with sutta-level granularity.
Maintainer: Dan Zigmond [email protected]
Every word of every volume of the Tipitaka, with one word per
volume per line. Computed from tipitaka_raw on first access.
tipitaka_longtipitaka_long
A data frame with the variables:
Pali word
Number of times this word appears in this book
Total number of words in this book
Frequency with which this word appears in this book
Abbreviated book name
Vipassana Research Institute, CST4, April 2020
pali_string_fix() first.Names of each book of the Tipitaka, both abbreviated and
in full. These are easier to read if you call pali_string_fix() first.
tipitaka_namestipitaka_names
A tibble with the variables:
Abbreviated title
Full title
# Clean up the Unicode characters to make things more readable: tipitaka_names$name <- stringi::stri_unescape_unicode(tipitaka_names$name)# Clean up the Unicode characters to make things more readable: tipitaka_names$name <- stringi::stri_unescape_unicode(tipitaka_names$name)
The unprocessed text of the Tipitaka, with one row per volume.
tipitaka_rawtipitaka_raw
A tibble with the variables:
Text of each Tipitaka volume
Abbreviated book name of each volume
Vipassana Research Institute, CST4, April 2020
Every word of every volume of the Tipitaka, with one word per
column and one book per line. Each cell is the frequency at
which that word appears in that book. Computed from
tipitaka_raw on first access.
tipitaka_widetipitaka_wide
An object of class data.frame with 46 rows and 140433 columns.
Vipassana Research Institute, CST4, April 2020
A subset of tipitaka_names consisting of only the books of
the Vinaya Pitaka. These are easier to read if you call
stringi::stri_unescape_unicode first.
vinaya_pitakavinaya_pitaka
A tibble with the variables:
Abbreviated title
Full title
# Clean up the Unicode characters to make things more readable: vinaya_pitaka$name <- stringi::stri_unescape_unicode(vinaya_pitaka$name)# Clean up the Unicode characters to make things more readable: vinaya_pitaka$name <- stringi::stri_unescape_unicode(vinaya_pitaka$name)