Title: | Data and Tools for Analyzing the Pali Canon |
---|---|
Description: | Provides access to the complete Pali Canon, or Tipitaka, the canonical scripture for Theravadin Buddhists worldwide. Based on the Chattha Sangayana Tipitaka version 4 (Vipassana Research Institute, 1990). |
Authors: | Dan Zigmond [aut, cre] |
Maintainer: | Dan Zigmond <[email protected]> |
License: | CC0 |
Version: | 0.1.2 |
Built: | 2024-10-26 04:10:28 UTC |
Source: | https://github.com/dangerzig/tipitaka |
A subset of tipitaka_names consisting of only the books of
the Abhidhamma Pitaka. These are easier to read if you call
pali_string_fix() first
.
abhidhamma_pitaka
abhidhamma_pitaka
A tibble with the variables:
Abbreviated title
Full title
\
# Clean up the Unicode characters to make things more readble: abhidhamma_pitaka$name <- stringi::stri_unescape_unicode(abhidhamma_pitaka$name) # Count all the words in the Abhidhamma Pitaka: sum(tipitaka_long[tipitaka_long$book %in% abhidhamma_pitaka$book, "n"])
# Clean up the Unicode characters to make things more readble: abhidhamma_pitaka$name <- stringi::stri_unescape_unicode(abhidhamma_pitaka$name) # Count all the words in the Abhidhamma Pitaka: sum(tipitaka_long[tipitaka_long$book %in% abhidhamma_pitaka$book, "n"])
Pali alphabet in order
pali_alphabet
pali_alphabet
The Pali alphabet in traditional order.
# Returns TRUE because a comes before b in Pali: match("a", pali_alphabet) < match("b", pali_alphabet) # Returns FALSE beceause c comes before b in Pali match("b", pali_alphabet) < match("c", pali_alphabet)
# Returns TRUE because a comes before b in Pali: match("a", pali_alphabet) < match("b", pali_alphabet) # Returns FALSE beceause c comes before b in Pali match("b", pali_alphabet) < match("c", pali_alphabet)
Note that all Pali string comparisons are case-insensitive.
pali_eq(word1, word2)
pali_eq(word1, word2)
word1 |
A first Pali word as a string |
word2 |
A second Pali word as a string |
TRUE if word1 and word2 are the same
Note that all Pali string comparisons are case-insensitive. #' Also non-Pali characters are placed at the end of the alphabet and are considered equivalent to each other.
pali_gt(word1, word2)
pali_gt(word1, word2)
word1 |
A first Pali word as a string |
word2 |
A second Pali word as a string |
TRUE if word1 comes after word2 alphabetically
Note that all Pali string comparisons are case-insensitive. Also non-Pali characters are placed at the end of the alphabet and are considered equivalent to each other. This has been implemented in C++ for speed.
pali_lt(word1, word2)
pali_lt(word1, word2)
word1 |
A first Pali word as a string |
word2 |
A second Pali words as a string |
TRUE if word1 comes before word2 alphabetically
Note that all Pali string comparisons are case-insensitive. This algorithm is based on Quicksort, but creates lots of intermediate data structures instead of doing swaps in place. This has been implemented in C++ as the original R version was about 500x slower.
pali_sort(word_list)
pali_sort(word_list)
word_list |
A vector of Pali words |
A new vector of Pali words in Pali alphabetical order
# Every unique word of of the Mahāsatipatthāna Sutta in # Pali alphabetical order: pali_sort(sati_sutta_long$word) # A sorted list of 100 random words from the Tiptaka: library(dplyr) pali_sort(sample(tipitaka_long$word, 100))
# Every unique word of of the Mahāsatipatthāna Sutta in # Pali alphabetical order: pali_sort(sati_sutta_long$word) # A sorted list of 100 random words from the Tiptaka: library(dplyr) pali_sort(sample(tipitaka_long$word, 100))
A list of all declinables and particles from the PTS Pali-English Dictionary.
pali_stop_words
pali_stop_words
An object of class tbl_df
(inherits from tbl
, data.frame
) with 245 rows and 1 columns.
https://dsalsrv04.uchicago.edu/dictionaries/pali/
# Find most common words in the Mahāsatipatthāna Sutta excluding stop words library(dplyr) sati_sutta_long %>% anti_join(pali_stop_words, by = "word") %>% arrange(desc(freq))
# Find most common words in the Mahāsatipatthāna Sutta excluding stop words library(dplyr) sati_sutta_long %>% anti_join(pali_stop_words, by = "word") %>% arrange(desc(freq))
The Mahāsatipatthāna Sutta or Discourse on the Establishing of Mindfulness in "long" form.
sati_sutta_long
sati_sutta_long
An object of class data.frame
with 832 rows and 4 columns.
Vipassana Research Institute, CST4, April 2020
The unprocessed text of the Mahāsatipatthāna Sutta
sati_sutta_raw
sati_sutta_raw
A tibble with the variable:
Complete text
Vipassana Research Institute, CST4, April 2020
A subset of tipitaka_names consisting of only the books of
the Sutta Pitaka. These are easier to read if you call
stringi::stri_unescape_unicode
first.
sutta_pitaka
sutta_pitaka
A tibble with the variables:
Abbreviated title
Full title
# Clean up the Unicode characters to make things more readble: sutta_pitaka$name <- stringi::stri_unescape_unicode(sutta_pitaka$name) # Count all the words in the Suttas: sum( unique( tipitaka_long[tipitaka_long$book %in% sutta_pitaka$book, "total"])) # Count another way: sum(tipitaka_long[tipitaka_long$book %in% sutta_pitaka$book, "n"]) # Create a tibble of just the Suttas sutta_wide <- tipitaka_wide[row.names(tipitaka_wide) %in% sutta_pitaka$book,]
# Clean up the Unicode characters to make things more readble: sutta_pitaka$name <- stringi::stri_unescape_unicode(sutta_pitaka$name) # Count all the words in the Suttas: sum( unique( tipitaka_long[tipitaka_long$book %in% sutta_pitaka$book, "total"])) # Count another way: sum(tipitaka_long[tipitaka_long$book %in% sutta_pitaka$book, "n"]) # Create a tibble of just the Suttas sutta_wide <- tipitaka_wide[row.names(tipitaka_wide) %in% sutta_pitaka$book,]
The package tipitaka provides access to the complete Pali Canon, or Tipitaka, from R. The Tipitaka is the canonical scripture for Therevadin Buddhists worldwide. This version is largely taken from the Chattha Sangāyana Tipitaka version 4.0 com;iled by the Vispassana Research Institute, although edits have been made to conform to the numbering used by the Pali Text Society. This package provides both data and tools to facilitate the analysis of these ancient Pali texts.
Several data sets are included:
tipitaka_raw: the complete text of the Tipitaka
tipitaka_long: the complete Tipitaka in "long" form
tipitaka_wide: the complete Tipitaka in "wide" form
tipitaka_names: the names of each book of the Tipitaka
sutta_pitaka: the names of each volume of the Sutta Pitaka
vinaya_pitaka: the names of each volume of the Vinaya Pitaka
abhidhamma_pitaka: the names of each volume of the Abhidhamma Pitak
sati_sutta_raw: the Mahāsatipatthāna Sutta text
sati_sutta_long: the Mahāsatipatthāna Sutta in "long" form
pali_alphabet: the complete pali alphabet in traditional order
pali_stop_words: a set of "stop words" for Pali
A few useful functions are provided for working with Pali text:
pali_lt: less-than function for Pali strings
pali-gt: greater-than function for Pali strings
pali-eq: equals function for Pali strings
pali-sort: sorting function for vectors of pali strings
Every word of every volume of the Tipitaka, with one word per volume per line.
tipitaka_long
tipitaka_long
A tibble with the variables:
Pali word
Number of time this word appears in this book
Ttal number of words in this book
Frequency with which this word appears in this book
Abbreviated book name
Vipassana Research Institute, CST4, April 2020
pali_string_fix() first
.Names of each book of the Tipitaka, both abbreviated and
in full. These are easier to read if you call pali_string_fix() first
.
tipitaka_names
tipitaka_names
A tibble with the variables:
Abbreviated title
Full title
# Clean up the Unicode characters to make things more readble: tipitaka_names$name <- stringi::stri_unescape_unicode(tipitaka_names$name)
# Clean up the Unicode characters to make things more readble: tipitaka_names$name <- stringi::stri_unescape_unicode(tipitaka_names$name)
The unprocessed text of the Tipitaka, with one row per volume.
tipitaka_raw
tipitaka_raw
A tibble with the variables:
Text of each Tipitaka volume
Abbreviated book name of each volume
Vipassana Research Institute, CST4, April 2020
Every word of every volume of the Tipitaka, with one word per column and one book per line. Each cell is the frequency at which that word appears in that book.
tipitaka_wide
tipitaka_wide
An object of class data.frame
with 46 rows and 141360 columns.
Vipassana Research Institute, CST4, April 2020
A subset of tipitaka_names consisting of only the books of
the Vinaya Pitaka. These are easier to read if you call
stringi::stri_unescape_unicode
first.
vinaya_pitaka
vinaya_pitaka
A tibble with the variables:
Abbreviated title
Full title
# Clean up the Unicode characters to make things more readble: vinaya_pitaka$name <- stringi::stri_unescape_unicode(vinaya_pitaka$name) # Count all the words in the Vinaya Pitaka: sum(tipitaka_long[tipitaka_long$book %in% vinaya_pitaka$book, "n"])
# Clean up the Unicode characters to make things more readble: vinaya_pitaka$name <- stringi::stri_unescape_unicode(vinaya_pitaka$name) # Count all the words in the Vinaya Pitaka: sum(tipitaka_long[tipitaka_long$book %in% vinaya_pitaka$book, "n"])