The convention by which names are now assigned to Tipitaka files at Access to Insight was first implemented in June 2006. It emerged from more than two years of discussions between jtb, Michael Olds, Alex Genaud, and Hugo Gayosso. Several others contributed helpful ideas on the now-defunct ATI technical blog, including John Gill, E.M., Donnovan Knight, Ben N, and Robert.
Addendum (2009.06.05): When this article was written (2007), I was mainly concerned with the naming of static HTML files. For the most part, the arguments presented below apply equally well to the naming of permalinks in any online content management system. Thoughtful and consistent naming of permalinks is always a good idea.
The naming and organization of Tipitaka files on websites across the Internet tends to be haphazard. To illustrate, consider the following suttas from the Majjhima Nikaya that turned up in a recent »Google search for majjhima sutta:
MN 28 (English) | /canon/sutta/majjhima/mn-028-tb0.html (From Access to Insight in 2005. In 2006 the file was moved to /tipitaka/mn/mn.028.than.html.) |
MN 28 (English) | /e-tipitaka/mn-28.htm |
MN 28 (English) | /028-mahahatthipadopama-sutta-e1.htm |
MN 23 (English) | /dhamma-vinaya/mo/mn/mn023_mo.htm |
MN 28 (Czech) | /sloni-stopa.htm |
MN 28 (German) | /majjhima/m028n.htm |
MN 28 (Italian) | /tipitaka/mn28.html |
MN 28 (Portuguese) | /sutta/MN28.htm |
MN 107 (Russian) | /dhamma/canon/mn107.htm |
MN 28 (Serbian) | http://www.geocities.com/budizam/canon/majjhima/mn28.html |
MN 16 (Swedish) | /buddha/95.htm |
This simple experiment reveals two striking facts:
Filenames are inconsistent.
Some sites name a sutta file using a Pali name (028-mahahatthipadopama-sutta-e1.htm); some use the local language's translation of the Pali title (sloni-stopa.htm); some use a local index number (95.htm); some use uppercase nikaya abbreviations (e.g., MN); some use lowercase; some use hyphens and underscores. And so on.
Directory hierarchies are inconsistent.
Moreover, they are rarely laid out in a way that reflects the structure of the Tipitaka itself. For example: some sites pour all their suttas into one big directory called "tipitaka" (/tipitaka/mn28.htm); others place them in a "sutta" directory (/sutta/MN28.htm); others refine this a little further, by placing each nikaya inside a "canon" directory (/budizam/canon/majjhima/mn28.html); while others place the nikayas under a "sutta pitaka" directory (/canon/sutta/majjhima/mn028-tb0.html).
It is quite natural for a busy webmaster to improvise a filing system that fulfills a site's immediate needs, one that quickly gets the job done of making the site's files accessible to its users. This is the strategy that Buddhist sites (Access to Insight included) have usually followed. But, as any experienced webmaster knows, as the number of files in a site's collection grows, the problem of intelligently managing all the hyperlinks between files can escalate at an alarming rate. [1] Any webmaster hoping to host a substantial collection of Tipitaka files must be prepared to provide hyperlinks between many thousands of files. Adding new files to a growing collection is only practical if the site adheres to a consistent and logical file-naming system.
There are other compelling reasons for consistent Tipitaka filenames. As Buddhist students, practitioners, and scholars study the online Tipitaka texts, they naturally exchange Tipitaka files with each other via e-mail or via postings on their websites. If you were to send me a file named 95.htm along with an e-mail that said, "Here's that sutta we talked about two weeks ago," what am I to make of it? I would have to open it and read it before I could know where to file it. If the file instead had a more meaningful name like mn-095-thanissaro.html then I could tell at a glance that this was Thanissaro's translation of MN 95 and I would know instantly where to drop it in my Tipitaka collection. Moreover, if I am trying to locate MN 95 on a website that is poorly indexed and whose filenames are poorly chosen, there is no way to know a priori where to find that sutta. This makes it extremely difficult to install useful hyperlinks between sites, as the webmaster must first decode the target site's opaque filing system.
Large sites and sites that serve as distribution source points for other sites should be concerned about these issues if they hope to continue to provide well organized and richly cross-referenced collections of Tipitaka texts.
Filenames shall be constructed from a subset of the World Wide Web Consortium's "unreserved characters" for URIs. In particular, our character set is this:
a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - (hyphen) . (dot) |
Please note that we exclude:
To make filenames machine-readable, they are structured to consist of three, four, or five dot-separated data fields, terminated by an extension (EXT). The possible fields are:
dn mn sn01 sn02 sn03 ... sn55 sn56 (one volume per samyutta) an01 an02 an03 ... an10 an11 (one volume per nipata) khp dhp ud iti snp vv pv thag thig nm miln mv cv (Vinaya)
Files from a given VOLUME are stored in a directory with the same name as the VOLUME. Thus, files from VOLUME dn are stored in the directory named dn; files from VOLUME an06 are stored in the directory named an06; etc.
The following are not yet implemented on ATI, and will be added on an as-needed basis: pk jat nc ps ap bv cp nett petk (Sutta Pitaka); sv (Vinaya Pitaka); dhs vbh kvu pug dhk yam pat (Abhidhamma Pitaka).
The following CHAPTERs are recognized:
VOLUME CHAPTER cv 01 02 03 ... 11 12 mv 01 02 03 ... 09 10 ud 1 2 3 4 5 6 7 8 iti 1 2 3 4 snp 1 2 3 4 5 vv 1 2 3 4 5 6 7 pv 1 2 3 4 thag 01 02 03 ... 20 21 thig 01 02 03 ... 15 16 miln 1 2 3 4 5 6 7
The TEXT field is present in all files. In most cases it corresponds to the basic unit of text in the Sutta Pitaka: the sutta. It consists of a one-, two-, or three-digit zero-padded subfield, optionally followed by:
The possible values in the first subfield of the TEXT field are as follows:
VOLUME Possible TEXT values cv 01 02 03 ... mv 01 02 03 ... dn 01 02 ... 33 34 mn 001 002 003 ... 151 152 sn (all) 001 002 003 ... an (all) 001 002 003 ... khp 1 2 3 ... 8 9 dhp 01 02 03 ... 25 26 ud 01 02 03 ... iti* 001 002 003 ... 110 111 snp 01 02 03 ... thag 01 02 03 ... thig 01 02 03 ... *TEXT numbering runs consecutively across the chapters.
The optional part of the TEXT contains more specific information about the contents of the file, as described above. For example:
Filename | What it contains |
an10.001x.abcd.html | An excerpt from AN 10.1 |
an01.001-010.abcd.html | Suttas AN 1.1-10 (complete) |
an01.001-010x.abcd.html | Suttas AN 1.1-10 (excerpts) |
SOURCE is a one-digit field that is used to further subdivide certain TEXTs: the Mv and Cv in the Vinaya (following IB Horner's numbering), and the longer suttas in DN.
The SOURCE field is present in all files. It contains a four-byte translator code that stands for the name of the translator(s) of the given text. For example: nymo=Ñanamoli Thera; than=Thanissaro Bhikkhu; irel=John D. Ireland. Librarians and webmasters are encouraged to use translator codes from Access to Insight's pool of Reserved Translator Codes. Eventually, the need may emerge for some organizational structure to oversee and coordinate assignment of these codes — especially when managing translations into other (non-English) languages. But this goes far beyond the scope of what I can do.
Additional SOURCE sub-fields may be needed in the future. All such sub-fields shall be appended after the translator code. If specific languages are to be incorporated in the future, language codes should observe the three-byte ISO 639.2 standard.
tipitaka/ vin/ cv/ mv/ sv/ dn/ mn/ sn/ sn01/ sn02/ sn03/ ... sn55/ sn56/ an/ an01/ an02/ an03/ ... an10/ an11/ kn/ khp/ dhp/ ud/ iti/ snp/ vv/ pv/ thag/ thig/ jat/ nm/ nc/ ps/ ap/ bv/ cp/ nett/ petk/ miln/ abhi/ dhs/ vbh/ kvu/ pug/ dhk/ yam/ pat/
Form: | VOLUME.CHAPTER.TEXT.SECTION.SOURCE.EXT | |
VOLUME: | mv | |
CHAPTER: | 01 02 03 ... 11 12 | |
TEXT: | The range of possible s depends on the CHAPTER:
|
|
SECTION: | The range of possible SECTIONs varies from TEXT to TEXT: 01 02 03 ... . TEXTs for which Horner has not enumerated any SECTIONs are assigned a SECTION of 01. | |
Examples: | mv.08.26.01-08.than.html (Mv 8.26.1-8 {Horner, Part IV, p.431}; the story of the monk with dysentery) |
Form: | VOLUME.CHAPTER.TEXT.SECTION.SOURCE.EXT | |
VOLUME: | cv | |
CHAPTER: | 01 02 03 ... 11 12 | |
TEXT: | The range of possible TEXTs depends on the CHAPTER:
|
|
SECTION: | The range of possible SECTIONs varies from TEXT to TEXT: 01 02 03 ... . TEXTs for which Horner has not enumerated any SECTIONs are assigned a SECTION of 01. | |
Examples: | cv.05.06.01x.olen.html (Cv 5.6, excerpt. Horner does not give any section numbers in Cv 5.6, so we call it 01.) |
Form: | VOLUME.TEXT.SECTION.SOURCE.EXT | |
VOLUME: | dn | |
TEXT: | 01 02 03 ... 33 34 | |
SECTION: | The range of possible SECTIONs depends on the TEXT:
|
|
Examples: | dn.01.2.abcd.html (DN 1, section 2) dn.16.5-6.than.html (DN 16, sections 5-6, complete) dn.16.1-6.vaji.html (DN 16, complete (i.e., sections 5-6)) dn.16.1-3x.abcd.html (DN 16, sections 1-3, excerpts) dn.22.0.than.html (DN 22, complete) |
Form: | VOLUME.TEXT.SOURCE.EXT |
VOLUME: | mn |
TEXT: | 001 002 003 ... 151 152 |
Examples: | mn.001.than.html (MN 1, complete) mn.021x.budd.html (MN 21, excerpt) |
Form: | VOLUME.TEXT.SOURCE.EXT |
VOLUME: | sn01 sn02 sn03 ... sn55 sn56 |
TEXT: | 001 002 003 ... |
Examples: | sn01.001.than.html (SN 1.1) sn36.010.nypo.html (SN 36.10) sn56.011.piya.html (SN 56.11) |
Form: | VOLUME.TEXT[.SECTION].SOURCE.EXT |
VOLUME: | an01 an02 an03 ... an10 an11 |
TEXT: | 001 002 003 ... |
SECTION: | 01 02 03 ...
A few suttas contain enumerated sections, which in other editions of the Tipitaka are treated as separate suttas. For example, AN 3.100 (PTS) contains 15 sections; in the Thai edition the first 10 are counted as one sutta and the last 5 as another. To avoid ambiguity, we therefore number the first as an03.100.01-10 and the second as an03.100.11-15. |
Examples: | an01.021-040.than.html (AN 1.21 through AN 1.40, complete) an01.031-040x.wood.html (AN 1.31 through AN 1.40, excerpts) an03.100.01-10.than.html (AN 3.100, first part; see above re: SECTION) an05.161.nymo.html (AN 5.61) |
Form: | VOLUME.TEXT.SOURCE.EXT |
VOLUME: | khp |
TEXT: | 1 2 3 ... 8 9 |
Examples: | khp.9.amar.html (Khp 9) khp.1-9x.piya.html (Khp 1-9, excerpts) |
Form: | VOLUME.TEXT.SOURCE.EXT |
VOLUME: | dhp |
TEXT: | 01 02 03 ... 25 26 |
Examples: | dhp.06.than.html (Dhp, Pandita vagga) dhp.23.budd.html (Dhp, Naga vagga) dhp.14.than.html#dhp-183 (Dhp 183; see note) |
Note: | Individual verses are referenced by href anchors, which are numbered consecutively across vaggas. Thus, Dhp 183 would be referenced as <a href="dhp.14.abcd.html#dhp-183">.... |
Form: | VOLUME.CHAPTER.TEXT.SOURCE.EXT |
VOLUME: | ud |
CHAPTER: | 1 2 3 4 5 6 7 8 |
TEXT: | 01 02 03 ... 09 10 |
Examples: | ud.1.02.abcd.html (Ud 1.2) |
Form: | VOLUME.CHAPTER.TEXT.SOURCE.EXT |
VOLUME: | iti |
CHAPTER: | 1 2 3 4 |
TEXT: | 001 002 003 ... 111 112 |
Examples: | iti.1.001.abcd.html (Iti 1) iti.1.002.abcd.html (Iti 2) iti.2.028-030.abcd.html (Iti 28-30) iti.4.106-112x.irel.html (excerpts from Iti 106-112) |
Note: | The numbering of suttas (TEXTs) runs consecutively across chapters (it does not restart at the beginning of each chapter). |
Form: | VOLUME.CHAPTER.TEXT.SOURCE.EXT |
VOLUME: | snp |
CHAPTER: | 1 2 3 4 5 |
TEXT: | 01 02 03 ... 15 16 |
Examples: | snp.1.01.abcd.html (Sn 1.1) snp.1.02.abcd.html (Sn 1.2) snp.5.16.abcd.html (Sn 5.16) |
Form: | VOLUME.CHAPTER.TEXT.SOURCE.EXT |
VOLUME: | vv |
CHAPTER: | 1 2 3 4 5 6 7 |
TEXT: | 01 02 03 ... 83 84 85 |
Example: | vv.1.16.irel.html (Vv 1.16) |
Form: | VOLUME.CHAPTER.TEXT.SOURCE.EXT |
VOLUME: | pv |
CHAPTER: | 1 2 3 4 |
TEXT: | 01 02 03 ... 49 50 51 |
Example: | pv.1.05.than.html (Pv 1.5) |
Form: | VOLUME.CHAPTER.TEXT.SOURCE.EXT |
VOLUME: | thag |
CHAPTER: | 01 02 03 ... 20 21 |
TEXT: | 01 02 03 ... 48 49 |
Examples: | thag.01.00.abcd.html#sutta001 (Thag 1.1; see Note) thag.01.00.abcd.html#sutta120 (Thag 1.120; see Note) thag.02.01.abcd.html (Thag 2.1) thag.02.49x.abcd.html (Thag 2.49 (excerpt)) thag.21.01.abcd.html (Thag 21.1) |
Note: |
Chapter 1 contains 120 one-liner "suttas". Rather than put each of these in individual files, Chapter 1 is treated as a monolithic entity, with TEXT assigned the value 00 and individual verses referenced by href anchors. Chapter 2 has the next-highest number of verses (49), so 49 is the maximum possible TEXT number.
Some chapters (e.g., 21) contain only one sutta, which is numbered 1. |
Form: | VOLUME.CHAPTER.TEXT.SOURCE.EXT |
VOLUME: | thig |
CHAPTER: | 01 02 03 ... 15 16 |
TEXT: | 01 02 03 ... 09 10 |
Examples: | thig.01.00.abcd.html#sutta01 (Thig 1.1; see Note) thig.01.00.abcd.html#sutta18 (Thig 1.18; see Note) thig.02.03.abcd.html (Thig 2.3) thig.12.01x.abcd.html (Thig 12.1 (excerpt)) |
Note: | Chapter 1 contains 18 one-liner "suttas". Rather than put each of these in individual files, Chapter 1 is treated as a monolithic entity, with TEXT assigned the value 00 and individual verses referenced by href anchors. Chapter 2 has the next-highest number of verses (10), so 10 is the maximum possible TEXT number.
Some chapters (e.g., 12) contain only one sutta, which is numbered 1. |
Form: | VOLUME.CHAPTER.TEXT.SOURCE.EXT |
VOLUME: | nm |
CHAPTER: | 1 2 3 ... 15 16 |
TEXT: | 01 02 03 ... 20 21 |
Example: | nm.2.04.olen.html (Nm 2.4) |
Note: | CHAPTER and TEXT are taken from the BJT edition. |
Horner's strange enumeration of sections in Miln frustrates attempts at building logical file names. Suttas are grouped in "divisions," each of which contains several questions posed to Ven. Nagasena. Her chapter II contains divisions 1-3, while chapter III contains divisions 4-7. Thus, if the last question in chapter II were referenced as "Miln 2.3.16" (Horner p. 87), the next question, in chapter III, would be referenced as "Miln 3.4.1" (p. 89). In other words, there is no "Miln 3.1.1". So be it.
Form: | VOLUME.CHAPTER.[DIVISION.]TEXT.SOURCE.EXT | |
VOLUME: | miln | |
CHAPTER: | 1 2 3 4 5 6 7 | |
DIVISION: | depends on the CHAPTER. | |
TEXT: | dependson the DIVISION. | |
|
||
Example: |
miln.5x.olen.html (Miln 5, excerpts) miln.2.3.12.kell.html (Miln 2.3.12; Horner vol 1 p.85; PTS p.62) miln.2x.kell.html (Miln 2, excerpts) |
These four-byte codes are used in the translator code sub-field of a filename.
Code | Translator | Identifying info |
---|---|---|
agga | Aggacitta Bhikkhu | Received higher ordination at Mahasi Meditation Centre, Rangoon, Myanmar, in 1979. |
agku | Aggacitta Bhikkhu & Kumara Bhikkhu | Translations done jointly by these two individuals. |
amar | Amaravati Sangha | Any of the monks or nuns affiliated with the Amaravati Buddhist Monastery, England. |
bodh | Bodhi, Bhikkhu | b. Jeffrey Block in New York City, 1944. |
bpit | Burmese Pitaka Association | Ven. Dr. Ashin Candasiri, Mahaganthavacakapandita (Asst. Rector, Sitagu International Buddhist Academy, Sagaing Hills, Sagaing, Myanmar), by way of Vilāsa Bhikkhu. |
budd | Buddharakkhita, Ven. Acharya | Founder of the Maha Bodhi Society in Bangalore, India. |
chlm | Chalmers, Robert | b. 18 August 1858 (where?). |
edmn | Edmunds, Albert J. | ??? |
hare | Hare, E.M. | b. 4 March 1893 (where?). |
harv | Harvey, Peter | Now at the School of Social and International Studies, University of Sunderland, Sunderland SR2 7EE, UK. (jtb 070328). |
heck | Hecker, Hellmuth | ??? |
hekh | Hecker, Hellmuth & Khema, Ayya (Group) | Translations done jointly by both individuals. (Hecker: Pali to German; Khema: German to English). |
horn | Horner, I.B. | b. in Walthamstow, England, 1896; d. 1981. |
irel | Ireland, John D. | b. in London, 1932; d. 1998. |
jnss | Johansson, Rune E.A. | b. Sweden (when?); d. (when?). Author: The Dynamic Psychology of Early Buddhism; Pali Buddhist Texts: An Introductory Reader and Grammar (1998). |
kant | Kantasilo Bhikkhu | ??? |
kell | John Kelly | b. 1952. |
khan | Khantipalo Bhikkhu | b. Laurence Mills, 1932 (where?). |
khem | Khema, Ayya | b. Ilse Kussel in Berlin, 1923; d. 1997. |
ksw0 | John Kelly, Sue Sawyer, & Victoria Wareham (Group) | Translations done jointly by these three individuals. |
kuma | Kumara Bhikkhu | b. Liew Chin Leag in Malaysia, 1972. |
lupt | Lupton, Walter James | b. 1871; d. 1955 (?dates unconfirmed?). |
mend | Dr. N.K.G. Mendis | b. (when?) Sri Lanka. |
msyd | Mahasi Sayadaw | b. Aug 14, 1904 at Seikkhun, Burma; d. Aug 14, 1982. |
nana | Ñanananda, Bhikkhu | b. in Sri Lanka (year?). |
nara | Narada Thera | b. in Kotahena, Sri Lanka, 1898; d. 1983. |
niza | Nizamis, Khristos | b. Adelaide, South Australia 1961. Independent philosopher, meditator, and scholar. |
norm | Norman, K.R. | Vice-president, Pali Text Society. |
ntbb | Ñanamoli Thera & Bhikkhu Bodhi (Group) | Translations done jointly by these two individuals. (Ñanamoli: Pali to English; Bodhi: Pali to English & editor.) |
nymo | Ñanamoli Thera | b. Osbert Moore in England, 1905; d. 1960. |
nypo | Nyanaponika Thera | b. Siegmund Feniger in Germany, 1901; d. 1994. |
nysa | Nyanasatta Thera | b. in Czechoslovakia (year?). |
nyva | Ñanavara Thera | a.k.a. Somdet Phra Buddhaghosacariya (where? year?) |
olds | Olds, Michael | b. Indiana, USA 1941; writing from New York and Los Altos. AKA 'Obo'. |
olen | Olendzki, Andrew | Executive director of the Barre Center for Buddhist Studies. |
piya | Piyadassi Thera | b. Colombo, Sri Lanka, in 1914; d. 1998. |
pnji | Punnaji, Mahathera Madewela | b. Sri Lanka (when?); recommended by M.O. |
rhyc | Rhys Davids, C.A.F. | b. Caroline Augusta Foley in Wadhurst, England, 1857; d. 1942 |
rhyt | Rhys Davids, T.W. | b. in Colchester, England, 1843; d. Chipstead, England 1922. |
soma | Soma Thera | b. Victor Emmanuel Perera Pulle in Sri Lanka, 1898; d. 1960. |
soni | Soni, R.L. | Resided in Mandalay, Burma in the 1950's. |
stor | Story, Francis | b. in England 1910; d. 1972. |
than | Thanissaro Bhikkhu | b. Geoffrey DeGraff in USA, 1949. |
upal | Upalavanna, Sister | ??? |
vaji | Vajira, Sister | b. in Germany. |
vaka | Ñanavara Thera & Kantasilo Bhikkhu (Group) | Translations done jointly by these two individuals. |
wlsh | Walshe, Maurice O'Connor | b. in London 1911; d. 1998. |
wood | Woodward, F.L. | b. 1871; d. 1952. |
wrrn | Warren, Henry Clarke | b. in Boston 1854; d. 1899. (Recommended by M.O. 060719.) |
yaho | Yahoo! Pali Group (Group) | Translations done jointly by participants of the Yahoo! Pali Group, an online forum. |
Dividing by vagga (1 and 3) has compelling didactic appeal, as the thematic grouping of suttas is more tightly bound to vagga than to samyutta. But if the goal is to simplify file management for a website librarian, who spends much of his or her time clicking and scrolling through directories to edit files, method 2 makes most sense. Consider a website containing the complete SN (2,889 files). Method 1 would yield about 500 files/dir (f/d); method 2 about 50 f/d; method 3 about 50 f/d, plus an additional layer of 5 dirs. From a librarian's point of view, the best choice is the simplest, cleanest, quickest, and most intuitive one. To me, that's 2. Although 1 and 3 have theoretical appeal, from a practical file-management point of view they add additional keystrokes and mouse-clicks that I don't want. Librarians and webmasters can, of course, design their index.html files to present SN files as being subdivided any way they like. But that is more a matter of display, not of organization.