Various scripts to convert editions of the Tipiṭaka into Markdown and HTML. Use at your own risk.
python scan_sc_html.py | sed "s/id='[^']*'/id='X'/g" | sed "s/data-counter='[^']*'/data-counter='X'/g" | sed "s/value='[^']*'/value='X'/g" | sort | uniqNone of these take any arguments on command line
wt_genlinks.py: Createlinks.pycontaining semantic mapping between file numbers to semantic pathtipitaka2500.py: Create tipitaka2500.github.io/tipitaka from World Tipitakawt2md.py: Create Markdown files intipitaka2500from World Tipitaka
Experimental translation and summary using LLM - reads folders/files from command line
wt2eng.py: Translate Markdown files into English using Llama 3.3 70b.
These are mainly used for experimental/testing purposes
wt2html.py: Convert World Tipitaka from XML to HTML body fragments (excludes everything outside ), no further processing (eg. fixing inline javascript links), write out towt-htmlwt2semantic.py: Convert World Tipitaka from XML to HTML body fragments (excludes everything outside ), but in a semantic tree structure, inwt-semanticbs4.py: Jupyter notebook for playing with BeautifulSoup
html_template.py: HTML template used bytipitaka2500.pyget_data.py: convert single XML file to HTML (raw), used by various utilitieslinks.py: generated bywt_genlinks.py, used by used bytipitaka2500.py