Skip to content

tidipa/convert-scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

convert-scripts

Various scripts to convert editions of the Tipiṭaka into Markdown and HTML. Use at your own risk.

python scan_sc_html.py | sed "s/id='[^']*'/id='X'/g" | sed "s/data-counter='[^']*'/data-counter='X'/g" | sed "s/value='[^']*'/value='X'/g" | sort | uniq

Main Scripts

None of these take any arguments on command line

  • wt_genlinks.py: Create links.py containing semantic mapping between file numbers to semantic path
  • tipitaka2500.py: Create tipitaka2500.github.io/tipitaka from World Tipitaka
  • wt2md.py: Create Markdown files in tipitaka2500 from World Tipitaka

Experimental translation and summary using LLM - reads folders/files from command line

  • wt2eng.py: Translate Markdown files into English using Llama 3.3 70b.

Utilities

These are mainly used for experimental/testing purposes

  • wt2html.py: Convert World Tipitaka from XML to HTML body fragments (excludes everything outside ), no further processing (eg. fixing inline javascript links), write out to wt-html
  • wt2semantic.py: Convert World Tipitaka from XML to HTML body fragments (excludes everything outside ), but in a semantic tree structure, in wt-semantic
  • bs4.py: Jupyter notebook for playing with BeautifulSoup

Support files

  • html_template.py: HTML template used by tipitaka2500.py
  • get_data.py: convert single XML file to HTML (raw), used by various utilities
  • links.py: generated by wt_genlinks.py, used by used by tipitaka2500.py

About

Scripts to convert various editions of the Tipiṭaka into Markdown

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors