Skip to content

Commit fd0b6d2

Browse files
authored
Merge pull request #720 from Systems-Modeling/ST6RI-846
ST6RI-846 Create EBNF / GBNF extractor tool for KerML and SysML specs
2 parents c4c3ef4 + 87b804b commit fd0b6d2

337 files changed

Lines changed: 392323 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,5 +35,8 @@ dependency-reduced-pom.xml
3535
# MacOS Finder
3636
.DS_Store
3737

38+
# Generated Python files
39+
*.pyc
40+
3841
# Built libraries
3942
*.kpar
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
build/
2+
dist/
3+
scratch/
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
= bnf_grammar_tools
2+
3+
Tools that process KerML and SysML2 concrete language grammars from their respective specifications, check them for correctness and generates two kinds of grammar listings: (1) machine-readable plain text BNF files and (2) human-readable hyperlinked BNF files in HTML format.
4+
5+
== Usage of the Tools
6+
7+
=== Obtain a Complete KerML or SysML Specification in HTML Source Format
8+
9+
Instructions on how to export a selected version of the specification from View Editor to a self-standing HTML file.
10+
11+
> Note: These instructions are tested with View Editor version 4 with the FireFox browser v130.0.1 on Windows 11. In other browsers the menu or key to inspect the source code may be different.
12+
13+
1. Open the selected spec in View Editor.
14+
2. Load the full document by clicking the "Full Document" icon (image:VE-full-document-icon.png[]) at the top of the left panel. This may take several minutes.
15+
3. Click the print icon, image:VE-print-icon.png[] left next to the EXPORT icon at the top of the document panel.
16+
4. Wait for a new browser tab to appear with the complete HTML document and a popup print dialog. Again, be patient, this may take several minutes.
17+
5. Cancel the popup print dialog.
18+
6. In the complete HTML document tab, open a Developer / Inspector panel via menu *More tools > Web Developer Tools* or by hitting `Ctrl+Shift+I`. (Note: a direct *Save page as ...* or `Ctlr+S` does not work, as it saves a script.)
19+
7. In the Inspector tab of the Developer panel, right-click the top level `<html ...>` element, and select *Copy > Outer HTML* from the context menu.
20+
8. Open a new, appropriately named `.html` file in a text editor, paste the contents and save.
21+
9. Close the complete HTML document tab.
22+
23+
=== Install the Python Environment
24+
25+
Ensure that Python version 3.9 or higher is installed on your machine. The most convenient way is to use the https://www.jetbrains.com/pycharm/[PyCharm] tool. Create a dedicated `conda` or `venv` development environment and activate it. The best tool to install a Python base environment is https://github.com/conda-forge/miniforge[miniforge].
26+
27+
After installation check the active Python version, e.g.:
28+
29+
[source,shell]
30+
----
31+
$ python --version
32+
Python 3.12.12
33+
----
34+
35+
Also ensure that the latest version of the following packages are installed. You can use `pip` or `conda` or `mamba`.
36+
37+
* Package https://pypi.org/project/beautifulsoup4/[beautifulsoup4] is used to parse the HTML input file.
38+
* Package https://pypi.org/project/lxml/[lxml] is used to assist beautifulsoup4 with processing SVG/XML files.
39+
* Package https://pypi.org/project/lark/[lark] is used to parse and verify the extracted BNF source.
40+
* Package https://pypi.org/project/pytest/[pytest] is used to run unit tests.
41+
42+
for example with the following commands:
43+
44+
[source,shell]
45+
----
46+
$ pip install beautifulsoup4
47+
$ pip install lxml
48+
$ pip install lark
49+
$ pip install pytest
50+
----
51+
52+
=== Run the bnf_grammar_processor
53+
54+
The usage info for the `bnf_grammar_processor` is as below, as usual obtained with the `-h` or `--help` option.
55+
Go to the `tool-support/bnf_grammar_tools` directory and run `python .\bnf_grammar\bnf_grammar_processor.py -h`.
56+
57+
[source,shell]
58+
----
59+
usage: python bnf_grammar_processor [-h] [-i [INPUT_DIR]] [-o [OUTPUT_DIR]] SOURCE_DATA
60+
61+
Extract or parse textual and/or graphical grammars from given KerML or SysML specifications and generate plain text and html BNF grammar files.
62+
63+
positional arguments:
64+
SOURCE_DATA JSON file defining source data - see examples under
65+
the tests directory
66+
67+
options:
68+
-h, --help show this help message and exit
69+
-i [INPUT_DIR], --input-dir [INPUT_DIR]
70+
input directory path
71+
-o [OUTPUT_DIR], --output-dir [OUTPUT_DIR]
72+
output directory path
73+
74+
The processor supports to two main capabilities:
75+
1) Extract the KerML or SysML grammar(s)
76+
from provided raw .html file(s) exported from KerML and SysML specifications in the View Editor tool,
77+
and then validate the extracted grammars, report possible errors, and generate these outputs:
78+
- .json dumps of the processed intermediate data model(s),
79+
- .kebnf and/or .kgbnf plain text files,
80+
- -marked_up.kebnf and/or -marked_up.kgbnf marked up text files, that can be used as a basis for corrected grammars,
81+
- .html files with hyperlinked, human-readable versions of the grammars.
82+
2) Validate corrected -marked_up.kebnf or -marked_up.kgbnf input files
83+
and generate the same files as under 1), but now for the corrected grammars.
84+
85+
Option 2) is selected when the input filename(s) end on '-marked_up.kebnf' or '-marked_up.kgbnf', otherwise option 1).
86+
Both options will produce a log file named 'bnf_grammar_processor.log' in the working directory.
87+
In SOURCE_DATA the input files should be given in reverse dependendy order, i.e., first KerML textual, then SysML textual, then SysML graphical notation.
88+
Via diff'ing of the extracted and corrected .kebnf and/or .kgbnf files a list of corrections to be fed into the OMG issue trackers can be compiled.
89+
----
90+
91+
Note. The file extensions `.kebnf` and `.kgbnf` are inspired by the `.kpar` extension for the KerML archive files.
92+
93+
The BNF grammars are defined in the format of the https://github.com/lark-parser/lark[Lark] parsing toolkit for Python. The definitions are in:
94+
95+
* `bnf_grammar/kebnf_textual_grammar.lark`, and,
96+
* `bnf_grammar/kgbnf_graphical_grammar.lark`.
97+
98+
Inside the `bnf_grammar_processer` Lark is used to check each production individually. Some additional heuristic validation is also performed to permit processing of incorrect grammar or note fragments. All diagnostics are reported in the `bnf_grammar_processor.log` file. For the graphical grammar this includes a mapping table between the existing (PNG) images in the specs from the View Editor source and the new SVG images in `images` subdirectory.
99+
100+
.Example command line arguments
101+
For the time being, example input and output directories with `SOURCE_DATA` files can be found under the `tests` folder.
102+
103+
For option 1):
104+
105+
- `INPUT_DIR` = `tests/KerML_and_SysML_spec_sources`
106+
- `OUTPUT-DIR` = `tests/KerML_and_SysML_grammars`
107+
- `SOURCE_DATA` = `source_specs.json` (in `INPUT_DIR`)
108+
109+
The style information for the generated HTML outputs resides in `tests/KerML_and_SysML_grammars/bnf_styles.css`.
110+
111+
For option 2):
112+
113+
- `INPUT_DIR` = `tests/KerML_and_SysML_grammars`
114+
- `OUTPUT-DIR` = `tests/KerML_and_SysML_grammars`
115+
- `SOURCE_DATA` = `source_marked_ups.json` (in `INPUT_DIR`)
116+
117+
.Example generated outputs for option 1) (extract)
118+
The bnf_grammar_processor with produces the following outputs (see directory `tests/KerML_and_SysML_grammars`):
119+
120+
[cols="2,3"]
121+
|===
122+
| `KerML-textual-bnf-elements.json` | dump of the processed intermediate data model(s)
123+
| `KerML-textual-bnf.kebnf` | generated plain text KerML textual grammar file
124+
| `KerML-textual-bnf-marked_up.kebnf` | generated editable marked up KerML textual grammar file
125+
| `KerML-textual-bnf.html` | generated browsable, hyperlinked HTML KerML textual grammar file
126+
| `SysML-textual-bnf-elements.json` | dump of the processed intermediate data model(s)
127+
| `SysML-textual-bnf.kebnf` | generated plain text SysML textual grammar file
128+
| `SysML-textual-bnf-marked_up.kebnf` | generated editable marked up SysML textual grammar file
129+
| `SysML-textual-bnf.html` | generated browsable, hyperlinked HTML SysML textual grammar file
130+
| `SysML-graphical-bnf-elements.json` | dump of the processed intermediate data model(s)
131+
| `SysML-graphical-bnf.kgbnf` | generated plain text SysML graphical grammar file
132+
| `SysML-graphical-bnf-marked_up.kgbnf` | generated editable marked up SysML graphical grammar file
133+
| `SysML-graphical-bnf.html` | generated browsable, hyperlinked HTML SysML graphical grammar file (See Note)
134+
|===
135+
136+
Note. The SVG images for the graphical BNF productions reside in `tests/KerML_and_SysML_grammars/images`. They are copied from the source in https://github.com/Systems-Modeling/Graphical-Specification-WG/tree/main/src/Graphical-BNF/_svg[Graphical-Specification-WG github repo].
137+
138+
Each run of the `bnf_grammar_processor` produces a log on the console and in file `bnf_grammar_processor.log`. The log of the previous run is saved in `bnf_grammar_processor.log.backup`, which can be used to detect differences between runs.
139+
140+
=== Correct the Extracted Grammar Files and Reprocess with the bnf_grammar_processor
141+
142+
If there are errors in the grammar files, the following workflow can be used to apply bulk corrections.
143+
144+
. Copy `KerML-textual-bnf-marked_up.kebnf` to `KerML-textual-bnf-corrected-marked_up.kebnf`
145+
. Copy `SysML-textual-bnf-marked_up.kebnf` to `SysML-textual-bnf-corrected-marked_up.kebnf`
146+
. Copy `SysML-graphical-bnf-marked_up.kgbnf` to `SysML-graphical-bnf-corrected-marked_up.kgbnf`
147+
. Check the errors in the log files, and modify the `...-corrected-marked_up.k*bnf` files in a text editor to correct the errors.
148+
. After every couple of corrections, run the `bnf_grammar_processor` with the option 2) arguments. This will validate the corrected `.kebnf` and `.kgbnf` and generate the set of files described in the table below, similar to option 1)
149+
. Iterate the steps 4 and 5, until satisfied.
150+
. By making a diff between pairs of original (`...-bnf-marked_up.k*bnf`) and corrected (`...-bnf-corrected-marked_up.k*bnf`) files the required changes to be raised in OMG issues can be systematically compiled.
151+
152+
.Example generated files for option 2)
153+
[cols="2,3"]
154+
|===
155+
| `KerML-textual-bnf-corrected-elements.json` | dump of the corrected intermediate data model(s)
156+
| `KerML-textual-bnf-corrected.kebnf` | generated corrected plain text KerML textual grammar file
157+
| `KerML-textual-bnf-corrected.html` | generated corrected browsable, hyperlinked HTML KerML textual grammar file
158+
| `SysML-textual-bnf-corrected-elements.json` | dump of the processed intermediate corrected data model(s)
159+
| `SysML-textual-bnf-corrected.kebnf` | generated corrected plain text SysML textual grammar file
160+
| `SysML-textual-bnf-corrected.html` | generated corrected browsable, hyperlinked HTML SysML textual grammar file
161+
| `SysML-graphical-bnf-corrected-elements.json` | dump of the corrected intermediate data model(s)
162+
| `SysML-graphical-bnf-corrected.kgbnf` | generated corrected plain text SysML graphical grammar file
163+
| `SysML-graphical-bnf-corrected.html` | generated corrected browsable, hyperlinked HTML SysML textual grammar file
164+
|===
165+
166+
=== Use the bnf_file_parser for Final Checks
167+
168+
As a final check the `bnf_file_parser` can be used to validate complete, corrected BNF grammar files.
169+
170+
The usage info for the `bnf_file_parser` is as below, as usual obtained with the `-h` or `--help` option.
171+
Go to the `tool-support/bnf_grammar_tools` directory and run `python .\bnf_grammar\bnf_file_parser.py -h`.
172+
173+
[source,shell]
174+
----
175+
usage: bnf_file_parser [-h] BNF_PATH
176+
177+
Parse KerML or SysML grammar files in textual or graphical BNF format.
178+
179+
positional arguments:
180+
BNF_PATH Path to plain text BNF file with extension .kebnf or .kgbnf
181+
182+
options:
183+
-h, --help show this help message and exit
184+
----
185+
186+
Run `bnf_file_parser` on the following files:
187+
188+
* `KerML-textual-bnf-corrected.kebnf`
189+
* `SysML-textual-bnf-corrected.kebnf`
190+
* `SysML-graphical-bnf-corrected.kgbnf`
191+
192+
The console and the log file `bnf_file_parser.log` will list any errors still present. Otherwise, if the parse is completely successful, a dump of the resulting abstract syntax tree (in Lark's pretty print format) will be listed.
1.97 KB
Loading
1.56 KB
Loading

tool-support/bnf_grammar_tools/__init__.py

Whitespace-only changes.

tool-support/bnf_grammar_tools/bnf_file_parser.log

Whitespace-only changes.

tool-support/bnf_grammar_tools/bnf_file_parser.log.backup

Whitespace-only changes.

tool-support/bnf_grammar_tools/bnf_grammar/__init__.py

Whitespace-only changes.
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
#!python
2+
3+
"""
4+
bnf_file_parser is a command line tool that parses a KerML or SysML plain text grammar file.
5+
6+
The supported file formats are:
7+
- .kebnf for a KerML or SysML textual notation grammar
8+
- .kgbnf for a SysML graphical notation grammar
9+
10+
Its usage is described below in the main() function.
11+
12+
@author: Hans Peter de Koning (DEKonsult)
13+
14+
Requirements:
15+
16+
This tool requires installation of the following packages:
17+
- lark (See https://pypi.org/project/lark)
18+
19+
"""
20+
21+
import sys
22+
import os
23+
import shutil
24+
import argparse
25+
from datetime import datetime, timezone
26+
from typing import Optional
27+
from lark import Lark, UnexpectedInput
28+
29+
# Create logger for debug, info, warning, error, critical messages
30+
import logging
31+
LOGGER = logging.getLogger()
32+
33+
34+
class BnfParser:
35+
def __init__(self) -> None:
36+
self.start_timestamp: Optional[datetime] = None
37+
self.bnf_filepath: Optional[str] = None
38+
self.parser: Optional[Lark] = None
39+
40+
def parse(self, bnf_filepath: str) -> None:
41+
self.start_timestamp = datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z")
42+
self.bnf_filepath = bnf_filepath
43+
44+
LOGGER.info(f"Started parsing {self.bnf_filepath} at {self.start_timestamp}")
45+
46+
basename, ext = os.path.splitext(bnf_filepath)
47+
grammar_file = None
48+
if ext == ".kebnf":
49+
grammar_file = "kebnf_textual_grammar.lark"
50+
elif ext == ".kgbnf":
51+
grammar_file = "kgbnf_graphical_grammar.lark"
52+
else:
53+
LOGGER.critical(f"Unrecognized file extension for BNF_PATH {bnf_filepath}, terminating ...")
54+
sys.exit(1)
55+
56+
self.parser = Lark.open(grammar_file, rel_to=__file__, parser="lalr")
57+
58+
bnf_file = open(bnf_filepath, "r", encoding="utf-8")
59+
bnf_input = bnf_file.read()
60+
bnf_file.close()
61+
62+
try:
63+
parse_tree = self.parser.parse(bnf_input)
64+
except UnexpectedInput as e:
65+
LOGGER.error(f"Parse error in {self.bnf_filepath}:\n{e}")
66+
else:
67+
LOGGER.info(f"Parse completed successfully")
68+
LOGGER.info(f"The resulting (AST) parse tree is:\n\n{parse_tree.pretty()}")
69+
70+
71+
def main() -> None:
72+
# Initialize logging
73+
LOGGER.setLevel(logging.DEBUG)
74+
formatter = logging.Formatter("%(levelname)-8s: %(message)s")
75+
76+
console_handler = logging.StreamHandler()
77+
console_handler.set_name("console")
78+
console_handler.setLevel(logging.INFO)
79+
console_handler.setFormatter(formatter)
80+
LOGGER.addHandler(console_handler)
81+
82+
log_file_name = "bnf_file_parser.log"
83+
if os.path.exists(log_file_name):
84+
# Create backup copy of the log-file to inspect differences between runs
85+
shutil.copy2(log_file_name, log_file_name + ".backup")
86+
87+
file_handler = logging.FileHandler(log_file_name, mode="w", encoding="utf-8")
88+
file_handler.set_name("logfile")
89+
file_handler.setLevel(logging.INFO)
90+
file_handler.setFormatter(formatter)
91+
LOGGER.addHandler(file_handler)
92+
93+
LOGGER.debug(f"bnf_grammar_parser started in {os.getcwd()}")
94+
95+
# Parse command line
96+
parser = argparse.ArgumentParser(
97+
prog="bnf_file_parser",
98+
allow_abbrev=False,
99+
description="Parse KerML or SysML grammar files in textual or graphical BNF format.")
100+
parser.add_argument("bnf_path", metavar="BNF_PATH", type=str, help="Path to plain text BNF file with extension .kebnf or .kgbnf")
101+
args = parser.parse_args()
102+
LOGGER.debug(f"args={args}")
103+
104+
# Run the parser
105+
bnf_parser = BnfParser()
106+
bnf_parser.parse(args.bnf_path)
107+
108+
109+
if __name__ == "__main__":
110+
main()

0 commit comments

Comments
 (0)