diff --git a/duui-gnfinder-v2/README.md b/duui-gnfinder-v2/README.md new file mode 100644 index 00000000..ce5d7a5f --- /dev/null +++ b/duui-gnfinder-v2/README.md @@ -0,0 +1,77 @@ +# HoToUse +For using GNfinder as a DUUI image it is necessary to use the Docker Unified UIMA Interface. + +## Use as Stand-Alone-Image +```bash +docker run docker.texttechnologylab.org/duui-gnfinder-v2:latest +``` + +## Run with a specific port +```bash +docker run -p 1000:9714 docker.texttechnologylab.org/duui-gnfinder-v2:latest +``` + +## Run within DUUI +```java +composer.add(new DUUIDockerDriver. + Component("docker.texttechnologylab.org/duui-gnfinder-v2:latest") + .withScale(iWorkers) + .withImageFetching()); +``` + +## Existing Parameters + +| Parameter | Description | Datatype | Default | +| --- | --- | --- | --- | +| language | Language of the text for better detection accuracy. | String | detect | +| ambiguousNames | Include ambiguous name matches. | Boolean | False | +| noBayes | Disable Bayesian odds calculation. | Boolean | False | +| oddsDetails | Include detailed odds calculation information. | Boolean | False | +| verification | Enable verification of found names against data sources. | Boolean | True | +| sources | List of data source IDs to use for verification. | String | [11] | +| allMatches | Return all matches instead of only the best match. | Boolean | False | + + +# Cite +If you want to use the DUUI image please quote this as follows: + + +Alexander Leonhardt, Giuseppe Abrami, Daniel Baumartz and Alexander Mehler. (2023). "Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI." Findings of the Association for Computational Linguistics: EMNLP 2023, 385–399. [[LINK](https://aclanthology.org/2023.findings-emnlp.29)] [[PDF](https://aclanthology.org/2023.findings-emnlp.29.pdf)] + +## BibTeX +``` +@inproceedings{Leonhardt:et:al:2023, + title = {Unlocking the Heterogeneous Landscape of Big Data {NLP} with {DUUI}}, + author = {Leonhardt, Alexander and Abrami, Giuseppe and Baumartz, Daniel and Mehler, Alexander}, + editor = {Bouamor, Houda and Pino, Juan and Bali, Kalika}, + booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2023}, + year = {2023}, + address = {Singapore}, + publisher = {Association for Computational Linguistics}, + url = {https://aclanthology.org/2023.findings-emnlp.29}, + pages = {385--399}, + pdf = {https://aclanthology.org/2023.findings-emnlp.29.pdf}, + abstract = {Automatic analysis of large corpora is a complex task, especially + in terms of time efficiency. This complexity is increased by the + fact that flexible, extensible text analysis requires the continuous + integration of ever new tools. Since there are no adequate frameworks + for these purposes in the field of NLP, and especially in the + context of UIMA, that are not outdated or unusable for security + reasons, we present a new approach to address the latter task: + Docker Unified UIMA Interface (DUUI), a scalable, flexible, lightweight, + and feature-rich framework for automatic distributed analysis + of text corpora that leverages Big Data experience and virtualization + with Docker. We evaluate DUUI{'}s communication approach against + a state-of-the-art approach and demonstrate its outstanding behavior + in terms of time efficiency, enabling the analysis of big text + data.} +} + +@misc{Abrami:2022, + author = {Abrami, Giuseppe}, + title = {GNfinder as DUUI-Komponent}, + year = {2022}, + howpublished = {https://github.com/texttechnologylab/duui-uima/edit/main/duui-GNFinder} +} + +```