Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions duui-gnfinder-v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# HoToUse
For using GNfinder as a DUUI image it is necessary to use the Docker Unified UIMA Interface.

## Use as Stand-Alone-Image
```bash
docker run docker.texttechnologylab.org/duui-gnfinder-v2:latest
```

## Run with a specific port
```bash
docker run -p 1000:9714 docker.texttechnologylab.org/duui-gnfinder-v2:latest
```

## Run within DUUI
```java
composer.add(new DUUIDockerDriver.
Component("docker.texttechnologylab.org/duui-gnfinder-v2:latest")
.withScale(iWorkers)
.withImageFetching());
```

## Existing Parameters

| Parameter | Description | Datatype | Default |
| --- | --- | --- | --- |
| language | Language of the text for better detection accuracy. | String | detect |
| ambiguousNames | Include ambiguous name matches. | Boolean | False |
| noBayes | Disable Bayesian odds calculation. | Boolean | False |
| oddsDetails | Include detailed odds calculation information. | Boolean | False |
| verification | Enable verification of found names against data sources. | Boolean | True |
| sources | List of data source IDs to use for verification. | String | [11] |
| allMatches | Return all matches instead of only the best match. | Boolean | False |


# Cite
If you want to use the DUUI image please quote this as follows:


Alexander Leonhardt, Giuseppe Abrami, Daniel Baumartz and Alexander Mehler. (2023). "Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI." Findings of the Association for Computational Linguistics: EMNLP 2023, 385–399. [[LINK](https://aclanthology.org/2023.findings-emnlp.29)] [[PDF](https://aclanthology.org/2023.findings-emnlp.29.pdf)]

## BibTeX
```
@inproceedings{Leonhardt:et:al:2023,
title = {Unlocking the Heterogeneous Landscape of Big Data {NLP} with {DUUI}},
author = {Leonhardt, Alexander and Abrami, Giuseppe and Baumartz, Daniel and Mehler, Alexander},
editor = {Bouamor, Houda and Pino, Juan and Bali, Kalika},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2023},
year = {2023},
address = {Singapore},
publisher = {Association for Computational Linguistics},
url = {https://aclanthology.org/2023.findings-emnlp.29},
pages = {385--399},
pdf = {https://aclanthology.org/2023.findings-emnlp.29.pdf},
abstract = {Automatic analysis of large corpora is a complex task, especially
in terms of time efficiency. This complexity is increased by the
fact that flexible, extensible text analysis requires the continuous
integration of ever new tools. Since there are no adequate frameworks
for these purposes in the field of NLP, and especially in the
context of UIMA, that are not outdated or unusable for security
reasons, we present a new approach to address the latter task:
Docker Unified UIMA Interface (DUUI), a scalable, flexible, lightweight,
and feature-rich framework for automatic distributed analysis
of text corpora that leverages Big Data experience and virtualization
with Docker. We evaluate DUUI{'}s communication approach against
a state-of-the-art approach and demonstrate its outstanding behavior
in terms of time efficiency, enabling the analysis of big text
data.}
}

@misc{Abrami:2022,
author = {Abrami, Giuseppe},
title = {GNfinder as DUUI-Komponent},
year = {2022},
howpublished = {https://github.com/texttechnologylab/duui-uima/edit/main/duui-GNFinder}
}

```