Skip to content

Commit 5be736d

Browse files
authored
Merge pull request #104 from Materials-Data-Science-and-Informatics/feature/entity_integration
Feature/entity integration
2 parents b294b8c + 5483143 commit 5be736d

23 files changed

Lines changed: 1016 additions & 313 deletions

File tree

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ jobs:
7474
fail-fast: true
7575
matrix:
7676
os: ["ubuntu-latest", "macos-latest", "windows-latest"]
77-
python-version: ["3.8", "3.9", "3.10", "3.11"]
77+
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13"]
7878
runs-on: ${{ matrix.os }}
7979

8080
steps:

CHANGELOG.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,12 @@ Here we provide notes that summarize the most important changes in each released
44

55
Please consult the changelog to inform yourself about breaking changes and security issues.
66

7-
## [v0.5.0](https://github.com/Materials-Data-Science-and-Informatics/somesy/tree/v0.4.3) <small>(2025-01-15)</small> { id="0.5.0" }
7+
## [v0.6.0](https://github.com/Materials-Data-Science-and-Informatics/somesy/tree/v0.6.0) <small>(2025-xx-xx)</small> { id="0.6.0" }
8+
9+
- implement CFF Entity model for author/maintainer/contributor
10+
- fix SomesyBaseModel kwargs being overwritten
11+
12+
## [v0.5.0](https://github.com/Materials-Data-Science-and-Informatics/somesy/tree/v0.5.0) <small>(2025-01-15)</small> { id="0.5.0" }
813

914
- make person argument email optional
1015

README.md

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,13 @@ orcid = "https://orcid.org/0000-0000-0000-0002"
107107
# ... but for scientific publications, this contributor should be listed as author:
108108
publication_author = true
109109

110+
# add an organization as a maintainer
111+
[[project.entities]]
112+
name = "My Super Organization"
113+
email = "info@my-super-org.com"
114+
website = "https://my-super-org.com"
115+
rorid = "https://ror.org/02nv7yv05" # highly recommended set a ror id for your organization
116+
110117
[config]
111118
verbose = true # show detailed information about what somesy is doing
112119
```
@@ -135,13 +142,13 @@ formats further below.
135142
By default, `somesy` will create (if they did not exist) or update `CITATION.cff` and `codemeta.json` files in your repository.
136143
If you happen to use
137144

138-
- `pyproject.toml` (in Python projects),
139-
- `package.json` (in JavaScript projects),
140-
- `Project.toml` (in Julia projects),
141-
- `fpm.toml` (in Fortran projects),
142-
- `pom.xml` (in Java projects),
143-
- `mkdocs.yml` (in projects using MkDocs),
144-
- `Cargo.toml` (in Rust projects)
145+
- `pyproject.toml` (in Python projects),
146+
- `package.json` (in JavaScript projects),
147+
- `Project.toml` (in Julia projects),
148+
- `fpm.toml` (in Fortran projects),
149+
- `pom.xml` (in Java projects),
150+
- `mkdocs.yml` (in projects using MkDocs),
151+
- `Cargo.toml` (in Rust projects)
145152

146153
then somesy would also update the respective information there.
147154

@@ -163,11 +170,11 @@ file in the root folder of your repository:
163170

164171
```yaml
165172
repos:
166-
# ... (your other hooks) ...
167-
- repo: https://github.com/Materials-Data-Science-and-Informatics/somesy
168-
rev: "v0.5.0"
169-
hooks:
170-
- id: somesy
173+
# ... (your other hooks) ...
174+
- repo: https://github.com/Materials-Data-Science-and-Informatics/somesy
175+
rev: 'v0.6.0'
176+
hooks:
177+
- id: somesy
171178
```
172179
173180
> **Note**
@@ -177,8 +184,8 @@ repos:
177184
Note that `pre-commit` gives `somesy` the [staged](https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F) version of files,
178185
so when using `somesy` with pre-commit, keep in mind that
179186

180-
- if `somesy` changed some files, you need to `git add` them again (and rerun pre-commit)
181-
- if you explicitly run `pre-commit`, make sure to `git add` all changed files (just like before a commit)
187+
- if `somesy` changed some files, you need to `git add` them again (and rerun pre-commit)
188+
- if you explicitly run `pre-commit`, make sure to `git add` all changed files (just like before a commit)
182189

183190
<!-- --8<-- [end:precommit] -->
184191

codemeta.json

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,16 @@
1313
"givenName": "Mustafa",
1414
"familyName": "Soylu",
1515
"email": "m.soylu@fz-juelich.de",
16-
"@id": "https://orcid.org/0000-0003-2637-0432"
16+
"@id": "https://orcid.org/0000-0003-2637-0432",
17+
"identifier": "https://orcid.org/0000-0003-2637-0432"
1718
},
1819
{
1920
"@type": "Person",
2021
"givenName": "Anton",
2122
"familyName": "Pirogov",
2223
"email": "a.pirogov@fz-juelich.de",
23-
"@id": "https://orcid.org/0000-0002-5077-7497"
24+
"@id": "https://orcid.org/0000-0002-5077-7497",
25+
"identifier": "https://orcid.org/0000-0002-5077-7497"
2426
}
2527
],
2628
"name": "somesy",
@@ -36,7 +38,8 @@
3638
"givenName": "Mustafa",
3739
"familyName": "Soylu",
3840
"email": "m.soylu@fz-juelich.de",
39-
"@id": "https://orcid.org/0000-0003-2637-0432"
41+
"@id": "https://orcid.org/0000-0003-2637-0432",
42+
"identifier": "https://orcid.org/0000-0003-2637-0432"
4043
}
4144
],
4245
"license": [
@@ -51,21 +54,24 @@
5154
"givenName": "Jens",
5255
"familyName": "Bröder",
5356
"email": "j.broeder@fz-juelich.de",
54-
"@id": "https://orcid.org/0000-0001-7939-226X"
57+
"@id": "https://orcid.org/0000-0001-7939-226X",
58+
"identifier": "https://orcid.org/0000-0001-7939-226X"
5559
},
5660
{
5761
"@type": "Person",
5862
"givenName": "Volker",
5963
"familyName": "Hofmann",
6064
"email": "v.hofmann@fz-juelich.de",
61-
"@id": "https://orcid.org/0000-0002-5149-603X"
65+
"@id": "https://orcid.org/0000-0002-5149-603X",
66+
"identifier": "https://orcid.org/0000-0002-5149-603X"
6267
},
6368
{
6469
"@type": "Person",
6570
"givenName": "Stefan",
6671
"familyName": "Sandfeld",
6772
"email": "s.sandfeld@fz-juelich.de",
68-
"@id": "https://orcid.org/0000-0001-9560-4728"
73+
"@id": "https://orcid.org/0000-0001-9560-4728",
74+
"identifier": "https://orcid.org/0000-0001-9560-4728"
6975
}
7076
],
7177
"url": "https://materials-data-science-and-informatics.github.io/somesy"

docs/manual.md

Lines changed: 112 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ Here is an overview of the schemas used in somesy.
6565
# Just make sure to check docs when changing the models!
6666
import json
6767
from io import StringIO
68-
from somesy.core.models import SomesyInput, ProjectMetadata, Person, SomesyConfig
68+
from somesy.core.models import SomesyInput, ProjectMetadata, Person, SomesyConfig, Entity
6969
from pydantic_core import PydanticUndefined
7070
from typing_extensions import get_args
7171

@@ -106,6 +106,7 @@ def model2md(m, out = None):
106106
print(model2md(SomesyInput).getvalue())
107107
print(model2md(ProjectMetadata).getvalue())
108108
print(model2md(Person).getvalue())
109+
print(model2md(Entity).getvalue())
109110
print(model2md(SomesyConfig).getvalue())
110111
```
111112

@@ -120,12 +121,23 @@ some of the currently supported formats. Bold field names are mandatory, the oth
120121
| Somesy Field | Poetry Config | SetupTools Config | Java POM | Julia Config | Fortran Config | package.json | mkdocs.yml | Rust Config | CITATION.cff | CodeMeta |
121122
| ---------------- | ------------- | ----------------- | ------------ | ------------ | -------------- | ------------ | ---------- | -------------- | --------------- | -------------- |
122123
| | | | | | | | | | | |
123-
| **given-names** | name+email | name | name | name+email | name+email | name | name+email | name+email | givenName | name+email |
124-
| **family-names** | name+email | name | name | name+email | name+email | name | name+email | name+email | familyName | name+email |
125-
| email | name+email | email | email | name+email | name+email | email | name+email | name+email | email | name+email |
124+
| **given-names** | name(+email) | name | name | name(+email) | name(+email) | name | name(+email) | name(+email) | givenName | name(+email) |
125+
| **family-names** | name(+email) | name | name | name(+email) | name(+email) | name | name(+email) | name(+email) | familyName | name(+email) |
126+
| email | name(+email) | email | email | name(+email) | name(+email) | email | name(+email) | name(+email) | email | name(+email) |
126127
| orcid | - | - | url | - | - | url | - | - | id | - |
127128
| *(many others)* | - | - | - | - | - | - | - | - | *(same)* | - |
128129

130+
=== "Entity Metadata"
131+
132+
| Somesy Field | Poetry Config | SetupTools Config | Java POM | Julia Config | Fortran Config | package.json | mkdocs.yml | Rust Config | CITATION.cff | CodeMeta |
133+
| ---------------- | ------------- | ----------------- | ------------ | ------------ | -------------- | ------------ | ---------- | -------------- | --------------- | -------------- |
134+
| | | | | | | | | | | |
135+
| **name** | name(+email) | name | name | name(+email) | name(+email) | name | name(+email) | name(+email) | givenName | name(+email) |
136+
| email | name(+email) | email | email | name(+email) | name(+email) | email | name(+email) | name(+email) | email | name(+email) |
137+
| rorid | - | - | url | - | - | url | - | - | id | - |
138+
| website | - | - | url | - | - | url | - | - | id | - |
139+
| *(many others)* | - | - | - | - | - | - | - | - | *(same)* | - |
140+
129141
=== "Project Metadata"
130142

131143
| Somesy Field | Poetry Config | SetupTools Config | Java POM | Julia Config | Fortran Config | package.json | mkdocs.yml | Rust Config | CITATION.cff | CodeMeta |
@@ -139,13 +151,13 @@ some of the currently supported formats. Bold field names are mandatory, the oth
139151
| ***author=true*** | authors | authors | developers | authors | author | author | site_author | authors | authors | author |
140152
| *maintainer=true* | maintainers | maintainers | - | - | maintainer | maintainers | - | - | contact | maintainer |
141153
| *people* | - | - | - | - | - | contributors | - | - | - | contributor |
154+
| *entities* | - | - | - | - | - | contributors | - | - | - | contributor |
142155
| | | | | | | | | | | |
143156
| keywords | keywords | keywords | - | - | keywords | keywords | - | keywords | keywords | keywords |
144157
| homepage | homepage | urls.homepage | urls | - | homepage | homepage | site_url | homepage | url | url |
145158
| repository | repository | urls.repository | scm.url | - | - | repository | repo_url | repository | repository_code | codeRepository |
146159
| documentation | documentation | urls.documentation | distributionManagement.site.url | - | - | - | - | documentation | - | buildInstructions |
147160

148-
149161
Note that the mapping is often not 1-to-1. For example, CITATION.cff allows rich
150162
specification of author contact information and complex names. In contrast,
151163
poetry only supports a simple string with a name and email (like in git commits)
@@ -154,6 +166,13 @@ than just move or rename fields**. This means that giving a clean and complete
154166
mapping overview is not feasible. In case of doubt or confusion, please open an
155167
issue or consult the `somesy` code.
156168

169+
**people** and **entities** are mapped to authors/maintainers/contributors depending
170+
on the output format. Both fields are marked as necessary but what `somesy` need is an
171+
author either in **people** or **entities**.
172+
173+
When an **entity** has a `ror id` but no `website` set, url related fields will be
174+
filled with `ror id`.
175+
157176
## The somesy CLI tool
158177

159178
You can see all supported somesy CLI command options using `somesy --help`:
@@ -168,13 +187,13 @@ defaults, while options passed as CLI arguments override the configuration.
168187

169188
Without an input file specifically provided, somesy will check if it can find a valid
170189

171-
* `.somesy.toml`
172-
* `somesy.toml`
173-
* `pyproject.toml` (in `tool.somesy` section)
174-
* `Project.toml` (in `tool.somesy` section)
175-
* `fpm.toml` (in `tool.somesy` section)
176-
* `package.json` (in `somesy` section)
177-
* `Cargo.toml` (in `package.metadata.somesy` section)
190+
- `.somesy.toml`
191+
- `somesy.toml`
192+
- `pyproject.toml` (in `tool.somesy` section)
193+
- `Project.toml` (in `tool.somesy` section)
194+
- `fpm.toml` (in `tool.somesy` section)
195+
- `package.json` (in `somesy` section)
196+
- `Cargo.toml` (in `package.metadata.somesy` section)
178197

179198
which is located in the current working directory. If you want to provide
180199
the somesy input file from a different location, you can pass it with the `-i` option.
@@ -221,15 +240,23 @@ one of the supported input formats:
221240
email = "a.contributor@example.com"
222241
orcid = "https://orcid.org/0000-0000-0000-0002"
223242

243+
# add an organization as a maintainer
244+
[[tool.somesy.project.entities]]
245+
name = "My Super Organization"
246+
email = "info@my-super-org.com"
247+
website = "https://my-super-org.com"
248+
rorid = "https://ror.org/02nv7yv05" # highly recommended set a ror id for your organization
249+
224250
[tool.somesy.config]
225251
verbose = true # show detailed information about what somesy is doing
226252
```
227253

228254
=== "Project.toml"
229-
```toml
230-
name = "my-amazing-project"
231-
version = "0.1.0"
232-
uuid = "c7e460c6-3f3e-11ec-8d3d-0242ac130003"
255+
256+
````toml
257+
name = "my-amazing-project"
258+
version = "0.1.0"
259+
uuid = "c7e460c6-3f3e-11ec-8d3d-0242ac130003"
233260

234261
[deps]
235262
...
@@ -259,14 +286,21 @@ one of the supported input formats:
259286
email = "a.contributor@example.com"
260287
orcid = "https://orcid.org/0000-0000-0000-0002"
261288

289+
# add an organization as a maintainer
290+
[[tool.somesy.project.entities]]
291+
name = "My Super Organization"
292+
email = "info@my-super-org.com"
293+
website = "https://my-super-org.com"
294+
rorid = "https://ror.org/02nv7yv05" # highly recommended set a ror id for your organization
295+
262296
[tool.somesy.config]
263297
verbose = true # show detailed information about what somesy is doing
264298
```
265299

266300
=== "fpm.toml"
267-
```toml
268-
name = "my-amazing-project"
269-
version = "0.1.0"
301+
```toml
302+
name = "my-amazing-project"
303+
version = "0.1.0"
270304

271305
[tool.somesy.project]
272306
name = "my-amazing-project"
@@ -293,6 +327,13 @@ one of the supported input formats:
293327
email = "a.contributor@example.com"
294328
orcid = "https://orcid.org/0000-0000-0000-0002"
295329

330+
# add an organization as a maintainer
331+
[[tool.somesy.project.entities]]
332+
name = "My Super Organization"
333+
email = "info@my-super-org.com"
334+
website = "https://my-super-org.com"
335+
rorid = "https://ror.org/02nv7yv05" # highly recommended set a ror id for your organization
336+
296337
[tool.somesy.config]
297338
verbose = true # show detailed information about what somesy is doing
298339
```
@@ -330,6 +371,14 @@ one of the supported input formats:
330371
}
331372
]
332373
},
374+
"entities":[
375+
{
376+
"name": "My Super Organization",
377+
"email": "info@my-super-org.com",
378+
"website": "https://my-super-org.com",
379+
"rorid": "https://ror.org/02nv7yv05"
380+
}
381+
],
333382
"config": {
334383
"verbose": true
335384
}
@@ -453,6 +502,46 @@ after running somesy (to remove the duplicate entries with the incorrect ORCID).
453502

454503
Person identification and merging is not applied to standards with free text fields for authors or maintainers, such as `fpm.toml`.
455504

505+
When somesy compares two metadata records about an entity, it will proceed as follows:
506+
507+
1. If both records contain a ROR ID, then the entity is the same if the ROR IDs are equal, and different if they are not.
508+
2. Otherwise, if both records contain a website URL, and it is the same URL, then they are the same entity.
509+
3. Otherwise, if both records have an attached email address, and it is the same email, then they are the same entity.
510+
4. Otherwise, the records are considered to be about the same entity if they agree on the name.
511+
512+
!!! tip
513+
514+
State ROR IDs for entities whenever possible to ensure reliable identification!
515+
516+
!!! tip
517+
518+
If a ROR ID is not available, state website URLs for entities to help with identification!
519+
520+
Somesy will usually correctly understand cases such as:
521+
522+
1. A ROR ID being added to an entity (i.e. if it was not present before)
523+
2. A website URL being added to an entity (if no ROR ID is present)
524+
3. A changed email address (if the name stays the same)
525+
4. A changed name (if the email address stays the same)
526+
5. Any other relevant metadata attached to the entity
527+
528+
Nevertheless, you should **check the changes somesy does** before committing them to your repository,
529+
especially **after you significantly modified your project metadata**.
530+
531+
!!! warning
532+
533+
Note that changing the ROR ID will not be recognized,
534+
because ROR IDs are assumed to be unique per entity.
535+
536+
If you initially have stated an incorrect ROR ID for an entity and then change it, **somesy will think that this is a new entity**.
537+
Therefore, **in such a case you will need to fix the ROR ID in all configured somesy targets** either
538+
before running somesy (so somesy will not create new entity entries), or
539+
after running somesy (to remove the duplicate entries with the incorrect ROR ID).
540+
541+
!!! warning
542+
543+
Entity identification and merging is not applied to standards with free text fields for entities, such as `fpm.toml`.
544+
456545
### Codemeta
457546

458547
While `somesy` is modifying existing files for most supported formats and implements
@@ -493,17 +582,13 @@ file in the somesy repository, which is also shown as the
493582

494583
```shell
495584
somesy fill docs/_template_authors.md -o AUTHORS.md
496-
```
585+
````
497586

498-
??? example "_template_authors.md"
499-
```
500-
--8<-- "docs/_template_authors.md"
501-
```
587+
??? example "\_template_authors.md"
588+
`--8<-- "docs/_template_authors.md"`
502589

503590
??? example "AUTHORS.md"
504-
```
505-
--8<-- "AUTHORS.md"
506-
```
591+
`--8<-- "AUTHORS.md"`
507592

508593
The template gets the complete
509594
[ProjectMetadata](reference/somesy/core/models.md#somesy.core.models.ProjectMetadata) as its context, so it is possible to access all included project and contributor information.

0 commit comments

Comments
 (0)