[obsolete] DF/data4es: 'update' mode for safe dataset metadata update. by mgolosova · Pull Request #247 · PanDAWMS/dkb

mgolosova · 2019-04-28T13:37:59Z

Closed for #253 + #262 + #263 in sum do the trick: we can run data4es with -i 91_in,91_out,95 and be sure that data that already present in ES won't be spoiled.

#320 is the next step in this direction that will allow getting data from Rucio but in case of error fall back to the "update" scenario.

However it does not provide functionality like "query Rucio only if we don't have these data in ES". It'd be nice, but in fact it should be done by introducing another stage that would extract all available information from ES in the beginning of the data4es process, instead of asking every stage to query ES for its own data.

Original description

Applies functionality added in #245 and #246 to the data4es process, allowing to start this process in a normal (basic integration) and 'safe' (archive update) mode, which can be turned on with --update option.

[WIP] is due to the pyDKB-related changes: they clearly do not belong here. By the way, even after this change I have seen that ConnectionTimeout exception; but as we are talking about 'archived metadata update', it seems to me quite OK to be interrupted in case of overloaded ES and restart after an hour or so.

Waits for #245, #246.

Sometimes we see ConnectionTimeout even with simple `get` request; it only means that for some reason ES just can't do anything about the request, not that the request is too heavy. Now there is a possibility to set number of timeout retries when the client is created; by default the number is 3. The ES client itself (`elasticsearch.Elasticsearch()`) by default turned off the 'retry on timeout' possibilityr, so we have to turn it on 'by hand'; while the retry number 3 is just the same as default.

In this mode all the stages that can use ES as a "backup" storage are configured to sdo so. It takes more time than a direct integration, yet allows to run it for arcived data and not to worry that some information will be missed.

mgolosova · 2020-02-20T11:54:59Z

Closed for #253 + #262 + #263 in sum do the trick: we can run data4es with -i 91_in,91_out,95 and be sure that data that already present in ES won't be spoiled.

#320 is the next step in this direction that will allow getting data from Rucio but in case of error fall back to the "update" scenario.

However it does not provide functionality like "query Rucio only if we don't have these data in ES". It'd be nice, but in fact it should be done by introducing another stage that would extract all available information from ES in the beginning of the data4es process, instead of asking every stage to query ES for its own data.

mgolosova added 2 commits April 27, 2019 22:26

mgolosova self-assigned this Apr 28, 2019

mgolosova changed the title ~~[WIP] DF/data4es: 'update' mode for safe dataset metadata update.~~ [obsolete] DF/data4es: 'update' mode for safe dataset metadata update. Aug 9, 2019

mgolosova closed this Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[obsolete] DF/data4es: 'update' mode for safe dataset metadata update.#247

[obsolete] DF/data4es: 'update' mode for safe dataset metadata update.#247
mgolosova wants to merge 2 commits into
95-ds-safefrom
data4es-ds-safe

mgolosova commented Apr 28, 2019 •

edited

Loading

Uh oh!

mgolosova commented Feb 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mgolosova commented Apr 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Original description

Uh oh!

mgolosova commented Feb 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mgolosova commented Apr 28, 2019 •

edited

Loading