[obsolete] DF/data4es: 'update' mode for safe dataset metadata update.#247
Closed
mgolosova wants to merge 2 commits into
Closed
[obsolete] DF/data4es: 'update' mode for safe dataset metadata update.#247mgolosova wants to merge 2 commits into
mgolosova wants to merge 2 commits into
Conversation
Sometimes we see ConnectionTimeout even with simple `get` request; it only means that for some reason ES just can't do anything about the request, not that the request is too heavy. Now there is a possibility to set number of timeout retries when the client is created; by default the number is 3. The ES client itself (`elasticsearch.Elasticsearch()`) by default turned off the 'retry on timeout' possibilityr, so we have to turn it on 'by hand'; while the retry number 3 is just the same as default.
In this mode all the stages that can use ES as a "backup" storage are configured to sdo so. It takes more time than a direct integration, yet allows to run it for arcived data and not to worry that some information will be missed.
Collaborator
Author
|
Closed for #253 + #262 + #263 in sum do the trick: we can run data4es with -i 91_in,91_out,95 and be sure that data that already present in ES won't be spoiled. #320 is the next step in this direction that will allow getting data from Rucio but in case of error fall back to the "update" scenario. However it does not provide functionality like "query Rucio only if we don't have these data in ES". It'd be nice, but in fact it should be done by introducing another stage that would extract all available information from ES in the beginning of the data4es process, instead of asking every stage to query ES for its own data. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closed for #253 + #262 + #263 in sum do the trick: we can run
data4eswith-i 91_in,91_out,95and be sure that data that already present in ES won't be spoiled.#320 is the next step in this direction that will allow getting data from Rucio but in case of error fall back to the "update" scenario.
However it does not provide functionality like "query Rucio only if we don't have these data in ES". It'd be nice, but in fact it should be done by introducing another stage that would extract all available information from ES in the beginning of the
data4esprocess, instead of asking every stage to query ES for its own data.Original description
Applies functionality added in #245 and #246 to the
data4esprocess, allowing to start this process in a normal (basic integration) and 'safe' (archive update) mode, which can be turned on with--updateoption.[WIP] is due to the pyDKB-related changes: they clearly do not belong here. By the way, even after this change I have seen that
ConnectionTimeoutexception; but as we are talking about 'archived metadata update', it seems to me quite OK to be interrupted in case of overloaded ES and restart after an hour or so.Waits for #245, #246.