Skip to content

Generic skeleton integration test#379

Open
ShoroukRamzy wants to merge 5 commits into
eclipse-score:mainfrom
Valeo-S-CORE-Organization:generic-skeleton-integration-test
Open

Generic skeleton integration test#379
ShoroukRamzy wants to merge 5 commits into
eclipse-score:mainfrom
Valeo-S-CORE-Organization:generic-skeleton-integration-test

Conversation

@ShoroukRamzy
Copy link
Copy Markdown
Contributor

@ShoroukRamzy ShoroukRamzy commented May 4, 2026

Overview

This PR significantly expands the integration testing suite by introducing tests for both Generic-Typed (type-erased Skeleton Provider with Strongly-Typed Proxy Consumer) and Generic-Generic (type-erased Skeleton Provider with type-erased Proxy Consumer) interactions.

This addresses this issue #261 and #311 by rigorously validating the underlying shared memory (SHM) communication mechanisms with various payload sizes (64, 32, 16, and 8 bytes) and configurations.

The tests strictly evaluate type-erased memory striding, boundary enforcements, and data integrity by spinning up a provider that sends 30 consecutive samples into a heavily constrained 5-slot ring buffer, forcing multiple buffer wrap-arounds.

Test Scope & Root Causes of Current Failures

Currently, all introduced test variants intentionally fail, successfully exposing critical bugs within the Generic skeleton implementation related to shared memory allocation and addressing. The failure modes depend on the payload size relative to std::max_align_t (which is assumed to be 32 bytes on the current architecture).

The identified root causes for these failures are:

1. Data Corruption (Expected: X, got: 0 or Y)

Affected Tests: All Generic-Generic interaction tests (64, 32, 8-byte payloads) and Generic-Typed tests with payloads > std::max_align_t (64-byte, 32-byte payloads).
Root Cause: The fundamental issue was an incorrect base pointer being returned for event data. The SkeletonMemoryManager::CreateGenericEventDataInCreatedSharedMemory and Skeleton::RegisterGeneric functions were erroneously returning a pointer to the EventDataStorage object itself, instead of the actual data buffer managed within that object (EventDataStorage::data()). This led both providers and consumers to miscalculate offsets, resulting in writes to unintended (often zero-initialized) memory locations and reads from those incorrect, zero-filled locations.

2. Boundary Check Crashes (Exit Code 134 - SIGABRT)

Affected Tests: Generic-Typed interaction tests with payloads < std::max_align_t (16-byte, 8-byte payloads).
Root Cause: This failure was caused by an incorrect capacity calculation for the shared memory array. The EventDataStorage's internal DynamicArray was being initialized with a capacity based on num_max_align_elements. For smaller payloads, this calculated capacity was often less than the numberOfSampleSlots specified in the configuration. When the consumer attempted to retrieve one beyond this physically undersized capacity, it resulted in out-of-bounds memory access, triggering assertions and process crashes.

@ShoroukRamzy
Copy link
Copy Markdown
Contributor Author

Hi @crimson11,
I will confirm the main reasons for the above two issues soon.

@ShoroukRamzy ShoroukRamzy force-pushed the generic-skeleton-integration-test branch from 2d0976f to fe72098 Compare May 5, 2026 08:06
@ShoroukRamzy ShoroukRamzy force-pushed the generic-skeleton-integration-test branch from fe72098 to 6dac63e Compare May 5, 2026 12:37
@ShoroukRamzy
Copy link
Copy Markdown
Contributor Author

Hi @crimson11, I will confirm the main reasons for the above two issues soon.

@crimson11, Done

@ShoroukRamzy ShoroukRamzy force-pushed the generic-skeleton-integration-test branch 2 times, most recently from b229be8 to 7b4392b Compare May 6, 2026 16:14
@ShoroukRamzy ShoroukRamzy force-pushed the generic-skeleton-integration-test branch from 7b4392b to 244d45c Compare May 6, 2026 16:52
@crimson11
Copy link
Copy Markdown
Contributor

I will have a look at it tomorrow!

@ShoroukRamzy ShoroukRamzy force-pushed the generic-skeleton-integration-test branch from aaa8d6a to e051958 Compare May 10, 2026 08:56
@ShoroukRamzy ShoroukRamzy force-pushed the generic-skeleton-integration-test branch from e051958 to d6ffbd3 Compare May 10, 2026 13:45
@crimson11
Copy link
Copy Markdown
Contributor

@ShoroukRamzy
I had now a 1st look. I do have the following question:
To me it is unexpected, that the Generic-Generic interaction fails?
Because what you have written above:

The fundamental issue was an incorrect base pointer being returned for event data. The SkeletonMemoryManager::CreateGenericEventDataInCreatedSharedMemory and Skeleton::RegisterGeneric functions were erroneously returning a pointer to the EventDataStorage object itself, ...

should have no effect in this case. Because a typed proxy/proxy event uses the DataStoragePointer here: https://github.com/eclipse-score/communication/blob/main/score/mw/com/impl/bindings/lola/generic_proxy_event.cpp#L156 , which the GenericSkeleton has placed into the events_metainfo_ here:
https://github.com/eclipse-score/communication/blob/main/score/mw/com/impl/bindings/lola/skeleton_memory_manager.cpp#L228
... and this is the "correct" raw-data-storage pointer (without the enclosing DynamicArray control structure).
So imho the Generic-Generic interaction should work, because:

  • the GenericSkeleton/provider is storing the raw-pointer (not the pointer to a DynamicArray) into the "meta-info-map"
  • the GenericProxyEvent/consumer is getting this raw-.pointer from the "meta-info-map"

Am I missing something here? I did not yet re-run/use your tests as picking it from the fork is "tedious" ;)

@ShoroukRamzy
Copy link
Copy Markdown
Contributor Author

ShoroukRamzy commented May 18, 2026

@ShoroukRamzy I had now a 1st look. I do have the following question: To me it is unexpected, that the Generic-Generic interaction fails? Because what you have written above:

The fundamental issue was an incorrect base pointer being returned for event data. The SkeletonMemoryManager::CreateGenericEventDataInCreatedSharedMemory and Skeleton::RegisterGeneric functions were erroneously returning a pointer to the EventDataStorage object itself, ...

should have no effect in this case. Because a typed proxy/proxy event uses the DataStoragePointer here: https://github.com/eclipse-score/communication/blob/main/score/mw/com/impl/bindings/lola/generic_proxy_event.cpp#L156 , which the GenericSkeleton has placed into the events_metainfo_ here: https://github.com/eclipse-score/communication/blob/main/score/mw/com/impl/bindings/lola/skeleton_memory_manager.cpp#L228 ... and this is the "correct" raw-data-storage pointer (without the enclosing DynamicArray control structure). So imho the Generic-Generic interaction should work, because:

  • the GenericSkeleton/provider is storing the raw-pointer (not the pointer to a DynamicArray) into the "meta-info-map"
  • the GenericProxyEvent/consumer is getting this raw-.pointer from the "meta-info-map"

Am I missing something here? I did not yet re-run/use your tests as picking it from the fork is "tedious" ;)

@crimson11, You are absolutely correct, the consumer is looking at the correct memory location (data_storage->data()). However, the failure in the Generic-Generic interaction is caused by the provider side (GenericSkeletonEvent).
If we trace what gets returned to the provider during initialization:

https://github.com/eclipse-score/communication/blob/main/score/mw/com/impl/bindings/lola/generic_skeleton_event.cpp#L38

  • Then, during GenericSkeletonEvent::Allocate(), the provider computes the memory address to write to by adding the offset directly to event_data_storage_ (the object pointer)

https://github.com/eclipse-score/communication/blob/main/score/mw/com/impl/bindings/lola/generic_skeleton_event.cpp#L69

Because of this, the provider and consumer are using two different base addresses:

  • Consumer: Reads from data_storage->data() (the actual allocated data buffer).

  • Provider: Writes to data_storage + offset (overwriting the DynamicArray control block and corrupting shared memory).

@crimson11
Copy link
Copy Markdown
Contributor

@ShoroukRamzy
Thank you for the clarification! So right now we have these two issues:

  1. GenericSkeleton / GenericSkeletonEvent is "completely broken" independent, whether the event-data gets consumed by a GenericProxy or a "typed" Proxy:
  • GenericSkeletonEvent::Allocate() allocates memory starting from a "base-pointer" pointing at the EventDataStorage type aka DynamicArray<T>, but thinks/expects, that its "base pointer" points to the underlying raw-array stored within DynamicArray<T>. Thus, it potentially overwrites the control-data part of the DynamicArray<T> and generally writes data to the wrong locations, since it uses the wrong base-pointer.
  1. even if this general problem under (1) would NOT exist, a typed Proxy / ProxyEvent consuming the Event provided by the GenericSkeleton would currently access the event-slots by interpreting the EventDataStorage / DynamicArray<T>, but since the GenericSkeletonEvent and the typed ProxyEvent have a different interpretation of T and therefore the number of slots, the access of the typed ProxyEvent to the underlying data is "off".

@ShoroukRamzy
Copy link
Copy Markdown
Contributor Author

@ShoroukRamzy Thank you for the clarification! So right now we have these two issues:

  1. GenericSkeleton / GenericSkeletonEvent is "completely broken" independent, whether the event-data gets consumed by a GenericProxy or a "typed" Proxy:
  • GenericSkeletonEvent::Allocate() allocates memory starting from a "base-pointer" pointing at the EventDataStorage type aka DynamicArray<T>, but thinks/expects, that its "base pointer" points to the underlying raw-array stored within DynamicArray<T>. Thus, it potentially overwrites the control-data part of the DynamicArray<T> and generally writes data to the wrong locations, since it uses the wrong base-pointer.
  1. even if this general problem under (1) would NOT exist, a typed Proxy / ProxyEvent consuming the Event provided by the GenericSkeleton would currently access the event-slots by interpreting the EventDataStorage / DynamicArray<T>, but since the GenericSkeletonEvent and the typed ProxyEvent have a different interpretation of T and therefore the number of slots, the access of the typed ProxyEvent to the underlying data is "off".

yes exactly @crimson11

}

} // namespace score::mw::com::impl::lola
} // namespace score::mw::com::impl::lola No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file from the commit ... you only provide tests in this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - I guess we were cross-talking? You now removed skeleton_memory_manager.cpp from the repo in your commit? I meant: This file shouldn't show up as changed in your commit ;) - no need to touch/change this file

Comment thread score/mw/com/test/generic_skeleton/mw_com_config_generic_generic.json Outdated
Comment thread score/mw/com/test/generic_skeleton/generic_typed_interaction_32_byte_app.cpp Outdated
Comment thread score/mw/com/test/generic_skeleton/generic_typed_interaction_app.cpp Outdated
}

template <typename Trait>
class MyTestService : public Trait::Base
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my initial comment! If you feel the need to show correct interaction with different event-data-sizes, then you just need this test and you add to your service-interface MyTestService additional event types (16 and 32 byte)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crimson11 , This was my first version of the test, then I separated them in different processes to be able to see the full behavior in terms of data integrity and number of slots check. Separation is needed as crashes happen due to the capacity (no. of slots issue) and prevent other events from being processed as well. we want to test on different event sizes and see the full behavior for each. Sorry I had to remove this file from the commit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I trust you, that you need different/separate apps ;)
But you have immense code duplication here! All the files generic_typed_interaction_XX_byte_app.cpp are almost IDENTICAL! Just do a diff ...
And you already have a mechanism to bring in the PAYLOAD_SIZE via a define!

So:

  • just provide ONE generic_typed_interaction_app.cpp - it just need eventually some additional #ifdef PAYLOAD_SIZE
  • The variation you want/need you then get with the bazel instantiations of the app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants