Skip to content

Disable parallelism for profiling #475

@kxygk

Description

@kxygk

Maybe this is a misguided question..

But I've profiling with clj-async-profiler and I have some sections with TMD code called from within a complex Pathom resolver setup. My problem is that.. as far as I understand the callstacks, TMD seems to triggers execution on separate threads. My whole app is spending 2/3rds of the wall time parked and I'm trying to narrow down the cause.

In terms of profiling the issue is I have huge stacks on top of ForJoinTask - and hence I have no idea who triggered these calls :))

I also don't want to disable parallelism globally (b/c I think it's necessary for the Pathom setup I have - where it uses Promesa VThreads).

I think TMD doesn't use VThreads? so I'm implicitly using two threading models here which is probably not good. While TMD parallelism is a sensible default, my very-poorly-informed impression is that if you process multiple table on multiple threads you're gunna end up with a soup of threads.

I'll be honest, I'm a little out of my depths here. It's likely I'm "holding it wrong" or reading the stack-trace tea leaves wrong.

I'm guessing you guys profile TMD all the time so there must be some best-practices :)

EDIT:

To my best understanding, ds/row-map and ds/row-mapcat will under the hood call pmap-ds which calls indexed-map-reduce (in dtype-next/for.clj) . The docstring for indexed-map-reduce reads

" [...] If the current thread is already in the common pool, this function executes in the current thread"

At the top of indexed-map-reduce there is a check

     (if (or (< num-iters (* 2 parallelism))
             (ForkJoinTask/inForkJoinPool))
       (reduce-fn [(indexed-map-fn 0 num-iters)])
       ;; otherwise parallelize things

I don't really know enough to understand inForkJoinPool, but it's part of the Java API.
My best guess is that if you you use Promesa's vthread then this check fails.

There is no serial row-map so I think the only solution for the time being is to Promesa's thread (instead of vthread virtual threads) in Pathom instead. I think that would use the same thread pool and stops TMD from threading.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions