Spearman cor test by mapi1 · Pull Request #304 · JuliaStats/HypothesisTests.jl

mapi1 · 2023-07-17T13:36:46Z

This PR adds the SpearmanCorrelationTest as suggested in #236.
For the confidence interval I took inspiration from this StackExchange thread and used the suggested variance estimator to counter the non-normal distribution of the ranks.
Unfortunately, I could not really add meaningful tests for it as R's cor.test does not give the intervals for Spearman correlation and uses another algorithm to calculate the p-value as well. Maybe someone has an idea here or knows a tool that can calculate this already correctly.

codecov-commenter · 2023-07-17T13:40:33Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.06 🎉

Comparison is base (932eaac) 93.75% compared to head (8869900) 93.81%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #304      +/-   ##
==========================================
+ Coverage   93.75%   93.81%   +0.06%     
==========================================
  Files          28       28              
  Lines        1729     1746      +17     
==========================================
+ Hits         1621     1638      +17     
  Misses        108      108

Impacted Files	Coverage Δ
src/correlation.jl	`100.00% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

nalimilan

Thanks, looks mostly good!

I'm not sure which existing implementations could be used to check results. Have you looked at those mentioned by Wikipedia, as there are several which return p-values?

nalimilan · 2023-07-30T10:04:07Z

+"""
+    SpearmanCorrelationTest(x, y)
+
+Perform a t-test for the hypothesis that ``\\text{Cor}(x,y) = 0``, i.e. the rank-based Spearman correlation 


Break all lines at 92 chars like in the CorrelationTest docstring. Also I would say "Spearman rank correlation" rather than "rank-based".

nalimilan · 2023-07-30T10:05:26Z

+    end
+end
+
+testname(p::SpearmanCorrelationTest) =  "Spearman correlation"


Suggested change

testname(p::SpearmanCorrelationTest) = "Spearman correlation"

testname(p::SpearmanCorrelationTest) = "Spearman correlation"

nalimilan · 2023-07-30T10:07:26Z

+    let out = sprint(show, w)
+        @test occursin("reject h_0", out) && !occursin("fail to", out)
+    end
+    # let ci = confint(w)


Why is this commented out?

nalimilan · 2023-07-30T10:10:46Z

+    #     @test first(ci) ≈ -0.1105478 atol=1e-6
+    #     @test last(ci) ≈ 0.0336730 atol=1e-6
+    # end
+    @test pvalue(x) ≈ 0.09275 atol=1e-2 # value from R's cor.test(..., method="spearman") which does not use a t test algorithm AS 89 


Can we find a more precise value to test against? This way of writing the test is misleading as even 0.09 would pass.

In the worst case, we should test with a lower tolerance against the value we return, and just note in a comment the value returned by R.

nalimilan · 2023-07-30T10:12:41Z

+Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of:
+
+* small sample sizes n < 25


Suggested change

Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of:

* small sample sizes n < 25

Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation, which performs insufficiently in the case of:

* sample sizes below 25

nalimilan · 2023-07-30T10:13:01Z

+[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.
+
+[2] A. J. Bishara and J. B. Hittner, “Confidence intervals for correlations when data are not normal,” Behav Res, vol. 49, no. 1, pp. 294–309, Feb. 2017, doi: 10.3758/s13428-016-0702-8.
+


Suggested change

nalimilan · 2023-07-30T10:18:00Z

+Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of:
+
+* small sample sizes n < 25
+* a high true population Spearman correlation


According to the StackExchange thread this is more precisely:

Suggested change

* a high true population Spearman correlation

* a true population Spearman correlation above 0.95

It also mentions ordinal data. Did you omit it on purpose? I admit it's not super explicit.

nalimilan · 2023-07-30T10:23:47Z

+In these cases a bootstrap confidence interval can perform better [2].
+
+# External resources
+[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.


Suggested change

[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.

[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating Pearson, Kendall and Spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.

mapi1 · 2023-08-01T11:52:08Z

Thanks for your detailed review! I tried to incorporate it as suggested.

I used the spearmanCI R library to get values for the CIs for testing. They suffer from the same problem as the p value, as they have a low number of matching significant digits. I still wrote the tests as a form of documentation and also added tests to compare against the vales we return to catch feature changes/bugs etc.

Also mention now the ordinal data as it is the main message of the Ruscio paper (Now added to docstring).

nalimilan

Sorry for the delay. Looks almost ready. I've just made a few more comments.

Regarding comparison of CIs against R, I hadn't realized spearmanCI doesn't implement the same CI method. In that case I don't think it makes sense to test against these values, as there are mathematically legitimate reasons to get different results. Maybe you could check against code that isn't included in a package such as this one instead? Then if that matches we can just test against the exact values we return to prevent regressions.

nalimilan · 2023-09-09T08:48:30Z

+    dof(test) > 1 || return (-one(T), one(T))  # Otherwise we can get NaNs
+    q = quantile(Normal(), 1 - (1-level) / 2)
+    fisher = atanh(test.r)
+    bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000)


Suggested change

bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000)

# Estimates variance as in Bonett and Wright (2000)

bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q

nalimilan · 2023-09-09T08:54:54Z


 function population_param_of_interest(p::CorrelationTest)
-    param = p.k != 0 ? "Partial correlation" : "Correlation"
+    param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation"


It seems that nobody says "partial Pearson correlation" even if that would sound more explicit.

Suggested change

param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation"

param = p.k != 0 ? "Partial correlation" : "Pearson correlation"

nalimilan · 2023-09-09T08:59:24Z

+end
+
+function StatsAPI.confint(test::SpearmanCorrelationTest{T}, level::Float64=0.95) where T
+    dof(test) > 1 || return (-one(T), one(T))  # Otherwise we can get NaNs


Can you add a test for this case? Maybe also for other corner cases like having NaNs or Inf in the input.

nalimilan · 2023-09-09T09:51:25Z

+Perform a t-test for the hypothesis that ``\\text{Cor}(x,y) = 0``, i.e. the Spearman rank
+correlation ρₛ of vectors `x` and `y` is zero.
+
+Implements `pvalue` for the t-test.


Suggested change

Implements `pvalue` for the t-test.

Implements `pvalue` for the t-test using the Fisher transformation.

nalimilan · 2023-09-09T09:57:05Z

+        @test first(ci) ≈ -0.1333692 atol=1e-6
+        @test last(ci) ≈ 0.01065576 atol=1e-6
+    end
+    @test pvalue(x) ≈ 0.09274721 atol=1e-2 # value from R's cor.test(..., method="spearman") which does not use a t test algorithm AS 89 


AFAICT R will use AS 89 if you pass exact=TRUE, right? It would be good to test against an implementation which uses AS 89, even if we need to use another software.

3f6a · 2024-06-12T11:40:52Z

Hello, how is this different from #53 ?

dpinol · 2024-08-26T16:31:50Z

@mapi1 thanks for the PR. Do you foresee you'll have the opportunity to move it forward? cheers

AuRobinson · 2025-03-17T07:44:26Z

I mentioned a temporary solution in #213.

Forgot to mention that I ignore the confidence interval CorrelationTest gives.

Validated by comparing results from corspearman and SPSS.

MariusPille added 2 commits July 17, 2023 15:25

Add SpearmanCorrelationTest

8508cb1

Typo

8869900

nalimilan reviewed Jul 30, 2023

View reviewed changes

incorporate review

ed3313b

nalimilan reviewed Sep 9, 2023

View reviewed changes

nalimilan mentioned this pull request Sep 9, 2023

Making CorrelationTest nonparametric #236

Closed

	testname(p::SpearmanCorrelationTest) = "Spearman correlation"
	testname(p::SpearmanCorrelationTest) = "Spearman correlation"

		Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of:

		* small sample sizes n < 25

		[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.

		[2] A. J. Bishara and J. B. Hittner, “Confidence intervals for correlations when data are not normal,” Behav Res, vol. 49, no. 1, pp. 294–309, Feb. 2017, doi: 10.3758/s13428-016-0702-8.

	* a high true population Spearman correlation
	* a true population Spearman correlation above 0.95

	bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000)
	# Estimates variance as in Bonett and Wright (2000)
	bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q

	param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation"
	param = p.k != 0 ? "Partial correlation" : "Pearson correlation"

	Implements `pvalue` for the t-test.
	Implements `pvalue` for the t-test using the Fisher transformation.

Conversation

mapi1 commented Jul 17, 2023

Uh oh!

codecov-commenter commented Jul 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nalimilan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mapi1 commented Aug 1, 2023

Uh oh!

nalimilan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nalimilan Sep 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

3f6a commented Jun 12, 2024

Uh oh!

dpinol commented Aug 26, 2024

Uh oh!

AuRobinson commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov-commenter commented Jul 17, 2023 •

edited

Loading

nalimilan Sep 9, 2023 •

edited

Loading