Parallelized calculation of PCAEmbedding #39

JonasIsensee · 2018-09-20T11:17:15Z

The only limiting factor for super large (>1000) embedding dimensions,
which can then be reduced by PCA is the
calculation of PCAEmbedding.

This could be parallelized
by splitting the dataset into different subsets and
computing the covariance matrix for each of them.
All covariance matrices are then averaged.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

The text was updated successfully, but these errors were encountered:

Datseris · 2018-09-21T08:51:25Z

Is this scientifically correct? to average the covariance matrices? I wouldn't say it is obviously true (I don't think it is).

p.s.:
The average of averages is not an average: https://math.stackexchange.com/questions/95909/why-is-an-average-of-an-average-usually-incorrect and one has to use weighted average instead.

JonasIsensee · 2018-09-21T09:06:27Z

AFAICT it is in this case.

The entries of the cov mat are C[i,j] == ⟨ x[i] * x[j] ⟩
the arithmetic means of the product x[i] and x[j].

Therefore each matrix has to be weighted with the relative size of its subset. ( which will ideally all be the same size)

x = rand(100)
mean(x) == sum(x)/100 == sum(x[1:50])/100 + sum(x[51:100])/ 100
mean(x) == mean(x[1:50]) /2 + mean(x[51:100]) / 2

Datseris · 2018-09-21T09:11:41Z

ok, if this every happen there will be a test that the result coincides with the non parallelized version, juuuuuust to be sure!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelized calculation of PCAEmbedding #39

Parallelized calculation of PCAEmbedding #39

JonasIsensee commented Sep 20, 2018 •

edited by Datseris

Loading

Datseris commented Sep 21, 2018

JonasIsensee commented Sep 21, 2018 •

edited

Loading

Datseris commented Sep 21, 2018

Parallelized calculation of PCAEmbedding #39

Parallelized calculation of PCAEmbedding #39

Comments

JonasIsensee commented Sep 20, 2018 • edited by Datseris Loading

Datseris commented Sep 21, 2018

JonasIsensee commented Sep 21, 2018 • edited Loading

Datseris commented Sep 21, 2018

JonasIsensee commented Sep 20, 2018 •

edited by Datseris

Loading

JonasIsensee commented Sep 21, 2018 •

edited

Loading