pca sklearn failure in latest development version

Brent;
We've been trying to use the latest development in bcbio (to get the initial hg38 support) and running into a failure during pca. Is there a minimum number of input samples needed to run pca, or am I doing something else wrong?
```
2018-05-30 23:07:51 r3467 peddy.pca[22570] INFO loaded and subsetted thousand-genomes genotypes (shape: (2504, 1)) in 0.8 seconds
Traceback (most recent call last):
  File "/g/data3/gx8/local/development/bcbio/anaconda/bin/peddy", line 11, in <module>
    load_entry_point('peddy==0.4.0', 'console_scripts', 'peddy')()
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/peddy/cli.py", line 204, in peddy
    in ("ped_check", "het_check", "sex_check")]):
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/peddy/cli.py", line 43, in run
    prefix=prefix, **kwargs)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/peddy/peddy.py", line 871, in het_check
    pca_df, background_pca_df = pca(pca_plot, sitesfile, gt_types, sites)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/peddy/pca.py", line 61, in pca
    clf.fit(genos1kg, background_target)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/sklearn/pipeline.py", line 248, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/sklearn/pipeline.py", line 213, in _fit
    **fit_params_steps[name])
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/memory.py", line 362, in __call__
    return self.func(*args, **kwargs)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/sklearn/pipeline.py", line 581, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 348, in fit_transform
    U, S, V = self._fit(X)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 394, in _fit
    return self._fit_truncated(X, n_components, svd_solver)
  File "/g/data/gx8/local/development/bcbio/anaconda/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 468, in _fit_truncated
    % (n_components, n_features, svd_solver))
ValueError: n_components=4 must be between 1 and n_features=1 with svd_solver='randomized'
```
Ideally we'd love to be able to get ancestry prediction for single sample inputs. Thanks for any pointers/suggestions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pca sklearn failure in latest development version #45

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

pca sklearn failure in latest development version #45

Description

Activity

brentp commented on May 30, 2018

brentp commented on May 30, 2018

brentp commented on May 30, 2018

brentp commented on May 30, 2018

brentp commented on May 30, 2018

chapmanb commented on May 31, 2018

brentp commented on May 31, 2018

chapmanb commented on May 31, 2018

brentp commented on May 31, 2018

chapmanb commented on May 31, 2018

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

Issue actions