Skip to content

Model does not initialize reliably #7

Open
@scott-uses-git

Description

@scott-uses-git

Hello again,

I've been having trouble getting my model to reliably initialize. The problem arises when initial f-est is "nan" - the model will get completely stuck and wont make any progress

Initial f-est:           nan, fit: -3.669e-01, tensor norm:  8.726e+02

The only workaround I have found is to restart the model over and over until it finds a good random starting point. Even then, the initial f-est is near machine limit but the algorithm is able to quickly converge to a more reasonable parameter space.

Initial f-est: 1.045070e+199, fit: -3.669e-01, tensor norm:  8.726e+02
Epoch   1: f-est =  5.316580e+06, fit =  1.938e-02, step =  1.0e-03, time = 1.24e+01 sec
Epoch   2: f-est =  4.930706e+06, fit =  2.244e-02, step =  1.0e-03, time = 2.50e+01 sec
Epoch   3: f-est =  4.551186e+06, fit =  2.439e-02, step =  1.0e-03, time = 3.80e+01 sec
Epoch   4: f-est =  4.210776e+06, fit =  2.598e-02, step =  1.0e-03, time = 5.22e+01 sec
Epoch   5: f-est =  3.899706e+06, fit =  2.746e-02, step =  1.0e-03, time = 6.45e+01 sec
Epoch   6: f-est =  3.723946e+06, fit =  2.853e-02, step =  1.0e-03, time = 7.74e+01 sec
Epoch   7: f-est =  3.554303e+06, fit =  2.937e-02, step =  1.0e-03, time = 9.13e+01 sec

Can you recommend any setting adjustments I can make to get a better model initialization?

Here is some more info on my data and model settings - unfortunately, my data is proprietary and I am not able to share it.

Sparse tensor: 
  8473 x 92 x 230 (1.79289e+08 total entries)
  761452 (0.4%) Nonzeros and 178527228 (99.6%) Zeros
  8.7e+02 Frobenius norm

Execution environment:
  MPI grid: 1 x 1 x 1 processes (1 total)
  Execution space: serial

GCP-SGD (Generalized CP Tensor Decomposition):
Generalized function type: Poisson (count)
Optimization method: adam
Max iterations (epochs): 100
Iterations per epoch: 500
Traditional annealer, learning rate: 1.0e-03, decay: 1.0e-01
  Function sampler:  stratified with 100000 nonzero and 100000 zero samples
  Gradient sampler:  stratified with 22844 nonzero and 22844 zero samples
  Gradient nonzero samples per epoch: 11422000 (1500.0%)
Gradient method: single MTTKRP

I would greatly appreciate any help or advice!

Thanks,
Scott

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions