Model does not initialize reliably

Hello again,

I've been having trouble getting my model to reliably initialize. The problem arises when initial f-est is "nan" - the model will get completely stuck and wont make any progress
```
Initial f-est:           nan, fit: -3.669e-01, tensor norm:  8.726e+02
```

The only workaround I have found is to restart the model over and over until it finds a good random starting point. Even then, the initial f-est is near machine limit but the algorithm is able to quickly converge to a more reasonable parameter space.

```
Initial f-est: 1.045070e+199, fit: -3.669e-01, tensor norm:  8.726e+02
Epoch   1: f-est =  5.316580e+06, fit =  1.938e-02, step =  1.0e-03, time = 1.24e+01 sec
Epoch   2: f-est =  4.930706e+06, fit =  2.244e-02, step =  1.0e-03, time = 2.50e+01 sec
Epoch   3: f-est =  4.551186e+06, fit =  2.439e-02, step =  1.0e-03, time = 3.80e+01 sec
Epoch   4: f-est =  4.210776e+06, fit =  2.598e-02, step =  1.0e-03, time = 5.22e+01 sec
Epoch   5: f-est =  3.899706e+06, fit =  2.746e-02, step =  1.0e-03, time = 6.45e+01 sec
Epoch   6: f-est =  3.723946e+06, fit =  2.853e-02, step =  1.0e-03, time = 7.74e+01 sec
Epoch   7: f-est =  3.554303e+06, fit =  2.937e-02, step =  1.0e-03, time = 9.13e+01 sec
```

Can you recommend any setting adjustments I can make to get a better model initialization? 

Here is some more info on my data and model settings - unfortunately, my data is proprietary and I am not able to share it.
```
Sparse tensor: 
  8473 x 92 x 230 (1.79289e+08 total entries)
  761452 (0.4%) Nonzeros and 178527228 (99.6%) Zeros
  8.7e+02 Frobenius norm

Execution environment:
  MPI grid: 1 x 1 x 1 processes (1 total)
  Execution space: serial

GCP-SGD (Generalized CP Tensor Decomposition):
Generalized function type: Poisson (count)
Optimization method: adam
Max iterations (epochs): 100
Iterations per epoch: 500
Traditional annealer, learning rate: 1.0e-03, decay: 1.0e-01
  Function sampler:  stratified with 100000 nonzero and 100000 zero samples
  Gradient sampler:  stratified with 22844 nonzero and 22844 zero samples
  Gradient nonzero samples per epoch: 11422000 (1500.0%)
Gradient method: single MTTKRP
```

I would greatly appreciate any help or advice!

Thanks,
Scott

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model does not initialize reliably #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model does not initialize reliably #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions