Skip to content

Background images for training #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
twehrbein opened this issue Aug 17, 2023 · 4 comments
Open

Background images for training #4

twehrbein opened this issue Aug 17, 2023 · 4 comments

Comments

@twehrbein
Copy link

Hi! Thanks for releasing the code!
I'm trying to reproduce the training and thus need to gather all the training and validation backgrounds. Following your description, I used the lsun repo to download and extract the backgrounds. However, now I'm struggling with 1) selecting the "correct" background images and 2) converting them to the right format and to the right location.
The provided script data/copy_lsun_images_to_train_files_dir.py doesn't work for me, since I guess the directory structure isn't correct after extracting the images. E.g. "bedroom_train_lmdb" extracts images to e.g. ./f/8/8/1/2/2/*.webp which isn't compatible with your script. Your script also only looks for .jpg files. Furthermore, I don't know how to select the mentioned 397582 training backgrounds, since e.g. "bedroom_train_lmdb" alone has over 3mio images. Would be grateful for any help!

@akashsengupta1997
Copy link
Owner

Hey!

That's odd, IIRC the script used to work with the dataset as extracted. I will take a look this weekend and get back to you.

@twehrbein
Copy link
Author

Hey, any update?

@Fly-Pluche
Copy link

Hello, may I ask if there is any progress?

@noahcao
Copy link

noahcao commented Apr 10, 2024

Hey is there any update?

One way may work, you should change the function for exporting images as [see issue]

def export_images(db_path, out_dir, flat=True, limit=-1):
    print('Exporting', db_path, 'to', out_dir)
    env = lmdb.open(db_path, map_size=1099511627776,
                    max_readers=100, readonly=True)
    count = 0
    with env.begin(write=False) as txn:
        cursor = txn.cursor()
        for key, val in cursor:
            if not flat:
                image_out_dir = join(out_dir, '/'.join(key[:6].decode()))
            else:
                image_out_dir = out_dir
            if not exists(image_out_dir):
                os.makedirs(image_out_dir)
            print('Current key:', key)
            image_out_path = join(image_out_dir, key.decode() + '.jpg')
            img = cv2.imdecode(
                numpy.fromstring(val, dtype=numpy.uint8), 1)
            cv2.imwrite(image_out_path, img)
            count += 1
            if count == limit:
                break
            if count % 1000 == 0:
                print('Finished', count, 'images')

then, you should extract the images with a --flat flag:

python3 data.py export *_val_lmdb --out_dir val
python3 data.py export *_train_lmdb --out_dir train

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants