Skip to content

Commit 9fe1276

Browse files
luke-hanrom1504
andauthored
fix parquet to arrow script failed when number of samples is small (#301)
key_format becomes negative Co-authored-by: Romain Beaumont <[email protected]>
1 parent c4e6615 commit 9fe1276

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

clip_retrieval/clip_back_prepro/parquet_to_arrow.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ def parquet_to_arrow(parquet_folder, output_arrow_folder, columns_to_return):
3636
sink = None
3737
current_batch_count = 0
3838
batch_counter = 0
39-
key_format = int(math.log10(number_samples / 10**10)) + 1
39+
key_format = max(0, int(math.log10(number_samples / 10**10))) + 1
4040
for parquet_files in tqdm(files):
4141
if sink is None or current_batch_count > 10**10:
4242
if sink is not None:

0 commit comments

Comments
 (0)