MaskBit Rebuttal Visualizations

We provide qualitative results for the following experiments:

1. COCO 0-shot bit flipping.

2. Nearest Neighbor Analysis w. LPIPS and Hamming distance.

3. VQGAN+ with corrupted latent tokens.

1. COCO 0-shot bit flipping

The top left image is the original image, and the bottom left image is the reconstructed image. All other images are obtained by flipping the i-th bit for all bit tokens and decoding them into images.

2. Nearest Neighbor Analysis w. LPIPS and Hamming distance

The left image is the generated image. The 10 samples at the top are the nearest neighbors from the training set in terms of LPIPS distance, while the 10 samples at the bottom are nearest neighbors from the training set in terms of Hamming distance.





3. VQGAN+ with 256 corruptions in latent space on ImageNet

We provide VQGAN+ corruption visualization below. We note the corruption visualization is similar for all other images and we randomly show three of them.