IM2LATEX-100K

Processed files (ready for training and testing)

im2latex_formulas.norm.lst (normalized formulas, 103559 lines)
im2latex_formulas.tok.lst (tokenized formulas, 103559 lines)
formula_images_processed.tar.gz (processed images, cropped, downsampled and padded to facilitate batching)
im2latex_train_filter.lst (filtered training set)
im2latex_validate_filter.lst (filterd validation set)
im2latex_test_filter.lst (test set, note that actually we did not filter that, only format conversion)

Raw files

im2latex_formulas.lst (untokenized formulas, 103559 lines)
im2latex_train.lst (training set)
im2latex_validate.lst (validation set)
im2latex_test.lst (test set)
formula_images.tar.gz (whole images)

IM2LATEX-100K-HANDWRITTEN

IM2LATEX-100K-HANDWRITTEN.tgz (processed images, unprocessed formulas, training, validation and test set)