This article is a description of my understanding how one should annotate their own dataset when working with https://github.com/microsoft/table-transformer.

Table Detection (PubTab1M only)

Here are some example annotated images. Images sent to training are the same - it’s just the full page of the document. Output is one or multiple tightly annotated bboxes around the tables within the document.

PMC4670118_6_ANNOTATIONS.jpg

PMC6385404_3_ANNOTATIONS.jpg

PMC3289074_3_ANNOTATIONS.jpg

Comments for annotating

Tables can be wrapped very tightly around the content, with a 1 pixel or less padding around the words. The table structure itself is not considered at all in the wrapping - meaning even if the table is clearly a grid with an excel, the annotation is still wrapped around the content within that grid. Additionally the wrapping is done around only the table - not including possible table titles, captions and other content.

Table Structure Recognition (PubTabNet1M + FinTabNet)

Annotations on top of images

These are made a slightly modified view_annotations.py script. They are tightly cropped, so they do not represent what the images look like before sending to training, but reflect what the annotations look like for a given image.

PubTabNet1M


PMC3666898_table_2_ANNOTATIONS.jpg

PMC4944165_table_1_ANNOTATIONS.jpg

FinTabNet


MS_2013_page_265_table_1_ANNOTATIONS.jpg

VNO_2015_page_88_table_0_ANNOTATIONS.jpg

Untitled

Images in training dataset

These are what the images look like before they are sent to TSR training. We can see that there is substantial “padding” around the table of interest in each image. In some documents the table is isolated and it looks like there is an empty area around the image, but in other more crowded examples we can see text around the table.

Preparing your own TSR dataset for training requires that you have a tight fitting table label around the table, which is then cropped out of the original document (full PDF page) with a padding of 30 pixels.

Untitled

PubTabNet1M


Cropped table that happens to have nothing around it

Cropped table that happens to have nothing around it


PMC3251512_table_0.jpg


Cropped table with other tables, captions and other info of the document visible

Cropped table with other tables, captions and other info of the document visible

FinTabNet


VMC_2011_page_41_table_0.jpg


VMC_2011_page_90_table_0.jpg


PEAK_2007_page_117_table_0.jpg


Comments for annotating

From these images we can infer the following things