This article is a description of my understanding how one should annotate their own dataset when working with https://github.com/microsoft/table-transformer.

Table Detection (PubTab1M only)

Here are some example annotated images. Images sent to training are the same - it’s just the full page of the document. Output is one or multiple tightly annotated bboxes around the tables within the document.

Comments for annotating

Tables can be wrapped very tightly around the content, with a 1 pixel or less padding around the words. The table structure itself is not considered at all in the wrapping - meaning even if the table is clearly a grid with an excel, the annotation is still wrapped around the content within that grid. Additionally the wrapping is done around only the table - not including possible table titles, captions and other content.

Table Structure Recognition (PubTabNet1M + FinTabNet)

Annotations on top of images

These are made a slightly modified view_annotations.py script. They are tightly cropped, so they do not represent what the images look like before sending to training, but reflect what the annotations look like for a given image.

PubTabNet1M

FinTabNet

Untitled

Images in training dataset

These are what the images look like before they are sent to TSR training. We can see that there is substantial “padding” around the table of interest in each image. In some documents the table is isolated and it looks like there is an empty area around the image, but in other more crowded examples we can see text around the table.

Preparing your own TSR dataset for training requires that you have a tight fitting table label around the table, which is then cropped out of the original document (full PDF page) with a padding of 30 pixels.

Untitled

PubTabNet1M

Cropped table that happens to have nothing around it

Cropped table with other tables, captions and other info of the document visible

FinTabNet

Comments for annotating

From these images we can infer the following things

Columns
- The left and right-most columns tightly wrap against text (matching the table borders)
- Column spacing does not necessarily match any structure, e.g. lines in the image
- Columns are tightly overlapped, with no space in-between columns
Column headers
- Column headers grab the full column header in a single bbox even if it contains multiple rows.