Skip to content

Table extraction #164

@kba

Description

@kba

From OCR-D/ocrd_fileformat#46

@kba:

It would be very useful to have a transformation that extracts any tables from PAGE-XML to CSV.

@bertsky:

Thoughts:

  • each TableRegion needs its own CSV, so it's not immediately clear how this fits with the page→page converter paradigm
    (e.g. for page→text, one could simply paste the CSV in the middle of the plaintext, but maybe creating a multitude of output files is usually better)
  • CSV may already be too coarse (no multi-span, no header distinction)
  • perhaps better transfer to ocr-fileformat subrepo?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions