Scandata parsing

internetarchivepdf.scandata.scandata_xml_get_toc(xml_file)[source]

Returns a table of contents given a parsed scandata.xml

Args:

  • scandata: Parsed scandata as returned by scandata_parse

Returns:

  • List of dict describing the table of contents: Indexes of pages that match a specific page type: [{‘title’: ‘The beginning’, ‘level’: 1, ‘label’: None, ‘leaf’: 2}, …] (list of dict)

Might raise KeyError in case the scandata is invalid