Parltrack publishes complete dumps of its database on a daily basis. These dumps are in JSON format, and they are compressed with the lzip[1] tool.

Due to most of the dumps being between 400 and 800 megabytes (at the time of writing in mid 2019) they might not be suitable to load all at once since when loaded into RAM they might use significantly more memory. To facilitate a record-by-record stream processing of these dumps, they are formatted in the following way, each line is one record, each prefixed either with:

  • '[' for the first record,
  • ',' for the other records,
  • ']' on its own for the last line
This means you can read the uncompressed JSON line-by-line, strip of the first character and process the rest of the line as JSON, you can stop processing if after stripping the first character an empty string remains, this means the end of the JSON stream.

TableDescriptionDumpSizeLast UpdatedSchemaPrevious dumps
Scraper Log This contains all the logs from the last scraping. 2024-03-02.log.lz (summary) 658.2 KiB 2024-03-02
MEPs This dump contains all the basic information about the MEPs ep_meps.json.lz 6.1 MiB 2024-03-02
Dossiers This dump contains all the basic information about the dossiers in the EP ep_dossiers.json.lz 44.3 MiB 2024-03-02
Amendments This dump contains most of the committee amendments ep_amendments.json.lz 92.3 MiB 2024-03-02 schema
MEP Activities This dump contains most of the activities of MEPs (such as plenary speeches, questions, interpellations, etc) ep_mep_activities.json.lz 36.5 MiB 2024-03-02 schema
MEP Plenary Votes This dump contains most of the plenary roll-call votes ep_votes.json.lz 7.2 MiB 2024-03-02 schema
Committee Agendas The ep_comagendas dump seems to be missing. schema

[1] previously parltrack used xz, but due to claims xz being inadequate for long-term archiving we switched to lzip.