Parltrack publishes complete dumps of its database on a daily basis. These dumps are in JSON format, and they are compressed with the lzip tool.
Due to most of the dumps being between 400 and 800 megabytes (at the time of writing in mid 2019) they might not be suitable to load all at once since when loaded into RAM they might use significantly more memory. To facilitate a record-by-record stream processing of these dumps, they are formatted in the following way, each line is one record, each prefixed either with:
- '[' for the first record,
- ',' for the other records,
- ']' on its own for the last line
|Table||Description||Dump||Size||Last Updated||Schema||Previous dumps|
|Scraper Log||This contains all the logs from the last scraping.||2020-01-24.log.lz (summary)||780.6 KiB||2020-01-24|
|MEPs||This dump contains all the basic information about the MEPs||ep_meps.json.lz||5.3 MiB||2020-01-22|
|Dossiers||This dump contains all the basic information about the dossiers in the EP||ep_dossiers.json.lz||38.0 MiB||2020-01-24|
|Amendments||This dump contains most of the committee amendments||ep_amendments.json.lz||61.3 MiB||2020-01-24||schema|
|MEP Activities||This dump contains most of the activities of MEPs (such as plenary speeches, questions, interpellations, etc)||ep_mep_activities.json.lz||21.4 MiB||2020-01-24||schema|
|MEP Plenary Votes||This dump contains most of the plenary roll-call votes||ep_votes.json.lz||4.4 MiB||2020-01-24||schema|
|Committee Agendas||This dump contains most of the committee agendas||ep_comagendas.json.lz||1.6 MiB||2020-01-22||schema|
 previously parltrack used xz, but due to claims xz being inadequate for long-term archiving we switched to lzip.