Missing data in JSON dumps

I’m usually using grep with parallel to do the pre-filtering before piping results to jq.

And I’m actually doing all of this to maintain my own database copy with a much reduced dataset. The tables are pretty simple with only few columns for the keys, and most of the data is stored as a json blob. All the data I need (for now anyway… artists, release-groups, and release → release group maps) weighs less than 3GB. Running a full replica isn’t really an option for me.

Thanks for all your insights!

FWIW: just tried ripgrep… what shall I say: it’s so much faster than using grep to pre-filter those data files! One indicator is that the SSD now reports 2.5-3GB/s throughput, whereas it would max out around 500MB/s using grep. Thanks for pointing me in that direction!

1 Like