Get replication data as a dump

Is there any way to get the replication packet data as a dump of some sort? I have a Musicbrainz mirror running locally which I’m querying using a python script which is storing a subset of data (artist data, official release-groups and the associated tracks) in a nosql (mongodb) database.

I want to be able to get the latest updates from MB periodically and then get them updated in my subset database. What I don’t want to have to do is re-import the whole musicbrainz db in to my mongodb subset every time there’s a new replication packet available so wondering if there’s any other way to get data about what has updated?

Edit: I’m just reading that the live data feed is MusicBrrainz primary source of income so I’m thinking that perhaps there won’t be any other way to get the data via than replication in to the postgresql database.

You could install a postgresql trigger on each table that records a history of what changes.
When you insert or update a table in postgres you can have code execute.

There is a trigger b_upd_last_updated_table on most tables that updates the last_updated columns as rows are inserted and updated.

You could also create your own trigger that fires when rows are inserted and updated.
One way of doing this would be to create a table that records the table and id of the row that was updated.
Your table may look like (id, table_name,insert/update flag, imported in mongo flag)
When a table you care about is updated you would have a AFTER UPDATE ON trigger that inserts a record. Your script would then be able to query this and know what records it needs to refresh.

2 Likes

Hi Dns Server,

Thanks for your reply!

That’s really useful info and I’m sure I’ll be able to get something working based off that.

If you get stuck let me know, I am a DBA and this kind of thing is sometimes my job.

1 Like

Cool will do. Thanks very much.