Finding release dates from database dump

release
postgresql
query
Tags: #<Tag:0x00007f2a03d3a5e8> #<Tag:0x00007f2a03d3a2f0> #<Tag:0x00007f2a03d39fd0>

#1

Just wondering, how would you go about SQL searching by release dates for individual releases or release groups? Specifically, I’m looking at release_country and release_unknown_country but those two just have separated year, month and date fields.

I hacked up a way to do it by creating a view that turns the year/month/day columns into an actual date value but when I do that, the queries take forever to run…is there a better way to do this?


#2

Can you paste the query / view definition you are trying to run?
It may be easier to use pastebin or a google doc with your query.


#3

Hi! First of all, thanks for taking the time to even look into this. I just found out most of the devs are actually in IRC rather than here.

Anyways, my goal is to create a sort of super table that lists the artists name, with any release_groups and the earliest release_dates attached to whatever album/single it finds since I don’t want it to list re-releases. To achieve this, I’m joining together the following tables:

  • artist
  • artist_credit
  • artist_credit_name
  • release
  • release_group
  • release_event_dates

release_event_dates is basically just a UNION between release_country and release_unknown_country that turns the day, month and year columns into dates. I’ve included the code for the two views at the bottom of my post. This is the query I try to run just to see what’s new this week:

SELECT * FROM release_groups_single_date WHERE released >= '2018-01-16' AND released =< '2018-02-16'

I’m sure someone more experience can already see the problem in that it takes forever to execute. I can post the output from EXPLAIN if you’d like, but the EXPLAIN ANALYZE says it takes 15 seconds to execute. If I run a similar search on the website it returns results instantly.

Things I’ve tried

  • Adding indexes where required
  • Removing the COALESCE and MIN functions
  • Using Materialized views

The last one basically solved it for me, and I just run a cron job where it updates the view once a day. Still though, I’m very curious as to how the website achieves such good performance so I’ve been tinkering with the virtual machine but can’t seem to get the search-server docker to run for some reason.

Once I get the virtual machine search-server up and running, I intend to log all queries and dig through the website’s source to see what it’s doing and try to mimic. I’m a bit new at this, so everything takes forever and I would appreciate it if anyone could point me in the right direction.

Thanks for reading! Let me know if there’s any info you think I should add.

/edit After looking more closely at the search help page, it looks like it uses a search engine called Lucene, so I’m guessing that means it doesn’t query the database directly. There may not be an answer to my question then :x

CREATE VIEW release_event_dates AS
 SELECT q.release,
    to_date(concat(COALESCE((q.date_year)::integer, 1900), '/', COALESCE((q.date_month)::integer, 1), '/', COALESCE((q.date_day)::integer, 1)), 'YYYY/MM/DD'::text) AS date,
    q.country
   FROM ( SELECT release_country.release,
            release_country.date_year,
            release_country.date_month,
            release_country.date_day,
            release_country.country
           FROM release_country
        UNION ALL
         SELECT release_unknown_country.release,
            release_unknown_country.date_year,
            release_unknown_country.date_month,
            release_unknown_country.date_day,
            NULL::integer AS int4
           FROM release_unknown_country) q;
SELECT artist.name,
    release.name AS rname,
    release_group.id,
    min(release_event_dates.date) AS released
   FROM release_group
     JOIN release ON release_group.id = release.release_group
     JOIN release_event_dates ON release.id = release_event_dates.release
     JOIN artist_credit ON artist_credit.id = release.artist_credit
     JOIN artist_credit_name ON artist_credit.id = artist_credit_name.artist_credit
     JOIN artist ON artist.id = artist_credit_name.artist
  GROUP BY release_group.id, artist.name, release.name;

#4

Your date calculation looks good but if you convert that to a function you may be able to use a function based index.
This allows you to create a function based index and this should hopefully be faster (needs testing)

create or replace function calc_date(date_year integer,date_month integer,date_day integer) returns date as $$
select to_date(concat(COALESCE((date_year)::integer, 1900), '/', COALESCE((date_month)::integer, 1), '/', COALESCE((date_day)::integer, 1)), 'YYYY/MM/DD'::text) $$
LANGUAGE SQL immutable;

create index release_unknown_country_over_date on release_unknown_country(calc_date(date_year,date_month,date_day));
create index release_country_over_date on release_country(calc_date(date_year,date_month,date_day));

When I look at the explain plan the difference is not that large.
As the releases table is huge most of the time is spent looking at that table.


#5

Did you find that release_groups have a record of the first release date, which might be something that is useful for you (and should remove your need for your view)


#6

I had no idea that such a table existed! I’m going to compare that to the min date and see if I get the same thing. Does this mean that the earliest release dates are duplicated in the database?

@dns_server

When I look at the explain plan the difference is not that large.

I’ve never used functions or function indexes so I might try it out just for fun, but are you saying that performance-wise it had no impact?


#7

@heyoni
You are joining a huge table with a small table so most of the time was reading the big table from disk.
The performance gains from calculating things one way or another way don’t make a huge difference in this case.