Count artists by country, Soleved


#1

Hi i have been struggling with the musicbrainz database, i just need to count the number of artists and the number of labels for each country, i think i have the counting labels sql query working but the count artists shows 5 artists for the first 50 results as i limited it to 50, and i assume that is wrong ?.

Any way i assume i have done something wrong with my join, can anyone please please show me what i have done wrong in the sql code below.

	/* Q2: COUNT the number of artists and the number of labels by country */

SELECT COUNT(artist.id) AS "artist_count" 
FROM artist
JOIN area ON artist.area = area.id 
GROUP BY area.name;  

SELECT area.name, COUNT(label.id) AS "label_count"      <-- WAI - this one works as intended i think
FROM area
JOIN label ON area.id = label.area
GROUP BY area.name;


the nested statement with both combined.
SELECT aCounties, a.Label_cnt, b.Artist_cnt FROM 
	(SELECT area.name AS Counties, COUNT(label.id) AS Label_cnt
	FROM area
	JOIN label ON area.id = label.area
	GROUP BY area.name) AS a,
		(SELECT COUNT(artist.id) AS Artist_cnt
		FROM artist
		JOIN area ON artist.area = area.id 
		GROUP BY area.name) AS b LIMIT 50;

#2

Few remarks, I didn’t look at your code closely:

  • areas are not only countries, they can be regions or cities too. Computing results by countries will probably not be trivial (the city-country relation is not stored directly in the database)
  • I would group over area.id, not area.name otherwise you are going to mix up different areas with the same name (for labels I have ~2366 results in the second case and ~2485 in the first one)
  • I don’t really understand why you assume your query is wrong (I don’t get what the 5/50 refer to). Can you paste your result?

Hope this helps


#3

i thought it was wrong just in the fact that the results where all 5 artists per area for the first 50 artists.
And i see, yes you’re right it does return all areas not just countries, what if i also joined to country_area would that do ? or would i also need to join to release_country too after country_area ?

can you help me with the sql, what would i have to do to get just the countries ?

heres the results in a csv export:

Country, Labels, Artists
Aachen,3,5
Aarau,1,5
Aarhus,5,5
Aartselaar,1,5
Aberdeen,2,5
Abstatt,1,5
Acerra,1,5
Adelaide,8,5
Aerdenhout,1,5
Ahmedabad,1,5
Alabama,1,5
Alagoas,1,5
Alameda,1,5
Alaska,1,5
Albania,4,5
Albany,6,5
Alberobello,1,5
Alberta,1,5
Albi,1,5
Albuquerque,4,5
Alexandria,1,5
Alger,1,5
Algeria,4,5
Alicante,1,5
Almería,1,5
Alsdorf,1,5
Altamonte Springs,1,5
Altdorf bei Nürnberg,1,5
Altenburg,1,5
Alès,1,5
Amami,1,5
Amberg,1,5
Amersfoort,1,5
Amherst,2,5
Amiens,3,5
Amityville,1,5
Amsterdam,36,5
América,1,5
Anacortes,1,5
Anaheim,1,5
Anchorage,1,5
Andalucía,1,5
Andhra Pradesh,1,5
Andorra,4,5
Andorra la Vella,2,5
Angers,3,5
Angoulême,1,5
Ankara,2,5
Ann Arbor,6,5
Annapolis,1,5
Annecy,1,5
Antananarivo,2,5
Antofagasta,1,5
Antwerp,8,5
Antwerpen,1,5
Araguari,1,5


#4

You’re not selecting area.name while grouping by it for the artists. I’ve never tried to do that in mysql but I wouldn’t be surprised if it made things a bit weird


#5

@reosarevok: looks like it

I ran the following query in postgresql and it seems to work:

SELECT a.areas, a.Label_cnt, b.Artist_cnt FROM 
  (SELECT area.name AS areas, COUNT(label.id) AS Label_cnt
   FROM area                        
   JOIN label ON area.id = label.area
   GROUP BY area.id) AS a,
  (SELECT area.name AS areas, COUNT(artist.id) AS Artist_cnt
   FROM artist
   JOIN area ON area.id = artist.area
   GROUP BY area.id) AS b
WHERE a.areas = b.areas 
LIMIT 10;

gives:

   areas    │ label_cnt │ artist_cnt 
────────────┼───────────┼────────────
 Albania    │         6 │         73
 Algeria    │         5 │        199
 Andorra    │         4 │         12
 Angola     │         3 │        107
 Antarctica │         1 │          2
 Argentina  │       329 │       3096
 Armenia    │         5 │        154
 Australia  │      1325 │       9450
 Austria    │       433 │       4347
 Azerbaijan │         1 │         67

I think you’re right, adding JOIN country_area ON country_area.area = area.id in the query should do the trick


#6

Thank you so much, i am very grateful for all the help.I add a join too country_area and it works perfectly :slight_smile: