Project summary:
Title: Modernize search storage format for the MusicBrainz database
Proposed mentor: @lucifer
Proposed Co-mentors: bitmap , reosarevok , yvanzo
Estimated Project Length: 350 hours
Difficulty: medium
Expected Outcomes:
- Upgrade the Solr schema version from 1.5 to 1.7
- Complete fields in configsets and indexer to store all the data to be returned
- Create two response writers to return data from fields to MB XML/MB JSON formats and add automated validation tests
Personal Information:
Name: Shaik Junaid
IRC Nickname: fettuccinae
Github: fettuccinae
Timezone: UTC +05:30
Introduction:
Hi! I’m Junaid (fettuccinae on matrix), a third-year Computer Science student at Mahatma Gandhi Institute of Technology (MGIT) ,Hyderabad, India.
Last summer, I worked on creating a notification system for MetaBrainz projects as my GSoC project and this year, I’m excited to apply as a contributor to work on the Solr search engine.
Project proposal :
Description:
The MusicBrainz database has a Solr search engine used for both website search and search API.
MB Search architecture for reference.
After solr was upgraded to version 9, the schemas in configsets weren’t updated and still use version 1.5. They need to be upgraded to version 1.7.
Solr currently keeps data in index fields for search and also uses _store field to store the index field data + other non index field’s data for response writer, causing redundancy. To eliminate this redundancy, we need to store all the required data in the fields themselves.
Solr response writer currently reads all the data from _store field of a document and returns it in a valid MB XML/ MB JSON format.
After modifying the configsets and indexer to store data in the fields, we need to create response writers that read from these fields and return data in a valid MB XML/ MB JSON format, along with validation tests for these writers.
Since storing data in individual fields rather than a single _store blob will lead to change in performance,we will start with the artist core and benchmark it before rolling out to all remaining entities.
Implementation:
1. Migrate schema to version 1.7
The change in schema version 1.6 is all about non-stored docValues fields, which we dont use.
The change in schema version 1.7 is that certain field types (Numeric, Date, Bool, String, Enum, UUID) that support docValues, will have docValues enabled by default. Since fields with docValues="true" performs poorly for retreival queries compared to stored="true" fields, we need to explicitly set docValues="false" for these field types in fieldtypes.xml.
Note: storefieldmv already has docValues="true", so we wont be modifying it.
Current: fieldtypes.xml
<types>
<fieldtype name="string" class="solr.StrField" sortMissingLast="false" />
<fieldType name="long" class="solr.LongPointField" positionIncrementGap="0" />
<fieldType name="mbid" class="solr.UUIDField" omitNorms="true" />
<fieldType name="storefield" class="solr.StrField" />
<fieldType name="storefieldmv" class="solr.StrField" docValues="true" />
<fieldType name="bool" class="solr.BoolField" />
<fieldType name="date" class="solr.DateRangeField" sortMissingLast="false" />
<fieldType name="int" class="solr.IntPointField" sortMissingLast="false" />
<fieldType name="float" class="solr.FloatPointField" />
....
</types>
After: fieldtypes.xml
<types>
<fieldtype name="string" class="solr.StrField" sortMissingLast="false" docValues="false" />
<fieldType name="long" class="solr.LongPointField" positionIncrementGap="0" docValues="false" />
<fieldType name="mbid" class="solr.UUIDField" omitNorms="true" docValues="false" />
<fieldType name="storefield" class="solr.StrField" docValues="false" />
<fieldType name="storefieldmv" class="solr.StrField" docValues="true" />
<fieldType name="bool" class="solr.BoolField" docValues="false" />
<fieldType name="date" class="solr.DateRangeField" sortMissingLast="false" docValues="false" />
<fieldType name="int" class="solr.IntPointField" sortMissingLast="false" docValues="false" />
<fieldType name="float" class="solr.FloatPointField" docValues="false" />
.....
</types>
Then, update all entities schema versions to 1.7
Example:
_template/_conf/schema.xml
<?xml version="1.0"?>
<!-- This is a template for new cores. -->
<schema name="[new_entity]" version="1.7" xmlns:xi="http://www.w3.org/2001/XInclude">
....
</schema>
artist/conf/schema.xml
<?xml version="1.0"?>
<schema name="artist" version="1.7" xmlns:xi="http://www.w3.org/2001/XInclude">
....
</schema>
2. Complete fields in configsets and remove _store
Currently, the search fields are used only for indexing and the _store blob is being used to store all the data required by response writers.
We can eliminate this redundancy by storing data in the fields themselves and removing the _store blob.
To flatten simple nested fields, We can add their inner elements to the schema.
Example: area element which is of the type def_area-element_inner can be flattened as follows:
<field name="area-id" type="text" indexed="false" stored="true" />
<field name="area-name" type="text" indexed="true" stored="true" />
<field name="area-type" type="mbid" indexed="false" stored="true />
<field name="area-type-id" type="text" indexed="false" stored="true />
<field name="area-sort-name" type="text" indexed="false" stored="true />
<field name="area-lifespan-begin_date" type="date" indexed="false" stored="true />
<field name="area-lifespan-end_date" type="date" indexed="false" stored="true />
<field name="area-lifespan-ended" type="bool" indexed="false" stored="true" />
There are four types of fields:
- Flat fields which are to be stored and to be used for indexing.
- Flat fields which are used only for indexing.
- Flat fields which are used only for storing.
- Complex nested fields which can’t be flattened, so we store them as XML string.
Example:
1. <field name="arid" type="mbid" indexed="true" stored="true" />
<field name="artist" type="text" indexed="true" stored="true" />
2. <field name="alias" type="text_mult" indexed="true" stored="false" multiValued="true" />
<field name="area" type="text_mult" indexed="true" stored="false" multiValued="true" />
3. <field name="gender-id" type="text" indexed="false" stored="true" />
<field name="area-type" type="lowercase" indexed="false" stored="true" />
4. <field name="alias_list_store" type="storefield" indexed="false" stored="true" />
<field name="tag_store" type="storefield" indexed="false" stored="true" />
This approach removes our dependency on _store completely and the response writer can read all the required data from the fields themselves.
Example for Artist entity schema after completing all the feilds:
artist/conf/schema.xml
<schema name="artist" version="1.7" xmlns:xi="http://www.w3.org/2001/XInclude">
<!-- Search fields with stored="true" -->
<field name="artist" type="text" indexed="true" stored="true" />
<field name="sortname" type="text" indexed="true" stored="true" required="true" />
<field name="arid" type="mbid" indexed="true" stored="true" />
<field name="type-name" type="lowercase" indexed="true" stored="true" omitNorms="true" />
<field name="comment" type="text" indexed="true" stored="true" />
<field name="country" type="lowercase" indexed="true" stored="true" omitNorms="true" />
<field name="area-name" type="text_mult" indexed="true" stored="true" multiValued="true" />
<field name="life_span-begin_date" type="date" indexed="true" stored="false" />
<field name="begin_area-name" type="text_mult" indexed="true" stored="false" multiValued="true" />
<field name="life_span-end_date" type="date" indexed="true" stored="false" />
<field name="end_area-name" type="text_mult" indexed="true" stored="false" multiValued="true" />
<field name="life_span-ended" type="bool" indexed="true" stored="false" />
<field name="gender-name" type="lowercase" indexed="true" stored="false" omitNorms="true" />
<field name="ipis-ipi" type="strip_leading_zeroes_concat_mult" indexed="true" stored="false" multiValued="true" />
<field name="isnis-isni" type="strip_leading_zeroes_concat_mult" indexed="true" stored="false" multiValued="true" />
<field name="mbid" type="mbid" indexed="true" stored="true" required="true" />
<!-- Search fields with stored="false" -->
<field name="alias" type="text_mult" indexed="true" stored="false" multiValued="true" />
<field name="primary_alias" type="text_mult" indexed="true" stored="false" multiValued="true" />
<field name="tag" type="text_mult" indexed="true" stored="false" multiValued="true" />
<field name="ref_count" type="int" indexed="true" stored="false" />
<field name="ngram" type="ngram" indexed="true" stored="false" multiValued="true" />
<!-- Complex nested fields -->
<field name="alias_list" type="storefield" indexed="false" stored="true" />
<field name="tag_list" type="storefield" indexed="false" stored="true" />
<!-- Non Search fields with stored="true" -->
<field name="area-id" type="text" indexed="false" stored="true" />
<field name="area-type" type="mbid" indexed="false" stored="true" />
<field name="area-type-id" type="text" indexed="false" stored="true" />
<field name="area-sort-name" type="text" indexed="false" stored="true" />
<field name="area-lifespan-begin_date" type="date" indexed="false" stored="true" />
<field name="area-lifespan-end_date" type="date" indexed="false" stored="true" />
<field name="area-lifespan-ended" type="bool" indexed="false" stored="true" />
<field name="begin_area-id" type="text" indexed="false" stored="true" />
<field name="begin_area-type" type="mbid" indexed="false" stored="true" />
<field name="begin_area-type-id" type="text" indexed="false" stored="true" />
<field name="begin_area-sort-name" type="text" indexed="false" stored="true" />
<field name="begin_area-lifespan-begin_date" type="date" indexed="false" stored="true" />
<field name="begin_area-lifespan-end_date" type="date" indexed="false" stored="true" />
<field name="begin_area-lifespan-ended" type="bool" indexed="false" stored="true" />
<field name="end_area-id" type="text" indexed="false" stored="true" />
<field name="end_area-type" type="mbid" indexed="false" stored="true" />
<field name="end_area-type-id" type="text" indexed="false" stored="true" />
<field name="end_area-sort-name" type="text" indexed="false" stored="true" />
<field name="end_area-lifespan-begin_date" type="date" indexed="false" stored="true" />
<field name="end_area-lifespan-end_date" type="date" indexed="false" stored="true" />
<field name="end_area-lifespan-ended" type="bool" indexed="false" stored="true" />
<field name="lifespan-ended" type="bool" indexed="false" stored="true" />
<field name="type-id" type="text" indexed="false" stored="true" />
<field name="_version_" type="long" indexed="true" stored="true" />
<copyField source="artist" dest="artistaccent" />
<copyField source="mbid" dest="arid" />
<copyField source="artist" dest="ngram" />
<copyField source="sortname" dest="ngram" />
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>mbid</uniqueKey>
</schema>
We also need to modify request-params.xml for each entity to return all stored fields by using * and removing _store from the Field List fl paramter .
artist/conf/request-params.xml
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="fl">score, *</str>
<str name="qf">alias^1.75 primary_alias^2 artist^2 artistaccent^2.2 comment ngram^0.5 sortname^1.85</str>
<str name="pf">primary_alias^2 artist^2 artistaccent^2.2 alias^1.75 sortname^1.85 comment</str>
<str name="bf">log(sum(ref_count,150))^4</str>
</lst>
3. Complete fields in indexer
Search Index Rebuilder or SIR , builds the index by sending search fields and _store field to Solr.
SIR initializes a SearchEntity object for a document with list of search fields as fields and remaining fields as extrapaths. It builds an entity query and then converts it to a dict which is later sent to solr engine.
To complete fields:
a. We need to rename the variables in Search<Entity>(in sir/schema/_init_.py as per the config sets and move them from extrapaths to fields.
Example for Artist entity:
Current code:
SearchArtist = E(modelext.CustomArtist, [
F("mbid", "gid"),
F("artist", "name"),
F("sortname", "sort_name"),
F("alias", "aliases.name"),
# Does not require a trigger since this will get updated on an alias update
F("primary_alias", "primary_aliases", trigger=False),
F("begin", "begin_date", transformfunc=tfs.index_partialdate_to_string),
F("end", "end_date", transformfunc=tfs.index_partialdate_to_string),
F("ended", "ended", transformfunc=tfs.ended_to_string),
F("area", ["area.name", "area.aliases.name"]),
F("beginarea", ["begin_area.name", "begin_area.aliases.name"]),
F("country", "area.iso_3166_1_codes.code"),
F("endarea", ["end_area.name", "end_area.aliases.name"]),
F("ref_count", "artist_credit_names.artist_credit.ref_count",
transformfunc=sum, trigger=False),
F("comment", "comment"),
F("gender", "gender.name"),
F("ipi", "ipis.ipi"),
F("isni", "isnis.isni"),
F("tag", "tags.tag.name"),
F("type", "type.name")
],
1.5,
convert.convert_artist,
extrapaths=["tags.count",
"aliases.type.name", "aliases.type.id",
"aliases.type.gid", "aliases.sort_name",
"aliases.locale", "aliases.primary_for_locale",
"aliases.begin_date", "aliases.end_date",
"begin_area.gid", "area.gid", "end_area.gid",
"area.begin_date", "area.end_date", "area.ended",
"begin_area.begin_date", "begin_area.end_date",
"begin_area.ended", "end_area.begin_date",
"end_area.end_date", "end_area.ended",
"gender.gid", "area.type.gid", "area.type.name",
"begin_area.type.gid", "begin_area.type.name",
"end_area.type.gid", "end_area.type.name",
"type.gid"]
)
After the change:
SearchArtist = E(modelext.CustomArtist, [
F("mbid", "gid"),
F("artist", "name"),
F("sortname", "sort_name"),
F("alias", "aliases.name"),
# Does not require a trigger since this will get updated on an alias update
F("primary_alias", "primary_aliases", trigger=False),
F("life_span-begin_date", "begin_date", transformfunc=tfs.index_partialdate_to_string),
F("life_span-end_date", "end_date", transformfunc=tfs.index_partialdate_to_string),
F("life_span-ended", "ended", transformfunc=tfs.ended_to_string),
F("area-name", ["area.name", "area.aliases.name"]),
F("begin_area-name", ["begin_area.name", "begin_area.aliases.name"]),
F("country", "area.iso_3166_1_codes.code"),
F("end_area-name", ["end_area.name", "end_area.aliases.name"]),
F("ref_count", "artist_credit_names.artist_credit.ref_count",
transformfunc=sum, trigger=False),
F("comment", "comment"),
F("gender", "gender.name"),
F("ipis-ipi", "ipis.ipi"),
F("isnis-isni", "isnis.isni"),
F("tag", "tags.tag.name"),
F("type-name", "type.name"),
F("begin_area-lifespan-date", "begin_area.begin_date"),
# Similar fields for `area`, `begin-area`, `end-area`
F("gender-id", "gender.gid"),
F("type-id", "type.gid")
],
1.7,
convert.convert_artist,
extrapaths=["tags.count",
"aliases.type.name", "aliases.type.id",
"aliases.type.gid", "aliases.sort_name",
"aliases.locale", "aliases.primary_for_locale",
"aliases.begin_date", "aliases.end_date"
]
)
b. Currently the converter function builds the full entity object and returns it.
We need to modify the existing convert functions to return a list of dictionaries representing the nested fields, alongside the entity object.
Current converter function for artist entity:
def convert_artist(obj):
"""
:type obj: :class:`sir.schema.modelext.CustomArtist`
"""
artist = models.artist(id=str(obj.gid), name=obj.name,
sort_name=obj.sort_name)
if obj.comment:
artist.set_disambiguation(obj.comment)
if obj.gender is not None:
artist.set_gender(convert_gender(obj.gender))
if obj.type is not None:
artist.set_type(obj.type.name)
artist.set_type_id(str(obj.type.gid))
if obj.begin_area is not None:
artist.set_begin_area(convert_area_inner(obj.begin_area))
if obj.area is not None:
artist.set_area(convert_area_inner(obj.area))
if len(obj.area.iso_3166_1_codes) > 0:
artist.set_country(
models.def_iso_3166_1_code(obj.area.iso_3166_1_codes[0].code)
)
if obj.end_area is not None:
artist.set_end_area(convert_area_inner(obj.end_area))
lifespan = convert_life_span(obj.begin_date, obj.end_date, obj.ended)
artist.set_life_span(lifespan)
if len(obj.aliases) > 0:
artist.set_alias_list(convert_alias_list(obj.aliases))
if len(obj.ipis) > 0:
artist.set_ipi_list(convert_ipi_list(obj.ipis))
if len(obj.isnis) > 0:
artist.set_isni_list(convert_isni_list(obj.isnis))
if len(obj.tags) > 0:
artist.set_tag_list(convert_tag_list(obj.tags))
return artist
After the change to return nested_field_list alongside:
sir/wscompat/convert.py
def convert_artist(obj):
"""
:type obj: :class:`sir.schema.modelext.CustomArtist`
"""
artist = models.artist(id=str(obj.gid), name=obj.name,
sort_name=obj.sort_name)
nested_field_list = []
if len(obj.tags) > 0:
artist.set_tag_list(convert_tag_list(obj.tags))
nested_field_list .append({"tag_list": artist.get_tag_list())
if len(obj.aliases) > 0:
artist.set_alias_list(convert_alias_list(obj.aliases))
nested_field_list .append({"alias_list": artist.get_alias_list()})
if obj.comment:
artist.set_disambiguation(obj.comment)
if obj.gender is not None:
artist.set_gender(convert_gender(obj.gender))
if obj.type is not None:
artist.set_type(obj.type.name)
artist.set_type_id(str(obj.type.gid))
if obj.begin_area is not None:
artist.set_begin_area(convert_area_inner(obj.begin_area))
if obj.area is not None:
artist.set_area(convert_area_inner(obj.area))
if len(obj.area.iso_3166_1_codes) > 0:
artist.set_country(
models.def_iso_3166_1_code(obj.area.iso_3166_1_codes[0].code)
)
if obj.end_area is not None:
artist.set_end_area(convert_area_inner(obj.end_area))
lifespan = convert_life_span(obj.begin_date, obj.end_date, obj.ended)
artist.set_life_span(lifespan)
if len(obj.ipis) > 0:
artist.set_ipi_list(convert_ipi_list(obj.ipis))
if len(obj.isnis) > 0:
artist.set_isni_list(convert_isni_list(obj.isnis))
return artist, nested_field_list
c. Update query_result_to_dict in searchentities.py to add nested fields to the data dict and remove _store from it.
sir/schema/searchentities.py
def query_result_to_dict(self, obj):
"""
Converts the result of single ``query`` result into a dictionary via the
field specification of this entity.
:param obj: A :ref:`declarative <sqla:declarative_toplevel>` object.
:rtype: dict
"""
# Unchanged code.
if (config.CFG.getboolean("sir", "wscompat") and self.compatconverter is
not None):
# _store is not required anymore.
# data["_store"] = str(tostring(self.compatconverter(obj).to_etree(), encoding='us-ascii'), encoding='us-ascii')
_, nested_list = self.compatconverter(obj)
for n in nested_list:
for n_field, value in n.items():
data[n_field] = str(tostring(value.to_etree(), encoding='us-ascii'), encoding='us-ascii')
return data
d. Fix test/test_searchentities.py and test/test_indexing_real_data.py tests as they are dependent on _store field to validate the output.
Expand test/test_wscompat_convert.py to cover remaining convert_<entity> functions.
4. Create response writers for complete fields with validation tests in query response writer
The MB-XML writer parses the Solr document by extracting the _store XML string, unmarshalling it, and writing it to the output. The MB JSON format is automatically generated from the MB XML format.
We need to create a new writers that parse the Solr document, read values from each field and constructs valid MB-XML/ MB-JSON objects for output.
a. MB-XML Writer
We need to create a writer that unpacks all flat fields, unmarshals nested fields and combines them into an entity object, which is then written to the output.
We can re-use MBXMLWriter and modify both the parseSolrResponse methods for our objective.
We need to add entity specific builders that creates an entity object from its stored fields and returns it.
As we have two implementations of parseSolrResponse( for BasicResultContext and SolrDocumentList), We need a FieldReader interface which can be passed to buildEntityFromFields router.
The XML writer will look something like:
// FieldReader interface
private interface FieldReader {
String get(String fieldName);
}
//Helper function to unmarshall nested fields
private Object unmarshalFragment(Unmarshaller unmarshaller, String fragment){
return unmarshaller.unmarshal(new ByteArrayInputStream(fragment.getBytes()));
}
//Artist entity builder
private Artist buildArtist(FieldReader doc, Unmarshaller unmarshaller) {
Artist artist = new Artist();
artist.setId(doc.get("mbid"));
artist.setName(doc.get("artist"));
artist.setType(doc.get("type-name"));
artist.setTypeId(doc.get("type-id"));
artist.setDisambiguation(doc.get("comment");
Gender gender = new Gender();
gender.setId(doc.get("gender-id"));
gender.setContent(doc.get("gender"));
artist.setGender(gender);
DefAreaElementInner area = new DefAreaElementInner();
area.setId(doc.get("area-id"));
area.setName(doc.get("area-name"));
LifeSpan ls = new LIfeSpan();
ls.setBegin(doc.get("area-being_date"));
ls.setEnd(doc.get("area-end_date"));
ls.setEnded(doc.get("area-ended"));
artist.setLifeSpan(ls);
// Similar logic for beginArea, endArea, ipiList, isniList
artist.setAliasList(unmarshalFragment(unmarshaller, doc.get("alias_list")));
artist.setTagList(unmarshalFragment(unmarshaller, doc.get("tag_list")));
return artist;
}
// Router function
private Object buildEntityFromFields(FieldReader doc, Unmarshaller unmarshaller) {
switch (entityType) {
case annotation: return buildAnnotation(doc, unmarshaller);
case area: return buildArea(doc, unmarshaller);
case artist: return buildArtist(doc, unmarshaller);
case cdstub: return buildCdstub(doc, unmarshaller);
case editor: return buildEditor(doc, unmarshaller);
case event: return buildEvent(doc, unmarshaller);
case instrument: return buildInstrument(doc, unmarshaller);
case label: return buildLabel(doc, unmarshaller);
case place: return buildPlace(doc, unmarshaller);
case recording: return buildRecording(doc, unmarshaller);
case release: return buildRelease(doc, unmarshaller);
case release_group: return buildReleaseGroup(doc, unmarshaller);
case series: return buildSeries(doc, unmarshaller);
case tag: return buildTag(doc, unmarshaller);
case work: return buildWork(doc, unmarshaller);
case url: return buildUrl(doc, unmarshaller);
default: throw new IllegalArgumentException("invalid entity type: " + entityType);
}
}
public void parseSolrResponse(ResultContext con,
MetadataListWrapper metadatalistwrapper,
SolrQueryRequest req)
throws IOException {
// unchanged code.
while (iter.hasNext()) {
int id = iter.nextDoc();
Document doc = req.getSearcher().doc(id);
FieldReader fieldReader = new FieldReader() {
@Override
public String get(String fieldName) {
return doc.getField(fieldName).stringValue();
}
}
Object entity = buildEntityFromFields(fieldReader, unmarshaller);
try {
adjustScore(maxScore, entity, iter.score());
} catch (NullPointerException e) {
throw new RuntimeException(SCORE_NOT_IN_FIELD_LIST);
}
xmlList.add(entity);
}
}
public void parseSolrResponse(SolrDocumentList doclist,
MetadataListWrapper metadatalistwrapper){
// No change.
while (iter.hasNext()) {
SolrDocument doc = iter.next();
FieldReader fieldReader = new FieldReader() {
@Override
public String get(String fieldName) {
String field = doc.get(fieldName);
return field;
}
};
Object entity = buildEntityFromFields(fieldReader, unmarshaller);
try {
adjustScore(maxScore, unmarshalledObj, (float) doc.get("score"));
} catch (NullPointerException e) {
throw new RuntimeException(SCORE_NOT_IN_FIELD_LIST);
}
xmlList.add(entity);
}
}
b. MB-JSON Writer
MBJSONWriter works by converting the output object from MB-XML format to MB-JSON format. We can reuse this writer without any modification once the changes to MBXMLWriter are implemented.
c. Benchmarking
To validate that eliminating _store doesn’t regress query performance, we’ll record production queries using Solr’s Request Logging , then replay them against both the old and new schema. Metrics to compare: query latency (QTime) and end-to-end response time (QTime + time taken by response writer) .
d. Validation Tests
Similar to the existing test strategy, test documents are added to Solr. The modified writers are then queried and their output is validated against the expected <entity>.xml and <entity>.json files.
Example test for artist entity:
Populate all the required fields in the getDoc() method
test/../AbstractMBWriterArtistTest.java
package org.musicbrainz.search.solrwriter;
import java.util.ArrayList;
import java.util.Arrays;
public abstract class AbstractMBWriterArtistTest extends AbstractMBWriterTest {
@Override
public ArrayList<String> getDoc() {
return new ArrayList<String>(Arrays.asList(new String[]{
"mbid", uuid,
"artist", "Howard Shore",
"sortname", "Shore, Howard",
"type-name", "Person",
"gender-name", "Male",
"country", "CA",
"area-id", "71bbafaa-e825-3e15-8ca9-017dcad1748b",
"area-name", "Canada",
"area-sort-name", "Canada",
"begin-area-id", "74b24e62-d2fe-42d2-9d96-31f2da756c77",
"begin_area-name", "Toronto",
"begin-area-sort-name", "Toronto",
"life_span-begin_date", "1946-10-18",
"lifespan-ended", "false",
"alias_list", "<alias-list><alias sort-name=\"Shore\">Shore</alias><alias sort-name=\"Howard Shaw\">Howard Shaw</alias><alias sort-name=\"H. Shore\">H. Shore</alias></alias-list>",
"tag_list", "<tag-list count=\"10\">" +
"<tag count=\"1\"><name>lord of the rings</name></tag>" +
"<tag count=\"2\"><name>classical</name></tag>" +
"<tag count=\"2\"><name>canadian</name></tag>" +
"<tag count=\"1\"><name>film composer</name></tag>" +
"<tag count=\"1\"><name>score</name></tag>" +
"<tag count=\"1\"><name>academy award winner</name></tag>" +
"<tag count=\"1\"><name>easy listening soundtracks and musicals</name></tag>" +
"<tag count=\"2\"><name>soundtrack</name></tag>" +
"<tag count=\"1\"><name>howard</name></tag>" +
"<tag count=\"1\"><name>shore</name></tag>" +
"</tag-list>" }));
}
}
Modify AbstractMBWriterTest to use all document values and then validate the response output against the expected file.
test/../AbstractMBWriterTest.java
@Test
public void performCoreTest() throws Exception {
ArrayList<String> docValues = new ArrayList<>(getDoc());
assertU(adoc(docValues.toArray(new String[0])));
assertU(commit());
String expectedFileName = String.format("%s-list.%s", getCorename(), getExpectedFileExtension());
String expectedFilePath = AbstractMBWriterTest.class.getResource(expectedFileName).getFile();
byte[] content = Files.readAllBytes(Paths.get(expectedFilePath));
String expected = new String(content);
String response = h.query(req("qt", "/advanced", "q", "*:*", "wt", getWritername()));
compare(expected, response);
}