Strip html tag from comment

Tags: #<Tag:0x00007f0ea97b1498> #<Tag:0x00007f0ea97b13a8>

I have some audiobooks that have HTML tag on comment/description, is there a way I can strip these HTML tags?

example: <b>"For anyone who is a mother, or who has a mother, [<i>Mom Genes</i>] is an eye-opening tour through the biology and psychology of a role that is at once utterly ordinary and wondrously strange." &#8212;Annie Murphy Paul, author of <i>Origins</i></b><BR> <BR><b>From the <i>New York Times</i> bestselling author of <i>The Lion in the Living Room</i> comes a fascinating and provocative exploration of the biology of motherhood.</b><BR>Everyone knows how babies are made, but scientists are only just beginning to understand the making of a mother. Mom Genes reveals the hard science behind our tenderest maternal impulses, tackling questions such as whether a new mom's brain ever really bounces back, why mothers are destined to mimic their own moms (or not), and how maternal aggression makes females the world's most formidable creatures.<BR> <BR>Part scientific odyssey, part memoir, Mom Genes weaves the latest research with Abigail Tucker's personal experiences to create a delightful,...

A script like this should do:

$set(comment,$rreplace(%comment%,<[^>]+>,))

Add the script in Options > Scripting. If the script is active (checkbox checked) it will automatically run on data loaded on the right. Or you can run it manually also on your local files via context menu.

Please double check how the comments get loaded for you into Picard. Instead of just comment it could also be comment:somedescription, with somedescription depending on your tags.

1 Like

I was going to suggest something extremely similar using $rreplace.

The only thing I would add is that IMO you should replace the HTML tag with a space, and then replace multiple spaces with a single one otherwise e.g. “Line<br>Line” will end up as “LineLine” not “Line Line”.

1 Like