Remove HTML tags and their content from a string

Issue

How do you remove content wrapped in HTML tags (including the tags) from a string?

In PHP I would like to remove all HTML content from a string. By this, I want to remove both the Tags and the content within the Tags.

Example:

Moby-Dick is an 1851 novel by American writer Herman Melville.
The book is the sailor Ishmael’s narrative of the
obsessive quest of Ahab, captain of the whaling ship
Pequod, for revenge against Moby Dick, the giant white sperm whale
that on the ship’s previous voyage bit off Ahab’s leg
at the knee.

Would end up as:

Moby-Dick is an novel by American writer Herman Melville. narrative
of the obsessive quest of , captain of the whaling ship Pequod, for
revenge against Moby Dick, the giant white sperm whale that on the
ship’s off Ahab’s leg at the knee.

You can assume the following:

  • No nested tags
  • Not limited to HTML tags in the example

In a pinch this solution comes close. I could use it and then use strip_tags to clear the tags but seems like that extra step should be unnecessary.

This example also works, but I believe the regex can be adjusted to capture all tags and making the loop unnecessary:

function get_without_enclosed_text( $paragraph ) {
    $tags = array(
        'b', 'strong', 'em', 'i',
        'span', 'div',
        'a'
    );

    foreach($tags as $tag) {
        $regex = "/<$tag(.*)<\/$tag>/iUs";
        $paragraph = preg_replace($regex, "", $paragraph);
    }

    return $paragraph;
}

Solution

Looks like this slight modificiation of an old answer fits your needs.

(?si)<(\w+)\b[^><]*>(?:.*?<\/\1>)?

See demo at regex101 (explanation on the right side)

With flags s dotall (makes dot match newlines) and i caseless (catch e.g. <B>...</b>).

The elements content and closing tag in this one is optional to match void elements like <img...> too. It will strip out any elements with content. Not good for parsing arbitrary or nested html.

Answered By – bobble bubble

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published