Get all HTML tags with Beautiful Soup

Issue

I am trying to get a list of all html tags from beautiful soup.

I see find all but I have to know the name of the tag before I search.

If there is text like

html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>"""

How would I get a list like

list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"]

I know how to do this with regex, but am trying to learn BS4

Solution

You don’t have to specify any arguments to find_all() – in this case, BeautifulSoup would find you every tag in the tree, recursively.

Sample:

from bs4 import BeautifulSoup

html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>
"""
soup = BeautifulSoup(html, "html.parser")

print([tag.name for tag in soup.find_all()])
# ['div', 'div', 'div', 'p']

print([str(tag) for tag in soup.find_all()])
# ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']

Answered By – alecxe

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published