Python selenium get rid of hyphens and other special html characters


I scrape websites with selenium and put then the content in pandas to easily use it. My only problem is that when I use the .text function on a selenium webelement, all the special html characters are kept but cannot be deleted because they are invisible. Is there a way to delete them all when scraping ?

Thank you all !


I have encountered a similar problem awhile ago. Without any reproducible code or HTML it’s a bit hard to say, but the best way I found was to remove special characters was executing a JS script:

driver.execute_script("var element = document.getElementsByClassName('<class_name>');for (var i = element.length - 1; i >= 0; --i) {element[i].remove();}")

Replace <class_name> with the name of the class you would like to remove. Now you can grab the webelement you need without worrying about special characters.

Answered By – Luke Hamilton

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published