Issue
I scrape websites with selenium and put then the content in pandas to easily use it. My only problem is that when I use the .text function on a selenium webelement, all the special html characters are kept but cannot be deleted because they are invisible. Is there a way to delete them all when scraping ?
Thank you all !
Solution
I have encountered a similar problem awhile ago. Without any reproducible code or HTML it’s a bit hard to say, but the best way I found was to remove special characters was executing a JS script:
driver.execute_script("var element = document.getElementsByClassName('<class_name>');for (var i = element.length - 1; i >= 0; --i) {element[i].remove();}")
Replace <class_name> with the name of the class you would like to remove. Now you can grab the webelement you need without worrying about special characters.
Answered By – Luke Hamilton
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0