Removing parts of a string returned when scraping with Selenium

Issue

I have written code in Selenium to scrape Accor’s booking website after certain information has been passed. I am able to scrape and return the names of all the hotels on the resultant page with this code.

url = 'https://all.accor.com/ssr/app/accor/hotels/london/index.en.shtml?dateIn=2021-08-20&nights=8&compositions=1&stayplus=false'
driver = webdriver.Chrome(executable_path='C:\\Users\\conor\\Desktop\\diss\\chromedriver.exe')
driver.get(url)
time.sleep(10)
working = driver.find_elements_by_class_name('hotel__wrapper')
for work in working:
    name = work.find_element_by_class_name('title__link').text
    name = name.strip()
    print(name)

This returns all of the hotel names on the page as expected, however, it’s also returning an extra line with each hotel name, with the star rating of the hotel, which I don’t see in the HTML markup on the page. Here is the output.

Sofitel London St James
5 Star rating
The Savoy
5 Star rating
Mercure London Bloomsbury Hotel
4 Star rating
Novotel London Waterloo
4 Star rating
ibis London Blackfriars
3 Star rating
Novotel London Blackfriars
4 Star rating
Mercure London Bridge
4 Star rating
Novotel London Bridge
4 Star rating
ibis Styles London Southwark - near Borough Market
3 Star rating
Pullman London St Pancras
4 Star rating

Is there a way to remove this extra line of text for the rating being returned with the hotel name? As I only want the hotel names as I am using the names to compare the prices on different sites. Any help appreciated, thank you.

Solution

Since you have two string one with name and the other with rating you can split the string and can only use the hotel name part. Here is the example:

for work in working:
    name_with_rating = work.find_element_by_class_name('title__link').text
    name = name_with_rating.split("\n")[0]
    print(name)

Answered By – theNishant

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published