Selenium cannot get all elements of a page

Issue

i am using selenium to go search on agoda and scrape all the hotel name in the page, but the output only return 2 names.

Then i tried to add a line to scroll to the bottom, now the output gives me first 2 names and last 2 names (first two from beginning, last two from bottom)

I don’t understand what’s the problem, i added time.sleep() for each step so the whole page should have been loaded completely. Does selenium limit by page view that it can only scrape those element in sight?

my code below:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

time.sleep(30)

def scrape():
    r = requests.get(current_page)

if r.status_code == requests.codes.ok:
    print('start scraping!')
    hotel = driver.find_elements_by_class_name('hotel-name')
    
    hotels = []
    
    for h in hotel:
        if hotel:
            hotels.append(h.text)
                
    print(hotels, file=open("output.txt", 'a', encoding="utf-8"))
    
scrape()

Here is the page i want to scrape

Solution

Try to use below script to scroll page down until no more results appeared on page and then scrape all available names:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome()
driver.maximize_window()

driver.get('https://www.agoda.com/pages/agoda/default/DestinationSearchResult.aspx?asq=8wUBc629jr0%2B3O%2BxycijdcaVIGtokeWrEO7ShJumN8xsNvkFkEV9bUgNnbx6%2Bx22ncbzTLOPBjT84OgAAKXmu6quf8aEKRA%2FQH%2BGoyXgowLt%2BXyB8OpN1h2WP%2BnBM%2FwNPzD%2BpaeII93w%2Bs4dMWI4QPJNbZJ8DWvRiPsrPVVBJY7ilpMPlUermwV1UKIKfuyeis3BqRkJh9FzJOs0E98zXQ%3D%3D&city=9590&cid=-142&tick=636818018163&languageId=20&userId=3c2c4cb9-ba6d-4519-8ef4-c85dfd280b8f&sessionId=d4qzq2tgymjrwsf22lnadxpc&pageTypeId=1&origin=HK&locale=zh-TW&aid=130589&currencyCode=HKD&htmlLanguage=zh-tw&cultureInfoName=zh-TW&ckuid=3c2c4cb9-ba6d-4519-8ef4-c85dfd280b8f&prid=0&checkIn=2019-01-16&checkOut=2019-01-17&rooms=1&adults=2&children=0&priceCur=HKD&los=1&textToSearch=%E5%A4%A7%E9%98%AA&productType=-1&travellerType=1')
# Get initial list of names
hotels = wait(driver, 15).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'hotel-name')))

while True:
    # Scroll down to last name in list
    driver.execute_script('arguments[0].scrollIntoView();', hotels[-1])
    try:
        # Wait for more names to be loaded
        wait(driver, 15).until(lambda driver: len(wait(driver, 15).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'hotel-name')))) > len(hotels))
        # Update names list 
        hotels = wait(driver, 15).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'hotel-name')))
    except:
        # Break the loop in case no new names loaded after page scrolled down
        break

# Print names list
print([hotel.text for hotel in hotels])

Answered By – Andersson

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published