Issue
i am using selenium to go search on agoda and scrape all the hotel name in the page, but the output only return 2 names.
Then i tried to add a line to scroll to the bottom, now the output gives me first 2 names and last 2 names (first two from beginning, last two from bottom)
I don’t understand what’s the problem, i added time.sleep() for each step so the whole page should have been loaded completely. Does selenium limit by page view that it can only scrape those element in sight?
my code below:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(30)
def scrape():
r = requests.get(current_page)
if r.status_code == requests.codes.ok:
print('start scraping!')
hotel = driver.find_elements_by_class_name('hotel-name')
hotels = []
for h in hotel:
if hotel:
hotels.append(h.text)
print(hotels, file=open("output.txt", 'a', encoding="utf-8"))
scrape()
Here is the page i want to scrape
Solution
Try to use below script to scroll page down until no more results appeared on page and then scrape all available names:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.agoda.com/pages/agoda/default/DestinationSearchResult.aspx?asq=8wUBc629jr0%2B3O%2BxycijdcaVIGtokeWrEO7ShJumN8xsNvkFkEV9bUgNnbx6%2Bx22ncbzTLOPBjT84OgAAKXmu6quf8aEKRA%2FQH%2BGoyXgowLt%2BXyB8OpN1h2WP%2BnBM%2FwNPzD%2BpaeII93w%2Bs4dMWI4QPJNbZJ8DWvRiPsrPVVBJY7ilpMPlUermwV1UKIKfuyeis3BqRkJh9FzJOs0E98zXQ%3D%3D&city=9590&cid=-142&tick=636818018163&languageId=20&userId=3c2c4cb9-ba6d-4519-8ef4-c85dfd280b8f&sessionId=d4qzq2tgymjrwsf22lnadxpc&pageTypeId=1&origin=HK&locale=zh-TW&aid=130589¤cyCode=HKD&htmlLanguage=zh-tw&cultureInfoName=zh-TW&ckuid=3c2c4cb9-ba6d-4519-8ef4-c85dfd280b8f&prid=0&checkIn=2019-01-16&checkOut=2019-01-17&rooms=1&adults=2&children=0&priceCur=HKD&los=1&textToSearch=%E5%A4%A7%E9%98%AA&productType=-1&travellerType=1')
# Get initial list of names
hotels = wait(driver, 15).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'hotel-name')))
while True:
# Scroll down to last name in list
driver.execute_script('arguments[0].scrollIntoView();', hotels[-1])
try:
# Wait for more names to be loaded
wait(driver, 15).until(lambda driver: len(wait(driver, 15).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'hotel-name')))) > len(hotels))
# Update names list
hotels = wait(driver, 15).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'hotel-name')))
except:
# Break the loop in case no new names loaded after page scrolled down
break
# Print names list
print([hotel.text for hotel in hotels])
Answered By – Andersson
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0