How to scrape a page that is dynamicaly locaded?


So here’s my problem. I wrote a program that is perfectly able to get all of the information I want on the first page that I load. But when I click on the nextPage button it runs a script that loads the next bunch of products without actually moving to another page.

So when I run the next loop all that happens is that I get the same content of the first one, even when the ones on the browser I’m emulating itself is different.

This is the code I run:

from selenium import webdriver 
from import By
from bs4 import BeautifulSoup
import time

soup = BeautifulSoup(driver.page_source, 'html.parser')  

#     ///////////       code to find total number of pages
currentPage = 0
button_NextPage = driver.find_element(By.ID, 'nextButton')

while currentPage != totalPages:
#    /////////       code to find the products
    currentPage += 1
    button_NextPage = driver.find_element(By.ID, 'nextButton')

Is there any way for me to scrape exactly what’s loaded on my browser?


The issue it seems to be because you’re just fetching the page 1 as shown in the next line:


But as you can see there’s a query parameter called page in the url that determines which html’s page you are fetching. So what you’ll have to do is every time you’re looping to a new page you’ll have to fetch the new html content with the driver by changing the page query parameter. For example in your loop it will be something like this:

driver.get("{page}&view=grid".format(page = currentPage))

And after you fetch the new html structure you’ll be able to access to the new elements that are present in the differente pages as you require.

Answered By – SaC-SeBaS

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published