Scraping a website that has a "Load more" button doesn't return info of newly loaded items with Beautiful Soup and Selenium

Issue

So I’m using Selenium to press the "Load more" button and everything loads properly. Then I want to get the info of all the loaded products but I only get the info of the first 36 items that are before the first "Load more" button.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import json
import time
import requests
allinfo=[]
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
url="https://zadaa.co/de-en/products/women/clothes-dresses/"
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),chrome_options=chrome_options)
driver.get(url)
r=requests.get(url)
soup=BeautifulSoup(r.content,"html.parser")
wait = WebDriverWait(driver, 10)
closebutton=wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="content"]/div[5]/button')))
closebutton.click()
for x in range(9):
    button = wait.until(EC.element_to_be_clickable((By.ID, "load-more-products")))
    button.click()
content=soup.find_all('a',class_='product-list-item')
for properties in content:
    brand=properties.find("p",class_='product-list-item-title').text
    info={
        'name':brand,
    }
    allinfo.append(info)
df=pd.DataFrame(allinfo)
print(df.head())
df.to_csv('zadaa.csv')

This is the web page I’m trying to scrape-
https://zadaa.co/de-en/products/women/clothes-dresses/

Sorry for some weird English usage.

Solution

You can simulate Ajax calls with requests module to get the data directly, without selenium (beware, there are 12k+ products):

import requests
from bs4 import BeautifulSoup


url = "https://zadaa.co/de-en/products/women/clothes-dresses/"
api_url = "https://zadaa.co/wp-admin/admin-ajax.php"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

payload = {
    "action": "get_more_products",
    "lang": "de-en",
    "security": "05ef973f4c",
    "query_id": soup.select_one("[data-query-id]")["data-query-id"],
    "offset": 0,
}


while True:
    data = requests.post(api_url, data=payload).json()
    if not data["success"]:
        break

    soup = BeautifulSoup(data["data"], "html.parser")

    for i in soup.select(".product-list-item"):
        print(i.select_one(".product-list-item-title").text)
        print(i["href"])
        print("-" * 80)

    payload["offset"] += 36

Prints:

...

CITY GIRL PARIS
3735824
-------------------------------------------------------------------------------- ZAFUL
3735781
-------------------------------------------------------------------------------- NKD
3735768
-------------------------------------------------------------------------------- GREAT RUMORS
3735762
-------------------------------------------------------------------------------- ...and so on.

Answered By – Andrej Kesely

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published