Code scrapes first webpage twice, but then scrapes the next six as it's meant to


I’m trying to scrape football scores from 8 pages online. For some reason my code is scraping the results from the first page twice, it goes on to scrape the next 6 pages as it should, then leaves out the final page.

Here is my code

from bs4 import BeautifulSoup
from selenium import webdriver
from import Options
from import WebDriverWait

import time
import requests
import numpy as np

chrome_options = Options()

driver = webdriver.Chrome(options=chrome_options)
wait = WebDriverWait(driver, 10)

scores = []

for i in range(1,9,1):
    url = '' + str(i) + '/'
    soup = BeautifulSoup(driver.page_source, 'lxml')
    main_table = soup.find('table', class_ ='table-main')
    rows_of_interest = main_table.find_all('tr', class_ = ['odd deactivate', 'deactivate'])

        score = row.find('td', class_ = 'center bold table-odds table-score').text

Help would be much appreciated


I fixed it by shifting the loop up by 1

for i in range(2,10,1):

I still have no idea why this works because the page numbers are 1-8


You should put a delay between driver.get(url) and soup = BeautifulSoup(driver.page_source, 'lxml') to let the new page loaded.
Without that the first iteration reads the first page correctly since
soup = BeautifulSoup(driver.page_source, 'lxml') action waits for page (any) to be loaded before scraping it content, but in the second iteration you will read the content of the first page again since the second page is still not loaded.
The time.sleep(5) command in it’s wrong locating will cause all the next pages to be scraped but with delay of 1 iteration causing the last page to not being scraped.
With delay at the correct place it will work correctly

for i in range(1,9,1):
    url = '' + str(i) + '/'
    soup = BeautifulSoup(driver.page_source, 'lxml')

Answered By – Prophet

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published