Scraping data from webpage with data download delay


I have tried looking at a couple of questions on this site with this problem but I can’t get their solutions working. I am using python and selenium with a chrome headless browser to scrape bond data from vanguard. Vanguard loads the data on the page on a delay and I can’t figure out how to get the data in properly.

I am trying to load data from this webpage, specifically the data from the fund facts table

When I tried doing this as I typically do I get

<iframe data-delayed-src=";src=844392;u7=vgmf;type=remar743;cat=mutua911;u1=prd;ord=1632433243910?" id="floodIframe" src=";src=844392;u7=vgmf;type=remar743;cat=mutua911;u1=prd;ord=1632433243910?"></iframe>

So I tried using this line of code to get the browser to wait until the data is loaded.

WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "data-ng-class")))

I am sure this is on the right track but I don’t know how to properly tell what element I should be waiting to indentify and if I am doing it correctly. Is there a way for me to wait until the iframe data-delayed-src element goes away to get the data?

I have seen usages of it with By.ID but I don’t see any elements in the data html that I want that have an id.

Here is the code I am using

from bs4 import BeautifulSoup
from selenium import webdriver
from import WebDriverWait
from import expected_conditions as EC
from import By
from selenium.common.exceptions import TimeoutException
import os

dirname = os.path.dirname(__file__)
options = webdriver.ChromeOptions()
browser = webdriver.Chrome(options=options, executable_path=os.path.join(dirname, 'chromedriver'))
symbol = 'vbirx'
url_vanguard = '{}'
# WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "data-ng-class")))

html = browser.page_source
mySoup = BeautifulSoup(html, 'html.parser')
htmlData = mySoup.find('table',{'role':'presentation'})
table = htmlData.find('tbody')
print('table: \n',table)

The table prints out missing all the data I want like this

<!-- ngRepeat: item in genericTableData.items -->


I used the XPath of the Fund facts table in the WebDriverWait statement to get it working.

Code snippet:-

symbol = 'vbirx'
url_vanguard = '{}'

#waiting for the fund facts table to load
WebDriverWait(browser, 15).until(EC.presence_of_element_located((By.XPATH,'//*[@class="summary-table historical-table col2Wide"]')))

html = browser.page_source
mySoup = BeautifulSoup(html, 'html.parser')
htmlData = mySoup.find('table',{'role':'presentation'})
table = htmlData.find('tbody')
rows = table.find_all('td')
for row in rows:

Answered By – Kamalesh S

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published