Python BeautifulSoup & Selenium not scraping full html


Beginner web-scraper here. My practice task is simple: Collect/count a player’s Pokemon usage over their last 50 games, on this page for example. To do this, I planned to use the image url of the Pokemon which contains the Pokemon’s name (in an <img> tag, encased by <span></span>). Inspecting from Chrome looks like this: <img alt="Played pokemon" srcset="/_next/image?url=%2FSprites%2Ft_Square_Snorlax.png&amp;w=96&amp;q=75 1x, /_next/image?url=%2FSprites%2Ft_Square_Snorlax.png&amp;w=256&amp;q=75 2x" ...

1) Using Beautiful Soup alone doesn’t get the html of the images that I need:

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('')
wp_player = bs(r.content)'span img')

2) Using Selenium picks up some of what BeautifulSoup missed:

from bs4 import BeautifulSoup as bs
from selenium import webdriver
from import Options

url = ""
options = Options()
driver = webdriver.Chrome(options=options)
page = driver.page_source

soup = bs(page, 'html.parser')'span img')

But it gives me links that look like this: <img alt="Played pokemon" data-nimg="fixed" decoding="async" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7"

What am I misunderstanding here? The website I’m interested in does not have a public API, despite its name. Any help is much appreciated.


This is a common issue while web scraping websites before these gets loaded completely. What you’ll have to do is basically wait for the page to fully load the images that you are requiring. You have two options, either implicit wait or explicit wait for the image elements to get loaded.

from selenium import webdriver
from import WebDriverWait
from import expected_conditions as EC
from import By
from import Options

url = r""
options = Options()
driver = webdriver.Chrome(executable_path='./chromedriver.exe', options=options)
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[alt="Played pokemon"]'))) # EXPLICIT WAIT
driver.implicitly_wait(10) # IMPLICIT WAIT

pokemons = driver.find_elements_by_css_selector('[alt="Played pokemon"]')
for element in pokemons:

You have to choose one or the other, but it’s better to explicit wait for the element(s) to get rendered before you try to access to their values.

pokemons =
driver.find_elements_by_css_selector(‘[alt="Played pokemon"]’)

Your workaround wasn’t working because you are doing a get request to the page that gets you the html values at their initial state, when all the DOM elements are still yet to get rendered.

Answered By – SaC-SeBaS

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published