Issue
I am working on getting information about a product listed here. I am using Selenium and Google Colab . I am having a problem accessing the text on the b tag. For other attributes such as name, seller, price, etc. can be scraped without problems.
This is the snippet of the HTML.
<div class="css-1le9c0d pad-bottom">
<img src="https://assets.tokopedia.net/assets-tokopedia-lite/v2/zeus/kratos/3ac8f50c.svg" alt="">
<div>Dikirim dari
<b>Kota Depok</b>
</div>
</div>
This is my driver settings.
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
webdriver_path = webdriver.Chrome('chromedriver', options=options)
driver = webdriver.Chrome('chromedriver', options=options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/85.0.4183.102 Safari/537.36'})
This is the code that I have tried.
sample_link = 'https://www.tokopedia.com/naturashop27/bio-oil-original-penghilang-bekas-luka-strecth-mark-isi-125ml?whid=0'
driver.get(sample_link)
time.sleep(1.5)
try:
product = driver.find_elements_by_tag_name('h1')[0].text
except:
product = np.nan
try:
shop_url = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@class='css-1n8curp']"))).get_attribute("href")
except:
shop_url = np.nan
# ....
try:
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'pad-bottom')]//b")))
loc = driver.find_element_by_xpath("//div[contains(@class,'pad-bottom')]//b").text
except:
loc = np.nan
This is the output from the code above. As you can see, the text on the b tag is nan instead of Kota Depok.
Bio Oil Original Penghilang Bekas Luka & Strecth Mark isi 125ml
https://www.tokopedia.com/naturashop27
nan
Please see the solution below. The issues are the following:
- element is not loaded fully before scraping the element.
- Using driver.set_window_size(1124,850) works in Colab.
Solution
You may wanna try this :
Element is not in Selenium view port, you need to scroll a bit to get the job done.
try:
driver.execute_script("window.scrollTo(0, 100)")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'pad-bottom')]"))).text)
except:
loc = np.nan
O/P :
Dikirim dari Kota Depok
Process finished with exit code 0
I have used this xpath : //div[contains(@class, 'pad-bottom')]
that will print Dikirim dari Kota Depok
if you use //div[contains(@class,'pad-bottom')]//b
you will get Kota Depok
Update 1 :
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.tokopedia.com/naturashop27/bio-oil-original-penghilang-bekas-luka-strecth-mark-isi-125ml?whid=0")
wait = WebDriverWait(driver, 10)
try:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.TAG_NAME, "h1"))).text)
except:
product = np.nan
try:
shop_url = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@data-testid='llbPDPFooterShopName']"))).get_attribute("href")
print(shop_url)
except:
shop_url = np.nan
try:
driver.execute_script("window.scrollTo(0, 100)")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'pad-bottom')]"))).text)
except:
loc = np.nan
This gives me :
Bio Oil Original Penghilang Bekas Luka & Strecth Mark isi 125ml
https://www.tokopedia.com/naturashop27
Dikirim dari Kota Depok
Process finished with exit code 0
Answered By – cruisepandey
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0