Issue
I’m trying to scrape this Page https://rarity.tools/thecryptodads
Using Selenium in python.
At the top of the right of each card below, there’s the owner name that contains a link once pressed, it takes you to that owner’s page.
When I inspect the element I can clearly see the a tag with the href link as shown below:
However, When I try to scrape it. it gets neither that text within the a tag nor the href.
I tried to get the div above it which contains this a tag along with another div that contains the number on the card located top left, but when I get the innerText of the div. it only gets the text of the first div AKA the number. (prints 1 for the first card).
here’s the code on how I’m trying to get the link:
PATH = "C:\Program Files (x86)\chromedriver"
driver = webdriver.Chrome(PATH)
driver.implicitly_wait(10)
driver.get("https://rarity.tools/thecryptodads")
try:
click = WebDriverWait(driver, 10).until(
EC.presence_of_element_located(
(By.XPATH, "/html/body/div/div/div/div[2]/div[2]/div[8]/div[1]/div[1]/div/div[1]/a"))
)
print(click.text)
except:
print()
I tried to get the item by class name, css selector, xpath, full xpath. still cant get the href.
BUT when I go into the debug mood and go through it line by line, I can see that this object is holding the text I want and it prints it at the end of the execution. which is so weird to me. I assume that this text is using some sort of encryption! that prevents me form scaping it!
Solution
The Website is taking time to load all the attribute values.
There are two ways to get that output.
1: Apply time.sleep(10)
after driver.get(URL)
.
driver.get("https://rarity.tools/thecryptodads")
time.sleep(10)
ele = driver.find_element_by_xpath("//div[contains(@class,'flex-1')]/div[2]/div[8]/div[1]/div[1]/div/div[1]/a")
print(ele.get_attribute("href"))
print(ele.text)
2: Apply Explicit wait
till the href
attribute has a value starting from https
.
driver.get("https://rarity.tools/thecryptodads")
wait= WebDriverWait(driver,30)
wait.until(EC.presence_of_element_located((By.XPATH,"//div[contains(@class,'flex-1')]/div[2]/div[8]/div[1]/div[1]/div/div[1]/a[contains(@href,'https')]")))
ele = driver.find_element_by_xpath("//div[contains(@class,'flex-1')]/div[2]/div[8]/div[1]/div[1]/div/div[1]/a")
print(ele.get_attribute("href"))
print(ele.text)
Output for both 1 and 2:
https://opensea.io/accounts/0x8f612b1a1afcc4a55879bb02212454ae79cab04b?ref=0x5c5321ae45550685308a405827575e3d6b4a84aa
0x8f61
Its better to go for Relative xpath compared to Absolute.
Answered By – pmadhu
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0