XPath parent::node() not working as expected

Issue

I’m trying to scrape multiple pages they have some measures but they don’t have the same order in all pages so i have to check in every page which measure is that..
so i’ve tried to get the parent node of the following text : SO,NO and CO to check which element is that and then put it in the right place
from the following html document:

<ul class="sc-hHftDr gOAyWd">
    <li>
       <p class="sc-bkzZxe card__subtitle">SO₂</p>
       <h1 class="sc-idOhPF ipvImd card__highlight-text">0.00</h1>
       <strong>ppb</strong><p>2022/06/13 07:00</p> 
    </li>
    <li>
       <p class="sc-bkzZxe card__subtitle">NO₂</p>
       <h1 class="sc-idOhPF ipvImd card__highlight-text">1.00</h1> 
       <strong>ppb</strong><p>2022/06/26 20:00</p>
   </li>
   <li>
     <p class="sc-bkzZxe card__subtitle">CO</p>
     <h1 class="sc-idOhPF ipvImd card__highlight-text">0.00</h1>
     <strong>ppb</strong>
     <p>2021/07/07 04:00</p>
   </li>
</ul>

i’ve tried something like this:
”’

elements_name = ['PM10','PM2.5',"PM1","CO","SO","O","NO"]
for element in elements_name:
    driver.find_element_by_xpath(f"//ul[@class='sc-hHftDr gOAyWd']//li//p[contains(., 
 {element})]").find_element_by_xpath("parent::node()").find_element_by_css_selector('h1[class="sc-idOhPF ipvImd card__highlight-text"]').text.strip())

but the problem is that parent::node() pulls the ‘SO’ element for every element_name each time, it does not get the right parent of the node
I also tried

('..') and ('parent::li')

Solution

I think the contains() function will return an unexpected result at least sometimes, because e.g. contains('SO₂', 'O') is true, and so is contains('PM10', 'PM1'). I think you should just use the = operator instead of contains().

You should be able to use a single XPath expression. Something like this:

driver.find_element_by_xpath(
   f"//ul[@class='sc-hHftDr gOAyWd']"
   f"/li[p[@class='sc-bkzZxe card__subtitle']='{element}')]"
   f"/h1[@class='sc-idOhPF ipvImd card__highlight-text']"
).text.strip())

=

  • search the entire document for the ul,
  • select the child li whose subtitle exactly matches (not contains!) the element parameter,
  • return the h1 child of that
    li.

Answered By – Conal Tuohy

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published