Issue
Using Selenium (Python) to avoid spoilers of a soccer game
I am trying to grab the url for a video of soccer match replay from a dynamically changing webpage. The webpage shows the score and I’d rather get the link directly, rather than visiting the website that almost certainly will show me the score. There are other related videos of the match, like 10 minute highlight reel. But I would like the full replay only.
There is a list of videos on the page to choose from. But the ‘h1’ heading indicating it’s a full replay is wrapped inside the ‘a’ tag (see below). There are ~10 of these list items on the page but they are distinguished only from the content of ‘h1’, buried as child. The text that I’m after Brentford v LFC : Full match. The "full match" part is the give away.
My problem is how do I get the link when the important information comes in a later child??
<li data-sidebar-video="0_5de4sioh" class="js-subscribe-entitlement">
<a class="" href="//video.liverpoolfc.com/player/0_5de4sioh/">
<article class="video-thumb video-thumb--fade-in js-thumb video-thumb--no-duration video-thumb--sidebar">
<figure class="video-thumb__img">
<div class="site-loader">
<ul>
<li></li>
<li></li>
<li></li>
</ul>
</div> <img class="video-thumb__img-container loaded" data-src="//open.http.mp.streamamg.com/p/101/thumbnail/entry_id/0_5de4sioh/width/150/height/90/type/3" alt="Brentford v LFC : Full match" onerror="PULSE.app.common.VideoThumbError(this)" onload="PULSE.app.common.VideoThumbLoaded(this)"
src="//open.http.mp.streamamg.com/p/101/thumbnail/entry_id/0_5de4sioh/width/150/height/90/type/3" data-image-initialised="true"> <span class="video-thumb__premium">Premium</span> <i class="video-thumb__play-btn"></i> <span class="video-thumb__time"> <i class="video-thumb__icon"></i> 1:45:07 </span> </figure>
<div class="video-thumb__txt-container"> <span class="video-thumb__tag js-video-tag">Match Action</span>
<h1 class="video-thumb__heading">Brentford v LFC : Full match</h1> <time class="video-thumb__date">25th Sep 2021</time> </div>
</article>
</a>
</li>
My code looks like this at the moment. It gives me a list of the links but I don’t know which one is which.
from selenium import webdriver
#------------------------Account login---------------------------#
#I have to login to my account first.
#----------------------------------------------------------------#
username = "<my username goes here>"
password = "<my password goes here>"
username_object_id = "login_form_username"
password_object_id = "login_form_password"
login_button_name = "submitBtn"
login_url = "https://video.liverpoolfc.com/mylfctvgo"
driver = webdriver.Chrome("/usr/local/bin/chromedriver")
driver.get(login_url)
driver.implicitly_wait(10)
driver.find_element_by_id(username_object_id).send_keys(username)
driver.find_element_by_id(password_object_id).send_keys(password)
driver.find_element_by_name(login_button_name).click()
#--------------Find most recent game played----------------#
#I have to go to the matches section of my account and click on the most recent game
#----------------------------------------------------------------#
matches_url = "https://video.liverpoolfc.com/matches"
driver.get(matches_url)
driver.implicitly_wait(10)
latest_game = driver.find_element_by_xpath("/html/body/div[2]/section/ul/li[1]/section/div/div[1]/a").get_attribute('href')
driver.get(latest_game)
driver.implicitly_wait(10)
#--------------Find the full replay video----------------#
#There are many videos to choose from but I only want the full replay.
#--------------------------------------------------#
#prints all the videos in the list. They all have the same "data-sidebar-video" attribute
web_element1 = driver.find_elements_by_css_selector('li[data-sidebar-video*=""] > a')
print(web_element1)
for i in web_element1:
print(i.get_attribute('href'))
Solution
You can do this with a simple XPath locator since you are searching based on contained text.
//a[.//h1[contains(text(),'Full match')]]
^ an A tag
^ that has an H1 descendant
^ that contains the text "Full match"
NOTE: You can’t just get the href
from the A tag since it isn’t a complete URL, e.g. //video.liverpoolfc.com/player/0_5de4sioh/
. I would suggest you just click on the link. If you want to write it to a file, you’ll have to append "https:" to the front of these partial URLs to make them usable.
Answered By – JeffC
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0