How do I extract text information from an angular website?

Issue

I’m trying to extract certain text fields from this website but new to angular. I am using selenium to build this web scraper . I noticed that the exact text value is not stored in the html code. Can someone help or provide some tips to go about this. I tried using:

find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

But got no progress. Thank you 🙂

This is one way I tried to extract the text:

def csc():
    alpah_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P"]
    indexOfAlpha = 0
    indexOfSheet = 2
    for x in range(2,4):
        y = x + 2
        driver.implicitly_wait(20)
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div[2]/div/div['+ str(x) +']/div/div/div[6]/a').click()
        driver.implicitly_wait(20)
        worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/ul/li[2]/a/span').click()
        ranSleep()
        indexOfSheet += 1

But i get this error on terminal

Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element(By.cssSelector("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']")))
AttributeError: type object 'By' has no attribute 'cssSelector'
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py 
Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']")))
TypeError: 'str' object is not callable
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py 
Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
TypeError: 'str' object is not callable

P.S I’m sorry I will not be able to share the website since it requires a private login.

<input class="edited_field ng-pristine ng-untouched ng-valid ng-not-empty" type="text" ng-model="tab.content.site.name" ng-disabled="!tab.content.updateBtnPermission" disabled="disabled">

Snippet of the text I want to extract with the html and angular code

The error for Qharr

This is the code i wrote based on Qharr comment

def csc():
    alpah_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P"]
    indexOfAlpha = 0
    indexOfSheet = 2
    for x in range(2,4):
        y = x + 2
        driver.implicitly_wait(20)
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div[2]/div/div['+ str(x) +']/div/div/div[6]/a').click()
        driver.implicitly_wait(20)
        worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))
        ranSleep()
        driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div/ul/li[2]/a/span').click()
        ranSleep()
        indexOfSheet += 1
Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), str(driver.find_element(By.CSS_SELECTOR("input[class = 'edited_field ng-pristine ng-untouched ng-valid ng-not-empty'][ng-model = 'tab.content.site.name']"))))
TypeError: 'str' object is not callable
Shahans-MacBook-Pro:WebScraping Shahan$ python3 selTest.py 
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 469, in _write
    f = float(token)
TypeError: float() argument must be a string or a number, not 'WebElement'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "selTest.py", line 88, in <module>
    csc()
  File "selTest.py", line 44, in csc
    worksheet.write(alpah_list[indexOfAlpha] + str(indexOfSheet), driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 67, in cell_wrapper
    return method(self, *args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 408, in write
    return self._write(row, col, *args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/xlsxwriter/worksheet.py", line 474, in _write
    raise TypeError("Unsupported type %s in write()" % type(token))
TypeError: Unsupported type <class 'selenium.webdriver.remote.webelement.WebElement'> in write()

Solution

Current error complains about compound class names. Try

driver.find_element_by_css_selector('input.edited_field.ng-pristine.ng-untouched.ng-valid.ng-not-empty'))

You may need a wait condition as well and possibly can shorten the selector to use less classes.

Answered By – QHarr

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published