How to efficiently scrap data from dynamic websites using Selenium?


I want to scrape data from website .
I am trying to access each blog and then click on the link and scrape the details on the details page of a given blog.

I tried to use BeautifulSoup but it returned no data, and I realized the data was loaded dynamically with JavaScript.
Then I tried to use Selenium to scrape it and this the code I came up with:

from selenium import webdriver

from import Service

service = Service('/usr/bin/chromedrivers')


driver = webdriver.Remote(service.service_url)



Unfortunately, my code returns no results.

How best can I improve it so that I get the desired results from the blog?


You don’t need selenium for this. When a page is loaded dynamically, you can look up in Network tab which urls are being accessed. The following code will get you started – returning a dataframe with blog title & url. You can further access those urls. Do tell if you need guidance.

The code is below:

import requests
import pandas as pd
from bs4 import BeautifulSoup

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0',
           'accept': 'application/json'

df_list = []
for x in range(1, 5):
    r = requests.get(f'{x}&offset=0&post_type=post&repeater=default&seo_start_page=1&preloaded=false&preloaded_amount=0&order=DESC&orderby=date&action=alm_get_posts&query_type=standard', headers=headers)
    soup = BeautifulSoup(r.json()['html'], 'html.parser')
    for y in''):
        df_list.append((y.select_one('h4').text.strip(), y.select_one('a.more-link').get('href')))

df = pd.DataFrame(df_list, columns = ['Title', 'URL'])

This returns:

Title   URL
0   Addressing the Youth Mental Health Crisis Requ...
1   Remote work: What does it mean for local offic...
2   Second Nature?
3   6 Benefits of Continuous Behavioral Health Mea...
4   A New Level of Measurement-Based Care
5   4 Ways Continuous Behavioral Health Measuremen...

Answered By – platipus_on_fire

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published