Python (Selenium/BeautifulSoup) Search Result Dynamic URL

Disclaimer: This is my first foray into web scraping

I have a list of URLs corresponding to search results, e.g.,

http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662

I'm trying to use Selenium to access the HTML of the result as follows:

for url in detail_urls:

    driver.get(url)

    html = driver.page_source

    soup = BeautifulSoup(html, 'html.parser')

    print(soup.prettify())

However, when I comb through the resulting prettified soup, I notice that the components I need are missing. Upon looking back at the page loading process, I see that the URL redirects a few times as follows:

http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662

https://www.vinelink.com/#/searchResults/id/offender/34003/33/2662

https://www.vinelink.com/#/searchResults/1

Does anyone have a tip on how to access the final search results data?

Update: After further exploration this seems like it might have to do with the scripts being executed to retrieve the relevant data for display... there are many search results-related scripts referenced in the page_source; is there a way to determine which is relevant?

I am able to Inspect the information I need per this image:

enter image description here

edited Nov 22 at 19:45

JaSON

2199

asked Nov 22 at 17:53

OJT

What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
– Joseph Choi
Nov 23 at 4:30

Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
– OJT
Nov 23 at 5:32

When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
– Joseph Choi
Nov 23 at 6:33

Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
– Joseph Choi
Nov 23 at 6:35

Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
– OJT
Nov 23 at 7:12

add a comment |

Disclaimer: This is my first foray into web scraping

I have a list of URLs corresponding to search results, e.g.,

http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662

I'm trying to use Selenium to access the HTML of the result as follows:

for url in detail_urls:

    driver.get(url)

    html = driver.page_source

    soup = BeautifulSoup(html, 'html.parser')

    print(soup.prettify())

http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662

https://www.vinelink.com/#/searchResults/id/offender/34003/33/2662

https://www.vinelink.com/#/searchResults/1

Does anyone have a tip on how to access the final search results data?

I am able to Inspect the information I need per this image:

enter image description here

edited Nov 22 at 19:45

JaSON

2199

asked Nov 22 at 17:53

OJT

What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
– Joseph Choi
Nov 23 at 4:30

Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
– OJT
Nov 23 at 5:32

When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
– Joseph Choi
Nov 23 at 6:33

Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
– Joseph Choi
Nov 23 at 6:35

Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
– OJT
Nov 23 at 7:12

add a comment |

Disclaimer: This is my first foray into web scraping

I have a list of URLs corresponding to search results, e.g.,

http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662

I'm trying to use Selenium to access the HTML of the result as follows:

for url in detail_urls:

    driver.get(url)

    html = driver.page_source

    soup = BeautifulSoup(html, 'html.parser')

    print(soup.prettify())

http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662

https://www.vinelink.com/#/searchResults/id/offender/34003/33/2662

https://www.vinelink.com/#/searchResults/1

Does anyone have a tip on how to access the final search results data?

I am able to Inspect the information I need per this image:

enter image description here

edited Nov 22 at 19:45

JaSON

2199

asked Nov 22 at 17:53

OJT

Disclaimer: This is my first foray into web scraping

I have a list of URLs corresponding to search results, e.g.,

http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662

I'm trying to use Selenium to access the HTML of the result as follows:

for url in detail_urls:

    driver.get(url)

    html = driver.page_source

    soup = BeautifulSoup(html, 'html.parser')

    print(soup.prettify())

http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662

https://www.vinelink.com/#/searchResults/id/offender/34003/33/2662

https://www.vinelink.com/#/searchResults/1

Does anyone have a tip on how to access the final search results data?

I am able to Inspect the information I need per this image:

enter image description here

python selenium selenium-webdriver beautifulsoup

edited Nov 22 at 19:45

JaSON

2199

asked Nov 22 at 17:53

OJT

edited Nov 22 at 19:45

JaSON

2199

asked Nov 22 at 17:53

OJT

edited Nov 22 at 19:45

JaSON

2199

edited Nov 22 at 19:45

JaSON

2199

edited Nov 22 at 19:45

JaSON

2199

asked Nov 22 at 17:53

OJT

asked Nov 22 at 17:53

OJT

asked Nov 22 at 17:53

OJT

What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
– Joseph Choi
Nov 23 at 4:30

Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
– OJT
Nov 23 at 5:32

When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
– Joseph Choi
Nov 23 at 6:33

Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
– Joseph Choi
Nov 23 at 6:35

Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
– OJT
Nov 23 at 7:12

add a comment |

What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
– Joseph Choi
Nov 23 at 4:30

Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
– OJT
Nov 23 at 5:32

When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
– Joseph Choi
Nov 23 at 6:33

Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
– Joseph Choi
Nov 23 at 6:35

Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
– OJT
Nov 23 at 7:12

What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
– Joseph Choi
Nov 23 at 4:30

Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
– OJT
Nov 23 at 5:32

When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
– Joseph Choi
Nov 23 at 6:33

Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
– Joseph Choi
Nov 23 at 6:35

Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
– OJT
Nov 23 at 7:12

add a comment |

1 Answer
1

active

oldest

votes

Once you have your soup variable with the HTML follow the code below..

import json

data = soup.find('search-result')['data']

print(data)

Output:
Now treat each value like a dict.

{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}

info = json.loads(data)



print(info['first_name'], info['last_name'])



#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.

answered Nov 23 at 4:18

Kamikaze_goldfish

463311

Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
– OJT
Nov 23 at 5:30

So when you run your code what out do you get?
– Kamikaze_goldfish
Nov 23 at 17:34

I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
– OJT
Nov 23 at 17:41

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436135%2fpython-selenium-beautifulsoup-search-result-dynamic-url%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Once you have your soup variable with the HTML follow the code below..

import json

data = soup.find('search-result')['data']

print(data)

Output:
Now treat each value like a dict.

{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}

info = json.loads(data)



print(info['first_name'], info['last_name'])



#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.

answered Nov 23 at 4:18

Kamikaze_goldfish

463311

Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
– OJT
Nov 23 at 5:30

So when you run your code what out do you get?
– Kamikaze_goldfish
Nov 23 at 17:34

I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
– OJT
Nov 23 at 17:41

add a comment |

Once you have your soup variable with the HTML follow the code below..

import json

data = soup.find('search-result')['data']

print(data)

Output:
Now treat each value like a dict.

{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}

info = json.loads(data)



print(info['first_name'], info['last_name'])



#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.

answered Nov 23 at 4:18

Kamikaze_goldfish

463311

Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
– OJT
Nov 23 at 5:30

So when you run your code what out do you get?
– Kamikaze_goldfish
Nov 23 at 17:34

I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
– OJT
Nov 23 at 17:41

add a comment |

Once you have your soup variable with the HTML follow the code below..

import json

data = soup.find('search-result')['data']

print(data)

Output:
Now treat each value like a dict.

{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}

info = json.loads(data)



print(info['first_name'], info['last_name'])



#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.

answered Nov 23 at 4:18

Kamikaze_goldfish

463311

Once you have your soup variable with the HTML follow the code below..

import json

data = soup.find('search-result')['data']

print(data)

Output:
Now treat each value like a dict.

{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}

info = json.loads(data)



print(info['first_name'], info['last_name'])



#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.

answered Nov 23 at 4:18

Kamikaze_goldfish

463311

answered Nov 23 at 4:18

Kamikaze_goldfish

463311

answered Nov 23 at 4:18

Kamikaze_goldfish

463311

answered Nov 23 at 4:18

Kamikaze_goldfish

463311

Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
– OJT
Nov 23 at 5:30

So when you run your code what out do you get?
– Kamikaze_goldfish
Nov 23 at 17:34

I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
– OJT
Nov 23 at 17:41

add a comment |

Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
– OJT
Nov 23 at 5:30

So when you run your code what out do you get?
– Kamikaze_goldfish
Nov 23 at 17:34

I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
– OJT
Nov 23 at 17:41

Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
– OJT
Nov 23 at 5:30

So when you run your code what out do you get?
– Kamikaze_goldfish
Nov 23 at 17:34

I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
– OJT
Nov 23 at 17:41

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi