Python (Selenium/BeautifulSoup) Search Result Dynamic URL












0














Disclaimer: This is my first foray into web scraping



I have a list of URLs corresponding to search results, e.g.,



http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662



I'm trying to use Selenium to access the HTML of the result as follows:



for url in detail_urls:
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
print(soup.prettify())


However, when I comb through the resulting prettified soup, I notice that the components I need are missing. Upon looking back at the page loading process, I see that the URL redirects a few times as follows:




  1. http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662


  2. https://www.vinelink.com/#/searchResults/id/offender/34003/33/2662


  3. https://www.vinelink.com/#/searchResults/1



Does anyone have a tip on how to access the final search results data?



Update: After further exploration this seems like it might have to do with the scripts being executed to retrieve the relevant data for display... there are many search results-related scripts referenced in the page_source; is there a way to determine which is relevant?



I am able to Inspect the information I need per this image:



enter image description here










share|improve this question
























  • What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
    – Joseph Choi
    Nov 23 at 4:30












  • Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
    – OJT
    Nov 23 at 5:32












  • When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
    – Joseph Choi
    Nov 23 at 6:33










  • Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
    – Joseph Choi
    Nov 23 at 6:35










  • Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
    – OJT
    Nov 23 at 7:12
















0














Disclaimer: This is my first foray into web scraping



I have a list of URLs corresponding to search results, e.g.,



http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662



I'm trying to use Selenium to access the HTML of the result as follows:



for url in detail_urls:
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
print(soup.prettify())


However, when I comb through the resulting prettified soup, I notice that the components I need are missing. Upon looking back at the page loading process, I see that the URL redirects a few times as follows:




  1. http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662


  2. https://www.vinelink.com/#/searchResults/id/offender/34003/33/2662


  3. https://www.vinelink.com/#/searchResults/1



Does anyone have a tip on how to access the final search results data?



Update: After further exploration this seems like it might have to do with the scripts being executed to retrieve the relevant data for display... there are many search results-related scripts referenced in the page_source; is there a way to determine which is relevant?



I am able to Inspect the information I need per this image:



enter image description here










share|improve this question
























  • What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
    – Joseph Choi
    Nov 23 at 4:30












  • Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
    – OJT
    Nov 23 at 5:32












  • When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
    – Joseph Choi
    Nov 23 at 6:33










  • Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
    – Joseph Choi
    Nov 23 at 6:35










  • Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
    – OJT
    Nov 23 at 7:12














0












0








0


0





Disclaimer: This is my first foray into web scraping



I have a list of URLs corresponding to search results, e.g.,



http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662



I'm trying to use Selenium to access the HTML of the result as follows:



for url in detail_urls:
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
print(soup.prettify())


However, when I comb through the resulting prettified soup, I notice that the components I need are missing. Upon looking back at the page loading process, I see that the URL redirects a few times as follows:




  1. http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662


  2. https://www.vinelink.com/#/searchResults/id/offender/34003/33/2662


  3. https://www.vinelink.com/#/searchResults/1



Does anyone have a tip on how to access the final search results data?



Update: After further exploration this seems like it might have to do with the scripts being executed to retrieve the relevant data for display... there are many search results-related scripts referenced in the page_source; is there a way to determine which is relevant?



I am able to Inspect the information I need per this image:



enter image description here










share|improve this question















Disclaimer: This is my first foray into web scraping



I have a list of URLs corresponding to search results, e.g.,



http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662



I'm trying to use Selenium to access the HTML of the result as follows:



for url in detail_urls:
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
print(soup.prettify())


However, when I comb through the resulting prettified soup, I notice that the components I need are missing. Upon looking back at the page loading process, I see that the URL redirects a few times as follows:




  1. http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662


  2. https://www.vinelink.com/#/searchResults/id/offender/34003/33/2662


  3. https://www.vinelink.com/#/searchResults/1



Does anyone have a tip on how to access the final search results data?



Update: After further exploration this seems like it might have to do with the scripts being executed to retrieve the relevant data for display... there are many search results-related scripts referenced in the page_source; is there a way to determine which is relevant?



I am able to Inspect the information I need per this image:



enter image description here







python selenium selenium-webdriver beautifulsoup






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 19:45









JaSON

2199




2199










asked Nov 22 at 17:53









OJT

85




85












  • What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
    – Joseph Choi
    Nov 23 at 4:30












  • Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
    – OJT
    Nov 23 at 5:32












  • When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
    – Joseph Choi
    Nov 23 at 6:33










  • Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
    – Joseph Choi
    Nov 23 at 6:35










  • Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
    – OJT
    Nov 23 at 7:12


















  • What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
    – Joseph Choi
    Nov 23 at 4:30












  • Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
    – OJT
    Nov 23 at 5:32












  • When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
    – Joseph Choi
    Nov 23 at 6:33










  • Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
    – Joseph Choi
    Nov 23 at 6:35










  • Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
    – OJT
    Nov 23 at 7:12
















What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
– Joseph Choi
Nov 23 at 4:30






What are the components that you are trying to access? (but not available) Selenium should load all javascript before returning html object to be parsed by BeautifulSoup
– Joseph Choi
Nov 23 at 4:30














Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
– OJT
Nov 23 at 5:32






Hi Joseph, I'm trying to access <search-result> tags from the final destination page (per my question, if I were to enter one of the original URLs into my Chrome search bar, the page would load sequentially and I would see the URL change twice, until it landed on '/#/searchResults/1' (the same URL no matter the offender being searched) -- any idea how to ensure Selenium does not pull data from the first URL in the series of redirects?
– OJT
Nov 23 at 5:32














When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
– Joseph Choi
Nov 23 at 6:33




When I try to connect to the link provided, I get redirected to an unauthorized page vinelink.com/#/unauthorized From my experience and testing, lines after driver.get(url) are only executed after the browser has finished loading. Selenium is designed to emulate web browsing experience same as a human would. Can you confirm the html you receive from driver.page_source is different from what you would get when you are browsing yourself?
– Joseph Choi
Nov 23 at 6:33












Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
– Joseph Choi
Nov 23 at 6:35




Try calling soup.find_all("search-result") to confirm that you are not getting the data you need
– Joseph Choi
Nov 23 at 6:35












Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
– OJT
Nov 23 at 7:12




Hi Joseph, based on what you've written, I found that I could actually bypass the driver.page_source call, and instead insert driver.implicitly_wait(5) before beginning to scrape data (this allows sufficient time for the browsing emulation to reach the destination page). Thank you very much! I now have a new problem (reCAPTCHA prevents me from collecting data from more than a few of the URLs in my list), but I will create a separate question for this!
– OJT
Nov 23 at 7:12












1 Answer
1






active

oldest

votes


















0














Once you have your soup variable with the HTML follow the code below..



import json
data = soup.find('search-result')['data']
print(data)


Output:
Now treat each value like a dict.



{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}


Next:



info = json.loads(data)

print(info['first_name'], info['last_name'])

#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.





share|improve this answer





















  • Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
    – OJT
    Nov 23 at 5:30










  • So when you run your code what out do you get?
    – Kamikaze_goldfish
    Nov 23 at 17:34










  • I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
    – OJT
    Nov 23 at 17:41











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436135%2fpython-selenium-beautifulsoup-search-result-dynamic-url%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Once you have your soup variable with the HTML follow the code below..



import json
data = soup.find('search-result')['data']
print(data)


Output:
Now treat each value like a dict.



{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}


Next:



info = json.loads(data)

print(info['first_name'], info['last_name'])

#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.





share|improve this answer





















  • Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
    – OJT
    Nov 23 at 5:30










  • So when you run your code what out do you get?
    – Kamikaze_goldfish
    Nov 23 at 17:34










  • I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
    – OJT
    Nov 23 at 17:41
















0














Once you have your soup variable with the HTML follow the code below..



import json
data = soup.find('search-result')['data']
print(data)


Output:
Now treat each value like a dict.



{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}


Next:



info = json.loads(data)

print(info['first_name'], info['last_name'])

#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.





share|improve this answer





















  • Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
    – OJT
    Nov 23 at 5:30










  • So when you run your code what out do you get?
    – Kamikaze_goldfish
    Nov 23 at 17:34










  • I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
    – OJT
    Nov 23 at 17:41














0












0








0






Once you have your soup variable with the HTML follow the code below..



import json
data = soup.find('search-result')['data']
print(data)


Output:
Now treat each value like a dict.



{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}


Next:



info = json.loads(data)

print(info['first_name'], info['last_name'])

#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.





share|improve this answer












Once you have your soup variable with the HTML follow the code below..



import json
data = soup.find('search-result')['data']
print(data)


Output:
Now treat each value like a dict.



{"offender_sid":154070373,"siteId":34003,"siteDesc":"NC_STATE","first_name":"WESLEY","last_name":"ADAMS","middle_initial":"CHURCHILL","alias_first_name":null,"alias_last_name":null,"alias_middle_initial":null,"oid":"2662","date_of_birth":"1965-11-21","agencyDesc":"Durham County Detention Center","age":53,"race":2,"raceDesc":"African American","gender":null,"genderDesc":null,"status_detail":"Durham County Detention Center","agency":33,"custody_status_cd":1,"custody_detail_cd":33,"custody_status_description":"In Custody","aliasFlag":false,"registerValid":true,"detailAgLink":false,"linkedCases":false,"registerMessage":"","juvenile_flg":0,"vineLinkInd":1,"vineLinkAgAccessCd":2,"links":[{"rel":"agency","href":"//www.vinelink.com/VineAppWebService/api/site/agency/34003/33"},{"rel":"self","href":"//www.vinelink.com/VineAppWebService/api/offender/?offSid=154070373&lang=en_US"}],"actions":[{"name":"register","template":"//www.vinelink.com/VineAppWebService/api/register/{json data}","method":"POST"}]}


Next:



info = json.loads(data)

print(info['first_name'], info['last_name'])

#This prints the first and last name but you can get others, just get the key like 'date_of_birth' or 'siteId'. You can also assign them to variables.






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 23 at 4:18









Kamikaze_goldfish

463311




463311












  • Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
    – OJT
    Nov 23 at 5:30










  • So when you run your code what out do you get?
    – Kamikaze_goldfish
    Nov 23 at 17:34










  • I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
    – OJT
    Nov 23 at 17:41


















  • Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
    – OJT
    Nov 23 at 5:30










  • So when you run your code what out do you get?
    – Kamikaze_goldfish
    Nov 23 at 17:34










  • I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
    – OJT
    Nov 23 at 17:41
















Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
– OJT
Nov 23 at 5:30




Thanks for the suggestion! I think because of something related to the dynamic URL for the search results creation process, I am not winding up with the correct HTML for the eventual destination, so the 'soup' variable does not include the correct HTML and I get the following error: 'TypeError: 'NoneType' object is not subscriptable'. Is there some way to get Selenium to walk through the URL change process and pull the correct page?
– OJT
Nov 23 at 5:30












So when you run your code what out do you get?
– Kamikaze_goldfish
Nov 23 at 17:34




So when you run your code what out do you get?
– Kamikaze_goldfish
Nov 23 at 17:34












I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
– OJT
Nov 23 at 17:41




I have actually now solved things per Joseph Choi's suggestion in the comments; I merely inserted a driver.implicitly_wait(5) after loading the original URL, and then the further Selenium commands do not begin until the final destination of the redirect has been reached.
– OJT
Nov 23 at 17:41


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436135%2fpython-selenium-beautifulsoup-search-result-dynamic-url%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

What visual should I use to simply compare current year value vs last year in Power BI desktop

Alexandru Averescu

Trompette piccolo