How to handle InvalidSchema exception











up vote
1
down vote

favorite












I've written a script in python using two functions within it. The first function get_links() fetches some links from a webpage and returns those links to another function get_info(). At this point the function get_info() should produce the different shop names from different links but It throws an error raise InvalidSchema("No connection adapters were found for '%s'" % url).



This is my try:



import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return get_info(elem)

def get_info(url):
response = requests.get(url)
print(response.url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

if __name__ == '__main__':
link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'
for review in get_links(link):
print(urljoin(link,review.get("href")))


The key thing that I'm trying to learn here is the real-life usage of return get_info(elem)



I created another thread concerning this return get_info(elem). Link to that thread.



When I try like the following, I get the results accordingly:



def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return elem

def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

if __name__ == '__main__':
link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'
for review in get_links(link):
print(get_info(urljoin(link,review.get("href"))))


My question: how can I get the results according to the way I tried with my first script making use of return get_info(elem)?










share|improve this question
























  • Please take a 2nd look at the title of your question. It's 100% non-descriptive
    – planetmaker
    Nov 22 at 15:48















up vote
1
down vote

favorite












I've written a script in python using two functions within it. The first function get_links() fetches some links from a webpage and returns those links to another function get_info(). At this point the function get_info() should produce the different shop names from different links but It throws an error raise InvalidSchema("No connection adapters were found for '%s'" % url).



This is my try:



import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return get_info(elem)

def get_info(url):
response = requests.get(url)
print(response.url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

if __name__ == '__main__':
link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'
for review in get_links(link):
print(urljoin(link,review.get("href")))


The key thing that I'm trying to learn here is the real-life usage of return get_info(elem)



I created another thread concerning this return get_info(elem). Link to that thread.



When I try like the following, I get the results accordingly:



def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return elem

def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

if __name__ == '__main__':
link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'
for review in get_links(link):
print(get_info(urljoin(link,review.get("href"))))


My question: how can I get the results according to the way I tried with my first script making use of return get_info(elem)?










share|improve this question
























  • Please take a 2nd look at the title of your question. It's 100% non-descriptive
    – planetmaker
    Nov 22 at 15:48













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I've written a script in python using two functions within it. The first function get_links() fetches some links from a webpage and returns those links to another function get_info(). At this point the function get_info() should produce the different shop names from different links but It throws an error raise InvalidSchema("No connection adapters were found for '%s'" % url).



This is my try:



import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return get_info(elem)

def get_info(url):
response = requests.get(url)
print(response.url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

if __name__ == '__main__':
link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'
for review in get_links(link):
print(urljoin(link,review.get("href")))


The key thing that I'm trying to learn here is the real-life usage of return get_info(elem)



I created another thread concerning this return get_info(elem). Link to that thread.



When I try like the following, I get the results accordingly:



def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return elem

def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

if __name__ == '__main__':
link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'
for review in get_links(link):
print(get_info(urljoin(link,review.get("href"))))


My question: how can I get the results according to the way I tried with my first script making use of return get_info(elem)?










share|improve this question















I've written a script in python using two functions within it. The first function get_links() fetches some links from a webpage and returns those links to another function get_info(). At this point the function get_info() should produce the different shop names from different links but It throws an error raise InvalidSchema("No connection adapters were found for '%s'" % url).



This is my try:



import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return get_info(elem)

def get_info(url):
response = requests.get(url)
print(response.url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

if __name__ == '__main__':
link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'
for review in get_links(link):
print(urljoin(link,review.get("href")))


The key thing that I'm trying to learn here is the real-life usage of return get_info(elem)



I created another thread concerning this return get_info(elem). Link to that thread.



When I try like the following, I get the results accordingly:



def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return elem

def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

if __name__ == '__main__':
link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'
for review in get_links(link):
print(get_info(urljoin(link,review.get("href"))))


My question: how can I get the results according to the way I tried with my first script making use of return get_info(elem)?







python python-3.x function web-scraping return






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 16:09









Andersson

36.1k103066




36.1k103066










asked Nov 22 at 15:44









robots.txt

18111




18111












  • Please take a 2nd look at the title of your question. It's 100% non-descriptive
    – planetmaker
    Nov 22 at 15:48


















  • Please take a 2nd look at the title of your question. It's 100% non-descriptive
    – planetmaker
    Nov 22 at 15:48
















Please take a 2nd look at the title of your question. It's 100% non-descriptive
– planetmaker
Nov 22 at 15:48




Please take a 2nd look at the title of your question. It's 100% non-descriptive
– planetmaker
Nov 22 at 15:48












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










Inspect what is returned by each function. In this case, the function in your first script will never run. The reason because get_info takes in a URL, not anything else. So obviously you are going to hit an error when you run get_info(elem) where elem is a list of items that are selected by soup.select().



You should already know the above though because you are iterating over the results from the second script which just returns the list to get the href elements. So if you want to use get_info in your first script, apply it on the items not the list, you can use a list comprehension in this case.



import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return [get_info(urljoin(link,e.get("href"))) for e in elem]

def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'

for review in get_links(link):
print(review)


Now you know the first function still returns a list, but with get_info applied to its elements, which is how it works rite? get_info accepts a URL not a list. From there since you have already applied the url_join and get_info in get_links, you can loop it over to print the results.






share|improve this answer























  • You are the one @BernardL. It worked perfectly.
    – robots.txt
    Nov 22 at 17:27










  • Hope it helped, be patient and take your time to understand the basic data types and how it works using Python, it will give you a stronger foundation in better design in the future. Cheers.
    – BernardL
    Nov 22 at 17:41











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53434370%2fhow-to-handle-invalidschema-exception%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










Inspect what is returned by each function. In this case, the function in your first script will never run. The reason because get_info takes in a URL, not anything else. So obviously you are going to hit an error when you run get_info(elem) where elem is a list of items that are selected by soup.select().



You should already know the above though because you are iterating over the results from the second script which just returns the list to get the href elements. So if you want to use get_info in your first script, apply it on the items not the list, you can use a list comprehension in this case.



import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return [get_info(urljoin(link,e.get("href"))) for e in elem]

def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'

for review in get_links(link):
print(review)


Now you know the first function still returns a list, but with get_info applied to its elements, which is how it works rite? get_info accepts a URL not a list. From there since you have already applied the url_join and get_info in get_links, you can loop it over to print the results.






share|improve this answer























  • You are the one @BernardL. It worked perfectly.
    – robots.txt
    Nov 22 at 17:27










  • Hope it helped, be patient and take your time to understand the basic data types and how it works using Python, it will give you a stronger foundation in better design in the future. Cheers.
    – BernardL
    Nov 22 at 17:41















up vote
2
down vote



accepted










Inspect what is returned by each function. In this case, the function in your first script will never run. The reason because get_info takes in a URL, not anything else. So obviously you are going to hit an error when you run get_info(elem) where elem is a list of items that are selected by soup.select().



You should already know the above though because you are iterating over the results from the second script which just returns the list to get the href elements. So if you want to use get_info in your first script, apply it on the items not the list, you can use a list comprehension in this case.



import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return [get_info(urljoin(link,e.get("href"))) for e in elem]

def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'

for review in get_links(link):
print(review)


Now you know the first function still returns a list, but with get_info applied to its elements, which is how it works rite? get_info accepts a URL not a list. From there since you have already applied the url_join and get_info in get_links, you can loop it over to print the results.






share|improve this answer























  • You are the one @BernardL. It worked perfectly.
    – robots.txt
    Nov 22 at 17:27










  • Hope it helped, be patient and take your time to understand the basic data types and how it works using Python, it will give you a stronger foundation in better design in the future. Cheers.
    – BernardL
    Nov 22 at 17:41













up vote
2
down vote



accepted







up vote
2
down vote



accepted






Inspect what is returned by each function. In this case, the function in your first script will never run. The reason because get_info takes in a URL, not anything else. So obviously you are going to hit an error when you run get_info(elem) where elem is a list of items that are selected by soup.select().



You should already know the above though because you are iterating over the results from the second script which just returns the list to get the href elements. So if you want to use get_info in your first script, apply it on the items not the list, you can use a list comprehension in this case.



import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return [get_info(urljoin(link,e.get("href"))) for e in elem]

def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'

for review in get_links(link):
print(review)


Now you know the first function still returns a list, but with get_info applied to its elements, which is how it works rite? get_info accepts a URL not a list. From there since you have already applied the url_join and get_info in get_links, you can loop it over to print the results.






share|improve this answer














Inspect what is returned by each function. In this case, the function in your first script will never run. The reason because get_info takes in a URL, not anything else. So obviously you are going to hit an error when you run get_info(elem) where elem is a list of items that are selected by soup.select().



You should already know the above though because you are iterating over the results from the second script which just returns the list to get the href elements. So if you want to use get_info in your first script, apply it on the items not the list, you can use a list comprehension in this case.



import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
elem = soup.select(".info h2 a[data-analytics]")
return [get_info(urljoin(link,e.get("href"))) for e in elem]

def get_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text,"lxml")
return soup.select_one("#main-header .sales-info h1").get_text(strip=True)

link = 'https://www.yellowpages.com/search?search_terms=%20Injury%20Law%20Attorneys&geo_location_terms=California&page=2'

for review in get_links(link):
print(review)


Now you know the first function still returns a list, but with get_info applied to its elements, which is how it works rite? get_info accepts a URL not a list. From there since you have already applied the url_join and get_info in get_links, you can loop it over to print the results.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 22 at 17:53

























answered Nov 22 at 17:18









BernardL

2,3331829




2,3331829












  • You are the one @BernardL. It worked perfectly.
    – robots.txt
    Nov 22 at 17:27










  • Hope it helped, be patient and take your time to understand the basic data types and how it works using Python, it will give you a stronger foundation in better design in the future. Cheers.
    – BernardL
    Nov 22 at 17:41


















  • You are the one @BernardL. It worked perfectly.
    – robots.txt
    Nov 22 at 17:27










  • Hope it helped, be patient and take your time to understand the basic data types and how it works using Python, it will give you a stronger foundation in better design in the future. Cheers.
    – BernardL
    Nov 22 at 17:41
















You are the one @BernardL. It worked perfectly.
– robots.txt
Nov 22 at 17:27




You are the one @BernardL. It worked perfectly.
– robots.txt
Nov 22 at 17:27












Hope it helped, be patient and take your time to understand the basic data types and how it works using Python, it will give you a stronger foundation in better design in the future. Cheers.
– BernardL
Nov 22 at 17:41




Hope it helped, be patient and take your time to understand the basic data types and how it works using Python, it will give you a stronger foundation in better design in the future. Cheers.
– BernardL
Nov 22 at 17:41


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53434370%2fhow-to-handle-invalidschema-exception%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to ignore python UserWarning in pytest?

What visual should I use to simply compare current year value vs last year in Power BI desktop

Héron pourpré