find_all with multiple atrributes
up vote
0
down vote
favorite
I want to find all the links on the page, this code is only getting the links which starts with http://
, however most of the links are https://
how can i edit below to find both.
for link in soup.find_all('a',attrs={'href':re.compile("^http://")}):
import requests,bs4,re
res=requests.get('https://www.nytimes.com/2018/11/21/nyregion/president-trump-immigration-law-firms.html?action=click&module=Top%20Stories&pgtype=Homepage')
soup=bs4.BeautifulSoup(res.text,'html.parser')
x=
y=
z=
for link in soup.find_all('a',attrs={'href':re.compile("^http://")}):
print(link.get('href'))
x=link.get('href')
I know i can simply do to get all the links.
for i in soup.select('a'):
print(i.get('href'))
python python-3.x beautifulsoup findall
|
show 1 more comment
up vote
0
down vote
favorite
I want to find all the links on the page, this code is only getting the links which starts with http://
, however most of the links are https://
how can i edit below to find both.
for link in soup.find_all('a',attrs={'href':re.compile("^http://")}):
import requests,bs4,re
res=requests.get('https://www.nytimes.com/2018/11/21/nyregion/president-trump-immigration-law-firms.html?action=click&module=Top%20Stories&pgtype=Homepage')
soup=bs4.BeautifulSoup(res.text,'html.parser')
x=
y=
z=
for link in soup.find_all('a',attrs={'href':re.compile("^http://")}):
print(link.get('href'))
x=link.get('href')
I know i can simply do to get all the links.
for i in soup.select('a'):
print(i.get('href'))
python python-3.x beautifulsoup findall
how about using this regexp^(http|https)://.*
. ?
– Enix
Nov 22 at 3:59
or use^http*://[a-zA-z]
– b-fg
Nov 22 at 4:01
If you want to find all links, why are you filtering the attributes at all?
– Barmar
Nov 22 at 4:41
@Barmar the links comes with their text and font formats and stuff like that
– timmy
Nov 22 at 7:18
@Enix your edit works, you can post as answer if you want
– timmy
Nov 22 at 7:19
|
show 1 more comment
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I want to find all the links on the page, this code is only getting the links which starts with http://
, however most of the links are https://
how can i edit below to find both.
for link in soup.find_all('a',attrs={'href':re.compile("^http://")}):
import requests,bs4,re
res=requests.get('https://www.nytimes.com/2018/11/21/nyregion/president-trump-immigration-law-firms.html?action=click&module=Top%20Stories&pgtype=Homepage')
soup=bs4.BeautifulSoup(res.text,'html.parser')
x=
y=
z=
for link in soup.find_all('a',attrs={'href':re.compile("^http://")}):
print(link.get('href'))
x=link.get('href')
I know i can simply do to get all the links.
for i in soup.select('a'):
print(i.get('href'))
python python-3.x beautifulsoup findall
I want to find all the links on the page, this code is only getting the links which starts with http://
, however most of the links are https://
how can i edit below to find both.
for link in soup.find_all('a',attrs={'href':re.compile("^http://")}):
import requests,bs4,re
res=requests.get('https://www.nytimes.com/2018/11/21/nyregion/president-trump-immigration-law-firms.html?action=click&module=Top%20Stories&pgtype=Homepage')
soup=bs4.BeautifulSoup(res.text,'html.parser')
x=
y=
z=
for link in soup.find_all('a',attrs={'href':re.compile("^http://")}):
print(link.get('href'))
x=link.get('href')
I know i can simply do to get all the links.
for i in soup.select('a'):
print(i.get('href'))
python python-3.x beautifulsoup findall
python python-3.x beautifulsoup findall
edited Nov 22 at 4:40
Barmar
413k34239340
413k34239340
asked Nov 22 at 3:45
timmy
817
817
how about using this regexp^(http|https)://.*
. ?
– Enix
Nov 22 at 3:59
or use^http*://[a-zA-z]
– b-fg
Nov 22 at 4:01
If you want to find all links, why are you filtering the attributes at all?
– Barmar
Nov 22 at 4:41
@Barmar the links comes with their text and font formats and stuff like that
– timmy
Nov 22 at 7:18
@Enix your edit works, you can post as answer if you want
– timmy
Nov 22 at 7:19
|
show 1 more comment
how about using this regexp^(http|https)://.*
. ?
– Enix
Nov 22 at 3:59
or use^http*://[a-zA-z]
– b-fg
Nov 22 at 4:01
If you want to find all links, why are you filtering the attributes at all?
– Barmar
Nov 22 at 4:41
@Barmar the links comes with their text and font formats and stuff like that
– timmy
Nov 22 at 7:18
@Enix your edit works, you can post as answer if you want
– timmy
Nov 22 at 7:19
how about using this regexp
^(http|https)://.*
. ?– Enix
Nov 22 at 3:59
how about using this regexp
^(http|https)://.*
. ?– Enix
Nov 22 at 3:59
or use
^http*://[a-zA-z]
– b-fg
Nov 22 at 4:01
or use
^http*://[a-zA-z]
– b-fg
Nov 22 at 4:01
If you want to find all links, why are you filtering the attributes at all?
– Barmar
Nov 22 at 4:41
If you want to find all links, why are you filtering the attributes at all?
– Barmar
Nov 22 at 4:41
@Barmar the links comes with their text and font formats and stuff like that
– timmy
Nov 22 at 7:18
@Barmar the links comes with their text and font formats and stuff like that
– timmy
Nov 22 at 7:18
@Enix your edit works, you can post as answer if you want
– timmy
Nov 22 at 7:19
@Enix your edit works, you can post as answer if you want
– timmy
Nov 22 at 7:19
|
show 1 more comment
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
You can use this regular expression to match http
or https
:
^(http|https)://.*
Regular expression (a|b)
means : match pattern a
or b
.
add a comment |
up vote
0
down vote
you want to categorize your link into http and https? find it using .startswith()
or re.match()
http =
https =
for link in soup.find_all('a'):
url = link.get('href')
if url.startswith('http://'): # or: if re.match("^http://", url)
http.append(url)
else:
# should be https://
https.append(url)
print(https)
print(http)
this wouldn't work all the time,i think you should use elif instead of else,to remove any possible error right?
– timmy
Nov 22 at 15:39
it just an example you can improve it by adding elif.
– ewwink
Nov 22 at 20:08
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
You can use this regular expression to match http
or https
:
^(http|https)://.*
Regular expression (a|b)
means : match pattern a
or b
.
add a comment |
up vote
0
down vote
accepted
You can use this regular expression to match http
or https
:
^(http|https)://.*
Regular expression (a|b)
means : match pattern a
or b
.
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
You can use this regular expression to match http
or https
:
^(http|https)://.*
Regular expression (a|b)
means : match pattern a
or b
.
You can use this regular expression to match http
or https
:
^(http|https)://.*
Regular expression (a|b)
means : match pattern a
or b
.
answered Nov 22 at 7:26
Enix
1,8801318
1,8801318
add a comment |
add a comment |
up vote
0
down vote
you want to categorize your link into http and https? find it using .startswith()
or re.match()
http =
https =
for link in soup.find_all('a'):
url = link.get('href')
if url.startswith('http://'): # or: if re.match("^http://", url)
http.append(url)
else:
# should be https://
https.append(url)
print(https)
print(http)
this wouldn't work all the time,i think you should use elif instead of else,to remove any possible error right?
– timmy
Nov 22 at 15:39
it just an example you can improve it by adding elif.
– ewwink
Nov 22 at 20:08
add a comment |
up vote
0
down vote
you want to categorize your link into http and https? find it using .startswith()
or re.match()
http =
https =
for link in soup.find_all('a'):
url = link.get('href')
if url.startswith('http://'): # or: if re.match("^http://", url)
http.append(url)
else:
# should be https://
https.append(url)
print(https)
print(http)
this wouldn't work all the time,i think you should use elif instead of else,to remove any possible error right?
– timmy
Nov 22 at 15:39
it just an example you can improve it by adding elif.
– ewwink
Nov 22 at 20:08
add a comment |
up vote
0
down vote
up vote
0
down vote
you want to categorize your link into http and https? find it using .startswith()
or re.match()
http =
https =
for link in soup.find_all('a'):
url = link.get('href')
if url.startswith('http://'): # or: if re.match("^http://", url)
http.append(url)
else:
# should be https://
https.append(url)
print(https)
print(http)
you want to categorize your link into http and https? find it using .startswith()
or re.match()
http =
https =
for link in soup.find_all('a'):
url = link.get('href')
if url.startswith('http://'): # or: if re.match("^http://", url)
http.append(url)
else:
# should be https://
https.append(url)
print(https)
print(http)
answered Nov 22 at 6:51
ewwink
6,84422233
6,84422233
this wouldn't work all the time,i think you should use elif instead of else,to remove any possible error right?
– timmy
Nov 22 at 15:39
it just an example you can improve it by adding elif.
– ewwink
Nov 22 at 20:08
add a comment |
this wouldn't work all the time,i think you should use elif instead of else,to remove any possible error right?
– timmy
Nov 22 at 15:39
it just an example you can improve it by adding elif.
– ewwink
Nov 22 at 20:08
this wouldn't work all the time,i think you should use elif instead of else,to remove any possible error right?
– timmy
Nov 22 at 15:39
this wouldn't work all the time,i think you should use elif instead of else,to remove any possible error right?
– timmy
Nov 22 at 15:39
it just an example you can improve it by adding elif.
– ewwink
Nov 22 at 20:08
it just an example you can improve it by adding elif.
– ewwink
Nov 22 at 20:08
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423583%2ffind-all-with-multiple-atrributes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
how about using this regexp
^(http|https)://.*
. ?– Enix
Nov 22 at 3:59
or use
^http*://[a-zA-z]
– b-fg
Nov 22 at 4:01
If you want to find all links, why are you filtering the attributes at all?
– Barmar
Nov 22 at 4:41
@Barmar the links comes with their text and font formats and stuff like that
– timmy
Nov 22 at 7:18
@Enix your edit works, you can post as answer if you want
– timmy
Nov 22 at 7:19