python 2 search phrase on multiple web sites, taken from a list file












0














So I have following list of links in file called "output":



https://web.archive.org/web/20180101003616/http://onet.pl
https://web.archive.org/web/20180102000139/http://onet.pl
[...]


If you open first link from the list and press "ctrl + f" in firefox, you can find phrase "Katastrofa".



All I want is to have a script, which can find a phrase ("Katastrofa" is only example, I want to use argv argument, but that's not important here), print some success message and proceed further...



I got stuck and can't figure out how to do it.
The script I got for testing does not "see" the word ("Katastrofa"), which definitely is on the first page...



Please help :)



Here is what I've done so far:



f = open('output', 'r')
f2 = f.readlines()
for i in f2:
r=requests.get(i)
first_page = r.text
soup = BeautifulSoup(first_page, 'html.parser')
page_soup = soup
fraza = "Katastrofa"
boxes = page_soup.body.find_all(fraza)
print(i)
print(boxes)


Output:



https://web.archive.org/web/20180101003616/http://onet.pl


https://web.archive.org/web/20180102000139/http://onet.pl


https://web.archive.org/web/20180103002217/http://onet.pl









share|improve this question






















  • What is the output, or error, you are currently getting?
    – TeeKea
    Nov 22 at 22:30
















0














So I have following list of links in file called "output":



https://web.archive.org/web/20180101003616/http://onet.pl
https://web.archive.org/web/20180102000139/http://onet.pl
[...]


If you open first link from the list and press "ctrl + f" in firefox, you can find phrase "Katastrofa".



All I want is to have a script, which can find a phrase ("Katastrofa" is only example, I want to use argv argument, but that's not important here), print some success message and proceed further...



I got stuck and can't figure out how to do it.
The script I got for testing does not "see" the word ("Katastrofa"), which definitely is on the first page...



Please help :)



Here is what I've done so far:



f = open('output', 'r')
f2 = f.readlines()
for i in f2:
r=requests.get(i)
first_page = r.text
soup = BeautifulSoup(first_page, 'html.parser')
page_soup = soup
fraza = "Katastrofa"
boxes = page_soup.body.find_all(fraza)
print(i)
print(boxes)


Output:



https://web.archive.org/web/20180101003616/http://onet.pl


https://web.archive.org/web/20180102000139/http://onet.pl


https://web.archive.org/web/20180103002217/http://onet.pl









share|improve this question






















  • What is the output, or error, you are currently getting?
    – TeeKea
    Nov 22 at 22:30














0












0








0







So I have following list of links in file called "output":



https://web.archive.org/web/20180101003616/http://onet.pl
https://web.archive.org/web/20180102000139/http://onet.pl
[...]


If you open first link from the list and press "ctrl + f" in firefox, you can find phrase "Katastrofa".



All I want is to have a script, which can find a phrase ("Katastrofa" is only example, I want to use argv argument, but that's not important here), print some success message and proceed further...



I got stuck and can't figure out how to do it.
The script I got for testing does not "see" the word ("Katastrofa"), which definitely is on the first page...



Please help :)



Here is what I've done so far:



f = open('output', 'r')
f2 = f.readlines()
for i in f2:
r=requests.get(i)
first_page = r.text
soup = BeautifulSoup(first_page, 'html.parser')
page_soup = soup
fraza = "Katastrofa"
boxes = page_soup.body.find_all(fraza)
print(i)
print(boxes)


Output:



https://web.archive.org/web/20180101003616/http://onet.pl


https://web.archive.org/web/20180102000139/http://onet.pl


https://web.archive.org/web/20180103002217/http://onet.pl









share|improve this question













So I have following list of links in file called "output":



https://web.archive.org/web/20180101003616/http://onet.pl
https://web.archive.org/web/20180102000139/http://onet.pl
[...]


If you open first link from the list and press "ctrl + f" in firefox, you can find phrase "Katastrofa".



All I want is to have a script, which can find a phrase ("Katastrofa" is only example, I want to use argv argument, but that's not important here), print some success message and proceed further...



I got stuck and can't figure out how to do it.
The script I got for testing does not "see" the word ("Katastrofa"), which definitely is on the first page...



Please help :)



Here is what I've done so far:



f = open('output', 'r')
f2 = f.readlines()
for i in f2:
r=requests.get(i)
first_page = r.text
soup = BeautifulSoup(first_page, 'html.parser')
page_soup = soup
fraza = "Katastrofa"
boxes = page_soup.body.find_all(fraza)
print(i)
print(boxes)


Output:



https://web.archive.org/web/20180101003616/http://onet.pl


https://web.archive.org/web/20180102000139/http://onet.pl


https://web.archive.org/web/20180103002217/http://onet.pl






python beautifulsoup






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 22 at 22:24









Irka Irenka

215




215












  • What is the output, or error, you are currently getting?
    – TeeKea
    Nov 22 at 22:30


















  • What is the output, or error, you are currently getting?
    – TeeKea
    Nov 22 at 22:30
















What is the output, or error, you are currently getting?
– TeeKea
Nov 22 at 22:30




What is the output, or error, you are currently getting?
– TeeKea
Nov 22 at 22:30












1 Answer
1






active

oldest

votes


















0














if you want to search if in html string contain text



for i in f2:
r=requests.get(i)
fraza = "Katastrofa"
if re.match(fraza, r.text, re.I) # ignore case
print(i)


if you want to search html element contain text



for i in f2:
r=requests.get(i)
soup = BeautifulSoup(r.text, 'html.parser')
fraza = "Katastrofa"
boxes = soup.find_all(True, text=re.compile(fraza, re.I))
if boxes:
print(i)
print(boxes)


Results is list of last child element:



https://web.archive.org/web/20180101003616/http://onet.pl
[<span class="title"> Kostaryka: Katastrofa lotnicza. Media: są ofiary </span>,
<span class="title"> Australia: katastrofa samolotu, są ofiary śmiertelne </span>]





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438626%2fpython-2-search-phrase-on-multiple-web-sites-taken-from-a-list-file%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    if you want to search if in html string contain text



    for i in f2:
    r=requests.get(i)
    fraza = "Katastrofa"
    if re.match(fraza, r.text, re.I) # ignore case
    print(i)


    if you want to search html element contain text



    for i in f2:
    r=requests.get(i)
    soup = BeautifulSoup(r.text, 'html.parser')
    fraza = "Katastrofa"
    boxes = soup.find_all(True, text=re.compile(fraza, re.I))
    if boxes:
    print(i)
    print(boxes)


    Results is list of last child element:



    https://web.archive.org/web/20180101003616/http://onet.pl
    [<span class="title"> Kostaryka: Katastrofa lotnicza. Media: są ofiary </span>,
    <span class="title"> Australia: katastrofa samolotu, są ofiary śmiertelne </span>]





    share|improve this answer


























      0














      if you want to search if in html string contain text



      for i in f2:
      r=requests.get(i)
      fraza = "Katastrofa"
      if re.match(fraza, r.text, re.I) # ignore case
      print(i)


      if you want to search html element contain text



      for i in f2:
      r=requests.get(i)
      soup = BeautifulSoup(r.text, 'html.parser')
      fraza = "Katastrofa"
      boxes = soup.find_all(True, text=re.compile(fraza, re.I))
      if boxes:
      print(i)
      print(boxes)


      Results is list of last child element:



      https://web.archive.org/web/20180101003616/http://onet.pl
      [<span class="title"> Kostaryka: Katastrofa lotnicza. Media: są ofiary </span>,
      <span class="title"> Australia: katastrofa samolotu, są ofiary śmiertelne </span>]





      share|improve this answer
























        0












        0








        0






        if you want to search if in html string contain text



        for i in f2:
        r=requests.get(i)
        fraza = "Katastrofa"
        if re.match(fraza, r.text, re.I) # ignore case
        print(i)


        if you want to search html element contain text



        for i in f2:
        r=requests.get(i)
        soup = BeautifulSoup(r.text, 'html.parser')
        fraza = "Katastrofa"
        boxes = soup.find_all(True, text=re.compile(fraza, re.I))
        if boxes:
        print(i)
        print(boxes)


        Results is list of last child element:



        https://web.archive.org/web/20180101003616/http://onet.pl
        [<span class="title"> Kostaryka: Katastrofa lotnicza. Media: są ofiary </span>,
        <span class="title"> Australia: katastrofa samolotu, są ofiary śmiertelne </span>]





        share|improve this answer












        if you want to search if in html string contain text



        for i in f2:
        r=requests.get(i)
        fraza = "Katastrofa"
        if re.match(fraza, r.text, re.I) # ignore case
        print(i)


        if you want to search html element contain text



        for i in f2:
        r=requests.get(i)
        soup = BeautifulSoup(r.text, 'html.parser')
        fraza = "Katastrofa"
        boxes = soup.find_all(True, text=re.compile(fraza, re.I))
        if boxes:
        print(i)
        print(boxes)


        Results is list of last child element:



        https://web.archive.org/web/20180101003616/http://onet.pl
        [<span class="title"> Kostaryka: Katastrofa lotnicza. Media: są ofiary </span>,
        <span class="title"> Australia: katastrofa samolotu, są ofiary śmiertelne </span>]






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 at 1:17









        ewwink

        10.1k22236




        10.1k22236






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438626%2fpython-2-search-phrase-on-multiple-web-sites-taken-from-a-list-file%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Catalogne

            Violoncelliste

            Héron pourpré