find newline with words starting with underscore with specific pattern











up vote
2
down vote

favorite












I need to find the following from c code using regular expression python but some how i could not write it properly.



if(condition)
/*~T*/
{
/*~T*/
_getmethis = FALSE;
/*~T*/
}
..........
/*~T*/
_findmethis = FALSE;
......
/*~T*/
_findthat = True;


I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file



import re
fh = open('filename.c', "r")
output = open("output.txt", "w")
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')
for line in fh:
for m in re.finditer(pattern, line):
output.write(m.group(3))
output.write("n")

output.close()









share|improve this question
























  • [aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())
    – Wiktor Stribiżew
    1 hour ago

















up vote
2
down vote

favorite












I need to find the following from c code using regular expression python but some how i could not write it properly.



if(condition)
/*~T*/
{
/*~T*/
_getmethis = FALSE;
/*~T*/
}
..........
/*~T*/
_findmethis = FALSE;
......
/*~T*/
_findthat = True;


I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file



import re
fh = open('filename.c', "r")
output = open("output.txt", "w")
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')
for line in fh:
for m in re.finditer(pattern, line):
output.write(m.group(3))
output.write("n")

output.close()









share|improve this question
























  • [aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())
    – Wiktor Stribiżew
    1 hour ago















up vote
2
down vote

favorite









up vote
2
down vote

favorite











I need to find the following from c code using regular expression python but some how i could not write it properly.



if(condition)
/*~T*/
{
/*~T*/
_getmethis = FALSE;
/*~T*/
}
..........
/*~T*/
_findmethis = FALSE;
......
/*~T*/
_findthat = True;


I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file



import re
fh = open('filename.c', "r")
output = open("output.txt", "w")
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')
for line in fh:
for m in re.finditer(pattern, line):
output.write(m.group(3))
output.write("n")

output.close()









share|improve this question















I need to find the following from c code using regular expression python but some how i could not write it properly.



if(condition)
/*~T*/
{
/*~T*/
_getmethis = FALSE;
/*~T*/
}
..........
/*~T*/
_findmethis = FALSE;
......
/*~T*/
_findthat = True;


I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file



import re
fh = open('filename.c', "r")
output = open("output.txt", "w")
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')
for line in fh:
for m in re.finditer(pattern, line):
output.write(m.group(3))
output.write("n")

output.close()






regex python-3.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 hours ago

























asked 2 hours ago









fastlearner

2417




2417












  • [aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())
    – Wiktor Stribiżew
    1 hour ago




















  • [aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())
    – Wiktor Stribiżew
    1 hour ago


















[aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())
– Wiktor Stribiżew
1 hour ago






[aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())
– Wiktor Stribiżew
1 hour ago














2 Answers
2






active

oldest

votes

















up vote
0
down vote













The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



Consider using this:



t = """
if(condition)
/*~-*/
{
/*~T*/
_getmethis = FALSE;
/*~-*/
}
..........
/*~T*/
_findmethis = FALSE;

/*~T*/
do_not_findme_this = FALSE;
"""

import re
pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
for m in re.finditer(pattern, t): # use the whole file here - not line-wise
print(m.group(1))


The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



Printout:



_getmethis
_findmethis


Doku:




  • re.MULTILINE

  • re.DOTALL






share|improve this answer























  • I am so silly of it that i always check the regex but not the python. I will try this
    – fastlearner
    2 hours ago










  • but this also finds the words if the underscore is in the middle of a variable
    – fastlearner
    50 mins ago










  • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...
    – Patrick Artner
    31 mins ago


















up vote
0
down vote













You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



The pattern I suggest is



(/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



When reading files in, it is more convenient to use with so that you do not have to use .close():



import re
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

with open('filename.c', "r") as fh:
contents = fh.read()
with open("output.txt", "w") as output:
output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415684%2ffind-newline-with-words-starting-with-underscore-with-specific-pattern%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



    Consider using this:



    t = """
    if(condition)
    /*~-*/
    {
    /*~T*/
    _getmethis = FALSE;
    /*~-*/
    }
    ..........
    /*~T*/
    _findmethis = FALSE;

    /*~T*/
    do_not_findme_this = FALSE;
    """

    import re
    pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
    for m in re.finditer(pattern, t): # use the whole file here - not line-wise
    print(m.group(1))


    The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



    Printout:



    _getmethis
    _findmethis


    Doku:




    • re.MULTILINE

    • re.DOTALL






    share|improve this answer























    • I am so silly of it that i always check the regex but not the python. I will try this
      – fastlearner
      2 hours ago










    • but this also finds the words if the underscore is in the middle of a variable
      – fastlearner
      50 mins ago










    • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...
      – Patrick Artner
      31 mins ago















    up vote
    0
    down vote













    The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



    Consider using this:



    t = """
    if(condition)
    /*~-*/
    {
    /*~T*/
    _getmethis = FALSE;
    /*~-*/
    }
    ..........
    /*~T*/
    _findmethis = FALSE;

    /*~T*/
    do_not_findme_this = FALSE;
    """

    import re
    pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
    for m in re.finditer(pattern, t): # use the whole file here - not line-wise
    print(m.group(1))


    The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



    Printout:



    _getmethis
    _findmethis


    Doku:




    • re.MULTILINE

    • re.DOTALL






    share|improve this answer























    • I am so silly of it that i always check the regex but not the python. I will try this
      – fastlearner
      2 hours ago










    • but this also finds the words if the underscore is in the middle of a variable
      – fastlearner
      50 mins ago










    • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...
      – Patrick Artner
      31 mins ago













    up vote
    0
    down vote










    up vote
    0
    down vote









    The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



    Consider using this:



    t = """
    if(condition)
    /*~-*/
    {
    /*~T*/
    _getmethis = FALSE;
    /*~-*/
    }
    ..........
    /*~T*/
    _findmethis = FALSE;

    /*~T*/
    do_not_findme_this = FALSE;
    """

    import re
    pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
    for m in re.finditer(pattern, t): # use the whole file here - not line-wise
    print(m.group(1))


    The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



    Printout:



    _getmethis
    _findmethis


    Doku:




    • re.MULTILINE

    • re.DOTALL






    share|improve this answer














    The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



    Consider using this:



    t = """
    if(condition)
    /*~-*/
    {
    /*~T*/
    _getmethis = FALSE;
    /*~-*/
    }
    ..........
    /*~T*/
    _findmethis = FALSE;

    /*~T*/
    do_not_findme_this = FALSE;
    """

    import re
    pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
    for m in re.finditer(pattern, t): # use the whole file here - not line-wise
    print(m.group(1))


    The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



    Printout:



    _getmethis
    _findmethis


    Doku:




    • re.MULTILINE

    • re.DOTALL







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 33 mins ago

























    answered 2 hours ago









    Patrick Artner

    18.1k51940




    18.1k51940












    • I am so silly of it that i always check the regex but not the python. I will try this
      – fastlearner
      2 hours ago










    • but this also finds the words if the underscore is in the middle of a variable
      – fastlearner
      50 mins ago










    • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...
      – Patrick Artner
      31 mins ago


















    • I am so silly of it that i always check the regex but not the python. I will try this
      – fastlearner
      2 hours ago










    • but this also finds the words if the underscore is in the middle of a variable
      – fastlearner
      50 mins ago










    • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...
      – Patrick Artner
      31 mins ago
















    I am so silly of it that i always check the regex but not the python. I will try this
    – fastlearner
    2 hours ago




    I am so silly of it that i always check the regex but not the python. I will try this
    – fastlearner
    2 hours ago












    but this also finds the words if the underscore is in the middle of a variable
    – fastlearner
    50 mins ago




    but this also finds the words if the underscore is in the middle of a variable
    – fastlearner
    50 mins ago












    @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...
    – Patrick Artner
    31 mins ago




    @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...
    – Patrick Artner
    31 mins ago












    up vote
    0
    down vote













    You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



    The pattern I suggest is



    (/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


    See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



    When reading files in, it is more convenient to use with so that you do not have to use .close():



    import re
    pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

    with open('filename.c', "r") as fh:
    contents = fh.read()
    with open("output.txt", "w") as output:
    output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))





    share|improve this answer

























      up vote
      0
      down vote













      You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



      The pattern I suggest is



      (/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


      See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



      When reading files in, it is more convenient to use with so that you do not have to use .close():



      import re
      pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

      with open('filename.c', "r") as fh:
      contents = fh.read()
      with open("output.txt", "w") as output:
      output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))





      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



        The pattern I suggest is



        (/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


        See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



        When reading files in, it is more convenient to use with so that you do not have to use .close():



        import re
        pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

        with open('filename.c', "r") as fh:
        contents = fh.read()
        with open("output.txt", "w") as output:
        output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))





        share|improve this answer












        You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



        The pattern I suggest is



        (/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


        See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



        When reading files in, it is more convenient to use with so that you do not have to use .close():



        import re
        pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

        with open('filename.c', "r") as fh:
        contents = fh.read()
        with open("output.txt", "w") as output:
        output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 11 mins ago









        Wiktor Stribiżew

        301k16122197




        301k16122197






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415684%2ffind-newline-with-words-starting-with-underscore-with-specific-pattern%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            What visual should I use to simply compare current year value vs last year in Power BI desktop

            How to ignore python UserWarning in pytest?

            Alexandru Averescu