Unable to add text and link to new tag in BeautifulSoup












0














I scraped the following HTML to get link information, created a new tag, added the link to the new tag then attempted to append that tag to another document but lost all HTML formatting:



data = """
<div class="Answer">
1. BOUNDARIES - EPB &amp; APL&nbsp;<i>(inferior)</i>, EPL&nbsp;<i>(superior).&nbsp;</i><div>2. FLOOR (proximal to distal) - radial styloid =&gt; scaphoid =&gt; trapezium =&gt; 1st MC base.&nbsp;<br /><div>3. CONTENTS - cutaneous branches of radial nerve&nbsp;<i>(on the roof),</i>&nbsp;cephalic vein&nbsp;<i>(begins here),</i>&nbsp;&nbsp;radial artery&nbsp;<i>(on the floor).</i></div></div><div><br /></div><div><img src="paste-27a44c801f0776d91f5f6a16a963bff67f0e8ef3.jpg" /><br /></div><div><b>Image:&nbsp;</b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</div>
</div>
"""
soup = BeautifulSoup(data, "html.parser")
image_link = soup.find('div').find('b').next.next
print(image_link)


I scraped the above data to get the following reference link (this is the format I require):



Case courtesy of Dr Sachintha Hapugoda, <a href="https://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href="https://radiopaedia.org/cases/52525">rID: 52525</a> [Accessed 15 Nov. 2018].



But adding the above reference link to a new tag loses all HTML formatting:



p_tag = soup.new_tag('p')
p_tag.append(soup.new_tag('br'))
p_tag.append(soup.new_tag('b'))
p_tag.b.append("Image: ")
p_tag.append(NavigableString(image_link))
print(p_tag)


Returns:



<p><br/><b>Image: </b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</p>


All HTML formatting is lost. What do I do?










share|improve this question





























    0














    I scraped the following HTML to get link information, created a new tag, added the link to the new tag then attempted to append that tag to another document but lost all HTML formatting:



    data = """
    <div class="Answer">
    1. BOUNDARIES - EPB &amp; APL&nbsp;<i>(inferior)</i>, EPL&nbsp;<i>(superior).&nbsp;</i><div>2. FLOOR (proximal to distal) - radial styloid =&gt; scaphoid =&gt; trapezium =&gt; 1st MC base.&nbsp;<br /><div>3. CONTENTS - cutaneous branches of radial nerve&nbsp;<i>(on the roof),</i>&nbsp;cephalic vein&nbsp;<i>(begins here),</i>&nbsp;&nbsp;radial artery&nbsp;<i>(on the floor).</i></div></div><div><br /></div><div><img src="paste-27a44c801f0776d91f5f6a16a963bff67f0e8ef3.jpg" /><br /></div><div><b>Image:&nbsp;</b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</div>
    </div>
    """
    soup = BeautifulSoup(data, "html.parser")
    image_link = soup.find('div').find('b').next.next
    print(image_link)


    I scraped the above data to get the following reference link (this is the format I require):



    Case courtesy of Dr Sachintha Hapugoda, <a href="https://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href="https://radiopaedia.org/cases/52525">rID: 52525</a> [Accessed 15 Nov. 2018].



    But adding the above reference link to a new tag loses all HTML formatting:



    p_tag = soup.new_tag('p')
    p_tag.append(soup.new_tag('br'))
    p_tag.append(soup.new_tag('b'))
    p_tag.b.append("Image: ")
    p_tag.append(NavigableString(image_link))
    print(p_tag)


    Returns:



    <p><br/><b>Image: </b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</p>


    All HTML formatting is lost. What do I do?










    share|improve this question



























      0












      0








      0







      I scraped the following HTML to get link information, created a new tag, added the link to the new tag then attempted to append that tag to another document but lost all HTML formatting:



      data = """
      <div class="Answer">
      1. BOUNDARIES - EPB &amp; APL&nbsp;<i>(inferior)</i>, EPL&nbsp;<i>(superior).&nbsp;</i><div>2. FLOOR (proximal to distal) - radial styloid =&gt; scaphoid =&gt; trapezium =&gt; 1st MC base.&nbsp;<br /><div>3. CONTENTS - cutaneous branches of radial nerve&nbsp;<i>(on the roof),</i>&nbsp;cephalic vein&nbsp;<i>(begins here),</i>&nbsp;&nbsp;radial artery&nbsp;<i>(on the floor).</i></div></div><div><br /></div><div><img src="paste-27a44c801f0776d91f5f6a16a963bff67f0e8ef3.jpg" /><br /></div><div><b>Image:&nbsp;</b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</div>
      </div>
      """
      soup = BeautifulSoup(data, "html.parser")
      image_link = soup.find('div').find('b').next.next
      print(image_link)


      I scraped the above data to get the following reference link (this is the format I require):



      Case courtesy of Dr Sachintha Hapugoda, <a href="https://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href="https://radiopaedia.org/cases/52525">rID: 52525</a> [Accessed 15 Nov. 2018].



      But adding the above reference link to a new tag loses all HTML formatting:



      p_tag = soup.new_tag('p')
      p_tag.append(soup.new_tag('br'))
      p_tag.append(soup.new_tag('b'))
      p_tag.b.append("Image: ")
      p_tag.append(NavigableString(image_link))
      print(p_tag)


      Returns:



      <p><br/><b>Image: </b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</p>


      All HTML formatting is lost. What do I do?










      share|improve this question















      I scraped the following HTML to get link information, created a new tag, added the link to the new tag then attempted to append that tag to another document but lost all HTML formatting:



      data = """
      <div class="Answer">
      1. BOUNDARIES - EPB &amp; APL&nbsp;<i>(inferior)</i>, EPL&nbsp;<i>(superior).&nbsp;</i><div>2. FLOOR (proximal to distal) - radial styloid =&gt; scaphoid =&gt; trapezium =&gt; 1st MC base.&nbsp;<br /><div>3. CONTENTS - cutaneous branches of radial nerve&nbsp;<i>(on the roof),</i>&nbsp;cephalic vein&nbsp;<i>(begins here),</i>&nbsp;&nbsp;radial artery&nbsp;<i>(on the floor).</i></div></div><div><br /></div><div><img src="paste-27a44c801f0776d91f5f6a16a963bff67f0e8ef3.jpg" /><br /></div><div><b>Image:&nbsp;</b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</div>
      </div>
      """
      soup = BeautifulSoup(data, "html.parser")
      image_link = soup.find('div').find('b').next.next
      print(image_link)


      I scraped the above data to get the following reference link (this is the format I require):



      Case courtesy of Dr Sachintha Hapugoda, <a href="https://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href="https://radiopaedia.org/cases/52525">rID: 52525</a> [Accessed 15 Nov. 2018].



      But adding the above reference link to a new tag loses all HTML formatting:



      p_tag = soup.new_tag('p')
      p_tag.append(soup.new_tag('br'))
      p_tag.append(soup.new_tag('b'))
      p_tag.b.append("Image: ")
      p_tag.append(NavigableString(image_link))
      print(p_tag)


      Returns:



      <p><br/><b>Image: </b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</p>


      All HTML formatting is lost. What do I do?







      python beautifulsoup






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 at 19:56

























      asked Nov 22 at 18:21









      Code Monkey

      143110




      143110
























          1 Answer
          1






          active

          oldest

          votes


















          2














          because type of image_link is NavigableString or string it will convert character like < to &lt; you need to convert it to Tag by creating new soup



          ....
          p_tag.b.append("Image: ")
          image_tag = BeautifulSoup(image_link, 'html.parser')
          p_tag.append(image_tag)


          or unescape the result



          from html import unescape

          ....
          p_tag.append(NavigableString(image_link))
          unescaped_p = unescape(str(p_tag))
          print(unescaped_p)





          share|improve this answer





















          • Dude! You da man! Dari Malaysia? I am a hobbyist and do a lot of work with BeautifulSoup and Scrapy, I’m thinking of making a Discord/Slack group. Would you be interested in joining?
            – Code Monkey
            Nov 22 at 22:45











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436404%2funable-to-add-text-and-link-to-new-tag-in-beautifulsoup%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          because type of image_link is NavigableString or string it will convert character like < to &lt; you need to convert it to Tag by creating new soup



          ....
          p_tag.b.append("Image: ")
          image_tag = BeautifulSoup(image_link, 'html.parser')
          p_tag.append(image_tag)


          or unescape the result



          from html import unescape

          ....
          p_tag.append(NavigableString(image_link))
          unescaped_p = unescape(str(p_tag))
          print(unescaped_p)





          share|improve this answer





















          • Dude! You da man! Dari Malaysia? I am a hobbyist and do a lot of work with BeautifulSoup and Scrapy, I’m thinking of making a Discord/Slack group. Would you be interested in joining?
            – Code Monkey
            Nov 22 at 22:45
















          2














          because type of image_link is NavigableString or string it will convert character like < to &lt; you need to convert it to Tag by creating new soup



          ....
          p_tag.b.append("Image: ")
          image_tag = BeautifulSoup(image_link, 'html.parser')
          p_tag.append(image_tag)


          or unescape the result



          from html import unescape

          ....
          p_tag.append(NavigableString(image_link))
          unescaped_p = unescape(str(p_tag))
          print(unescaped_p)





          share|improve this answer





















          • Dude! You da man! Dari Malaysia? I am a hobbyist and do a lot of work with BeautifulSoup and Scrapy, I’m thinking of making a Discord/Slack group. Would you be interested in joining?
            – Code Monkey
            Nov 22 at 22:45














          2












          2








          2






          because type of image_link is NavigableString or string it will convert character like < to &lt; you need to convert it to Tag by creating new soup



          ....
          p_tag.b.append("Image: ")
          image_tag = BeautifulSoup(image_link, 'html.parser')
          p_tag.append(image_tag)


          or unescape the result



          from html import unescape

          ....
          p_tag.append(NavigableString(image_link))
          unescaped_p = unescape(str(p_tag))
          print(unescaped_p)





          share|improve this answer












          because type of image_link is NavigableString or string it will convert character like < to &lt; you need to convert it to Tag by creating new soup



          ....
          p_tag.b.append("Image: ")
          image_tag = BeautifulSoup(image_link, 'html.parser')
          p_tag.append(image_tag)


          or unescape the result



          from html import unescape

          ....
          p_tag.append(NavigableString(image_link))
          unescaped_p = unescape(str(p_tag))
          print(unescaped_p)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 at 21:27









          ewwink

          9,86922236




          9,86922236












          • Dude! You da man! Dari Malaysia? I am a hobbyist and do a lot of work with BeautifulSoup and Scrapy, I’m thinking of making a Discord/Slack group. Would you be interested in joining?
            – Code Monkey
            Nov 22 at 22:45


















          • Dude! You da man! Dari Malaysia? I am a hobbyist and do a lot of work with BeautifulSoup and Scrapy, I’m thinking of making a Discord/Slack group. Would you be interested in joining?
            – Code Monkey
            Nov 22 at 22:45
















          Dude! You da man! Dari Malaysia? I am a hobbyist and do a lot of work with BeautifulSoup and Scrapy, I’m thinking of making a Discord/Slack group. Would you be interested in joining?
          – Code Monkey
          Nov 22 at 22:45




          Dude! You da man! Dari Malaysia? I am a hobbyist and do a lot of work with BeautifulSoup and Scrapy, I’m thinking of making a Discord/Slack group. Would you be interested in joining?
          – Code Monkey
          Nov 22 at 22:45


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436404%2funable-to-add-text-and-link-to-new-tag-in-beautifulsoup%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Trompette piccolo

          Slow SSRS Report in dynamic grouping and multiple parameters

          Simon Yates (cyclisme)