Vader Sentiment with multiple PDF

up vote
0
down vote

favorite

I have recently merged 20 pdf in 1 pdf via adobe. I have import the pdf in python with this code.

from PyPDF2 import PdfFileReader, PdfFileWriter

pdf_file = open ('/Users/cj/Desktop/PEI.pdf','rb')

newfile=open('rjtjj.txt','w')

pdf_reader= PdfFileReader (pdf_file)

pdf_writer= PdfFileWriter()

print(pdf_reader.numPages) 

n=pdf_reader.getNumPages()

for i in range(0, n-1):    

# pdf_writer.addPage(pdf_reader.getPage(i))

gft=pdf_reader.getPage(i)

newfile.write(gft.extractText())

pdf_file.close()

newfile.close()

I'm trying to use Vadersentiment to analyse the pdf. What i want to do is analyse individually the 20 pdf that are merged into 1.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

with open('rjtjj.txt', 'r') as f:

for line in f.read().split("n"):

vs=analyzer.polarity_scores(line)

I know my code is wrong, because it only gives me the first line of the entire pdf. I am new to this, i would really appreciate your help.
Thank you

asked Nov 22 at 13:55

user10277070

111

add a comment |

up vote
0
down vote

favorite

I have recently merged 20 pdf in 1 pdf via adobe. I have import the pdf in python with this code.

from PyPDF2 import PdfFileReader, PdfFileWriter

pdf_file = open ('/Users/cj/Desktop/PEI.pdf','rb')

newfile=open('rjtjj.txt','w')

pdf_reader= PdfFileReader (pdf_file)

pdf_writer= PdfFileWriter()

print(pdf_reader.numPages) 

n=pdf_reader.getNumPages()

for i in range(0, n-1):    

# pdf_writer.addPage(pdf_reader.getPage(i))

gft=pdf_reader.getPage(i)

newfile.write(gft.extractText())

pdf_file.close()

newfile.close()

I'm trying to use Vadersentiment to analyse the pdf. What i want to do is analyse individually the 20 pdf that are merged into 1.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

with open('rjtjj.txt', 'r') as f:

for line in f.read().split("n"):

vs=analyzer.polarity_scores(line)

I know my code is wrong, because it only gives me the first line of the entire pdf. I am new to this, i would really appreciate your help.
Thank you

asked Nov 22 at 13:55

user10277070

111

add a comment |

up vote
0
down vote

favorite

I have recently merged 20 pdf in 1 pdf via adobe. I have import the pdf in python with this code.

from PyPDF2 import PdfFileReader, PdfFileWriter

pdf_file = open ('/Users/cj/Desktop/PEI.pdf','rb')

newfile=open('rjtjj.txt','w')

pdf_reader= PdfFileReader (pdf_file)

pdf_writer= PdfFileWriter()

print(pdf_reader.numPages) 

n=pdf_reader.getNumPages()

for i in range(0, n-1):    

# pdf_writer.addPage(pdf_reader.getPage(i))

gft=pdf_reader.getPage(i)

newfile.write(gft.extractText())

pdf_file.close()

newfile.close()

I'm trying to use Vadersentiment to analyse the pdf. What i want to do is analyse individually the 20 pdf that are merged into 1.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

with open('rjtjj.txt', 'r') as f:

for line in f.read().split("n"):

vs=analyzer.polarity_scores(line)

I know my code is wrong, because it only gives me the first line of the entire pdf. I am new to this, i would really appreciate your help.
Thank you

asked Nov 22 at 13:55

user10277070

111

I have recently merged 20 pdf in 1 pdf via adobe. I have import the pdf in python with this code.

from PyPDF2 import PdfFileReader, PdfFileWriter

pdf_file = open ('/Users/cj/Desktop/PEI.pdf','rb')

newfile=open('rjtjj.txt','w')

pdf_reader= PdfFileReader (pdf_file)

pdf_writer= PdfFileWriter()

print(pdf_reader.numPages) 

n=pdf_reader.getNumPages()

for i in range(0, n-1):    

# pdf_writer.addPage(pdf_reader.getPage(i))

gft=pdf_reader.getPage(i)

newfile.write(gft.extractText())

pdf_file.close()

newfile.close()

I'm trying to use Vadersentiment to analyse the pdf. What i want to do is analyse individually the 20 pdf that are merged into 1.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

with open('rjtjj.txt', 'r') as f:

for line in f.read().split("n"):

vs=analyzer.polarity_scores(line)

I know my code is wrong, because it only gives me the first line of the entire pdf. I am new to this, i would really appreciate your help.
Thank you

python-3.x

asked Nov 22 at 13:55

user10277070

111

asked Nov 22 at 13:55

user10277070

111

asked Nov 22 at 13:55

user10277070

111

asked Nov 22 at 13:55

user10277070

111

asked Nov 22 at 13:55

user10277070

111

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

Your problem really isn't about Vader sentiment analysis -- it is about correct extraction of text from a PDF.

Postscript's forth interpreter is Turing-complete, so some PDF documents are "hard" to parse. You didn't post your PDF so we can only guess at the issue. You might try using poppler's pdftotext command line utility instead. Ubuntu calls the package "poppler-utils"; on mac you would use brew install poppler. Running through pdf2ps & ps2ascii will sometimes offer different, and helpful, results.

If you continue to find it difficult to retrieve proper text from the PDF, you may want to contact whoever produced the PDF and settle on supplying the same information in a revised format.

answered Nov 22 at 22:21

J_H

3,0981616

I have installed poppler with home brew. What code should i use? can i use it on python?
– user10277070
Nov 22 at 23:58

I was suggesting two things: (1) The PDF format can be "really hard" to parse ascii text from, and (2) different PDF parsers come at it from different directions, so one might win in a certain situation, like table formatted pages, where another parser happens to lose. You didn't disclose the PDF of interest, nor how it was produced, nor how it might be produced through alternate means, including output to .TXT or to .CSV. If $ pdftotext PEI.pdf wins, and that is a big "if", then python could simply consume the resulting PEI.txt ascii text, without needing a PDF library at all.
– J_H
Nov 23 at 0:40

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432554%2fvader-sentiment-with-multiple-pdf%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Your problem really isn't about Vader sentiment analysis -- it is about correct extraction of text from a PDF.

If you continue to find it difficult to retrieve proper text from the PDF, you may want to contact whoever produced the PDF and settle on supplying the same information in a revised format.

answered Nov 22 at 22:21

J_H

3,0981616

I have installed poppler with home brew. What code should i use? can i use it on python?
– user10277070
Nov 22 at 23:58

I was suggesting two things: (1) The PDF format can be "really hard" to parse ascii text from, and (2) different PDF parsers come at it from different directions, so one might win in a certain situation, like table formatted pages, where another parser happens to lose. You didn't disclose the PDF of interest, nor how it was produced, nor how it might be produced through alternate means, including output to .TXT or to .CSV. If $ pdftotext PEI.pdf wins, and that is a big "if", then python could simply consume the resulting PEI.txt ascii text, without needing a PDF library at all.
– J_H
Nov 23 at 0:40

add a comment |

up vote
0
down vote

Your problem really isn't about Vader sentiment analysis -- it is about correct extraction of text from a PDF.

If you continue to find it difficult to retrieve proper text from the PDF, you may want to contact whoever produced the PDF and settle on supplying the same information in a revised format.

answered Nov 22 at 22:21

J_H

3,0981616

I have installed poppler with home brew. What code should i use? can i use it on python?
– user10277070
Nov 22 at 23:58

I was suggesting two things: (1) The PDF format can be "really hard" to parse ascii text from, and (2) different PDF parsers come at it from different directions, so one might win in a certain situation, like table formatted pages, where another parser happens to lose. You didn't disclose the PDF of interest, nor how it was produced, nor how it might be produced through alternate means, including output to .TXT or to .CSV. If $ pdftotext PEI.pdf wins, and that is a big "if", then python could simply consume the resulting PEI.txt ascii text, without needing a PDF library at all.
– J_H
Nov 23 at 0:40

add a comment |

up vote
0
down vote

Your problem really isn't about Vader sentiment analysis -- it is about correct extraction of text from a PDF.

If you continue to find it difficult to retrieve proper text from the PDF, you may want to contact whoever produced the PDF and settle on supplying the same information in a revised format.

answered Nov 22 at 22:21

J_H

3,0981616

Your problem really isn't about Vader sentiment analysis -- it is about correct extraction of text from a PDF.

If you continue to find it difficult to retrieve proper text from the PDF, you may want to contact whoever produced the PDF and settle on supplying the same information in a revised format.

answered Nov 22 at 22:21

J_H

3,0981616

answered Nov 22 at 22:21

J_H

3,0981616

answered Nov 22 at 22:21

J_H

3,0981616

answered Nov 22 at 22:21

J_H

3,0981616

I have installed poppler with home brew. What code should i use? can i use it on python?
– user10277070
Nov 22 at 23:58

I was suggesting two things: (1) The PDF format can be "really hard" to parse ascii text from, and (2) different PDF parsers come at it from different directions, so one might win in a certain situation, like table formatted pages, where another parser happens to lose. You didn't disclose the PDF of interest, nor how it was produced, nor how it might be produced through alternate means, including output to .TXT or to .CSV. If $ pdftotext PEI.pdf wins, and that is a big "if", then python could simply consume the resulting PEI.txt ascii text, without needing a PDF library at all.
– J_H
Nov 23 at 0:40

add a comment |

I have installed poppler with home brew. What code should i use? can i use it on python?
– user10277070
Nov 22 at 23:58

I was suggesting two things: (1) The PDF format can be "really hard" to parse ascii text from, and (2) different PDF parsers come at it from different directions, so one might win in a certain situation, like table formatted pages, where another parser happens to lose. You didn't disclose the PDF of interest, nor how it was produced, nor how it might be produced through alternate means, including output to .TXT or to .CSV. If $ pdftotext PEI.pdf wins, and that is a big "if", then python could simply consume the resulting PEI.txt ascii text, without needing a PDF library at all.
– J_H
Nov 23 at 0:40

I have installed poppler with home brew. What code should i use? can i use it on python?
– user10277070
Nov 22 at 23:58

I was suggesting two things: (1) The PDF format can be "really hard" to parse ascii text from, and (2) different PDF parsers come at it from different directions, so one might win in a certain situation, like table formatted pages, where another parser happens to lose. You didn't disclose the PDF of interest, nor how it was produced, nor how it might be produced through alternate means, including output to .TXT or to .CSV. If $ pdftotext PEI.pdf wins, and that is a big "if", then python could simply consume the resulting PEI.txt ascii text, without needing a PDF library at all.
– J_H
Nov 23 at 0:40

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi