File watch in S3 and send the particular path to a program
up vote
0
down vote
favorite
I am new with S3 bucket processing.
I run my hive scripts running in ec2-insctance and its results in the form of .csv files gets saved in their respective folders according to the script in S3. Now my requirement is that I have to have a file watch to see whenever a new .csv file is overwritten in every folders in S3 and send the full path of those .csv to my python program and call the program to run and save the output.csv in the same folder.It would be helpful if anyone can suggest some ways so that I could pick up and implement it.
hadoop amazon-s3 amazon-ec2 hive
add a comment |
up vote
0
down vote
favorite
I am new with S3 bucket processing.
I run my hive scripts running in ec2-insctance and its results in the form of .csv files gets saved in their respective folders according to the script in S3. Now my requirement is that I have to have a file watch to see whenever a new .csv file is overwritten in every folders in S3 and send the full path of those .csv to my python program and call the program to run and save the output.csv in the same folder.It would be helpful if anyone can suggest some ways so that I could pick up and implement it.
hadoop amazon-s3 amazon-ec2 hive
AWS lambda is typically used for file watching
– cricket_007
Nov 22 at 17:24
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am new with S3 bucket processing.
I run my hive scripts running in ec2-insctance and its results in the form of .csv files gets saved in their respective folders according to the script in S3. Now my requirement is that I have to have a file watch to see whenever a new .csv file is overwritten in every folders in S3 and send the full path of those .csv to my python program and call the program to run and save the output.csv in the same folder.It would be helpful if anyone can suggest some ways so that I could pick up and implement it.
hadoop amazon-s3 amazon-ec2 hive
I am new with S3 bucket processing.
I run my hive scripts running in ec2-insctance and its results in the form of .csv files gets saved in their respective folders according to the script in S3. Now my requirement is that I have to have a file watch to see whenever a new .csv file is overwritten in every folders in S3 and send the full path of those .csv to my python program and call the program to run and save the output.csv in the same folder.It would be helpful if anyone can suggest some ways so that I could pick up and implement it.
hadoop amazon-s3 amazon-ec2 hive
hadoop amazon-s3 amazon-ec2 hive
edited Nov 22 at 9:56
asked Nov 22 at 9:46
Vijaya Seetharaman
5419
5419
AWS lambda is typically used for file watching
– cricket_007
Nov 22 at 17:24
add a comment |
AWS lambda is typically used for file watching
– cricket_007
Nov 22 at 17:24
AWS lambda is typically used for file watching
– cricket_007
Nov 22 at 17:24
AWS lambda is typically used for file watching
– cricket_007
Nov 22 at 17:24
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
- you can use Spark Streaming to monitor a directory, kick off work when new entries are added. Needs to you run a spark cluster all the time.
- you can set up S3 itself to send events through S3 Event notifications to their queue service or AWS lambda.
Option #2 is going to be the lowest cost and most reliable
Is it possible with Oozie Coordinator running in Ec2-instance? I am asking this because my python script reside in ec2-instance. And for S3 Event notification my ec2-instance doesnot have web server enabled. I am unable to proceed with only amazon documentation. Wanted some basic explanation rather than too complicated
– Vijaya Seetharaman
Nov 26 at 11:56
no idea, sorry.
– Steve Loughran
Nov 26 at 14:42
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
- you can use Spark Streaming to monitor a directory, kick off work when new entries are added. Needs to you run a spark cluster all the time.
- you can set up S3 itself to send events through S3 Event notifications to their queue service or AWS lambda.
Option #2 is going to be the lowest cost and most reliable
Is it possible with Oozie Coordinator running in Ec2-instance? I am asking this because my python script reside in ec2-instance. And for S3 Event notification my ec2-instance doesnot have web server enabled. I am unable to proceed with only amazon documentation. Wanted some basic explanation rather than too complicated
– Vijaya Seetharaman
Nov 26 at 11:56
no idea, sorry.
– Steve Loughran
Nov 26 at 14:42
add a comment |
up vote
0
down vote
- you can use Spark Streaming to monitor a directory, kick off work when new entries are added. Needs to you run a spark cluster all the time.
- you can set up S3 itself to send events through S3 Event notifications to their queue service or AWS lambda.
Option #2 is going to be the lowest cost and most reliable
Is it possible with Oozie Coordinator running in Ec2-instance? I am asking this because my python script reside in ec2-instance. And for S3 Event notification my ec2-instance doesnot have web server enabled. I am unable to proceed with only amazon documentation. Wanted some basic explanation rather than too complicated
– Vijaya Seetharaman
Nov 26 at 11:56
no idea, sorry.
– Steve Loughran
Nov 26 at 14:42
add a comment |
up vote
0
down vote
up vote
0
down vote
- you can use Spark Streaming to monitor a directory, kick off work when new entries are added. Needs to you run a spark cluster all the time.
- you can set up S3 itself to send events through S3 Event notifications to their queue service or AWS lambda.
Option #2 is going to be the lowest cost and most reliable
- you can use Spark Streaming to monitor a directory, kick off work when new entries are added. Needs to you run a spark cluster all the time.
- you can set up S3 itself to send events through S3 Event notifications to their queue service or AWS lambda.
Option #2 is going to be the lowest cost and most reliable
answered Nov 24 at 17:21
Steve Loughran
4,96511417
4,96511417
Is it possible with Oozie Coordinator running in Ec2-instance? I am asking this because my python script reside in ec2-instance. And for S3 Event notification my ec2-instance doesnot have web server enabled. I am unable to proceed with only amazon documentation. Wanted some basic explanation rather than too complicated
– Vijaya Seetharaman
Nov 26 at 11:56
no idea, sorry.
– Steve Loughran
Nov 26 at 14:42
add a comment |
Is it possible with Oozie Coordinator running in Ec2-instance? I am asking this because my python script reside in ec2-instance. And for S3 Event notification my ec2-instance doesnot have web server enabled. I am unable to proceed with only amazon documentation. Wanted some basic explanation rather than too complicated
– Vijaya Seetharaman
Nov 26 at 11:56
no idea, sorry.
– Steve Loughran
Nov 26 at 14:42
Is it possible with Oozie Coordinator running in Ec2-instance? I am asking this because my python script reside in ec2-instance. And for S3 Event notification my ec2-instance doesnot have web server enabled. I am unable to proceed with only amazon documentation. Wanted some basic explanation rather than too complicated
– Vijaya Seetharaman
Nov 26 at 11:56
Is it possible with Oozie Coordinator running in Ec2-instance? I am asking this because my python script reside in ec2-instance. And for S3 Event notification my ec2-instance doesnot have web server enabled. I am unable to proceed with only amazon documentation. Wanted some basic explanation rather than too complicated
– Vijaya Seetharaman
Nov 26 at 11:56
no idea, sorry.
– Steve Loughran
Nov 26 at 14:42
no idea, sorry.
– Steve Loughran
Nov 26 at 14:42
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53428046%2ffile-watch-in-s3-and-send-the-particular-path-to-a-program%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
AWS lambda is typically used for file watching
– cricket_007
Nov 22 at 17:24