Hive - Parquet format - OR clause in where not working as expected












1














I am creating a table and inserting data like below



CREATE TABLE `mydb.mytable`(
`seq_num` decimal(18,0),
`ins_upd_flag` char(1)
)
stored as parquet;

INSERT OVERWRITE TABLE mydb.mytable
SELECT 10457 SEQ_NUM,'I' INS_UPD_FLAG UNION ALL
SELECT 10462,'I' UNION ALL
SELECT 10461,'I' UNION ALL
SELECT 10460,'I' UNION ALL
SELECT 10459,'I' UNION ALL
SELECT 10456,'I' UNION ALL
SELECT 10458,'I';


As you can see above, the second column value is just 'I' and it does not have any extra characters.



When I run the following query, it does not result anything



SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
FROM mydb.mytable T
WHERE SEQ_NUM <> 0 OR INS_UPD_FLAG <> 'I' ;


However, if I run the below query it returns all the inserted rows as expected



SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG)) -- both length =1
FROM mydb.mytable T
WHERE SEQ_NUM <> 0 OR TRIM(INS_UPD_FLAG) <> 'I' ;


Any explanation on why this is happening and how to resolve the same? I need the results without using trim



Please note that this issue is happening only with OR condition. If I run the below query, I am getting results



SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
FROM mydb.mytable T
WHERE SEQ_NUM <> 0 and INS_UPD_FLAG = 'I' ;


Also, if I create table with ORC or Text Format, I am getting results without trim.










share|improve this question



























    1














    I am creating a table and inserting data like below



    CREATE TABLE `mydb.mytable`(
    `seq_num` decimal(18,0),
    `ins_upd_flag` char(1)
    )
    stored as parquet;

    INSERT OVERWRITE TABLE mydb.mytable
    SELECT 10457 SEQ_NUM,'I' INS_UPD_FLAG UNION ALL
    SELECT 10462,'I' UNION ALL
    SELECT 10461,'I' UNION ALL
    SELECT 10460,'I' UNION ALL
    SELECT 10459,'I' UNION ALL
    SELECT 10456,'I' UNION ALL
    SELECT 10458,'I';


    As you can see above, the second column value is just 'I' and it does not have any extra characters.



    When I run the following query, it does not result anything



    SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
    FROM mydb.mytable T
    WHERE SEQ_NUM <> 0 OR INS_UPD_FLAG <> 'I' ;


    However, if I run the below query it returns all the inserted rows as expected



    SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG)) -- both length =1
    FROM mydb.mytable T
    WHERE SEQ_NUM <> 0 OR TRIM(INS_UPD_FLAG) <> 'I' ;


    Any explanation on why this is happening and how to resolve the same? I need the results without using trim



    Please note that this issue is happening only with OR condition. If I run the below query, I am getting results



    SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
    FROM mydb.mytable T
    WHERE SEQ_NUM <> 0 and INS_UPD_FLAG = 'I' ;


    Also, if I create table with ORC or Text Format, I am getting results without trim.










    share|improve this question

























      1












      1








      1







      I am creating a table and inserting data like below



      CREATE TABLE `mydb.mytable`(
      `seq_num` decimal(18,0),
      `ins_upd_flag` char(1)
      )
      stored as parquet;

      INSERT OVERWRITE TABLE mydb.mytable
      SELECT 10457 SEQ_NUM,'I' INS_UPD_FLAG UNION ALL
      SELECT 10462,'I' UNION ALL
      SELECT 10461,'I' UNION ALL
      SELECT 10460,'I' UNION ALL
      SELECT 10459,'I' UNION ALL
      SELECT 10456,'I' UNION ALL
      SELECT 10458,'I';


      As you can see above, the second column value is just 'I' and it does not have any extra characters.



      When I run the following query, it does not result anything



      SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
      FROM mydb.mytable T
      WHERE SEQ_NUM <> 0 OR INS_UPD_FLAG <> 'I' ;


      However, if I run the below query it returns all the inserted rows as expected



      SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG)) -- both length =1
      FROM mydb.mytable T
      WHERE SEQ_NUM <> 0 OR TRIM(INS_UPD_FLAG) <> 'I' ;


      Any explanation on why this is happening and how to resolve the same? I need the results without using trim



      Please note that this issue is happening only with OR condition. If I run the below query, I am getting results



      SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
      FROM mydb.mytable T
      WHERE SEQ_NUM <> 0 and INS_UPD_FLAG = 'I' ;


      Also, if I create table with ORC or Text Format, I am getting results without trim.










      share|improve this question













      I am creating a table and inserting data like below



      CREATE TABLE `mydb.mytable`(
      `seq_num` decimal(18,0),
      `ins_upd_flag` char(1)
      )
      stored as parquet;

      INSERT OVERWRITE TABLE mydb.mytable
      SELECT 10457 SEQ_NUM,'I' INS_UPD_FLAG UNION ALL
      SELECT 10462,'I' UNION ALL
      SELECT 10461,'I' UNION ALL
      SELECT 10460,'I' UNION ALL
      SELECT 10459,'I' UNION ALL
      SELECT 10456,'I' UNION ALL
      SELECT 10458,'I';


      As you can see above, the second column value is just 'I' and it does not have any extra characters.



      When I run the following query, it does not result anything



      SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
      FROM mydb.mytable T
      WHERE SEQ_NUM <> 0 OR INS_UPD_FLAG <> 'I' ;


      However, if I run the below query it returns all the inserted rows as expected



      SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG)) -- both length =1
      FROM mydb.mytable T
      WHERE SEQ_NUM <> 0 OR TRIM(INS_UPD_FLAG) <> 'I' ;


      Any explanation on why this is happening and how to resolve the same? I need the results without using trim



      Please note that this issue is happening only with OR condition. If I run the below query, I am getting results



      SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
      FROM mydb.mytable T
      WHERE SEQ_NUM <> 0 and INS_UPD_FLAG = 'I' ;


      Also, if I create table with ORC or Text Format, I am getting results without trim.







      hive hiveql hadoop2






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 at 18:38









      Mohan

      3261210




      3261210
























          3 Answers
          3






          active

          oldest

          votes


















          1














          I managed to re-pro your issue and getting same results as you have explained.



          I ran a query to see what exactly your where condition is producing:



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T


          and results are as following:



          |seq_num|ins_upd_flag|_c1|_c2|_c3 |_c4 |_c5 |
          |10457 |I |1 |1 |true|true|true|
          |10462 |I |1 |1 |true|true|true|
          |10461 |I |1 |1 |true|true|true|
          |10460 |I |1 |1 |true|true|true|
          |10459 |I |1 |1 |true|true|true|
          |10456 |I |1 |1 |true|true|true|
          |10458 |I |1 |1 |true|true|true|


          As you can see it is returning true for your join condition.



          Out of curiosity, I wrapped the where condition around () and got the results without trim.



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T
          where (SEQ_NUM <> 0 or INS_UPD_FLAG = 'I')


          Results: Same as previous query.



          I am using Hive 1.2.1000 via Hue 2.6.1-227.



          I don't know what magic () is doing and why query is not working without it.






          share|improve this answer





















          • Good finding. Unfortunately, the brackets '()' does not do anything in my version of hive (Hive 1.2.1000.2.6.5.0-292 on HDP). This looks like a bug with Hive 1.2
            – Mohan
            Nov 24 at 17:55



















          0














          may i know reason why you dont want to use the trim function.As hadoop stores the data in files there is alot of chances that it can add the space while inserting.i faced this issue a lot of time while using parque format.it depends sometimes you face it some time you wont face it.
          whatever it is can you run you query only based on this condition
          SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
          FROM mydb.mytable T
          WHERE SEQ_NUM <> 0



          if it is not returning anything the way data inserted is wrong..please try it let me know





          share|improve this answer





















          • I do not want to use trim because I am having a similar char column as partitioned column and I am not sure if a condition with trim(partitioned_column) will be used for predicate push down
            – Mohan
            Nov 23 at 6:56










          • *I am not sure trim(partitioned_column) will be used for partition scan
            – Mohan
            Nov 23 at 7:02



















          0














          The first thing is that I am not sure why you aren't seeing the expected results. I ran a very similar test, below, and received the expected results. What version of Hive are you on? I did the same tests on hive-1.1.0+cdh5.13.3+1307 (Cloudera distro w/v5.13):



          CREATE TABLE test_cond_parq
          (
          seq_num decimal(10,0),
          ind string
          )
          STORED AS PARQUET;

          INSERT INTO TABLE test_cond_parq
          SELECT 10457, 'I' UNION ALL
          SELECT 10458, 'I'
          ;


          Then I ran the following query, similar to yours above...



          SELECT *, LENGTH(ind) len_ind, LENGTH(TRIM(ind)) len_trim_ind
          FROM test_cond_parq
          WHERE seq_num <> 0 OR ind <> 'I';


          This returned the following results:



          10457 | I | 1 | 1
          10458 | I | 1 | 1





          share|improve this answer























          • I am using Hive 1.2.1000.2.6.5.0-292
            – Mohan
            Nov 23 at 5:39










          • Thank you for trying it out. I am surprised why it is not working in the version I am using.
            – Mohan
            Nov 23 at 5:39










          • NP. What SQL editor are you running your test in? I did it in HUE so it could perhaps be related to that?
            – ajdams
            Nov 23 at 13:10










          • I ran my queries directly in Hive command line interface
            – Mohan
            Nov 23 at 14:01











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436581%2fhive-parquet-format-or-clause-in-where-not-working-as-expected%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          I managed to re-pro your issue and getting same results as you have explained.



          I ran a query to see what exactly your where condition is producing:



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T


          and results are as following:



          |seq_num|ins_upd_flag|_c1|_c2|_c3 |_c4 |_c5 |
          |10457 |I |1 |1 |true|true|true|
          |10462 |I |1 |1 |true|true|true|
          |10461 |I |1 |1 |true|true|true|
          |10460 |I |1 |1 |true|true|true|
          |10459 |I |1 |1 |true|true|true|
          |10456 |I |1 |1 |true|true|true|
          |10458 |I |1 |1 |true|true|true|


          As you can see it is returning true for your join condition.



          Out of curiosity, I wrapped the where condition around () and got the results without trim.



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T
          where (SEQ_NUM <> 0 or INS_UPD_FLAG = 'I')


          Results: Same as previous query.



          I am using Hive 1.2.1000 via Hue 2.6.1-227.



          I don't know what magic () is doing and why query is not working without it.






          share|improve this answer





















          • Good finding. Unfortunately, the brackets '()' does not do anything in my version of hive (Hive 1.2.1000.2.6.5.0-292 on HDP). This looks like a bug with Hive 1.2
            – Mohan
            Nov 24 at 17:55
















          1














          I managed to re-pro your issue and getting same results as you have explained.



          I ran a query to see what exactly your where condition is producing:



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T


          and results are as following:



          |seq_num|ins_upd_flag|_c1|_c2|_c3 |_c4 |_c5 |
          |10457 |I |1 |1 |true|true|true|
          |10462 |I |1 |1 |true|true|true|
          |10461 |I |1 |1 |true|true|true|
          |10460 |I |1 |1 |true|true|true|
          |10459 |I |1 |1 |true|true|true|
          |10456 |I |1 |1 |true|true|true|
          |10458 |I |1 |1 |true|true|true|


          As you can see it is returning true for your join condition.



          Out of curiosity, I wrapped the where condition around () and got the results without trim.



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T
          where (SEQ_NUM <> 0 or INS_UPD_FLAG = 'I')


          Results: Same as previous query.



          I am using Hive 1.2.1000 via Hue 2.6.1-227.



          I don't know what magic () is doing and why query is not working without it.






          share|improve this answer





















          • Good finding. Unfortunately, the brackets '()' does not do anything in my version of hive (Hive 1.2.1000.2.6.5.0-292 on HDP). This looks like a bug with Hive 1.2
            – Mohan
            Nov 24 at 17:55














          1












          1








          1






          I managed to re-pro your issue and getting same results as you have explained.



          I ran a query to see what exactly your where condition is producing:



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T


          and results are as following:



          |seq_num|ins_upd_flag|_c1|_c2|_c3 |_c4 |_c5 |
          |10457 |I |1 |1 |true|true|true|
          |10462 |I |1 |1 |true|true|true|
          |10461 |I |1 |1 |true|true|true|
          |10460 |I |1 |1 |true|true|true|
          |10459 |I |1 |1 |true|true|true|
          |10456 |I |1 |1 |true|true|true|
          |10458 |I |1 |1 |true|true|true|


          As you can see it is returning true for your join condition.



          Out of curiosity, I wrapped the where condition around () and got the results without trim.



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T
          where (SEQ_NUM <> 0 or INS_UPD_FLAG = 'I')


          Results: Same as previous query.



          I am using Hive 1.2.1000 via Hue 2.6.1-227.



          I don't know what magic () is doing and why query is not working without it.






          share|improve this answer












          I managed to re-pro your issue and getting same results as you have explained.



          I ran a query to see what exactly your where condition is producing:



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T


          and results are as following:



          |seq_num|ins_upd_flag|_c1|_c2|_c3 |_c4 |_c5 |
          |10457 |I |1 |1 |true|true|true|
          |10462 |I |1 |1 |true|true|true|
          |10461 |I |1 |1 |true|true|true|
          |10460 |I |1 |1 |true|true|true|
          |10459 |I |1 |1 |true|true|true|
          |10456 |I |1 |1 |true|true|true|
          |10458 |I |1 |1 |true|true|true|


          As you can see it is returning true for your join condition.



          Out of curiosity, I wrapped the where condition around () and got the results without trim.



          SELECT
          T.*,
          length(INS_UPD_FLAG),
          length(trim(INS_UPD_FLAG)),
          SEQ_NUM <> 0,
          INS_UPD_FLAG = 'I',
          SEQ_NUM <> 0 or INS_UPD_FLAG = 'I'
          FROM mydb.mytable T
          where (SEQ_NUM <> 0 or INS_UPD_FLAG = 'I')


          Results: Same as previous query.



          I am using Hive 1.2.1000 via Hue 2.6.1-227.



          I don't know what magic () is doing and why query is not working without it.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 24 at 5:35









          Ambrish

          2,71421631




          2,71421631












          • Good finding. Unfortunately, the brackets '()' does not do anything in my version of hive (Hive 1.2.1000.2.6.5.0-292 on HDP). This looks like a bug with Hive 1.2
            – Mohan
            Nov 24 at 17:55


















          • Good finding. Unfortunately, the brackets '()' does not do anything in my version of hive (Hive 1.2.1000.2.6.5.0-292 on HDP). This looks like a bug with Hive 1.2
            – Mohan
            Nov 24 at 17:55
















          Good finding. Unfortunately, the brackets '()' does not do anything in my version of hive (Hive 1.2.1000.2.6.5.0-292 on HDP). This looks like a bug with Hive 1.2
          – Mohan
          Nov 24 at 17:55




          Good finding. Unfortunately, the brackets '()' does not do anything in my version of hive (Hive 1.2.1000.2.6.5.0-292 on HDP). This looks like a bug with Hive 1.2
          – Mohan
          Nov 24 at 17:55













          0














          may i know reason why you dont want to use the trim function.As hadoop stores the data in files there is alot of chances that it can add the space while inserting.i faced this issue a lot of time while using parque format.it depends sometimes you face it some time you wont face it.
          whatever it is can you run you query only based on this condition
          SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
          FROM mydb.mytable T
          WHERE SEQ_NUM <> 0



          if it is not returning anything the way data inserted is wrong..please try it let me know





          share|improve this answer





















          • I do not want to use trim because I am having a similar char column as partitioned column and I am not sure if a condition with trim(partitioned_column) will be used for predicate push down
            – Mohan
            Nov 23 at 6:56










          • *I am not sure trim(partitioned_column) will be used for partition scan
            – Mohan
            Nov 23 at 7:02
















          0














          may i know reason why you dont want to use the trim function.As hadoop stores the data in files there is alot of chances that it can add the space while inserting.i faced this issue a lot of time while using parque format.it depends sometimes you face it some time you wont face it.
          whatever it is can you run you query only based on this condition
          SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
          FROM mydb.mytable T
          WHERE SEQ_NUM <> 0



          if it is not returning anything the way data inserted is wrong..please try it let me know





          share|improve this answer





















          • I do not want to use trim because I am having a similar char column as partitioned column and I am not sure if a condition with trim(partitioned_column) will be used for predicate push down
            – Mohan
            Nov 23 at 6:56










          • *I am not sure trim(partitioned_column) will be used for partition scan
            – Mohan
            Nov 23 at 7:02














          0












          0








          0






          may i know reason why you dont want to use the trim function.As hadoop stores the data in files there is alot of chances that it can add the space while inserting.i faced this issue a lot of time while using parque format.it depends sometimes you face it some time you wont face it.
          whatever it is can you run you query only based on this condition
          SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
          FROM mydb.mytable T
          WHERE SEQ_NUM <> 0



          if it is not returning anything the way data inserted is wrong..please try it let me know





          share|improve this answer












          may i know reason why you dont want to use the trim function.As hadoop stores the data in files there is alot of chances that it can add the space while inserting.i faced this issue a lot of time while using parque format.it depends sometimes you face it some time you wont face it.
          whatever it is can you run you query only based on this condition
          SELECT T.*, length(INS_UPD_FLAG), length(trim(INS_UPD_FLAG))
          FROM mydb.mytable T
          WHERE SEQ_NUM <> 0



          if it is not returning anything the way data inserted is wrong..please try it let me know






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 23 at 3:27









          sivaraj

          143




          143












          • I do not want to use trim because I am having a similar char column as partitioned column and I am not sure if a condition with trim(partitioned_column) will be used for predicate push down
            – Mohan
            Nov 23 at 6:56










          • *I am not sure trim(partitioned_column) will be used for partition scan
            – Mohan
            Nov 23 at 7:02


















          • I do not want to use trim because I am having a similar char column as partitioned column and I am not sure if a condition with trim(partitioned_column) will be used for predicate push down
            – Mohan
            Nov 23 at 6:56










          • *I am not sure trim(partitioned_column) will be used for partition scan
            – Mohan
            Nov 23 at 7:02
















          I do not want to use trim because I am having a similar char column as partitioned column and I am not sure if a condition with trim(partitioned_column) will be used for predicate push down
          – Mohan
          Nov 23 at 6:56




          I do not want to use trim because I am having a similar char column as partitioned column and I am not sure if a condition with trim(partitioned_column) will be used for predicate push down
          – Mohan
          Nov 23 at 6:56












          *I am not sure trim(partitioned_column) will be used for partition scan
          – Mohan
          Nov 23 at 7:02




          *I am not sure trim(partitioned_column) will be used for partition scan
          – Mohan
          Nov 23 at 7:02











          0














          The first thing is that I am not sure why you aren't seeing the expected results. I ran a very similar test, below, and received the expected results. What version of Hive are you on? I did the same tests on hive-1.1.0+cdh5.13.3+1307 (Cloudera distro w/v5.13):



          CREATE TABLE test_cond_parq
          (
          seq_num decimal(10,0),
          ind string
          )
          STORED AS PARQUET;

          INSERT INTO TABLE test_cond_parq
          SELECT 10457, 'I' UNION ALL
          SELECT 10458, 'I'
          ;


          Then I ran the following query, similar to yours above...



          SELECT *, LENGTH(ind) len_ind, LENGTH(TRIM(ind)) len_trim_ind
          FROM test_cond_parq
          WHERE seq_num <> 0 OR ind <> 'I';


          This returned the following results:



          10457 | I | 1 | 1
          10458 | I | 1 | 1





          share|improve this answer























          • I am using Hive 1.2.1000.2.6.5.0-292
            – Mohan
            Nov 23 at 5:39










          • Thank you for trying it out. I am surprised why it is not working in the version I am using.
            – Mohan
            Nov 23 at 5:39










          • NP. What SQL editor are you running your test in? I did it in HUE so it could perhaps be related to that?
            – ajdams
            Nov 23 at 13:10










          • I ran my queries directly in Hive command line interface
            – Mohan
            Nov 23 at 14:01
















          0














          The first thing is that I am not sure why you aren't seeing the expected results. I ran a very similar test, below, and received the expected results. What version of Hive are you on? I did the same tests on hive-1.1.0+cdh5.13.3+1307 (Cloudera distro w/v5.13):



          CREATE TABLE test_cond_parq
          (
          seq_num decimal(10,0),
          ind string
          )
          STORED AS PARQUET;

          INSERT INTO TABLE test_cond_parq
          SELECT 10457, 'I' UNION ALL
          SELECT 10458, 'I'
          ;


          Then I ran the following query, similar to yours above...



          SELECT *, LENGTH(ind) len_ind, LENGTH(TRIM(ind)) len_trim_ind
          FROM test_cond_parq
          WHERE seq_num <> 0 OR ind <> 'I';


          This returned the following results:



          10457 | I | 1 | 1
          10458 | I | 1 | 1





          share|improve this answer























          • I am using Hive 1.2.1000.2.6.5.0-292
            – Mohan
            Nov 23 at 5:39










          • Thank you for trying it out. I am surprised why it is not working in the version I am using.
            – Mohan
            Nov 23 at 5:39










          • NP. What SQL editor are you running your test in? I did it in HUE so it could perhaps be related to that?
            – ajdams
            Nov 23 at 13:10










          • I ran my queries directly in Hive command line interface
            – Mohan
            Nov 23 at 14:01














          0












          0








          0






          The first thing is that I am not sure why you aren't seeing the expected results. I ran a very similar test, below, and received the expected results. What version of Hive are you on? I did the same tests on hive-1.1.0+cdh5.13.3+1307 (Cloudera distro w/v5.13):



          CREATE TABLE test_cond_parq
          (
          seq_num decimal(10,0),
          ind string
          )
          STORED AS PARQUET;

          INSERT INTO TABLE test_cond_parq
          SELECT 10457, 'I' UNION ALL
          SELECT 10458, 'I'
          ;


          Then I ran the following query, similar to yours above...



          SELECT *, LENGTH(ind) len_ind, LENGTH(TRIM(ind)) len_trim_ind
          FROM test_cond_parq
          WHERE seq_num <> 0 OR ind <> 'I';


          This returned the following results:



          10457 | I | 1 | 1
          10458 | I | 1 | 1





          share|improve this answer














          The first thing is that I am not sure why you aren't seeing the expected results. I ran a very similar test, below, and received the expected results. What version of Hive are you on? I did the same tests on hive-1.1.0+cdh5.13.3+1307 (Cloudera distro w/v5.13):



          CREATE TABLE test_cond_parq
          (
          seq_num decimal(10,0),
          ind string
          )
          STORED AS PARQUET;

          INSERT INTO TABLE test_cond_parq
          SELECT 10457, 'I' UNION ALL
          SELECT 10458, 'I'
          ;


          Then I ran the following query, similar to yours above...



          SELECT *, LENGTH(ind) len_ind, LENGTH(TRIM(ind)) len_trim_ind
          FROM test_cond_parq
          WHERE seq_num <> 0 OR ind <> 'I';


          This returned the following results:



          10457 | I | 1 | 1
          10458 | I | 1 | 1






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 24 at 5:18









          Ambrish

          2,71421631




          2,71421631










          answered Nov 22 at 23:01









          ajdams

          2,0121117




          2,0121117












          • I am using Hive 1.2.1000.2.6.5.0-292
            – Mohan
            Nov 23 at 5:39










          • Thank you for trying it out. I am surprised why it is not working in the version I am using.
            – Mohan
            Nov 23 at 5:39










          • NP. What SQL editor are you running your test in? I did it in HUE so it could perhaps be related to that?
            – ajdams
            Nov 23 at 13:10










          • I ran my queries directly in Hive command line interface
            – Mohan
            Nov 23 at 14:01


















          • I am using Hive 1.2.1000.2.6.5.0-292
            – Mohan
            Nov 23 at 5:39










          • Thank you for trying it out. I am surprised why it is not working in the version I am using.
            – Mohan
            Nov 23 at 5:39










          • NP. What SQL editor are you running your test in? I did it in HUE so it could perhaps be related to that?
            – ajdams
            Nov 23 at 13:10










          • I ran my queries directly in Hive command line interface
            – Mohan
            Nov 23 at 14:01
















          I am using Hive 1.2.1000.2.6.5.0-292
          – Mohan
          Nov 23 at 5:39




          I am using Hive 1.2.1000.2.6.5.0-292
          – Mohan
          Nov 23 at 5:39












          Thank you for trying it out. I am surprised why it is not working in the version I am using.
          – Mohan
          Nov 23 at 5:39




          Thank you for trying it out. I am surprised why it is not working in the version I am using.
          – Mohan
          Nov 23 at 5:39












          NP. What SQL editor are you running your test in? I did it in HUE so it could perhaps be related to that?
          – ajdams
          Nov 23 at 13:10




          NP. What SQL editor are you running your test in? I did it in HUE so it could perhaps be related to that?
          – ajdams
          Nov 23 at 13:10












          I ran my queries directly in Hive command line interface
          – Mohan
          Nov 23 at 14:01




          I ran my queries directly in Hive command line interface
          – Mohan
          Nov 23 at 14:01


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53436581%2fhive-parquet-format-or-clause-in-where-not-working-as-expected%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          What visual should I use to simply compare current year value vs last year in Power BI desktop

          How to ignore python UserWarning in pytest?

          Alexandru Averescu