Storing multi-byte data in BLOB for single byte oracle deployments
up vote
0
down vote
favorite
Based on the our client requirements we configure our oracle (version - 12c) deployments to support single or multi-byte data (through character set setting). There is a need to cache third party multi-byte data(json) for performance reasons. We found that we could encode the data in UTF-8 and persist it (after converting it to bytes) in a BLOB column of an Oracle table. This is a hack that allows us to store multi-byte data in single byte deployments. There are certain limitations that come with this approach like
- The data cannot be queried or updated through SQL code (stored procedures).
- Search operation using for e.g. LIKE operators could not be performed.
- Marshaling and unmarshaling overhead for every operation at the application layer (java)
Assuming we compromise with these limitations are there any other drawbacks that we should be aware off?
Thanks.
oracle
|
show 1 more comment
up vote
0
down vote
favorite
Based on the our client requirements we configure our oracle (version - 12c) deployments to support single or multi-byte data (through character set setting). There is a need to cache third party multi-byte data(json) for performance reasons. We found that we could encode the data in UTF-8 and persist it (after converting it to bytes) in a BLOB column of an Oracle table. This is a hack that allows us to store multi-byte data in single byte deployments. There are certain limitations that come with this approach like
- The data cannot be queried or updated through SQL code (stored procedures).
- Search operation using for e.g. LIKE operators could not be performed.
- Marshaling and unmarshaling overhead for every operation at the application layer (java)
Assuming we compromise with these limitations are there any other drawbacks that we should be aware off?
Thanks.
oracle
2
1) NVARCHAR, NCLOB and NCHAR columns should be able to store multi byte data even on single-byte installations. Can't you just declare all columns that are expected to contain multi-byte data as Nxxxx columns?
– Carlo Sirna
Nov 22 at 6:17
1
2) if the requirement comes only for JSON columns: it is entirely possible to store json data as pure ASCII data by escaping all characters whose code is greater than 127. For example the Json string '{"UnicodeCharsTest":"niu00f1o"}' represents the very same object of this other one: '{"UnicodeCharsTest" : "niño"}'. you could re-encode all json strings this way and you could store them. And Oracle 12 has both the JSON_VALUE function and the "field is json" constraint that allows you to correctly query values stored in json objects (you don't have to decode yourself escape sequences)
– Carlo Sirna
Nov 22 at 6:25
Nowadays the default isNLS_CHARACTERSET=AL32UTF8
, i.e. UTF-8. Of course UTF-8 supports also single byte characters. Why do you like to use a single-byte characters set still in 2018?
– Wernfried Domscheit
Nov 22 at 9:55
1
@AndyDufresne NCLOB/NCHAR/NVARCHAR has always been the official type to use for multi-byte character strings. It has always worked this way. Whay I am suggesting is not a hack.
– Carlo Sirna
Nov 22 at 17:47
1
@AndyDufresne: let me elaborate: any installation of oracle supports TWO character sets: the normal character set (which in old versions of oracle defaulted to the single byte character set that matched the language using the installation process... and this is used for all normal varchar, char and clob fields... and also for table names, column names, etc...) and the "national" character set used for storing string with weird characters (Nxxx columns). personally I have never found an oracle installation where the character set used for these other columns isn't a UNICODE charset.
– Carlo Sirna
Nov 22 at 17:58
|
show 1 more comment
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Based on the our client requirements we configure our oracle (version - 12c) deployments to support single or multi-byte data (through character set setting). There is a need to cache third party multi-byte data(json) for performance reasons. We found that we could encode the data in UTF-8 and persist it (after converting it to bytes) in a BLOB column of an Oracle table. This is a hack that allows us to store multi-byte data in single byte deployments. There are certain limitations that come with this approach like
- The data cannot be queried or updated through SQL code (stored procedures).
- Search operation using for e.g. LIKE operators could not be performed.
- Marshaling and unmarshaling overhead for every operation at the application layer (java)
Assuming we compromise with these limitations are there any other drawbacks that we should be aware off?
Thanks.
oracle
Based on the our client requirements we configure our oracle (version - 12c) deployments to support single or multi-byte data (through character set setting). There is a need to cache third party multi-byte data(json) for performance reasons. We found that we could encode the data in UTF-8 and persist it (after converting it to bytes) in a BLOB column of an Oracle table. This is a hack that allows us to store multi-byte data in single byte deployments. There are certain limitations that come with this approach like
- The data cannot be queried or updated through SQL code (stored procedures).
- Search operation using for e.g. LIKE operators could not be performed.
- Marshaling and unmarshaling overhead for every operation at the application layer (java)
Assuming we compromise with these limitations are there any other drawbacks that we should be aware off?
Thanks.
oracle
oracle
asked Nov 22 at 3:35
Andy Dufresne
3,51362964
3,51362964
2
1) NVARCHAR, NCLOB and NCHAR columns should be able to store multi byte data even on single-byte installations. Can't you just declare all columns that are expected to contain multi-byte data as Nxxxx columns?
– Carlo Sirna
Nov 22 at 6:17
1
2) if the requirement comes only for JSON columns: it is entirely possible to store json data as pure ASCII data by escaping all characters whose code is greater than 127. For example the Json string '{"UnicodeCharsTest":"niu00f1o"}' represents the very same object of this other one: '{"UnicodeCharsTest" : "niño"}'. you could re-encode all json strings this way and you could store them. And Oracle 12 has both the JSON_VALUE function and the "field is json" constraint that allows you to correctly query values stored in json objects (you don't have to decode yourself escape sequences)
– Carlo Sirna
Nov 22 at 6:25
Nowadays the default isNLS_CHARACTERSET=AL32UTF8
, i.e. UTF-8. Of course UTF-8 supports also single byte characters. Why do you like to use a single-byte characters set still in 2018?
– Wernfried Domscheit
Nov 22 at 9:55
1
@AndyDufresne NCLOB/NCHAR/NVARCHAR has always been the official type to use for multi-byte character strings. It has always worked this way. Whay I am suggesting is not a hack.
– Carlo Sirna
Nov 22 at 17:47
1
@AndyDufresne: let me elaborate: any installation of oracle supports TWO character sets: the normal character set (which in old versions of oracle defaulted to the single byte character set that matched the language using the installation process... and this is used for all normal varchar, char and clob fields... and also for table names, column names, etc...) and the "national" character set used for storing string with weird characters (Nxxx columns). personally I have never found an oracle installation where the character set used for these other columns isn't a UNICODE charset.
– Carlo Sirna
Nov 22 at 17:58
|
show 1 more comment
2
1) NVARCHAR, NCLOB and NCHAR columns should be able to store multi byte data even on single-byte installations. Can't you just declare all columns that are expected to contain multi-byte data as Nxxxx columns?
– Carlo Sirna
Nov 22 at 6:17
1
2) if the requirement comes only for JSON columns: it is entirely possible to store json data as pure ASCII data by escaping all characters whose code is greater than 127. For example the Json string '{"UnicodeCharsTest":"niu00f1o"}' represents the very same object of this other one: '{"UnicodeCharsTest" : "niño"}'. you could re-encode all json strings this way and you could store them. And Oracle 12 has both the JSON_VALUE function and the "field is json" constraint that allows you to correctly query values stored in json objects (you don't have to decode yourself escape sequences)
– Carlo Sirna
Nov 22 at 6:25
Nowadays the default isNLS_CHARACTERSET=AL32UTF8
, i.e. UTF-8. Of course UTF-8 supports also single byte characters. Why do you like to use a single-byte characters set still in 2018?
– Wernfried Domscheit
Nov 22 at 9:55
1
@AndyDufresne NCLOB/NCHAR/NVARCHAR has always been the official type to use for multi-byte character strings. It has always worked this way. Whay I am suggesting is not a hack.
– Carlo Sirna
Nov 22 at 17:47
1
@AndyDufresne: let me elaborate: any installation of oracle supports TWO character sets: the normal character set (which in old versions of oracle defaulted to the single byte character set that matched the language using the installation process... and this is used for all normal varchar, char and clob fields... and also for table names, column names, etc...) and the "national" character set used for storing string with weird characters (Nxxx columns). personally I have never found an oracle installation where the character set used for these other columns isn't a UNICODE charset.
– Carlo Sirna
Nov 22 at 17:58
2
2
1) NVARCHAR, NCLOB and NCHAR columns should be able to store multi byte data even on single-byte installations. Can't you just declare all columns that are expected to contain multi-byte data as Nxxxx columns?
– Carlo Sirna
Nov 22 at 6:17
1) NVARCHAR, NCLOB and NCHAR columns should be able to store multi byte data even on single-byte installations. Can't you just declare all columns that are expected to contain multi-byte data as Nxxxx columns?
– Carlo Sirna
Nov 22 at 6:17
1
1
2) if the requirement comes only for JSON columns: it is entirely possible to store json data as pure ASCII data by escaping all characters whose code is greater than 127. For example the Json string '{"UnicodeCharsTest":"niu00f1o"}' represents the very same object of this other one: '{"UnicodeCharsTest" : "niño"}'. you could re-encode all json strings this way and you could store them. And Oracle 12 has both the JSON_VALUE function and the "field is json" constraint that allows you to correctly query values stored in json objects (you don't have to decode yourself escape sequences)
– Carlo Sirna
Nov 22 at 6:25
2) if the requirement comes only for JSON columns: it is entirely possible to store json data as pure ASCII data by escaping all characters whose code is greater than 127. For example the Json string '{"UnicodeCharsTest":"niu00f1o"}' represents the very same object of this other one: '{"UnicodeCharsTest" : "niño"}'. you could re-encode all json strings this way and you could store them. And Oracle 12 has both the JSON_VALUE function and the "field is json" constraint that allows you to correctly query values stored in json objects (you don't have to decode yourself escape sequences)
– Carlo Sirna
Nov 22 at 6:25
Nowadays the default is
NLS_CHARACTERSET=AL32UTF8
, i.e. UTF-8. Of course UTF-8 supports also single byte characters. Why do you like to use a single-byte characters set still in 2018?– Wernfried Domscheit
Nov 22 at 9:55
Nowadays the default is
NLS_CHARACTERSET=AL32UTF8
, i.e. UTF-8. Of course UTF-8 supports also single byte characters. Why do you like to use a single-byte characters set still in 2018?– Wernfried Domscheit
Nov 22 at 9:55
1
1
@AndyDufresne NCLOB/NCHAR/NVARCHAR has always been the official type to use for multi-byte character strings. It has always worked this way. Whay I am suggesting is not a hack.
– Carlo Sirna
Nov 22 at 17:47
@AndyDufresne NCLOB/NCHAR/NVARCHAR has always been the official type to use for multi-byte character strings. It has always worked this way. Whay I am suggesting is not a hack.
– Carlo Sirna
Nov 22 at 17:47
1
1
@AndyDufresne: let me elaborate: any installation of oracle supports TWO character sets: the normal character set (which in old versions of oracle defaulted to the single byte character set that matched the language using the installation process... and this is used for all normal varchar, char and clob fields... and also for table names, column names, etc...) and the "national" character set used for storing string with weird characters (Nxxx columns). personally I have never found an oracle installation where the character set used for these other columns isn't a UNICODE charset.
– Carlo Sirna
Nov 22 at 17:58
@AndyDufresne: let me elaborate: any installation of oracle supports TWO character sets: the normal character set (which in old versions of oracle defaulted to the single byte character set that matched the language using the installation process... and this is used for all normal varchar, char and clob fields... and also for table names, column names, etc...) and the "national" character set used for storing string with weird characters (Nxxx columns). personally I have never found an oracle installation where the character set used for these other columns isn't a UNICODE charset.
– Carlo Sirna
Nov 22 at 17:58
|
show 1 more comment
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423514%2fstoring-multi-byte-data-in-blob-for-single-byte-oracle-deployments%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
1) NVARCHAR, NCLOB and NCHAR columns should be able to store multi byte data even on single-byte installations. Can't you just declare all columns that are expected to contain multi-byte data as Nxxxx columns?
– Carlo Sirna
Nov 22 at 6:17
1
2) if the requirement comes only for JSON columns: it is entirely possible to store json data as pure ASCII data by escaping all characters whose code is greater than 127. For example the Json string '{"UnicodeCharsTest":"niu00f1o"}' represents the very same object of this other one: '{"UnicodeCharsTest" : "niño"}'. you could re-encode all json strings this way and you could store them. And Oracle 12 has both the JSON_VALUE function and the "field is json" constraint that allows you to correctly query values stored in json objects (you don't have to decode yourself escape sequences)
– Carlo Sirna
Nov 22 at 6:25
Nowadays the default is
NLS_CHARACTERSET=AL32UTF8
, i.e. UTF-8. Of course UTF-8 supports also single byte characters. Why do you like to use a single-byte characters set still in 2018?– Wernfried Domscheit
Nov 22 at 9:55
1
@AndyDufresne NCLOB/NCHAR/NVARCHAR has always been the official type to use for multi-byte character strings. It has always worked this way. Whay I am suggesting is not a hack.
– Carlo Sirna
Nov 22 at 17:47
1
@AndyDufresne: let me elaborate: any installation of oracle supports TWO character sets: the normal character set (which in old versions of oracle defaulted to the single byte character set that matched the language using the installation process... and this is used for all normal varchar, char and clob fields... and also for table names, column names, etc...) and the "national" character set used for storing string with weird characters (Nxxx columns). personally I have never found an oracle installation where the character set used for these other columns isn't a UNICODE charset.
– Carlo Sirna
Nov 22 at 17:58