What cost does bloated object file carry?











up vote
-1
down vote

favorite












While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.



Let me illustrate it with an example.



Original hpp/cpp files can be approximated like this:



original.hpp:




void example(bool arg1, bool arg2, const char* data);


original.cpp:



#include "ex1.hpp"
#include <iostream>

void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}


Let's assume, that every single time the function is called, arg1 and arg2 are known during compilation. Argument data isn't, and for variety of reasons its processing cannot be put in header file.



However, all those if statements can be handled by the compiler with a little bit of template magic:



magic.hpp:



template<bool arg1, bool arg2>
void example(const char* data);


magic.cpp:



#include "ex1.hpp"    
#include <iostream>

template<bool arg1, bool arg2>
struct Processor;

template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};

template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};

template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};

template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};

template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}

template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);


As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!



Now, in my real-life case things are a bit more complex, because instead of two bool arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);



Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.



All this in the name of removing several if and one switch statements.



My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?










share|improve this question
























  • Comments are not for extended discussion; this conversation has been moved to chat.
    – Samuel Liew
    Nov 21 at 2:13















up vote
-1
down vote

favorite












While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.



Let me illustrate it with an example.



Original hpp/cpp files can be approximated like this:



original.hpp:




void example(bool arg1, bool arg2, const char* data);


original.cpp:



#include "ex1.hpp"
#include <iostream>

void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}


Let's assume, that every single time the function is called, arg1 and arg2 are known during compilation. Argument data isn't, and for variety of reasons its processing cannot be put in header file.



However, all those if statements can be handled by the compiler with a little bit of template magic:



magic.hpp:



template<bool arg1, bool arg2>
void example(const char* data);


magic.cpp:



#include "ex1.hpp"    
#include <iostream>

template<bool arg1, bool arg2>
struct Processor;

template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};

template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};

template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};

template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};

template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}

template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);


As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!



Now, in my real-life case things are a bit more complex, because instead of two bool arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);



Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.



All this in the name of removing several if and one switch statements.



My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?










share|improve this question
























  • Comments are not for extended discussion; this conversation has been moved to chat.
    – Samuel Liew
    Nov 21 at 2:13













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.



Let me illustrate it with an example.



Original hpp/cpp files can be approximated like this:



original.hpp:




void example(bool arg1, bool arg2, const char* data);


original.cpp:



#include "ex1.hpp"
#include <iostream>

void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}


Let's assume, that every single time the function is called, arg1 and arg2 are known during compilation. Argument data isn't, and for variety of reasons its processing cannot be put in header file.



However, all those if statements can be handled by the compiler with a little bit of template magic:



magic.hpp:



template<bool arg1, bool arg2>
void example(const char* data);


magic.cpp:



#include "ex1.hpp"    
#include <iostream>

template<bool arg1, bool arg2>
struct Processor;

template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};

template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};

template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};

template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};

template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}

template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);


As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!



Now, in my real-life case things are a bit more complex, because instead of two bool arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);



Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.



All this in the name of removing several if and one switch statements.



My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?










share|improve this question















While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.



Let me illustrate it with an example.



Original hpp/cpp files can be approximated like this:



original.hpp:




void example(bool arg1, bool arg2, const char* data);


original.cpp:



#include "ex1.hpp"
#include <iostream>

void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}


Let's assume, that every single time the function is called, arg1 and arg2 are known during compilation. Argument data isn't, and for variety of reasons its processing cannot be put in header file.



However, all those if statements can be handled by the compiler with a little bit of template magic:



magic.hpp:



template<bool arg1, bool arg2>
void example(const char* data);


magic.cpp:



#include "ex1.hpp"    
#include <iostream>

template<bool arg1, bool arg2>
struct Processor;

template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};

template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};

template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};

template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};

template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}

template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);


As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!



Now, in my real-life case things are a bit more complex, because instead of two bool arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);



Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.



All this in the name of removing several if and one switch statements.



My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?







c++ templates optimization embedded






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 at 14:00

























asked Nov 20 at 21:41









Darth Hunterix

1,21232229




1,21232229












  • Comments are not for extended discussion; this conversation has been moved to chat.
    – Samuel Liew
    Nov 21 at 2:13


















  • Comments are not for extended discussion; this conversation has been moved to chat.
    – Samuel Liew
    Nov 21 at 2:13
















Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew
Nov 21 at 2:13




Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew
Nov 21 at 2:13












1 Answer
1






active

oldest

votes

















up vote
4
down vote



accepted










The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.






share|improve this answer



















  • 1




    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
    – Lundin
    Nov 21 at 11:58










  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
    – Darth Hunterix
    Nov 21 at 21:55











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402004%2fwhat-cost-does-bloated-object-file-carry%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
4
down vote



accepted










The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.






share|improve this answer



















  • 1




    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
    – Lundin
    Nov 21 at 11:58










  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
    – Darth Hunterix
    Nov 21 at 21:55















up vote
4
down vote



accepted










The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.






share|improve this answer



















  • 1




    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
    – Lundin
    Nov 21 at 11:58










  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
    – Darth Hunterix
    Nov 21 at 21:55













up vote
4
down vote



accepted







up vote
4
down vote



accepted






The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.






share|improve this answer














The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.



This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 at 22:02

























answered Nov 20 at 21:57









xaxxon

14.3k43059




14.3k43059








  • 1




    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
    – Lundin
    Nov 21 at 11:58










  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
    – Darth Hunterix
    Nov 21 at 21:55














  • 1




    What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
    – Lundin
    Nov 21 at 11:58










  • I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
    – Darth Hunterix
    Nov 21 at 21:55








1




1




What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
– Lundin
Nov 21 at 11:58




What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
– Lundin
Nov 21 at 11:58












I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
– Darth Hunterix
Nov 21 at 21:55




I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
– Darth Hunterix
Nov 21 at 21:55


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402004%2fwhat-cost-does-bloated-object-file-carry%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

What visual should I use to simply compare current year value vs last year in Power BI desktop

How to ignore python UserWarning in pytest?

Alexandru Averescu