What cost does bloated object file carry?
up vote
-1
down vote
favorite
While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.
Let me illustrate it with an example.
Original hpp/cpp files can be approximated like this:
original.hpp:
void example(bool arg1, bool arg2, const char* data);
original.cpp:
#include "ex1.hpp"
#include <iostream>
void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}
Let's assume, that every single time the function is called, arg1
and arg2
are known during compilation. Argument data
isn't, and for variety of reasons its processing cannot be put in header file.
However, all those if
statements can be handled by the compiler with a little bit of template magic:
magic.hpp:
template<bool arg1, bool arg2>
void example(const char* data);
magic.cpp:
#include "ex1.hpp"
#include <iostream>
template<bool arg1, bool arg2>
struct Processor;
template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};
template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};
template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};
template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};
template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}
template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);
As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!
Now, in my real-life case things are a bit more complex, because instead of two bool
arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);
Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.
All this in the name of removing several if
and one switch
statements.
My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?
c++ templates optimization embedded
add a comment |
up vote
-1
down vote
favorite
While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.
Let me illustrate it with an example.
Original hpp/cpp files can be approximated like this:
original.hpp:
void example(bool arg1, bool arg2, const char* data);
original.cpp:
#include "ex1.hpp"
#include <iostream>
void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}
Let's assume, that every single time the function is called, arg1
and arg2
are known during compilation. Argument data
isn't, and for variety of reasons its processing cannot be put in header file.
However, all those if
statements can be handled by the compiler with a little bit of template magic:
magic.hpp:
template<bool arg1, bool arg2>
void example(const char* data);
magic.cpp:
#include "ex1.hpp"
#include <iostream>
template<bool arg1, bool arg2>
struct Processor;
template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};
template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};
template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};
template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};
template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}
template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);
As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!
Now, in my real-life case things are a bit more complex, because instead of two bool
arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);
Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.
All this in the name of removing several if
and one switch
statements.
My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?
c++ templates optimization embedded
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Nov 21 at 2:13
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.
Let me illustrate it with an example.
Original hpp/cpp files can be approximated like this:
original.hpp:
void example(bool arg1, bool arg2, const char* data);
original.cpp:
#include "ex1.hpp"
#include <iostream>
void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}
Let's assume, that every single time the function is called, arg1
and arg2
are known during compilation. Argument data
isn't, and for variety of reasons its processing cannot be put in header file.
However, all those if
statements can be handled by the compiler with a little bit of template magic:
magic.hpp:
template<bool arg1, bool arg2>
void example(const char* data);
magic.cpp:
#include "ex1.hpp"
#include <iostream>
template<bool arg1, bool arg2>
struct Processor;
template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};
template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};
template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};
template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};
template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}
template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);
As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!
Now, in my real-life case things are a bit more complex, because instead of two bool
arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);
Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.
All this in the name of removing several if
and one switch
statements.
My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?
c++ templates optimization embedded
While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.
Let me illustrate it with an example.
Original hpp/cpp files can be approximated like this:
original.hpp:
void example(bool arg1, bool arg2, const char* data);
original.cpp:
#include "ex1.hpp"
#include <iostream>
void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}
Let's assume, that every single time the function is called, arg1
and arg2
are known during compilation. Argument data
isn't, and for variety of reasons its processing cannot be put in header file.
However, all those if
statements can be handled by the compiler with a little bit of template magic:
magic.hpp:
template<bool arg1, bool arg2>
void example(const char* data);
magic.cpp:
#include "ex1.hpp"
#include <iostream>
template<bool arg1, bool arg2>
struct Processor;
template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};
template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};
template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};
template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};
template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}
template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);
As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!
Now, in my real-life case things are a bit more complex, because instead of two bool
arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);
Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.
All this in the name of removing several if
and one switch
statements.
My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?
c++ templates optimization embedded
c++ templates optimization embedded
edited Nov 22 at 14:00
asked Nov 20 at 21:41
Darth Hunterix
1,21232229
1,21232229
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Nov 21 at 2:13
add a comment |
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Nov 21 at 2:13
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Nov 21 at 2:13
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Nov 21 at 2:13
add a comment |
1 Answer
1
active
oldest
votes
up vote
4
down vote
accepted
The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.
This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.
1
What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
– Lundin
Nov 21 at 11:58
I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
– Darth Hunterix
Nov 21 at 21:55
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.
This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.
1
What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
– Lundin
Nov 21 at 11:58
I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
– Darth Hunterix
Nov 21 at 21:55
add a comment |
up vote
4
down vote
accepted
The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.
This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.
1
What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
– Lundin
Nov 21 at 11:58
I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
– Darth Hunterix
Nov 21 at 21:55
add a comment |
up vote
4
down vote
accepted
up vote
4
down vote
accepted
The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.
This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.
The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.
This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.
edited Nov 20 at 22:02
answered Nov 20 at 21:57
xaxxon
14.3k43059
14.3k43059
1
What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
– Lundin
Nov 21 at 11:58
I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
– Darth Hunterix
Nov 21 at 21:55
add a comment |
1
What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
– Lundin
Nov 21 at 11:58
I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
– Darth Hunterix
Nov 21 at 21:55
1
1
What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
– Lundin
Nov 21 at 11:58
What makes you think the OP is using a system with data or instruction cache? He says he's using a microcontroller. Which can be anything from an antique 8051 to Cortex A or PowerPC. I don't see how this answers the question.
– Lundin
Nov 21 at 11:58
I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
– Darth Hunterix
Nov 21 at 21:55
I can now confirm that there is a bit of cache, small as it might be, so it may be a problem. The answer is also useful for other people with similar question. In the end, the concept was abandoned.
– Darth Hunterix
Nov 21 at 21:55
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402004%2fwhat-cost-does-bloated-object-file-carry%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew♦
Nov 21 at 2:13