|
Written by Akiba
|
|
Saturday, 25 October 2008 |
|
Well, I'm just finishing up the conversion to using dynamically allocated memory using the Contiki managed memory library. It took a bit longer than I expected because the memory handling is part of the stack framework and is used in many places. Changing the memory handling while keeping the stack behavior the same was a bit of a challenge. However it's pretty much finished and the long file transfers are still working.
In my previous post, I don't think I did a good job of explaining why dynamic memory allocation is important. So let me try and re-explain it in terms of how the stack was.
The initial implementation of the stack had a fixed number of elements for each table. An example would be like allocating a max number of 10 entries or the routing table and 5 entries for the indirect queue. A problem occurs if a particular application requires only 8 routing entries but 6 indirect queue entries. In that case, you ran out of memory.
Its very difficult to recover from situations where you run out of table/queue entries because the behavior may be unpredictable. This means that to be on the safe side, you would need to over-specify the max number of entries for each table/queue/list in order to have some margin of safety. This ends up wasting a lot of RAM. Another problem with this is that the stack uses something like twenty different tables to keep track of everything (yes, I know its a lot...please complain to the Zigbee Alliance). The amount of effort to tune the stack for each application would be tremendous and the inefficiency of the RAM usage is no good for something that is supposed to be in small, embedded applications.
If a standard memory allocation was used such as malloc, it would result in memory fragmentation. You can see an example of memory fragmentation in the drawing. Memory fragmentation is also an inefficient use of memory because even though the total amount of free memory may be available, you also need the memory to be adjacent so that you can fit the block into it. This becomes a problem in situations where many different sized blocks are being allocated and freed. With fragmentation, it becomes tougher to fit the large blocks due to the need for adjacent memory.
The Contiki memory management library overcomes this problem by having a feature called memory compacting. Every time a memory block is freed, all the used blocks that come after it will be moved down. This leaves all of the free memory in one ajacent heap so that you won't run into the fragmentation issues I mentioned earlier.
The memory management comes with a price though. It accomplishes this by using an added level of indirection. This means that when you allocate memory, you don't get a pointer to the actual memory. You get a pointer to a pointer to the memory. When you allocate managed memory, it will create a memory block structure which contains the size of the block and the pointer to the memory. It will then return a pointer to the managed memory pointer which points to the underlying memory. The reason for this is that when memory is compacted, the address of the actual memory changes, but the changes are reflected in the managed memory pointer. So as long as you use the managed memory pointer, you will always be pointing to the correct address for the allocated memory. So to use the managed memory, all of the tables needed to be modified to use the managed memory pointer instead of actual memory. This added code size to the stack, however decreasing the RAM utilization takes priority over having a smaller code size for me. Its easier and cheaper to trade up to a microcontroller with a bigger flash, but it sure is tough to find one with a bigger RAM.
Here's an example of how the table handling changed. Before: rte_entry = list_head(rte_tbl); // grab the route entry next_hop = rte_entry->next_hop; // get the next hop in the route entry After: mem_ptr = list_head(rte_tbl); // grab the memory pointer to the route entry rte_entry = mem_ptr->mmem_ptr.mem_block_ptr; // de-reference the memory pointer to get to the data next_hop = rte_entry->next_hop; // now we can use the entry  There is one dangerous point about using the managed memory and I included a diagram so that it can be more easily seen. To actually access table elements, you have to de-reference the managed memory pointer and access the direct memory. However if in between de-referencing the pointer and actually using the table data, a block of memory gets freed somewhere, the de-reference pointer is useless and dangerous because it is now a dangling pointer. Since Contiki is not pre-emptive, you won't have to worry about a stray thread causing this situation, however it can occur if you have a function in between de-referencing and using the pointer. If the function frees memory somewhere, then it will be dangerous. So its best to use the pointer as soon as possible after de-referencing it. Here's an example of an error case: mem_ptr = list_head(rte_tbl); // grab the first mem pointer from the route table rte_entry = mem_ptr->mmem_ptr.mem_block_ptr; // de-reference the mem pointer some_random_func(); // a memory block gets freed somewhere in here next_hop = rte_entry->next_hop; // this will be incorrect. the entry now points to junk. Updated 2008-10-26: Just learned from Adam Dunkels, the author of the managed memory library (as well as Contiki and uIP), that the memory should only be dereferenced as its used. And the MMEM_PTR macro should be used to access the memory blocks only. That way, the error case won't be encountered. So looks like its time to re-modify the code to handle this. It should lead to a more robust design without needing to "play with fire" or at least run the risk of dangling pointers. According to Adam, the problematic case should be rewritten like this: mem_ptr = list_head(rte_tbl); // grab the first mem pointer from the route table /* rte_entry = mem_ptr->mmem_ptr.mem_block_ptr; never de-reference and store the mem pointer like this */ some_random_func(); // a memory block gets freed somewhere in here next_hop = ((struct tbl *)MMEM_PTR(mem_ptr))->next_hop; /* this is the way to access the memory pointed to by mem_ptr - this works even if a memory block was freed in some_random_func(). */
End update. The example error case will now only show what can happen if you abuse the memory management system ;)
I've tried to keep the managed memory pointer handling away from the main parts of the code and isolated to the table files. That way, from a user point of view, you just call functions like rte_table_get_next_hop() and it will return the next hop data without worrying about de-referencing, dangling pointers, etc... That way, stack modification should be much easier for everyone. In most places, you only need to worry about handling the managed memory pointers if you are modifying the behavior of a table. Updated 2008-10-26: I should also mention that the managed memory library will only be used for table and list elements. Those are alloc'd and free'd on a relatively leisurely basis. For frame buffers, that I'm assuming will be alloc'd and free'd frequently, I will be implementing two memory pools of 64-byte and 140-byte sizes. This is because there is a performance impact when the managed memory is free'd and then compacted. I'm hoping this will provide a good balance between performance and ease of use. All in all, the mods will take the stack from tweaking approx 20 table sizes and the frame buffer pool to tweaking the shared heap memory and the amount of frame buffers in each pool. Well, that's all for now. Have a nice weekend! |
|
|
Written by Akiba
|
|
Saturday, 18 October 2008 |
|
Just released v0.52 of the stack. Someone found that the simulator is almost unusable on Linux. I had imagined that if it worked on Cygwin, it would work on Linux however I seemed to have been dreaming this utopia. The reality is that there are significant behavioral differences between Cygwin and Linux, so the stack and sim needed to be modified to work on both systems. Thanks to Jedrzej Solecki who pointed it out and also did a lot of work to get the simulator running on Linux. He had the dubious task of making me get off my ass and install VMWare and Ubuntu so that I could see for myself how bad it was on Linux. I was horrified. The simulator and stack is now tested and working on both Cygwin and Linux. Most of the fixes were contributed by Jedrzej and I was pretty much the Tonto to his Lone Ranger. Thanks a lot! The updated code can be found at Sourceforge: http://www.sourceforge.net/projects/freakz |
|
|
Written by Akiba
|
|
Tuesday, 14 October 2008 |
|
Whew...what a title. When I first started this project, I had no idea how important it would be to have a non-fragmenting dynamic memory allocator. In most protocol stack cases, you can get away with just having a statically allocated buffer for TX and RX frames. Apart from that, you would only have a few cases where you need to manage tables and queues. For a USB devices stack, you normally only need to allocate a couple packet buffers and some fixed arrays to hold the descriptor data. However in Zigbee, there seem like an infinite amount of tables required to do anything. Indirect queues, routing, neighbor, broadcast, discovery, scan tables, etc... I don't even want to get into how many there are. The problem with statically allocating elements for each table/queue/map/list is that its a horrible waste of memory. In some cases, you may over specify the amount of elements needed for something like...say...a neighbor table. Then you might underspecify the amount of elements for a routing table. The problem here is that the stack will require tuning of multiple (ie: a fucking lot of) parameters in order to run properly. Underspecification of any one parameter could throw the stack out of whack. It's not a very good situation. Because of the danger of underspecifying table elements, most people (like me) would overspecify the amount of elements needed, which usually causes a large amount of waste. If those elements are being used, they're just sitting there doing nothing. Its much more efficient to have a pool of memory that is shared by all the tables/queues/maps/lists so that the utilization can be improved. It also helps on tuning the stack because instead of specifying how much memory is required for each element, you only need to specify the size of the statically allocated shared memory block (the heap). However its not such an easy task. There is this nasty thing called fragmentation in dynamic memory allocators. It's basically the reason why I haven't implemented a dynamic memory allocator yet. Fragmentation occurs based on the algorithm that the allocation algorithm uses. Here's an easy example: You use memory pools and divide them up into 2 sizes: 32 bytes and 64 bytes. If you need to allocate memory for a 28-byte table element, then the first choice is to use a block from the 32-byte memory pool. If no blocks are available, then you would use a block from the 64-byte memory pool instead. Some fragmentation just occurred because you're using up 64 bytes to hold 28 bytes of memory. Say that this happens multiple times and you use up the 64 byte pool. Some time later, you have multiple 32 byte blocks open up, but you have a request for a 33-byte table element. Unfortunately, you're memory starved because you can't accomodate the request even though you have memory available.
Here's another example: You use a generic memory allocator that has a large heap and uses a buddy-system algorithm. The buddy system iteratively halves the size of an available memory block until it can accomodate the requested memory allocation. For example say you request 56 bytes of memory for a table element and the heap size is 1024 bytes. For the buddy system, you would separate the heap into two 512 byte blocks. Since the two blocks are still large, you would separate one of the blocks into two 256 byte blocks. Since thats still too big, you would continue on until you got a 64 byte block. That would just accomodate the requested size while being less than double the size of it. However if you request 65 bytes of memory, then you would require a 128 byte block to accomodate it. If this happens multiple times, you can use up your 1kB heap while utilizing less than half of it. In fact, even nastier things can happen. Say you request 65 bytes 8 times. That would use up your 1kB heap (128 bytes * 8 blocks = 1024 bytes). Then say that four of those blocks are freed, but they're not adjacent (ie: blocks 1,3,5,and 7 get freed). You then get a request for 140 bytes of memory (say a 802.15.4 frame with a full payload + control flags). You've just become memory starved because you can't accomodate the request. Your 1kB heap was chopped into 128 byte segments initially. After the memory was freed, the blocks weren't adjacent so they couldn't be coalesced into larger blocks. You just ran out of memory while using 260 bytes out of your 1024, or slightly more than 25% utilization. Ouch!
As you can see, although it's nice to have a dynamic memory allocator, the issue of fragmenting is one that needs to be addressed. It's the reason why malloc is banned from safety-critical and high-reliability applications. Any embedded sytem that does a lot of alloc's and free's would need to confront this issue because the memory performance will be based on the malloc algorithm being used. It's one of those little gotchas in protocol stack design, however it also affects other architectures such as graphics, databases, etc.... Well, the main point of this post is that I've been looking for a usable dynamic memory allocator or algorithm on the side while trying to move forward with the stack design. Now that the initial release is over with, I've had a chance to revisit my search. After much searching on the internet, checking out other open source stacks, and reading documentation and appnotes from various manufacturers, I still couldn't find a satisfying memory allocator. Most of them used fixed size memory pools which is cool, but a bit wasteful. Otherwise, they require statically allocating table elements which is what I have now. That's also not very interesting. I happened to run across the managed memory library file in Contiki again. I had previously tested it in my sandbox to see if it could be used for my table and linked list elements. The memory allocator in Contiki is seldom used in the actual OS, but it is very elegant. You can request any size memory you want from a fixed size heap. When you request it, a structure is used to point to the block in memory. The pointer to the structure is returned from the allocator. To get to the memory, you need to deref the pointer to the structure, and then the structure's pointer to the memory block. Or effectively, it adds a level of indirection to the memory allocation. The reason this is done is because when a memory block is freed, the whole memory is compacted so that all the used blocks are adjacent to each other. This way, there is no fragmentation. Whew...why would I even bother to explain that? Because when I first tested the memory manager, I rejected it. I allocated the memory and used it in my linked lists. However I stored the actual pointer to the memory in my linked lists instead of the pointer to the structure that points to the memory (ugh...it hurts to even type that). Basically, my linked lists and tables got trashed once I freed an element because the memory would get compacted, which means that it would change position. Since my linked lists use pointers to the next memory block, if that block changes position, then I'm just pointing to junk. Well, all I can say is..."operator error". I decided what the hell and decided to retry testing the memory allocator in Contiki. This time, I did a more thorough study of what the code was doing. The first time around, I kind of breezed through the comments, and there wasn't any documentation on this lib file (that I could find). This time, I studied the code thoroughly and wrote a couple of small, controlled test programs to see what the behavior was. Once I caught on to the concept (which was basically to provide an abstraction layer to the memory), I was able to get it working. That means that I can have fragmentation free dynamic memory allocation. Yay!!! The two obstacles I have left on this stack is security and the fragmentation free dynamic memory allocation. Now that I have one of them out of the way, I just need to complete the security portion of the stack. If I can finish that, the rest of the stack development is just filling out the Zigbee features and making sure I comply with the Zigbee compliance checklist. The next step is to implement the Contiki memory allocator in the stack, and then port the stack to hardware. The memory allocator alone will probably save 1-2 kB of RAM which is why its important to do it before I go into hardware. In that case, it will be possible to run the stack on something like 2 kB of RAM and still be able to implement a decent sized network. Ahhh...the trials and tribulations of stack development... |
|
|
Written by Akiba
|
|
Friday, 03 October 2008 |
|
The guys at Atmel came through for me and hooked me up with 2 sets of their AT86RF231 platforms. The timing is perfect because now that the release is finished, the next objective is to port the stack to hardware and get it working. Since Atmel is the only major 802.15.4 chip supplier to hook me up, looks like they'll be the first platform that the stack gets ported to. The setup is pretty sweet. Two of the base boards have USB connectivity and you can even load the Daintree Basic Edition sniffer (free download from Atmel) on to it. That's good because I've been using a Microchip Zena sniffer (circa 2005) and the software on it is getting a bit aged. Especially since the Zigbee spec went through two revisions since I've had it. The other two boards are the same boards that Atmel gives out for their AVR LCD evaluations and development. It has a 128x64 RGB graphical LCD and you can download the software for the graphics widgets from their site. Looks like I'm gonna be havin' some fun. The radios are the AT86RF231 boards which aren't available with the low-cost Raven kits. These boards have the AES security engine and the antenna diversity feature so its gonna be interesting to check them out. Especially for the security, that will be the next portion of the stack that I'll be working on. Thanks, Ingolf! Let's go clubbing next time you're in Tokyo. Here are some pictures of the dev kit that I'll be working on for now:
|
|
|
Written by Akiba
|
|
Tuesday, 30 September 2008 |
|
Wow. That had to be one of the most difficult experiences of my life. It felt like I was passing a kidney stone. Painful, anxious, but afterwards gratification and relief. After arguing with myself last month, I'm glad that I finally got up the balls to do a release. That was the first time I've done a full release by myself and I have to say that release managers have it pretty tough. At least for the crunch time before a release goes out.
Well, its probably best to reflect a bit on the stack development. First of all, I grossly underestimated the amount of work that a full Zigbee stack would take. When I first started, I had thought that I would be able to finish it after about six months. Unfortunately, that's not the case. I'm going on my seventh month working on it, and there are still a lot of things that need to be implemented. One of the things I didn't take into account was that the learning curve for Zigbee was much steeper than I had thought. Although I had some minor experience with the spec, a full blown stack development requires intimate knowledge of every detail. I found out that it takes a long time to be intimate with a 600 page spec. I have to admit that a good portion of my time was also taken up re-writing and re-organizing my code and architecture. As I learned more about each layer, I found that my initial architectural assumptions were incorrect. That forced me to re-write a lot of code and revise the architecture many times, especially on the NWK and application layers. I think that the code right now is on the right track in terms of architecture. Now that I've been through each layer and experimented with different architectures, I'm pretty sure that the one I have right now is probably close to the one that I'm going to stick with. |
|
Read more...
|
|
|
Written by Akiba
|
|
Tuesday, 30 September 2008 |
|
Wow. I was finally able to kick that release to the curb. I'll post more details later. I'm pooped...Here is the download info, docs, and some pics... The release can be found here: http://sourceforge.net/projects/freakz/ Here's an online version of the documentation: Docs Link Here are some teaser pics:
|
|
|
Written by Akiba
|
|
Monday, 29 September 2008 |
|
I'm cutting it down to the wire. I have most of the documentation finished and just went through a last linting and testing of the code. I just did some simple tests to make sure that I didn't break anything. I'll be posting the documentation tomorrow and also the code, unless I somehow completely forgot all my Subversion commands. I've recently been mesmerized by all the stuff happening in the US. I'm still a proud American and hope that the US pulls through the whole financial mess. Good Luck US!!! |
|
|
Written by Akiba
|
|
Saturday, 27 September 2008 |
|
Well, I'm almost finished with documenting the stack. Thank God for Doxygen. I can take care of commenting the code and creating documentation at the same time. Tomorrow will probably be the last chance that I'll have to write documentation for the v0.5 release. I'm going to need to write some prose about how to build the code, operate the simulator, and some general words on the architecture. After that, I need to do a final lint, testing, possibly some last minute cleanup, and then I can upload it on to the sourceforge server. I'm planning on doing it on the last day of this month since I'll probably be cutting it pretty close. Already most of Monday is out for me since I have a meeting I need to attend for work. The upload will happen on Tuesday which means that I'm going to need to bust some ass this Saturday and Sunday to finish things up. *sigh* I'll probably be going into a quiet period for a couple of days after this is over as I try and catch up on the rest of my life. Other than that, gonna hit up some wine and drink myself to sleep. It is Friday night ya know... |
|
|
Written by Akiba
|
|
Tuesday, 23 September 2008 |
|
The blog posts are getting a bit sparse as I'm trying to prep
for the release. I lost almost a full week accompanying people from
the main office of my part time job on customer sales visits. Talk
about torture...an impending software deadline and I can only sit
and listen to PPT presentations. I have some colorful words I want
to say about sales calls, PPT presentations, mandatory face-to-face
meetings, and old fashioned companies, but right now, I'm
exhausted from traveling all around Tokyo hawking chips.
Regarding the release status, I've already started on the
documentation, but it will be a bit lean when the code is released.
I might have to beef it up later, since my drop dead release date
is the end of this month. On that day, the code goes out no matter
what condition it's in. I'll try to post another update before
the release... |
|
|
Written by Akiba
|
|
Monday, 15 September 2008 |
|
Yaa-Taa is a Japanese expression for excitement. You hear it a lot in anime, which is probably where I picked it up. Anyways, I just finished multiple 10kB file transfers over four Zigbee router hops in my simulator and the remote files were identical to the originals. It looks like my data transfers are pretty solid now. One step closer to getting this bad boy out... Now since its a Sunday, I'm going to have a glass or three of wine and watch some anime. Cheers... |
|
|