|
Written by Akiba
|
|
Wednesday, 06 August 2008 |
|
Looks like I made a mistaken assumption in taking the Zigbee application layer lightly. The application layer consists of the application framework (AF), the Zigbee Device Object (ZDO), and the Application SubLayer (APS). I had originally assumed I could crank out the app layer quickly by just implementing a couple of descriptors and some client/server functionality. However its significantly more complicated than that. The Zigbee spec is kind of a mess at the application layer. The information is dispersed over roughly 250 pages and the partitions between each area of the app layer is kind of blurred. Its taking a significant amount of imagination to try and understand how the spec implementors envisioned the application layer. It takes a couple of passes through the app portion of the spec just to understand what they're talking about and how things fit together. I started with a simple application framework (AF). Its just one file which handles registering each individual endpoint and the endpoint descriptor (simple descriptor in Zigbee terminology). So if you had a home automation profile, you would just register the descriptor and endpoint number with the AF. After that, all incoming messages that specify that endpoint will go there. Not too difficult. The confusing part comes in implementing the ZDO. The ZDO basically manages the whole device and thus it is the main controller for network formation, discovery, joining, etc... It also has a bunch of descriptors and configuration info that needs to be set on startup. These are all mandatory for protocol compliance. Some examples are the node descriptor, power descriptor, active endpoint list, simple descriptors, device polling rate, etc... It also functions as a client and server so certain mandatory functions need to be implemented on both the client and server sides. I had expected this and was able to crank out most of them without too much trouble. Nay, the main complexity of the application layer is that it ties the whole stack together. It's like I've been developing individual parts of a car, and now I have to assemble the complete car. It's a bit nerve wracking, since you find that some pieces don't fit well together and need modification. I'm also having problems imagining what the interface into the stack should look like from the user application. I think the scariest thing is that after the application layer is finished, the stack is supposed to work (sans security). It's like the Shawshank Redemption where you've been in prison so long that you're kind of scared to leave. I'm feeling a bit of a mental block in finishing the remaining portions of the stack because of this. Anyways, I'm going to try and put a hard freeze on the code soon, depending on the progress moving through the app layer. That way, I can limit the amount of small tweaks that I'm tempted to make and force myself to set a release point. After the code freeze, I gotta figure out a way to put together an automated testbench to test things. The old one doesn't work now that the simulator is multi-process and there are so many variables and inter-relations in Zigbee that its easy to break something by fixing something else. Well, I just thought I'd share some of my insecurities with you. |
|
|
Written by Akiba
|
|
Wednesday, 23 July 2008 |
|
Now that I'm back in the US, I'm trying to re-establish my rhythm again. I think it's important to have a rhythm for any type of large project undertaking. My sister once told me that Haruki Murakami, a famous Japanese writer , writes continuously every day for four hours in the morning. Once he finishes his writing, he doesn't need to think about it any more for the rest of the day. That four hours might not seem like a lot, but built up over a long period, it's what enables him to finish his usual 500+ page books without burning himself out. I believe it's exactly the same for coding, dancing, or even just working out in the gym. I kind of pattern my work schedule after what my sister told me, although the coding time often wavers above and below the four hour mark. However if you think about four hours of unbroken coding time, it's something that is nearly impossible in a typical office environment. My experience at companies is that there are so many distractions inside a typical office workplace that even getting two hours of pure coding time is very challenging. When I was working in the cubicle farms in the US, I found that most of my time was spent just trying to seem busy, while I wasn't getting a lot of real work done. Glad I'm out of that life now. Anyways, now that I'm back in Tokyo, I can settle back into my own schedule again. Being that its 4 am right now, I figure that I haven't fully recovered from my jetlag. It gives me a good opportunity to restart writing in my blog again and research the part of the stack that I'm currently working on. As I mentioned before, I'm moving up to the Application Layer and the Zigbee Device Object (ZDO) implementation. The Application Layer of the Zigbee spec consists of three main parts: - The Application Sub-Layer (APS): This is the workhorse that provides the data Tx services and routes the incoming data to the correct endpoint in the application layer. All endpoints (profiles) need to use the APS to communicate with other devices.
- The Application Framework (AF): This is kind of a nebulous portion of the application layer that isn't really well defined. The framework needs to define the endpoints and provide a means for each endpoint to register itself with the framework. Then, when a remote node does a service discovery on the local node, the application framework helps assemble the list of active endpoints, descriptors, and profiles.
- The Zigbee Device object (ZDO): This is a required endpoint on all nodes and always resides in endpoint 0. The ZDO implements the Zigbee Device Profile and has the following responsibilities:
- Device and Service Discovery
- Security Management
- Network Management
- Remote Node Management (usually for provisioning)
- Binding Management
- Group Management
Once the application layer is implemented, then I should theoretically have a working Zigbee stack. However the fun doesn't stop there. In order to implement the Zigbee standard profiles, you also need to implement the Zigbee Cluster Library (ZCL) as well as the actual profile implementation. If you have a good application framework, then I'm hoping that the profile implementation will be easy. In any case, I'm currently working on the ZDO and once that's finished, gonna move to the Application Framework. Since the APS is done, that should give me a full stack. At that point, I can start cleaning up the code and prepping it for an initial release. As I mentioned before, the initial release will probably start at v0.5. A Full 1.0 release will need to be (relatively) stable and include security handling and an over-the-air bootloader. Anyways, that's the plan for now... |
|
|
Written by Akiba
|
|
Saturday, 12 July 2008 |
|
I'm still in California and enjoying a nice little break from development. I got to visit my brand new niece and eat some good ol' fashioned American food like tacos. I have been working on the stack here and there and was stuck at the binding and group tables implementation. Finally, I decided that I'm going to forego implementing the binding and group tables in the Application Sub-Layer (APS) and go straight to the Application Layer to implement the Zigbee Device Object. The binding tables got changed from the 2006 spec to the 2007 spec so there is no longer coordinator binding. All binding is done on the source node, which is the node that is originating the frame. However there are still some discrepancies in the spec since they refer to certain binding fields that don't exist anymore. I also can't really see the actual usage scenario in my head so it's a bit hard to visualize the data flow. The group tables for multicasting at the APS layer finally got unified with the group tables for multicasting at the NWK layer. Yes, they previously had redundant group tables and different methods of multicasting. APS multicasting would send an individual frame to all members of the group, while the NWK layer multicast would send a broadcast frame and only members with the correct group ID would decode it. Anyways, it was one of the quirks of the Zigbee spec before. I decided a while back when I was working on the NWK layer to hold off on multicasting. That's because I'd prefer to get some actual feedback on the stack from users so releasing early would be more preferable than completeness, at least initially. Once I get the feedback, I can use it to tweak things and also implement features like multicasting, etc... So in that case, I decided to move on up to the ZDO which I believe is much more important. The Zigbee Device Object plays a large role in the actual device, initializing everything, interfacing to the NWK and APS layers, doing device discovery, and basically managing everything. When two devices communicate, it's the ZDO that initially sets up the connection between them. So in that sense, I believe that the ZDO is much more important. Finishing it will bring me much closer to being able to release the stack. The initial release might not include binding and grouping, but I figure that it should be okay. They can be implemented at a later time when I can actually see the use cases, instead of trying to guess at how they will be used by others. Until then, gonna enjoy some nice Bay Area salami (Molinari) and check out all the weirdos in Berkeley. Unfortunately, I seem to fit in quite well here... |
|
|
Written by Akiba
|
|
Sunday, 06 July 2008 |
|
Ahhh...the California sun. It's been about six months since the last time I was back and it's wonderful. I arrived last night on July 4th and ended up going to an Independence Day concert with the local symphony. I think I'm only able to appreciate Independence Day in the US now that I'm living outside the country. When I was living here, I always kind of took it for granted and was kind of apathetic about everything. Well, I didn't get much done before I came here. I was actually experimenting with trying to implement dynamic ram allocation inside the stack. Normally, it's a bit dangerous to use malloc and free on an embedded system on data structures that you use a lot due to the eventual fragmentation of the memory. However statically allocating all of the structures required by Zigbee is grossly inefficient because there are so many tables that are used and currently, I need to define the max table entries for each one. It would be much better to dynamically allocate the table entry from a pre-specified memory heap that can be shared among all the tables. That way, you can still control the amount of memory that you are using and you would have a much better utilization of that memory. I discovered the managed memory code library in the Contiki OS (mmem.c/h) and thought hmmm...maybe I can use this to do dynamic allocation. So I spent a day writing a dummy program and test routines that emulate my stack to see how the managed library would work. The managed memory library basically allocates memory from a pre-specified memory array, and after the memory is freed, will compact the whole memory downwards so that the freed memory won't leave a gap (fragment) in the memory array. In this way, it prevents memory fragmentation which is the main issue with malloc and free. However I discovered a problem with using the managed memory with linked lists. Most of my tables are currently implemented using linked lists because it makes insertion, deletion, and searching much easier. But when a table entry is freed, the managed memory will compact the memory which can change the addresses of the other table entries. But the linked list pointers don't get updated. This is a severe problem so unfortunately, I wasn't able to use the managed mem lib for the stack. I was pretty disappointed about that because I think I might have been able to decrease my RAM usage by half and still remain safe if I had a means to dynamically allocate memory. Anyways, I'll probably continue to work on the stack while I'm here. Now that I've gotten used to working on it all the time, it feels weird if I go more than a few days without touching it. In the meantime, I need to work on my tan... |
|
|
Written by Akiba
|
|
Tuesday, 01 July 2008 |
|
Hit a major milestone over the weekend. After rewriting the mac, nwk, and aps data paths, and also rewriting most of the mesh networking code, I finally got everything to work. The reason behind the rewriting, besides cleaning things up is because I have a much clearer picture of how things should work and also the architecture that I want. So I wanted to make everything consistent with each other. This also includes the naming, the patterns (no I don't use C design patterns, but many functions share a common pattern of execution), and the file layout. I also simplified the mesh routing and improved the functionality. I can now handle multiple route requests which I think may be needed at network startup. When a network starts, all the nodes have no entries in their routing tables. So any node sending data to a remote node that isn't an exact neighbor will be doing route discovery which means broadcasting route requests. Previously, I could only handle one at a time which means that I would have to drop any other route requests. This might have been detrimental to finding good routes on network startup. Or I might just be overly cautious. Anyways, the routing code is smaller, cleaner, and can handle multiple requests so I'm satisfied. After all of this was implemented, then I started testing late last week. I verified the mesh routing and tree routing on Saturday which is a big event since those seem to be the most difficult parts of the stack to get right. Once I got those working, then I took Sunday off (my wife was complaining) and today, I finished the remaining services on the nwk layer. Things were much easier to implement now that the code was cleaned up. I also finished out the remaining services on the mac layer that were needed by Zigbee. I'm not implementing a full 802.15.4 mac because it's not required by Zigbee. The spec only uses probably about one-third of the available functionality of IEEE 802.15.4. The remaining things to do are to verify the rest of the NWK layer and finish off the APS layer. On the APS layer, I still need to implement the binding table, group table, discovery cache, and address map. I'm hoping those won't be too hard. I'm a pro at implementing tables and queues now...Ha ha ha...that's actually pretty sad... Also, I'm starting to look over the Zigbee 2007 protocol compliance documentation. My eventual goal is to get this stack certified as Zigbee compliant, although I'd have to cough up the ~$2k to join the Zigbee Alliance. Ugh...gotta figure out a way to make that happen. I'll be going to California this Friday for a two week stay over there. My part-time job is requesting all the FAE's (yes, I'm a part-time FAE...gotta pay the billz) to go there for training on the product line. Anyways, it'll give me a chance to visit my sister and parents. My sister is up in the bay area so I'll be in Berkeley next week, and then go down to southern california the week after for the training. I'm hoping to claw my way through most of the remaining APS layer by the time I leave so I can have a clear conscience. Otherwise, it's gonna weigh down on me and I'm going to end up working on it while I'm over there. That's the problem with obsessive behavior. Other than that, nothing much else. My life seems pretty boring if you take the Zigbee part out of it. Hmmm...it seems pretty boring if you leave it in, too... |
|
|
Written by Akiba
|
|
Friday, 27 June 2008 |
|
Just did a sizing of the stack again. The major portions of the MAC, NWK, and APS layers are about 70% finished, although testing and debugging is still ongoing. I rewrote a lot of the stack to make it cleaner and more maintainable. My past experience is that once code gets out in the field, it can quickly turn to spaghetti due to patches and quick fixes. So if the code is more straightforward, hopefully this can be minimized. Now for the numbers, the code size is currently ~29k. That's not including the standard libraries, although I'm only using simple functions like memcpy and memcmp. I'm still using gcc for x86 so those numbers will probably not reflect the actual size. I'd figure the code will be quite a bit larger when I compile for a RISC architecture like the AVR. I'm pretty happy with that number, though. There's decent headroom there so that I think I can mak my 60 kB target size. For the RAM size, I'm currently using 5.6k. That number is a bit high for me and reflects the fact that I've implemented a lot of tables and queues lately, ie: routing, discovery, indirect, aps_retry, mac_retry, etc... And of course the biggest consumer of RAM is the buffer pool (10 buffers = 1.4k). The RAM size should pretty accurately reflect the actual size in the target since it usually is architecture independent. Once the stack is stabilized, I can start to optimize the RAM usage. Some of the ways to reduce the RAM would be like dynamically allocating memory for some of the linked lists instead of statically allocating it, shrinking the structures, shrinking the buffer pool, and tweaking the number of entries for each table. Anyways, the numbers turned out better than I expected so it looks like there's hope to have a fairly tight stack. It's still gonna be a bit longer though since I need to implement some of the lesser used functions in each layer and I'm still cleaning things up. I'll try to keep everyone posted as a first release approaches. I hope it comes soon, cuz I'm getting tired... |
|
|
Written by Akiba
|
|
Thursday, 26 June 2008 |
|
Recently, I've been doing a lot of testing in my sim. That means that I've been crawling through the code looking for the causes of different types of bugs. Some are obvious, some are mysterious. However, as a result of looking at the code a lot recently, I found many areas that I'm dissatisfied with. One of the areas that I was unhappy with was my APS layer or the Zigbee application sub-layer. This was the first part of the stack that I worked on and it was written about three months ago. Now that I'm more familiar with the needs, behavior, and patterns of the Zigbee stack, I realized that there was a lot of code that didn't need to exist. In the APS layer, if you recally my old post on it , I implemented a data queue to queue up the data from the different endpoints, a state machine to handle retries and acks, a transmit process, and some miscellaneous logic functions. All in all, it was a complicated way to implement the APS data request service. A lot of the complication was due to handling reliable acknowledgement. To do this, you needed to keep track of a timer and also buffer the data for re-sending. Well, after a couple months of stack coding under my belt and a better understanding of when and how to use the Contiki services, I decided to rewrite the APS data request service. This is one of the key functions in the stack because all the endpoints including the Zigbee Device Object will be using this service heavily. The good thing about rewrites is that you usually have much more insight than you did when you originally wrote the code. Since I can now see from the application layer down to the radio driver, I knew exactly how the APS layer needed to behave and what the tx function needed to do. Here is a simple flowchart of how the new data request routine looks: 
I was able to get rid of the data queue because it wasn't really needed at the APS layer. The tx process, state machine, and temporary retry buffer got replaced with a retry queue and a callback timer. This greatly simplified things because the callback timer performed the equivalent of a state machine, process, and timer all in one. The retry queue is actually an enhancement. Previously, I could only buffer one frame at a time which meant that I could only transmit one reliable frame (ack requested) at a time. Then I had to wait for a timeout or ACK to send the next one. This is the main reason why I needed a data queue on the transmit side. A retry queue allowed me to set aside only those frames that needed to be sent reliably. When they got put in the retry queue, then they would also have a callback timer set. If the callback timer expires, then they would automatically be sent out again. The revised design simplified the APS layer and reduced the lines of code by about half. Getting rid of all the complication also made things easier to debug. I've already tested it out and it works in the simulator too. Now, I'm going to do the same for the MAC layer. The MAC layer has a similar behavioral pattern as the APS layer in which it sends out data and needs to wait for the ACK. I'm pretty sure I can use the APS layer as a guide to simplify the MAC layer too. That way, I can decrease the size of the stack and make it more maintainable, and even add functionality. Also, the APS and MAC data request functions will be similar which kind of gives the stack a nice symmetry...I hope. |
|
|
Written by Akiba
|
|
Saturday, 14 June 2008 |
|
I finally got the Zigbee broadcasts working. It's a slight deviation from my original plan to get the mesh routing working, but as I was going through the mesh code, I realized that it requires the use of broadcast transmissions. So I figured why not get broadcasts up first. I ran into numerous problems with broadcasts. Even though I tested it in my original test fixture, it was only capable of testing two devices which was almost a trivial case. When I tested it out in the simulator with a multi-node network, I ran into infinite broadcast loops that kept on crashing my stack. So many data transmissions were flying around that it would immediately exhaust the buffer pools in all the nodes and they would end up hanging. Hmmm...need to do something about that too. I should be able to recover gracefully if I exhaust my buffers. Anyways, I almost had to rewrite the broadcast handling code to get it to work. I didn't take into consideration a couple of different situations that I should have thought of. I also found a lot of repetitive code that didn't need to be there so I cut all of that out. Finally, I got things to work and tested it on a five node network with the topology I mentioned previously . Ahhh...now I can go home and drink a beer. I've been working out of hamburger shop on a Saturday night. Not the best way to spend a weekend. I'm sick of the smell of burgers and fries. If you're ever at a Mos Burger in Japan, you should try out the rice burgers, though. They're pretty interesting. |
|
|
Written by Akiba
|
|
Thursday, 12 June 2008 |
This week has been pretty busy with the part-time job I'm doing for survival money. The part time job is nothing too difficult. I'm basically just doing customer support and technical sales for the company's product line which consists mostly of LAN, WAN, and some encryption chips. I'm contracted for two days a week which is just enough to pay rent and bills. That means that I can probably survive about two years working on this project at my current level of spending. Another way to look at it is that I'll probably die of mental fatigue before my savings gets fully depleted. Ha ha ha.
In between the customer visits this week, I managed to squeeze in some time to continue the test efforts. I finally was able to start testing the data transfers. I got the basic network management functions working over the weekend in my simulator so that I could form a network and also add nodes to it.
Today, I got the single hop unicast transfers working. That's a bit of a milestone for me because to transfer data, I need to send it out from the application layer. From there, it travels down to the application sub-layer, networking, mac, and finally out the driver to the sim. From the sim, it travels to all the nodes who check the destination address and drop the frame if it's not meant for them. At the target node, it will travel all the way up the stack (driver, mac, nwk, aps) until it reaches the application layer where it gets printed out to the simulation console. All the way up and down the stack, there are many things that could have gone wrong, however it was surprisingly easy to get working. I guess that's because I did a lot of testing in my old test fixture for the data transfers. |
|
Read more...
|
|
|
Written by Akiba
|
|
Sunday, 08 June 2008 |
|
I've fallen into radio silence mode again recently since I got the simulator working. The last few weeks have been pretty tough with all of my simulator issues, but for the past three days, it has been working well. After my big coding session on Thursday, I got things to the point where the sim runs stably which allowed me to move forward on actually testing the stack. As of today, I was able to get 10 nodes joined on to the same network which was quite a big accomplishment. That means that the first node was able to form the network, and the other nodes were able to communicate the handshake correctly to finish the join procedure. The addressing is also correct and follows the distributed addressing formula correctly in the spec. Actually, it wasn't easy getting to my current point. Once I got the stack working in the simulator, I found and cleaned up many bugs. They ranged from stupid ones like reversing the bit definitions in one of the header fields to agonizing such as a re-entrancy issue I had when my (simulator) thread called a non-reentrant stack function. Also, things got exponentially more complicated to debug as the size of the network grew. I ran into some problems that would only occur after my network grew to 8 nodes. That one turned out to be a buffer pool issue, but it was hard to track down because there were so many things going on simultaneously. Especially on the join procedure, there's a lot of broadcasts that occur so all of my node consoles were scrolling like crazy. |
|
Read more...
|
|
|