Build smarter, not harder

I spent the past six weeks roaming around Europe with a netbook, and used that time productively to get some work done on Firefox. Part of that involved building on Windows for the first time, and experiencing the joy of pymake. However, I found the extra characters required to fire off incremental builds with pymake pushed me just past the pain point required to get me to write some sort of automation. With that in mind, I introduce smartmake, a tool to allow you to specify as little information as possible to build incrementally while still ending up with a working finished build.

Here’s how it works: I’ve encoded some basic dependencies into a python file (any changes to layout/build, netwerk/build, js/src, etc. require a rebuild of toolkit/library, any changes to layout/ or dom/ require rebuilding layout/build, etc). You pass a list of srcdir directories that have been changed to the script, and it prints out a list of of build commands joined by &&, ie:

$ smartmake ipc/ipdl dom/src/geolocation
make -C objdir/ipc/ipdl && make -C objdir/dom/src/geolocation && make -C objdir/layout/build && make -C objdir/toolkit/library

It’s a pretty dumb tool at the moment, and there are certainly lots of edge cases that don’t actually work correctly. However, I found it useful enough in Europe that when I returned home to a different machine, I missed it. It’s hardcoded for my own setup right now, but I’ll try to make it more general if people are interested (ie. the objdir and cmd variables could be read from a config file). Hit me up with any requests or if you have wonderful ideas for how to improve it.

Update: I’ve genericized it a small amount. Now, you’ll need to create .hg/.smartmake like so:
[smartmake]
objdir: objdir/
cmd: make -C

before the updated smartmake will allow you to continue. Notice that this is per tree, not global. Furthermore, smartmake.py is designed to be used as a tool that outputs a command line – I have a shell alias that pipes its output to sh.

JS runnables: now with less boilerplate

Actually, this little trick has been possible for at least a year and a half since I fixed the enhancement request, but I don’t believe it’s common knowledge. When writing something like
someEventTarget.dispatch({ run: function() { ... }), you can simply use someEventTarget.dispatch(function() { ... }) and skip the object goop. It looks cleaner to my eyes, so I thought I’d try to get the message out.

Cancelling builds from the console, now easier than ever!

The self-serve tools, specifically cancel.py has received some important usability upgrades at the urging of jst and ehsan. Now, simply running

python cancel.py

will be enough to get you going – you’ll be prompted for your username, password, branch and hg revision. The builds displayed also show their state (running, pending or complete) as well, so it’s easier to find what you’re looking for. Let me know if there are more changes you’d like made!

How to identify expected and unexpected crashes in tinderbox logs

I’ve seen this come up in several bugs recently, and it’s time to disseminate some knowledge. Here is what an unexpected crash usually looks like:

TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_seek.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_seek.html | application crashed (minidump found)

You’re looking at a test harness timeout because of a crash. Simple, easy to recognize. Even shows the crashing test for you!

PROCESS-CRASH | Main app process exited normally | application crashed (minidump found)

Here’s a sly one. This is a crash, but the lack of a test name and the “Main app process exited normally” actually means that a subprocess crashed intentionally. We’ve got tests that run scripts that cause crashes in child processes so that we can test the recovery behaviour in the parent, but unfortunately we display that information very well right now.

If you’re in doubt as to what kind of crash you’re seeing in the log, there’s another heuristic you can apply by looking at the crashing stack:

Crash reason: SIGSEGV
Crash address: 0x8

Thread 0 (crashed)
0 libxul.so!js::ctypes::ConvertToJS [typedefs.h:a538db9ab619 : 113 + 0x5]
rbx = 0x00000008 r12 = 0xa7833800 r13 = 0x00000000 r14 = 0x00000000
r15 = 0x00000000 rip = 0xb69142f5 rsp = 0xf25f7490 rbp = 0xa87a4690
Found by: given as instruction pointer in context
1 libxul.so!js::ctypes::PointerType::ContentsGetter [CTypes.cpp:a538db9ab619 : 3393 + 0x1b]
rbx = 0xa7833800 r12 = 0xa87a4690 r13 = 0xf25f74e8 r14 = 0xffffffff
r15 = 0xf25f7ae0 rip = 0xb69173cf rsp = 0xf25f74e0 rbp = 0xa877e750
Found by: call frame info

This is an intentional crash. We use jsctypes to dereference 0×8, an invalid address, and this is what it looks like every single time. If you don’t see this stack, you’re looking at a crash that should be filed.

So, to summarize: not every crash is unexpected. Keep your wits about you; know your crash stacks.

Self-serve, now in bulk

Update: the tool is now easier to use and doesn’t require adding your password as an argument. See this post for more details.

I’m a big fan of the self-serve tool that RelEng provided for people with LDAP access. When I can see a try build going bad, I can cancel all the remaining builds and free up resources, or retrigger completed builds if I want to get extra results. Unfortunately, the server is fairly slow to respond and the UI to perform these actions is clumsy. Luckily, there’s a really simple API available to allow anyone with access to make use of these tools in more traditional (read: non-graphical) means. Allow me to introduce you to a new repo I set up today to make working with the self-serve API easier – self-serve tools. selfserve.py contains simple wrappers for every API point exposed, and some basic documentation of the values returned by most of the calls. cancel.py is an example of a really simple tool that can be built on top of the wrappers to allow for bulk cancellation. Here’s what a session looks like:

[jdm@Phaethon self-serve-tools]$ python cancel.py -u "jmatthews@mozilla.com" -p my5ecureP4ssword123 -r 306838f27b33 -b try
1: Linux x86-64 tryserver leak test build
2: Linux tryserver build
3: OS X 10.6.2 tryserver build
4: WINNT 5.2 tryserver build
5: Maemo 5 QT tryserver build
6: OS X 10.6.2 tryserver leak test build
7: Maemo 5 GTK tryserver build
8: Android R7 tryserver build
9: Linux tryserver leak test build
10: WINNT 5.2 tryserver leak test build
11: OS X 10.5.2 tryserver leak test build
12: Linux QT tryserver build
13: Linux x86-64 tryserver build
14: all
Builds to cancel: 1 3 5
Cancelling Linux x86-64 tryserver leak test build
Cancelling OS X 10.6.2 tryserver build
Cancelling Maemo 5 QT tryserver build

This is just the first cut, but I’m excited not to have to use the web interface any more. Please feel free to add further documentation, or even new tools! I’m excited to see what other people can build with this.

nsITimer anti-pattern

I’ve filed bug 640629 to address an intermittent source of orange: incorrect nsITimer creation. I first ran across it while working on making httpd.js collect garbage more frequently, a task which quickly turned into orange whack-a-mole as more and more problematic test constructs popped out of the nether. Mounir Lamouri (volkmar) recently fixed another instance of the nsITimer problem, so I thought I’d address it in public and do some education.

When you see a construct like this, you should be wary:

function doSomething() {
  var timer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
  timer.initWithCallback(callback, delay, timer.TYPE_ONE_SHOT);
}

There’s a common misconception that timers retain an extra reference that is released after they fire. This is false. If a timer is created and stored in a locally-scoped variable and the scope is exited, the timer is at risk of being garbage-collected before the timer fires. To combat this, store a reference to the timer elsewhere – a member of an object that outlives the current scope, a global variable, it doesn’t matter. Do your part – save a timer’s life today.

Knowledge++

Nine days ago, I made an off-hand remark in #content that I might be able to get the geolocation service working in Fennectrolysis by the end of the day if my plans worked out. I also remember referring to the process as “not a big deal.” Since that moment, I have put in a significant amount of work (at least several hours every day), and learned:

  • My estimating skills are severely underdeveloped
  • How to make use of the cycle collector
  • How weak references work
  • Best practices for XPCOM reference counting
  • There’s a confusing thing called nsIClassInfo which I should learn more about, but I know enough to force it to do my bidding for now
  • How non-modal prompts work
  • The meaning of obscure GCC linker errors like “undefined reference to vtable”
  • How to implement an XPCOM object in Javascript
  • Implementing XPCOM objects in Javascript frequently results in much more pleasant code than C++

Having said all that, yesterday I got the Fennec geolocation permission prompt to appear when triggered by a content page, and the proper callback was called when I allowed or canceled the request, so I’m confident that I can have a patch up for review by the end of the holiday weekend. Of course, given my track record, that means it might be up by the end of the week.

I’ve seen the future, brother: it is dynamic additions to the status bar that don’t block the main process.

You’re looking at a mind-bogglingly alpha Jetpack prototype running out of process. Yesterday was a black triangle moment for me, as I finally saw the culmination of 2.5 months of work to make the words “Gmail it” appear in the status bar.

In this implementation, when a Jetpack tries to do something that doesn’t really make sense in its own process (say, adding an element to the status bar), it proxies this operation to the chrome process and continues on its way. Theoretically this allows the main chrome to focus on important things like being responsive or not freezing, so the main work of running Jetpack scripts can be delegated to another process.

There’s lots and lots more work from here (for example, clicking “Gmail it” does nothing for various reasons I need to explore), but this inauspicious screenshot demonstrates that the out of process future is alive and kicking!

Megazeux debugger on github

The official Megazeux repository recently moved to github, allowing me the opportunity to create my own fork and move my debugger work into a more public sphere. Accordingly, you can now visit my repo for all the most recent robotic debugging developments.

Getting involved with Mozilla

I realize that while I’ve been contributing to Mozilla since last July, I’m still quite new to a lot of the process and knowledge that more experienced developers take for granted. Therefore, I’m going to document the steps I’ve taken to increase my understanding and involvement in hopes that it generates some discussion on the best way to help new people get their bearings.

One of the most inviting aspects of the project right off the bat was that I had a point of contact. Benjamin Smedberg announced in a blog post that the Electrolysis project could always use more help, inviting interested people to get in touch with him. This was an immense help, as he pointed me towards a really good introductory bug that forced me to explore and learn about IPDL, IDL, XPCOM, Javascript, the build system, and more. This was perfect for me; I’m always looking to expand the horizons of what I know, and my work hooking up e10s with the typeaheadfind component allowed me to do just that.

I find that one of the largest hurdles for getting involved in any project for me is lack of knowledge. You’ve got a source checkout, and you’ve got a problem to solve, but no idea where to start. If you’re courageous, you can start dipping into random files and window shopping until you find something that looks promising. However, this approach is inefficient. As I said, I really like to expand the breadth of my understanding as quickly as possible, so here are my patented steps to getting a better handle on the source tree:

  1. Subscribe to an RSS feed of commits
  2. Hang out in IRC channels
  3. Watch interesting Bugzilla users

The trick here is to have lots of information available for consumption, and to sample a wide variety of it. I read commit logs every morning, pick out entries that catch my attention and skim the commit. If I’m still interested, I’ll visit the original bug and read through its history. Through these actions, I am now in possession of:

  • names of people involved in something I’m intrigued by
  • locations in the tree of code that I’m interested in
  • other information from the bug – components worth paying attention to, dependencies, blockers, etc

IRC channels are a great way to act sponge-like. Many diverse conversations on all sorts of interesting code-related topics occur in #developers, while more focused channels like #content and #static allow me to pick up new concepts and insights into the work that I’m currently doing. Furthermore, they’re points of contact for people who are usually happy to help out when I’ve got a question.

Finally, Bugzilla is a goldmine of fascinating activity and information. When I started working on electrolysis, I realized that all of work I was interested in was clusted in the Core:IPC, so I set up an email watch on the QA contact for that component. Eventually, however, I wanted to diversify, so I began to follow specific users. Watching the activity of the polyglots of the project, those who dip in and out of every component is a great way to quickly become exposed to the wealth of work being done. There’s a downside to this: the more users you watch, the more intimidating your bugmail becomes. Today, I ended up receiving 270 emails over the course of one hour because roc decided to unassign himself from a crapload of bugs at the same time as a bunch of dependencies were added to some Jaegermonkey tracking bugs. However, I’ve become adept at quickly deciding whether a conversation thread is interesting to me or not, and these deluges are infrequent.

When it comes to learning about specific pieces of code that confuse me, I have another system. If it’s some fundamental concept that I need to grok (nsCOMPtr, ns*String, etc.), I turn to the faithful Google search: “mozilla X”, X being the unknown item, and 99% of the time the first result will be the relevant MDC page. If I’m more interested in quickly locating pieces of code, I pull out DXR and make use of its wonderful search limiters such as member: or derived:. If what I’m looking for is a piece of C or JS code, or simply isn’t indexed in DXR (m-c only), I haul out mxr and search there. If I do a few searches and can’t find what I’m looking for, it’s usually off to the friendly folk in #developers.

There’s one specific moment I remember from when I was starting out – my very first review. I’d submitted my first attempt at the typeaheadfind work, and to my horror, and email arrived with the subject “review denied.” I felt crushed. Reading through the review, I saw that many good points were made, but it was hard at first to shake the feeling that my code was simply not good enough. I’ve gotten better at accepting review- since then, but I feel that a simple change to the email subject (“review complete“) would go a long way to improving that user experience.

So that’s it, really. Through the application of these methods, I’ve gained enough knowledge to submit a bunch of patches, log some bugs, and start answering other people’s questions. It’s really just been a process of perseverance, asking the right questions, and making use of the correct tools.

Got a story? Please share! I’d love to hear how other people’s experiences differ.