How to Read Unfamiliar Code

How to Read Unfamiliar Code

Read it until it makes sense.

That's it. That's the job.

Reading lots of code is hard

Reading code as a skill

Not the same as reading syntax.

Not taught in classes.

Difficulty is based on experience and confidence.

Reading code as a skill

Not the same as reading syntax.

Not taught in classes.

Difficulty is based on experience and confidence.

same project <= same framework <=
different framework <= different language

What can replace experience?

  • Contextual clues - what is the effect of this code?
  • Making connections with existing knowledge

What can replace confidence?

  • Confusion
  • Feeling adrift
  • Utter despair...

What can replace confidence?

  • Organized confusion!
  • Minimize sources of despair by establishing boundaries

Case study: LibreOffice

I promised to document my attempt to understand a brand new codebase, so I chose the LibreOffice source.

OpenHub.net says 7M lines of C++/Java/XML across 71,000 files.

Let's figure out how the charting feature works!

Our tools

  • Pen/paper
  • Favourite search tool (find/grep/ack)
  • Tricked-out text editor

Get ready to search all the things.

Traditionally...

  • Search for keywords
  • Read through each file top to bottom
  • Start with main(), read to end

We can do better than this.

Start with the UI

UI strings ("Insert Chart") are more likely to be unique than arbitrary keywords.

Even if they're not directly related, the results provide us context.

Using ack "Insert Chart" on LibreOffice brings up 10 results!

What if there's no related UI?

  • Exploratory keywords searches for uncommon words
  • Aggressively ignore results based on filename
  • Goal: find files/directories that look interesting

The Search for Charts

I found a strings.src that looks plausible. It references a STR_INSERT_CHART name, so let's repeat the process for that.

This found a file called viewoverlaymanager.cxx in a ui/ directory, which sounds like something that could implement the button I'm looking for.

Making sense of a file

  • Avoid top to bottom
  • Editor: outlining tools, code folding, etc.
  • Avoid reading as much code as possible
  • Use version control history to your advantage

Iterating

  • Keep track of interesting and don't understand
    i.e. things worth investigating, and things that don't make sense
  • Avoid getting sidetracked by items in the second list
  • If current file doesn't pan out, move on to the next interesting item

  • Breadth-first vs. depth-first searching.

Broadening understanding

  • Identify relationships between code
  • More searching, but using language-specific tricks:
    ("public Foo", "new Foo", "import Foo", "#include Foo.h")
  • Set breakpoints, add print statements
  • Exploratory refactoring

The Search for Charts

We have found gButtonToolTips and gButtonSlots, which are used in onMouseEnter and MouseButtonDown methods. These seem to use an abstract command pattern by calling an Execute method. We can try to trace that by following SID_INSERT_DIAGRAM.

This takes us to a DoExecute method in ui/func/fuinsert.cxx. There's a lot that doesn't make sense here, but there's a chart2 module used which I recognize from an earlier search.

The Search for Charts

The chart2 turns out to contain a model/view/controller design, which makes a lot of sense. I learn things like:

  • ChartModel.cxx is 1500 lines long and none of them mean anything to me
  • BarChart.cxx has a method for creating the shapes for a bar chart. This makes sense!
  • PieChart.cxx has more complicated math than expected
  • The code in controller/ wires up the model and the view, including the sidebar

Now what?

Either we've accomplished our task, or it's time to start over.

In both cases, we now are more familiar with patterns in the code!

Further resources

Lindsey Kuper on refactoring to understand

Aria Stewart's thoughts on how to read code

Mike Conley's "bugnotes" from solving Firefox bugs

Thanks

Red panda (Firefox)
Slides: joshmatthews.net/cusec16/
Photo by Yortw