Appendix - An `sbdump` Parser

Introduction

This is a parser for the files produced by the sbdump command.

<sbd-parser.icn>= [D->]

link options

procedure main(args)

  opts := options(args, "-nl: -nf:", badopt)

  who_calls_whom(args, \opts["nl"] | "pp", wordify(\opts["nf"]) | [])

  end

<sbd-parser.icn>+= [<-D->]

procedure badopt(e)

  write(&errout, e, ".  Known options are")
  write(&errout, "  -nl - set the node labels.")
  write(&errout, "  -nf - set the node filter.")

  exit(1)

  end # badopt

<sbd-parser.icn>+= [<-D->]
record context(sp, spf, dp, dpf)

<sbd-parser.icn>+= [<-D->]
procedure eval(pgm, cntxt)

  if *pgm = 0 then return

  if pgm[2] == "=" then
    if equal(get_cntxt(pgm[1], cntxt), get_cntxt(pgm[3], cntxt))
    then return eval(pgm[4:0], cntxt)
    else fail

  if pgm[2] == "!=" then
    if not equal(get_cntxt(pgm[1], cntxt), get_cntxt(pgm[3], cntxt))
    then return eval(pgm[4:0], cntxt)
    else fail

  stop
  end

Given tag and a context cntxt, get_context() returns the associated context component in cntxt if tag is a context tag or it returns tag otherwise.

<sbd-parser.icn>+= [<-D->]
procedure get_cntxt(tag, cntxt)

  return      if tag == "sp" then cntxt.sp
         else if tag == "spf" then cntxt.spf
         else if tag == "dp" then cntxt.dp
         else if tag == "dpf" then cntxt.dpf
         else tag

  end

Given the two entities s1 and s2, equal() succeeds if they're equal and fails otherwise.

<sbd-parser.icn>+= [<-D->]
procedure equal(s1, s2)

  static syscalls
  initial syscalls := set(["sprintf", "printf", "fprintf", "free", "time",
                           "fopen", "fclose", "fread", "fwrite", "memset", 
                           "realloc", "strcmp", "strlen", "malloc"])

  (s2 == "syscall") & return member(syscalls, s1)

  return s1 == s2

  end

Given the string str, wordify() returns a list of the words in str.

<sbd-parser.icn>+= [<-D->]
procedure wordify(str)

  wds := []

  str || " " ? {
    repeat {
      tab(many(' ') | &pos)
      (not pos(0)) | return wds
      put(wds, tab(upto(' ')))
      }
    }

  return wds

  end

Reading `sbdump` Files

get_file_info(fname) accepts the name of an sbdump file fname and returns the information gleaned from the file. The information includes the sbdump data itself (sbd_info), the set of procedures defined in the source file associated with fname (defined_procs), and a table giving the set of procedures called by a procedure defined in the source file associated with fname (called_procs, where called_procs[p] is the set of procedures called by p, and p is in defined_procs).

<sbd-parser.icn>+= [<-D->]

record file_info(sbd_info, defined_procs, called_procs)

procedure get_file_info(fname)

  local in, f, fi, p

  in := open(fname) | stop("Can't open " || fname || ".")
  fi := file_info()

  fi.sbd_info := read_sbd_file(in)
  fi.defined_procs := defined_procedures(fi.sbd_info)
  fi.called_procs := table()
  every p := !fi.defined_procs do
    fi.called_procs[p] := called_procedures(fi.sbd_info, p)

  close(in)

  return fi;

  end

An sbdump file is a sequence of sections.

<sbd-parser.icn>+= [<-D->]

record sbdump_file(sections)

procedure read_sbd_file(in)

  local sbdf

  sbdf := sbdump_file()
  sbdf.sections := []
  while put(sbdf.sections, read_section(in))

  return sbdf

  end

Each sbdump-file section has

A name: text indicating the information contained in the section; ends with the word section.
An identifier: an integer identifying the information contained in the section. The name and the identifier are apparently related (for example, the Focus Unit Table section has identifier 10 and the section with identifier 10 is the Focus Unit Table section).
A start location: evidently a file position within the .bd file from which the dump was generated; the units are unclear, but probably bytes.
A length: evidently an offset from the start location; the units are unclear, but probably bytes.
Lines of text: the information, with each line corresponding to a piece of information.

<sbd-parser.icn>+= [<-D->]
record sbdump_section(name, id, start, length, text)

Read and return the next section in the input file in.

<sbd-parser.icn>+= [<-D->]
procedure read_section(in)

  local s

  s := read_section_header(in) | fail

  s.text := []
  repeat {
    l := read_line(in) | break
    if l ? ="++++" then {
      pushback_line(l)
      break
      }
    put(s.text, l)
    }

  return s

  end

Read and return the section header, which should be the next line from in; fail if in is empty (die if the next line from in is not a section header).

<sbd-parser.icn>+= [<-D->]
procedure read_section_header(in)

  local s, l

  s := sbdump_section()
  l := read_line(in) | fail

  l ? {
    ="++++ " &
    s.name := tab(upto('(')) & 
    ="(id=" &
    s.id := tab(many('0123456789')) &
    =", start=" &
    s.start := tab(many('0123456789')) &
    =", length=" &
    s.length := tab(many('0123456789')) &
    =")" &
    pos(0)
    } | stop("Unrecognizable section header line\n  \"" || l || "\".")

  return s

  end

Read and return the input line. If there's a pushed back line, return that; otherwise read from in.

<sbd-parser.icn>+= [<-D->]
global pushedback_line

procedure read_line(in) 

  return 1(.\pushedback_line, pushedback_line := &null) | read(in)

  end

Push back line l so that the next input line read will be l; die if there's already a line pushed back (that is, the push-back buffer is one line long).

<sbd-parser.icn>+= [<-D->]
procedure pushback_line(l)

  (/pushedback_line := l) | stop("trying to push back more than one line")
  
  end

Extracting Defined Procedure Names

Information about procedures defined in a source file is scattered throughout the sbdump file, but seems to be concentrated in the Focus Unit Table and in the Semantic Table. The Focus Unit Table is easier to parse than is the Semantic Table, so it will be the basis for the following code. An sbdump file may contain at least four types of the Focus Unit Table: one related to the full directory path leading to the source file, one related to the name of the source file (including the full directory path), one related to the name of the source language in which the source file is written, and one relating to the procedure names referenced in the source file. It is the last, referenced-procedure type of the Focus Unit Table that will be of interest in what follows. A Focus Unit Table's type can be determined by examining the text in the associated section: each line in the section begins with a word giving the type. In particular, each line in the reference-procedure Focus Unit Table section begins with Function. Having identified the Focus Unit Table of interest, in the rest of this section the phrase ``Focus Unit Table'' will be taken to mean ``the referenced-procedure Focus Unit Table'' unless otherwise noted. Each line in the Focus Unit Table section has the form

x-number: Function name: pname, Focus range: {f,l} =  semtag: y-number

where

x-number: A number of unknown purpose.
pname: The name of the defined procedure.
f, l: These are, presumably, the source-file line numbers of the first (f) and last (l) lines of the procedure definition.
y-number: A number of unknown purpose.

A procedure reference in the Focus Unit Table may be the result of being defined in the source file or by being called in the source file (if a procedure is both defined and called in the source file, it appears in the Focus Unit Table as a definition). The numbers fand ldistinguish the two cases: if a procedure is defined in the source file, both fand lare non-zero (the first line in the source file is line 1); if a procedure is referenced but not defined in the source file, both fand lare zero. defined_procedures(sbdf) accepts the sbdump-file information sbdf and returns a set containing the names of the procedures defined in the source file associated with sbdf.

<sbd-parser.icn>+= [<-D->]
procedure defined_procedures(sbdf)

  local defp, s

  defp := set()

  every s := !sbdf.sections do
    if (s.id = 10) & (s.text[1] ? (tab(many(' 0123456789')) & =": Function"))
    then defp ++:= defprocs(s.text)

  return defp

  end

defprocs(text) returns a set containing the names of the procedures defined in text, the text from a referenced-procedure Focus Unit Table section.

<sbd-parser.icn>+= [<-D->]
procedure defprocs(text)

  local defp, l, f, dp

  defp := set()

  every l := !text do {
    l ? {
      tab(many(' 0123456789')) &
      =": Function name: " &
      dp := tab(upto(',')) &
      =", focus range: {" &
      f := tab(many('0123456789'))
      } | stop("Unrecognizable reference-procedure Focus Unit Table " ||
               "line\n  \"" || l || "\".")
    if f ~= 0 then insert(defp, dp)
    }

  return defp

  end

Generating Who-Calls-Whom Information

It is helpful to know three things when generating who-calls-whom information:

The names of the defined procedures.
The names of the procedures called by a procedure (called_procs).
The name of the files in which procedures are defined (filename).

Only the second item is necessary; the first and third items are useful in customizing the generated information.

<sbd-parser.icn>+= [<-D->]

record proc_info(filename, called_procs)

Given a list of source files files, who_calls_whom() generates a call graph.

<sbd-parser.icn>+= [<-D->]

procedure who_calls_whom(files, node_types, filter)

  local f, p, in, sbdinfo, procinfo

  procinfo := table()
  every f := !files do {
    in := open(f) | stop("Can't open " || fname || ".")
    sbdinfo := read_sbd_file(in)
    close(in)

    every p := !defined_procedures(sbdinfo) do
      procinfo[p] := proc_info(f, called_procedures(sbdinfo, p))
    }

  make_wcw_graph(procinfo, node_types, filter)

  end

Given a table of procedure information procinfo, make_wcw_graph() generates a call-graph in dot input format.

<sbd-parser.icn>+= [<-D->]

procedure make_wcw_graph(procinfo, node_types, filter)

  <generate the graph>
  <draw the graph>

  end

<generate the graph>= (<-U)
g := table()
every pi := !sort(procinfo) do
  every p := !(pi[2].called_procs) do {
    pname := (\procinfo[p]).filename | "<unknown>"
    if eval(filter, context(pi[1], pi[2].filename, p, pname))
    then {
      fromp := (node_types[1] == "p" & pi[1]) | pi[2].filename
      /g[fromp] := set([])
      insert(g[fromp], (node_types[2] == "p" & p) | pname)
      }
    }

Drawing the graph is done in two steps. The first part gets around some dot syntax and the second part generates dot code. The default in dot is to use the node name given in the dot source as the label of the node in the generated graph. This is o.k. as long as the the syntax of the graph label matches the syntax of dot node names, which it does when the graph labels are procedure names. File names are (usually) not valid node names because they contain illegal characters (dot and slash, for example). It's easy enough to get around this problem: create synthetic node names acceptable to dot and define the proper graph label for each synthetic node. If each graph label is (invertibly) mapped to a natural number, then the graph label l is represented by the synthetic node name ni, where iis the natural number to which l is mapped. Create the mapping, represented by the table nodes such that the graph label l is mapped to the natural number nodes[l].

<draw the graph>= (<-U) [D->]

nodes := table(&null)
node_cnt := 0
every n := !sort(g) do {
  /nodes[n[1]] := (node_cnt +:= 1)
  every n2 := !(n[2]) do
    /nodes[n2] := (node_cnt +:= 1)
  }

Generating the dot code takes two steps: defining the synthetic nodes,

<draw the graph>+= (<-U) [<-D->]

write("digraph G {")
write("  size=\"8.0,10.5\"")
write("  ratio=fill")

write("  { set = minrank; node[shape = plaintext, fontsize = 16];")
write("    nodelabels [label = \"", node_types, "\"]")

ftr := ""
every ftr ||:= (!filter || " ")
write("    filter [label = \"", ftr, "\"]")
write("  }")

every n := !sort(nodes) do
  write("  n", n[2], " [label = \"", n[1], "\"]")

and creating the edges between the nodes.

<draw the graph>+= (<-U) [<-D]

every n := !sort(g) do
  every n2 := !(n[2]) do
    write("  n", nodes[n[1]], " -> n", nodes[n2], ";")

write("  }")

The default filter: accept anything.

<sbd-parser.icn>+= [<-D->]
procedure null_filter(clr, clr_file, cld, cld_file)
  return
  end

The non-sys filter: reject calls to system functions.

<sbd-parser.icn>+= [<-D->]
procedure nosys_filter(clr, clr_file, cld, cld_file)
  static syscalls
  initial syscalls := set(["sprintf", "printf", "fprintf", "free", "time",
                           "fopen", "fclose", "memset", "realloc", "strcmp",
                           "strlen", "malloc"])
  member(syscalls, cld) & fail
  return
  end

Extracting Who-Calls-Whom Information

Who-calls-whom information lives in the Edge Table section, identifier 11. Each line in the Edge Table section indicates one call, and has the format

x-number: Call From: caller To: called @ l-number semtag: y-number

where

x-number: A number of unknown purpose.
caller: The name of the calling procedure.
called: The name of the called procedure.
l-number: The source-file line number in which the call takes place.
y-number: A number of unknown purpose.

The function called_procedures(sbdf, p) accepts a procedure name p and returns the set of all procedures called by p in the sbdump file represented by sbdf.

<sbd-parser.icn>+= [<-D->]
procedure called_procedures(sbdf, p) 

  local called, s

  called := set()

  every s := !sbdf.sections do
    if s.id = 11 then called ++:= called_p(s.text, p)

  return called

  end

called_p(text, p) searches through the Edge Table section text text and returns a set containing the names of the procedures called by the procedure named p.

<sbd-parser.icn>+= [<-D->]
 
procedure called_p(text, p)

  local called

  called := set()

  every l := !text do {
    l ? {
      tab(many('0123456789')) &
      =": Call From: " &
      clr := tab(upto(' ')) &
      =" To: " &
      cld := tab(upto(' '))
      } | stop("Unrecognizable Edge Table line\n  \"" || l || "\".")
    if clr == p then insert(called, cld)
    }

  return called

  end

Generating Struct Information

At the end of the day, we want a list of structures defined in the code, where each structure is identified by its tag (or typedef name) and the list of types used to define its fields.

<sbd-parser.icn>+= [<-D->]
record struct_info(name, types)

Given file, the name of a source file, get_struct_information() returns a list of structs defined in file.

<sbd-parser.icn>+= [<-D->]

procedure get_struct_information(file)

  local in, sbdinfo, si, structinfo

  in := open(file) | stop("Can't open " || file || ".")
  sbdinfo := read_sbd_file(in)
  close(in)

  return extract_structinfo(sbdinfo)

  end

Extracting Struct Information

The sbdump file information does not keep detailed information on the structure of data types. What information it does contain is concentrated in the Semantic Table section (identifier 5). The Semantic Table apparently keeps track of symbol occurrences in the associated source file. Each occurrence is also classified as to type; in particular, where the occurrence is a definition or a reference. Symbol occurrences in the Semantic Table are grouped into records. Each line in a record has the form

x-number: Symbol on line l-number: 'sname' class = y-number

where

x-number: A number of unknown purpose.
l-number: The number of the source-file line containing the symbol.
sname: The symbol.
class: The symbol's classification.
y-number: A number of unknown purpose.

The end of a record is marked with a line containing End record. class is rather awkwardly encoded, so for the moment it's enough to note that class indicates, among other things, the entity associated with the symbol (e.g., a type or a struct field) and whether the symbol's occurrence is a definition or a reference to an already defined entity. Rummaging through the Semantic Table turns up three kinds of symbol information useful in identifying type structure:

Structure tags. A structure tag is an identifier naming a structure, for example, it is the x in struct x {...}. Structure tags are identified directly by the class part of a Semantic Table line.
Structure field declarations. These are also directly identified by the class part. Associating a field declaration with a particular structure is done indirectly by line number. A field declaration belongs the the closest previous structure tag definition; that is, a field declaration on source line x belongs to the structure defined line y, where y is the largest such line number less than x.
Field types. Here's where things get a bit sticky. Structure-field type declarations are not explicitly identified as such; they're just mentioned as references to types (which brings up a related stickiness: finding all the ways class may identify a type reference). A type reference is related to a structure through the field definitions gathered in step 2. A type reference on line x is related to a structure s if there also exists a field definition on line x and that field definition is related to s.

Before identifying how type structure might be inferred from this Semantic Table information, it might be useful to consider the ways in which this information fall short. First, it considers only structures; unions and arrays are not considered. Second, structure tag definitions are only recognized if they appear after the struct keyword, as in typedef struct x {...}. Alternative forms are classified in other ways; for example in typedef struct {...} x, x is classified as a typedef definition with no explicit mention of the structure being defined (it also, in this case, appears after all the structure field definitions). This distinction raise problems for the present analysis: of the sixteen files in the Mosaic source directory that use typedef to define structures, six of them do not define a structure tag. Anonymous structures, as in the variable declaration struct {...} s, also cause problems by not having anything to delimit the structure definition. Only one of the sixteen Mosaic source files defining structures use an anonymous structure. This problem may not be too serious, however, because, in the case of tagless structure definitions, sbdump appears to enter the keyword struct in the Semantic Table. It appears, one way or another, it is possible to mark the start of a structure definition from Semantic Table information. Third, the sbdump file contains incomplete type information about structure field declarations. In particular, it does not distinguish between a field of type t and a field of type t*. Depending on the type of analysis to which this data will be put, this problem may distort but not invalidate any results obtained. Fourth, the sbdump files are only generated from the information contained in the associated .c files. Somehow, sbdump manages to ignore information contained in .h files, which is a spectacularly inappropriate thing to do when trying to perform type hierarchy analysis. The first thing to do is extract the three types of information described above: the structure tags (structs), the field declarations (fields), and the field types (types). structs is a table indexed by structure tag; structs[[st]] is the source-text line number at which the definition for the structure with tag st starts. fields is the set of source-text line numbers at which structure fields are defined. types is a table indexed by line number; types[l] is the type used on source-text line l.

<sbd-parser.icn>+= [<-D->]
record structure_info(structs, fields, types)

Return the structure information found in sbdump file associated with sbdf.

<sbd-parser.icn>+= [<-D->]
procedure extract_structinfo(sbdf) 

  local s

  every s := !sbdf.sections do
    if s.id = 5 then return extractstructinfo(s.text)

  end

extractstructinfo(text) returns a list information about the structures defined in the source file associated with the sbdump file from which came the Semantic Table information text.

<sbd-parser.icn>+= [<-D->]
 
procedure extractstructinfo(text)

  local si, linfo, l, structinfo, struct_def_fields, struct_def_start

  si := structure_info()
  si.structs := table()
  si.fields := set()
  si.types := table()

  every l := !text do
    <classify the Semantic Table line l>

  <group structure field definitions>

  <gather field types>

  return structinfo

  end

Line classification is based on the class field described at the start of this section. This classification scheme isn't too intelligent, but it's the best we can do at the moment because Semantic Table doesn't contain more complete information (e.g., it doesn't identify when a type reference occurs in a structure field declaration) and the information is stored in some unknown order (actually, it's ordered by x-number, whatever that is). Note there's a small problem with anonymous structure definitions: there's no struct name to serve as the index for the struct table. This problem is fixed by fabricating the name l, where lis the line number at which the anonymous struct declaration starts.

<classify the Semantic Table line l>= (U->)

if linfo := parse_stl(l) then {
  if match("cb_c_def_struct_field", linfo[3]) then
    insert(si.fields, linfo[1])

  else if match("cb_c_ref_builtin_type_decl", linfo[3]) then {
    # write("si.types[", linfo[1], "] := ", linfo[2])
    si.types[linfo[1]] := linfo[2]
    }
  else if match("cb_c_ref_typedef_decl", linfo[3]) then {
    # write("si.types[", linfo[1], "] := ", linfo[2])
    si.types[linfo[1]] := linfo[2]
    }
  else if match("cb_c_ref_struct_tag_decl", linfo[3]) then {
    # write("si.types[", linfo[1], "] := ", linfo[2])
    si.types[linfo[1]] := linfo[2]
    }

  else if "cb_c_def_struct_tag_anonymous" == linfo[3] then
    si.structs["<anonymous>" || linfo[1]] := linfo[1]
  else if "cb_c_def_struct_tag" == linfo[3] then
    si.structs[linfo[2]] := linfo[1]
  }

Grouping structure field definitions involves associating a field definition with a structure definition. Grouping is done by line numbers: if a field definition occurs on line x, it is associated with the structure definition on line y, where y is the largest line number less than x. Grouping uses two auxiliary data data structures, both lists. struct_def_start gives the starting line number of structure definitions in increasing order; that is, struct_def_start[i] is the line number at which the definition of the ith structure starts, and struct_def_start[i] < struct_def_start[i+1] (in theory, there could be two structure definitions on the same line, but we'll discount such a possibility). The second auxiliary data structure is struct_def_fields, which gives the line numbers for field definitions. In particular, struct_def_fields[i] is the set of line numbers for the field definitions associated with the structure definition starting at line struct_def_start[i]. It will also be handy to know, given l, a line number at which a structure definition starts, the value of i for which struct_def_start[i] = l. This mapping is kept in the table struct_def_invert_start; that is struct_def_start[struct_def_invert_start[l]] = l.

<group structure field definitions>= (U->) [D->]
        
struct_def_start := []
struct_def_fields := []
struct_def_invert_start := table()

every l := !sort(si.structs, 2) do {
  put(struct_def_start, l[2])
  struct_def_invert_start[l[2]] := *struct_def_start
  put(struct_def_fields, set())
  }

For each structure field definition on line l find the i such that struct_def_start[i] <= l < struct_def_start[i+1] and store l in struct_def_fields[i].

<group structure field definitions>+= (U->) [<-D]

every l := !si.fields do {
  if l < struct_def_start[1]
  then stop("Struct field definition on line " || l || " before first " ||
            "struct definition on line " || struct_def_start[1] || ".")
  i := 1
  while (i <= *struct_def_start) do
    if (i = *struct_def_start) | (struct_def_start[i + 1] > l) then {
      insert(struct_def_fields[i], l)
      break;
      }
    else
      i +:= 1
  }

<gather field types>= (U->)

structinfo := []
every s := !sort(si.structs) do {
  sti := struct_info(s[1], [])
  put(structinfo, sti)
  every l := !struct_def_fields[struct_def_invert_start[s[2]]] do {
    put(sti.types, \si.types[l] | stop("Missing type at line " || l || "."))
    # write("si.types[", l, "] = '", si.types[l], "'")
    }
  }

Given l, a line from the Semantic Table, parse the line and return a list giving the line number, symbol name, and classification (in that order) found on the line. Fail if the line doesn't contain this information.

<sbd-parser.icn>+= [<-D]
procedure parse_stl(l)

  local lno, sname, class

  l ? {
    tab(many('0123456789')) &
    =": Symbol on line " &
    lno := integer(tab(many('0123456789'))) &
    =": '" &
    sname := tab(upto("'")) &
    ="' " &
    class := tab(upto(' '))
    } & return [lno, sname, class]

  end

<classify the Semantic Table line l>: U1, D2
<draw the graph>: U1, D2, D3, D4
<gather field types>: U1, D2
<generate the graph>: U1, D2
<group structure field definitions>: U1, D2, D3
<sbd-parser.icn>: D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15, D16, D17, D18, D19, D20, D21, D22, D23, D24, D25, D26, D27, D28, D29

Appendix - An sbdump Parser

Introduction

Reading sbdump Files

Extracting Defined Procedure Names

Generating Who-Calls-Whom Information

Extracting Who-Calls-Whom Information

Generating Struct Information

Extracting Struct Information

Appendix - An `sbdump` Parser

Reading `sbdump` Files