sbdump
Parser
sbdump
command.
<sbd-parser.icn
>= [D->]
link options
procedure main(args)
opts := options(args, "-nl: -nf:", badopt)
who_calls_whom(args, \opts["nl"] | "pp", wordify(\opts["nf"]) | [])
end
<sbd-parser.icn
>+= [<-D->]
procedure badopt(e)
write(&errout, e, ". Known options are")
write(&errout, " -nl - set the node labels.")
write(&errout, " -nf - set the node filter.")
exit(1)
end # badopt
<sbd-parser.icn
>+= [<-D->]
record context(sp, spf, dp, dpf)
<sbd-parser.icn
>+= [<-D->]
procedure eval(pgm, cntxt)
if *pgm = 0 then return
if pgm[2] == "=" then
if equal(get_cntxt(pgm[1], cntxt), get_cntxt(pgm[3], cntxt))
then return eval(pgm[4:0], cntxt)
else fail
if pgm[2] == "!=" then
if not equal(get_cntxt(pgm[1], cntxt), get_cntxt(pgm[3], cntxt))
then return eval(pgm[4:0], cntxt)
else fail
stop
end
Given tag
and a context cntxt
, get_context()
returns the associated
context component in cntxt
if tag
is a context tag or it returns
tag
otherwise.
<sbd-parser.icn
>+= [<-D->]
procedure get_cntxt(tag, cntxt)
return if tag == "sp" then cntxt.sp
else if tag == "spf" then cntxt.spf
else if tag == "dp" then cntxt.dp
else if tag == "dpf" then cntxt.dpf
else tag
end
Given the two entities s1
and s2
, equal()
succeeds if they're equal
and fails otherwise.
<sbd-parser.icn
>+= [<-D->]
procedure equal(s1, s2)
static syscalls
initial syscalls := set(["sprintf", "printf", "fprintf", "free", "time",
"fopen", "fclose", "fread", "fwrite", "memset",
"realloc", "strcmp", "strlen", "malloc"])
(s2 == "syscall") & return member(syscalls, s1)
return s1 == s2
end
Given the string str
, wordify()
returns a list of the words in
str
.
<sbd-parser.icn
>+= [<-D->]
procedure wordify(str)
wds := []
str || " " ? {
repeat {
tab(many(' ') | &pos)
(not pos(0)) | return wds
put(wds, tab(upto(' ')))
}
}
return wds
end
sbdump
Filesget_file_info(fname)
accepts the name of an sbdump
file fname
and
returns the information gleaned from the file. The information includes the
sbdump
data itself (sbd_info
), the set of procedures defined in the
source file associated with fname
(defined_procs
), and a table giving
the set of procedures called by a procedure defined in the source file
associated with fname
(called_procs
, where called_procs[p]
is the
set of procedures called by p
, and p
is in defined_procs
).
<sbd-parser.icn
>+= [<-D->]
record file_info(sbd_info, defined_procs, called_procs)
procedure get_file_info(fname)
local in, f, fi, p
in := open(fname) | stop("Can't open " || fname || ".")
fi := file_info()
fi.sbd_info := read_sbd_file(in)
fi.defined_procs := defined_procedures(fi.sbd_info)
fi.called_procs := table()
every p := !fi.defined_procs do
fi.called_procs[p] := called_procedures(fi.sbd_info, p)
close(in)
return fi;
end
An sbdump
file is a sequence of sections.
<sbd-parser.icn
>+= [<-D->]
record sbdump_file(sections)
procedure read_sbd_file(in)
local sbdf
sbdf := sbdump_file()
sbdf.sections := []
while put(sbdf.sections, read_section(in))
return sbdf
end
Each sbdump
-file section has
section
.
.bd
file from
which the dump was generated; the units are unclear, but probably bytes.
<sbd-parser.icn
>+= [<-D->]
record sbdump_section(name, id, start, length, text)
Read and return the next section in the input file in
.
<sbd-parser.icn
>+= [<-D->]
procedure read_section(in)
local s
s := read_section_header(in) | fail
s.text := []
repeat {
l := read_line(in) | break
if l ? ="++++" then {
pushback_line(l)
break
}
put(s.text, l)
}
return s
end
Read and return the section header, which should be the next line from in
;
fail if in
is empty (die if the next line from in
is not a section
header).
<sbd-parser.icn
>+= [<-D->]
procedure read_section_header(in)
local s, l
s := sbdump_section()
l := read_line(in) | fail
l ? {
="++++ " &
s.name := tab(upto('(')) &
="(id=" &
s.id := tab(many('0123456789')) &
=", start=" &
s.start := tab(many('0123456789')) &
=", length=" &
s.length := tab(many('0123456789')) &
=")" &
pos(0)
} | stop("Unrecognizable section header line\n \"" || l || "\".")
return s
end
Read and return the input line. If there's a pushed back line, return that;
otherwise read from in
.
<sbd-parser.icn
>+= [<-D->]
global pushedback_line
procedure read_line(in)
return 1(.\pushedback_line, pushedback_line := &null) | read(in)
end
Push back line l
so that the next input line read will be l
; die if
there's already a line pushed back (that is, the push-back buffer is one line
long).
<sbd-parser.icn
>+= [<-D->]
procedure pushback_line(l)
(/pushedback_line := l) | stop("trying to push back more than one line")
end
sbdump
file, but seems to be concentrated in the Focus Unit Table and
in the Semantic Table. The Focus Unit Table is easier to parse than is the
Semantic Table, so it will be the basis for the following code.
An sbdump
file may contain at least four types of the Focus Unit Table: one
related to the full directory path leading to the source file, one related to
the name of the source file (including the full directory path), one related to
the name of the source language in which the source file is written, and one
relating to the procedure names referenced in the source file. It is the last,
referenced-procedure type of the Focus Unit Table that will be of interest in
what follows.
A Focus Unit Table's type can be determined by examining the text in the
associated section: each line in the section begins with a word giving the
type. In particular, each line in the reference-procedure Focus Unit Table
section begins with Function
. Having identified the Focus Unit Table of
interest, in the rest of this section the phrase ``Focus Unit Table'' will be
taken to mean ``the referenced-procedure Focus Unit Table'' unless otherwise
noted.
Each line in the Focus Unit Table section has the form
x-number: Function name: pname, Focus range: {f,l} = semtag: y-numberwhere
defined_procedures(sbdf)
accepts the sbdump
-file information sbdf
and returns a set containing the names of the procedures defined in the source
file associated with sbdf
.
<sbd-parser.icn
>+= [<-D->]
procedure defined_procedures(sbdf)
local defp, s
defp := set()
every s := !sbdf.sections do
if (s.id = 10) & (s.text[1] ? (tab(many(' 0123456789')) & =": Function"))
then defp ++:= defprocs(s.text)
return defp
end
defprocs(text)
returns a set containing the names of the procedures defined
in text
, the text from a referenced-procedure Focus Unit Table section.
<sbd-parser.icn
>+= [<-D->]
procedure defprocs(text)
local defp, l, f, dp
defp := set()
every l := !text do {
l ? {
tab(many(' 0123456789')) &
=": Function name: " &
dp := tab(upto(',')) &
=", focus range: {" &
f := tab(many('0123456789'))
} | stop("Unrecognizable reference-procedure Focus Unit Table " ||
"line\n \"" || l || "\".")
if f ~= 0 then insert(defp, dp)
}
return defp
end
called_procs
).
filename
).
<sbd-parser.icn
>+= [<-D->]
record proc_info(filename, called_procs)
Given a list of source files files
, who_calls_whom()
generates a
call graph.
<sbd-parser.icn
>+= [<-D->]
procedure who_calls_whom(files, node_types, filter)
local f, p, in, sbdinfo, procinfo
procinfo := table()
every f := !files do {
in := open(f) | stop("Can't open " || fname || ".")
sbdinfo := read_sbd_file(in)
close(in)
every p := !defined_procedures(sbdinfo) do
procinfo[p] := proc_info(f, called_procedures(sbdinfo, p))
}
make_wcw_graph(procinfo, node_types, filter)
end
Given a table of procedure information procinfo
, make_wcw_graph()
generates a call-graph in dot input format.
<sbd-parser.icn
>+= [<-D->]
procedure make_wcw_graph(procinfo, node_types, filter)
<generate the graph>
<draw the graph>
end
<generate the graph>= (<-U) g := table() every pi := !sort(procinfo) do every p := !(pi[2].called_procs) do { pname := (\procinfo[p]).filename | "<unknown>" if eval(filter, context(pi[1], pi[2].filename, p, pname)) then { fromp := (node_types[1] == "p" & pi[1]) | pi[2].filename /g[fromp] := set([]) insert(g[fromp], (node_types[2] == "p" & p) | pname) } }
Drawing the graph is done in two steps. The first part gets around some
dot syntax and the second part generates dot code.
The default in dot is to use the node name given in the dot source as the label
of the node in the generated graph. This is o.k. as long as the the syntax of
the graph label matches the syntax of dot node names, which it does when the
graph labels are procedure names. File names are (usually) not valid node
names because they contain illegal characters (dot and slash, for example).
It's easy enough to get around this problem: create synthetic node names
acceptable to dot and define the proper graph label for each synthetic node.
If each graph label is (invertibly) mapped to a natural number, then the graph
label l
is represented by the synthetic node name ni, where
iis the natural number to which l
is mapped.
Create the mapping, represented by the table nodes
such that the graph
label l
is mapped to the natural number nodes[l]
.
<draw the graph>= (<-U) [D->] nodes := table(&null) node_cnt := 0 every n := !sort(g) do { /nodes[n[1]] := (node_cnt +:= 1) every n2 := !(n[2]) do /nodes[n2] := (node_cnt +:= 1) }
Generating the dot code takes two steps: defining the synthetic nodes,
<draw the graph>+= (<-U) [<-D->] write("digraph G {") write(" size=\"8.0,10.5\"") write(" ratio=fill") write(" { set = minrank; node[shape = plaintext, fontsize = 16];") write(" nodelabels [label = \"", node_types, "\"]") ftr := "" every ftr ||:= (!filter || " ") write(" filter [label = \"", ftr, "\"]") write(" }") every n := !sort(nodes) do write(" n", n[2], " [label = \"", n[1], "\"]")
and creating the edges between the nodes.
<draw the graph>+= (<-U) [<-D] every n := !sort(g) do every n2 := !(n[2]) do write(" n", nodes[n[1]], " -> n", nodes[n2], ";") write(" }")
The default filter: accept anything.
<sbd-parser.icn
>+= [<-D->]
procedure null_filter(clr, clr_file, cld, cld_file)
return
end
The non-sys filter: reject calls to system functions.
<sbd-parser.icn
>+= [<-D->]
procedure nosys_filter(clr, clr_file, cld, cld_file)
static syscalls
initial syscalls := set(["sprintf", "printf", "fprintf", "free", "time",
"fopen", "fclose", "memset", "realloc", "strcmp",
"strlen", "malloc"])
member(syscalls, cld) & fail
return
end
Who-calls-whom information lives in the Edge Table section, identifier 11. Each line in the Edge Table section indicates one call, and has the format
x-number: Call From: caller To: called @ l-number semtag: y-numberwhere
called_procedures(sbdf, p)
accepts a procedure name p
and
returns the set of all procedures called by p
in the sbdump
file
represented by sbdf
.
<sbd-parser.icn
>+= [<-D->]
procedure called_procedures(sbdf, p)
local called, s
called := set()
every s := !sbdf.sections do
if s.id = 11 then called ++:= called_p(s.text, p)
return called
end
called_p(text, p)
searches through the Edge Table section text text
and
returns a set containing the names of the procedures called by the procedure
named p
.
<sbd-parser.icn
>+= [<-D->]
procedure called_p(text, p)
local called
called := set()
every l := !text do {
l ? {
tab(many('0123456789')) &
=": Call From: " &
clr := tab(upto(' ')) &
=" To: " &
cld := tab(upto(' '))
} | stop("Unrecognizable Edge Table line\n \"" || l || "\".")
if clr == p then insert(called, cld)
}
return called
end
<sbd-parser.icn
>+= [<-D->]
record struct_info(name, types)
Given file
, the name of a source file, get_struct_information()
returns a list of structs defined in file
.
<sbd-parser.icn
>+= [<-D->]
procedure get_struct_information(file)
local in, sbdinfo, si, structinfo
in := open(file) | stop("Can't open " || file || ".")
sbdinfo := read_sbd_file(in)
close(in)
return extract_structinfo(sbdinfo)
end
sbdump
file information does not keep detailed information on the
structure of data types. What information it does contain is concentrated in
the Semantic Table section (identifier 5).
The Semantic Table apparently keeps track of symbol occurrences in the
associated source file. Each occurrence is also classified as to type; in
particular, where the occurrence is a definition or a reference.
Symbol occurrences in the Semantic Table are grouped into records.
Each line in a record has the form
x-number: Symbol on line l-number: 'sname' class = y-numberwhere
End record
.
class is rather awkwardly encoded, so for the moment it's enough to note
that class indicates, among other things, the entity associated with the
symbol (e.g., a type or a struct field) and whether the symbol's occurrence is
a definition or a reference to an already defined entity.
Rummaging through the Semantic Table turns up three kinds of symbol information
useful in identifying type structure:
x
in struct x {...}
. Structure tags
are identified directly by the class part of a Semantic Table line.
s
if there also exists a field
definition on line x and that field definition is related to s
.
struct
keyword, as in typedef struct x {...}
. Alternative forms are
classified in other ways; for example in typedef struct {...} x
, x
is
classified as a typedef definition with no explicit mention of the structure
being defined (it also, in this case, appears after all the structure field
definitions). This distinction raise problems for the present analysis: of the
sixteen files in the Mosaic source directory that use typedef to define
structures, six of them do not define a structure tag.
Anonymous structures, as in the variable declaration struct {...} s
, also
cause problems by not having anything to delimit the structure definition.
Only one of the sixteen Mosaic source files defining structures use an
anonymous structure.
This problem may not be too serious, however, because, in the case of tagless
structure definitions, sbdump appears to enter the keyword struct
in the
Semantic Table. It appears, one way or another, it is possible to mark the
start of a structure definition from Semantic Table information.
Third, the sbdump file contains incomplete type information about structure
field declarations. In particular, it does not distinguish between a field of
type t
and a field of type t*
. Depending on the type of analysis to
which this data will be put, this problem may distort but not invalidate any
results obtained.
Fourth, the sbdump files are only generated from the information contained in
the associated .c
files. Somehow, sbdump
manages to ignore information
contained in .h
files, which is a spectacularly inappropriate thing to do
when trying to perform type hierarchy analysis.
The first thing to do is extract the three types of information described
above: the structure tags (structs
), the field declarations (fields
),
and the field types (types
). structs
is a table indexed by structure
tag; structs[[st]]
is the source-text line number at which the definition
for the structure with tag st
starts. fields
is the set of source-text
line numbers at which structure fields are defined. types
is a table
indexed by line number; types[l]
is the type used on source-text line
l
.
<sbd-parser.icn
>+= [<-D->]
record structure_info(structs, fields, types)
Return the structure information found in sbdump file associated with
sbdf
.
<sbd-parser.icn
>+= [<-D->]
procedure extract_structinfo(sbdf)
local s
every s := !sbdf.sections do
if s.id = 5 then return extractstructinfo(s.text)
end
extractstructinfo(text)
returns a list information about the structures
defined in the source file associated with the sbdump file from which came the
Semantic Table information text
.
<sbd-parser.icn
>+= [<-D->] procedure extractstructinfo(text) local si, linfo, l, structinfo, struct_def_fields, struct_def_start si := structure_info() si.structs := table() si.fields := set() si.types := table() every l := !text do <classify the Semantic Table linel
> <group structure field definitions> <gather field types> return structinfo end
Line classification is based on the class field described at the
start of this section. This classification scheme isn't too intelligent, but
it's the best we can do at the moment because Semantic Table doesn't contain
more complete information (e.g., it doesn't identify when a type reference
occurs in a structure field declaration) and the information is stored in some
unknown order (actually, it's ordered by x-number, whatever that is).
Note there's a small problem with anonymous structure definitions: there's no
struct name to serve as the index for the struct
table. This problem is
fixed by fabricating the name
<classify the Semantic Table line l
>= (U->)
if linfo := parse_stl(l) then {
if match("cb_c_def_struct_field", linfo[3]) then
insert(si.fields, linfo[1])
else if match("cb_c_ref_builtin_type_decl", linfo[3]) then {
# write("si.types[", linfo[1], "] := ", linfo[2])
si.types[linfo[1]] := linfo[2]
}
else if match("cb_c_ref_typedef_decl", linfo[3]) then {
# write("si.types[", linfo[1], "] := ", linfo[2])
si.types[linfo[1]] := linfo[2]
}
else if match("cb_c_ref_struct_tag_decl", linfo[3]) then {
# write("si.types[", linfo[1], "] := ", linfo[2])
si.types[linfo[1]] := linfo[2]
}
else if "cb_c_def_struct_tag_anonymous" == linfo[3] then
si.structs["<anonymous>" || linfo[1]] := linfo[1]
else if "cb_c_def_struct_tag" == linfo[3] then
si.structs[linfo[2]] := linfo[1]
}
Grouping structure field definitions involves associating a field definition
with a structure definition. Grouping is done by line numbers: if a field
definition occurs on line x, it is associated with the structure definition on
line y, where y is the largest line number less than x.
Grouping uses two auxiliary data data structures, both lists.
struct_def_start
gives the starting line number of structure definitions
in increasing order; that is, struct_def_start[i]
is the line number at
which the definition of the i
th structure starts, and
struct_def_start[i]
< struct_def_start[i+1]
(in theory, there could
be two structure definitions on the same line, but we'll discount such a
possibility).
The second auxiliary data structure is struct_def_fields
, which gives the
line numbers for field definitions. In particular, struct_def_fields[i]
is
the set of line numbers for the field definitions associated with the structure
definition starting at line struct_def_start[i]
.
It will also be handy to know, given l
, a line number at which a structure
definition starts, the value of i
for which struct_def_start[i]
=
l
. This mapping is kept in the table struct_def_invert_start
; that is
struct_def_start[struct_def_invert_start[l]]
= l
.
<group structure field definitions>= (U->) [D->] struct_def_start := [] struct_def_fields := [] struct_def_invert_start := table() every l := !sort(si.structs, 2) do { put(struct_def_start, l[2]) struct_def_invert_start[l[2]] := *struct_def_start put(struct_def_fields, set()) }
For each structure field definition on line l
find the i
such that
struct_def_start[i]
<= l
< struct_def_start[i+1]
and store l
in struct_def_fields[i]
.
<group structure field definitions>+= (U->) [<-D] every l := !si.fields do { if l < struct_def_start[1] then stop("Struct field definition on line " || l || " before first " || "struct definition on line " || struct_def_start[1] || ".") i := 1 while (i <= *struct_def_start) do if (i = *struct_def_start) | (struct_def_start[i + 1] > l) then { insert(struct_def_fields[i], l) break; } else i +:= 1 }
<gather field types>= (U->) structinfo := [] every s := !sort(si.structs) do { sti := struct_info(s[1], []) put(structinfo, sti) every l := !struct_def_fields[struct_def_invert_start[s[2]]] do { put(sti.types, \si.types[l] | stop("Missing type at line " || l || ".")) # write("si.types[", l, "] = '", si.types[l], "'") } }
Given l
, a line from the Semantic Table, parse the line and return a list
giving the line number, symbol name, and classification (in that order) found
on the line. Fail if the line doesn't contain this information.
<sbd-parser.icn
>+= [<-D]
procedure parse_stl(l)
local lno, sname, class
l ? {
tab(many('0123456789')) &
=": Symbol on line " &
lno := integer(tab(many('0123456789'))) &
=": '" &
sname := tab(upto("'")) &
="' " &
class := tab(upto(' '))
} & return [lno, sname, class]
end
l
>: U1, D2
sbd-parser.icn
>: D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15, D16, D17, D18, D19, D20, D21, D22, D23, D24, D25, D26, D27, D28, D29