Ported by LouiSe
more info and other AMIGA ports at:
http://louise.amiga.hu
-----------------------------------------
ncc 0.5
WHATIS
======
Basically, ncc is a tool for hackers designed to provide program analysis
data of C source code. That is program flow and usage of variables.
Some big programs out there are by default obfuscated, either due to extreme
size, programming style, hacks upon hacks and other crazyness. In order to
do program analysis correctly, there has to be compilation of expressions,
and thus ncc is really a compiler (supporting zero architectures).
At the same time, ncc is small and easy to understand so you can hack it and
add custom features and extensions at any stage of the compilation, to match
what you expect and consider useful as output. Most common GNU extensions
are supported and there has been an effort be practically useful in the GNU
system (which is not easy because the GNU system is very gcc-friendly).
The goal is to be able to replace 'ncc' in Makefiles and work with the big
open source projects.
INSTALL
=======
'make'
and copy the file doc/nognu to /usr/include.
This file is used to fix some madness of libc header files and remove some
GNU extensions which violate the C grammar and can be removed without
problems. If you don't want to copy it to /usr/include, edit config.h and
recompile.
USAGE
=====
ncc uses gcc for preprocessing because the standard library headers
eventually need some other architecture specific header which are somewhere
where gcc knows where. Any options starting with -D and -I will be passed
to gcc for preprocessing. Generally, because ncc should be able to work
from makefiles instead of gcc, all options unless starting with '-nc'
produce no error (and may be even passed to gcc in a special mode).
The files compiled with ncc, will have the __GCC__ macro defined, because many
programs are written for gcc and take some gcc extensions for granted.
ncc additionally defines __NCC__ macro.
the default output (at stdout) is the report of :function calls, use of global
variables and use of members of structures.
with "-ncmv" each use of global variable or member of structure is reported
multiple times as used. This is a way to understand better how the code
works, by looking the use of variables between function calls.
with "-nc2dm" the output is suitable for the 2dmap viewer and includes only
the function calls.
with "-nchelp" help is displayed on the -nc options.
with "-ncoo" the output goes to a file sourcefile.c.nccout
HACKING mpg123
==============
This one is easy (because it's done "the right way", programs are
exponential: the number of tasks a program can do is N^2 if N are the lines
of code. Thus any program of more than 50000 lines has probably design flaws
(unless it's device drivers))
Anyway, to view the calls of mpg123, the command is:
for a in *.c; do ncc $a -ncoo -nc2dm; done
cat *.nccout > code.map
2dmap
HACKING LARGER PROGRAMS
=======================
The obvious way is to use make with ncc, so that the required -D and -I
options are invoked, and only the right files are compiled (if there
are depenencies). Normally, changing "CC=gcc" to "CC=ncc -ncoo" would be
enough. But alas! often it isn't.
So now you have to devise ways to hack the Makefiles or think of other
tricks to get the job done.
Sometimes the make procedure expects object files which ncc does not produce
and it may fail. Other programs even compile and run helpers in the
procedure of make. If all else fails, the last resort that always works, is
using the "-ncgcc" option.
with "-ncgcc", ncc will also run gcc in parallel with all it's options except
the -nc ones. So nobody will understand that ncc was even run and the
makefiles will be happy. It takes 1000% more time, but computers do get
faster every day. In this case, it is generally a good idea to remove any
'-O2 -g' options.
BYTECODE
========
with "-nccc", the output is some bytecode for the expressions.
In this mode ncc does full syntax and semantics tests, unlike the other
modes which'd better work with sources known to be correct.
The output is definatelly incomplete and of little use, but its fun to look at.
A tip is that variables are taken one level down : the '&' operator
disappears and an extra '*' goes infront of variables.
The enlightening example is:
---------------------------------
int **pp, *pa [10], a [10][10];
pp[1][1];
pa[1][1];
a[1][1];
---------------------------------
BTW, since in C: &a == a == &a[0]
for an array 'a', ncc supports &(&(&(&a))) == a
which is mathematically and logically correct as the address-of operator
may have no effect and still be valid (for pointer operands that are not
lvalues).
TROUBLESHOOTING
===============
As this is the first release of ncc, braindead bugs should still be in here.
However, thanks to open source, there are infinitive test cases.
ncc has been tested with:
linux kernel (partial according to depend), Imagemagick, gcc
xanim, mpg123, bladeenc, bzip2, gtk, gnu-fileutils,
less, mpeg_play, nasm, ncftp, vim, sox, bind, gdb
although these programs are correct and ncc lacks testing on finding errors
on wrong programs.
Also read the file doc/TROUBLES
TODO
====
- GNU statements in expressions, are parsed but the return type is not saved
and it's done int. That of course is wrong but since for the moment
there were no problems during testing it stays. Will be fixed.
- The bytecode can be : optimized, turned into architecture assembly,
run with an intepreter, etc. Bytecode for the statements will be added
if there is interest for any of the above.
- It is easy to implement parsing structures when lookup for a member is
done for the first time. That will save both space and time as more
structures declared in header files are not used. But there is no reason
to get paranoid with optimization. The major slowdown factor is having
to use -ncgcc afterall.
- Maybe get into C++.
THEREST
=======
Program written by Stelios Xanthakis.
e-mail: sxanth@ceid.upatras.gr
ncc latest download: http://students.ceid.upatras.gr/~sxanth/ncc/
Check out: http://students.ceid.upatras.gr/~sxanth/PP/
for the solution to symmetrical cryptography.
|