Skip to content

Latest commit

 

History

History
422 lines (403 loc) · 29.5 KB

README.md

File metadata and controls

422 lines (403 loc) · 29.5 KB

dcc - Direct C Compiler

A C99 compliant C compiler with additions implementing many extensions and features, as well as arbirary-precision integer arithmetic.

The main feature that differentiates this compiler from others, is its ability to directly read, preprocess, tokenize, parse, assemble and link c source code, all at the same time, in a way allowing you to execute C code in an environment similar to that of an interactive commandline. If you are interested in how this is achieved, take a look at /include/drt/drt.h

Currently only able to target I386 and above, support for x86-64 is planned and already partially implemented.

Supported output formats are ELF, windows PE, as well as direct execution of generated code.

DCC supports AT&T inline assembly syntax, emulating gcc's __asm__ statement and the GNU assembler as well as direct parsing of assembly sources.

Using TPP as preprocessor to implement a fully featured perprocessor, DCC implements many GCC extensions such as __asm__, __builtin_constant_p, many __attribute__-s, __typeof__, __auto_type, and many more, including my own twist on awesome C extensions.

Development on DCC started on 17.04.2017, eversince then being the usual one-person project.

Current state:

Note that DCC is still fairly early in its development, meaning that anything can still change and that more features will be added eventually.

  • Link against windows PE binaries/libraries (*.dll).
  • Statically link against PE binaries (as in: clone everything from a *.dll)
  • Dynamically/Statically link against ELF binaries/libraries/object files (*, *.so, *.o)
  • Output windows PE binary/library (*.exe, *.dll).
  • Output linux ELF binary/library (*, *.so).
  • Output ELF relocatable object files (*.o)
  • Process and merge (link) multiple source-/object files/static libraries.
  • Compiling DCC is mainly tested and working on windows using Visual C or DCC itself. GCC and linux support is present, but may occasionally be broken.
  • Full STD-C compliance up to C99.
  • Full AT&T assembly support with many GNU assembler extensions (see below).
  • Full ELF binary target support.
  • Fully working live execution of C source code.
  • DCC can fully compile itself (And the result can compile itself again!)

Planned features:

  • Support for X86-64/AMD64 CPU architectures.
  • Compiling DCC on linux (most of the work's already there, but nothing's tested yet).
  • Compiling DCC with DCC (because every C compiler must be able to do that!).
  • Generation of debug information (recognizeable by gdb).
  • Finish many partially implemented features (see below).
  • Support for true thread-local storage (aka. segment-based)

Features (Compiler):

  • DCC as host compiler can easily be detected with defined(__DCC_VERSION__).
  • Using TPP as preprocessor, every existing preprocessor extension is supported, as well as all that are exclusive to mine.
  • Live-compilation-mode directly generates assembly.
  • C-conforming symbol forward/backward declaration.
  • K&R-C compatible
  • Full STD-C89/90 compliance
  • Full STD-C95 compliance
  • Full STD-C99 compliance
  • Supports all C standard types.
  • Supports 64-bit long long integrals (using double-register storage).
  • Supports all C control statements.
  • Supports C11 _Generic.
  • Supports C11 _Atomic (Not fully implemented).
  • Supports C99 _Bool.
  • Supports C99 __func__ builtin identifier.
  • Supports Variable declaration in if-expressions and for-initializers.
  • Supports nested function declaration, as well as access to variables from surrounding scopes.
  • Supports C++ lvalue types (int y = 10; int &x = y;).
  • Supports C structure bitfields
  • Support for GCC statement-expressions: int x = ({ int z = 10; z+20; }); // x == 30.
  • Support for __FUNCTION__ and __PRETTY_FUNCTION__, including use by concat with other strings: char *s = "Function " __FUNCTION__ " was called"; printf("%s\n",s);.
  • Support for GCC __sync_* builtin functions (__sync_val_compare_and_swap(&x,10,20)).
  • Supports all compiler-slangs for alignof: _Alignof, __alignof, __alignof__ and __builtin_alignof.
  • Support for compile-time type deduction from expressions: typeof, __typeof, __typeof__.
  • Support for GCC scoped labels: __label__.
  • Support for GCC-style inline assembly: __asm__("ret").
  • Support for MSVC fixed-length integer types: __int(8|16|32|64).
  • Support for GCC __auto_type (as well as special interpretation of auto when not used as storage class. - auto int x = 42 auto is storage class; auto y = 10; auto denotes automatic type deduction).
  • Support for C99 variable-length arrays: int x = 10; int y[x*2]; assert(sizeof(y) == 80);.
  • Support for old (pre-STDC: K&R-C) function declarations/implementations.
  • Support for new (post-STDC: C90+) function declarations/implementations.
  • Support for floating-point types (Assembly generator is not implemented yet).
  • Support for GCC x86 segment address space (__seg_fs/__seg_gs)
  • Debugging aids for pre-initializing local variables with 0xCC bytes and memory allocated using alloca with 0xAC.
  • Inherited from assembly: Named register identifiers.
    • int x = %eax; (CPU-specific, on i386 compiles to mov %eax, x).
    • int x = *(int *)%fs:0x18; (Can also be used to access segment register, on i386 compiles to movl %fs:(0x18), x).
  • Inherited from assembly: Get current text address.
    • void *p = .; (Evaluates to the current text address with void * typing).
  • Use label names in expressions:
    • void *p = &&my_label; my_label: printf("p = %p\n",p);
  • Support for new & old GCC structure/array initializer:
    • dot-field: struct { int x,y; } p = { .x = 10, .y = 20 };
    • field-collon: struct point { int x,y; } p = { x: 10, y: 20 };
    • array-subscript: int alpha[256] = { ['a' ... 'z'] = 1, ['A' ... 'Z'] = 1, ['_'] = 1 };
  • Support for runtime brace-initializers: struct point p = { .x = get_x(), .y = get_y() };
  • Split between struct/union/enum, declaration and label namespaces:foo: struct foo foo; // Valid code and 3 different 'foo'
  • Support for unnamed struct/union inlining:
    • union foo { __int32 x; struct { __int16 a,b; }; };
      • offsetof(union foo,x) == 0, offsetof(union foo,a) == 0, offsetof(union foo,b) == 2
  • Support for builtin functions offering special compile-time optimizations, or functionality (Every builtin can be queried with __has_builtin(...)):
    • char const (&__builtin_typestr(type_or_expr t))[];
      • Accepting arguments just like 'sizeof', return a human-readable representation of the [expression's] type as a compile-time array of characters allocated in the '.string' section.
    • _Bool __builtin_constant_p(expr x);
    • expr __builtin_choose_expr(constexpr _Bool c, expr tt, expr ff);
    • _Bool __builtin_types_compatible_p(type t1, type t2);
    • void __builtin_unreachable(void) __attribute__((noreturn));
    • void __builtin_trap(void) __attribute__((noreturn));
    • void __builtin_breakpoint(void);
      • Emit a CPU-specific instruction to break into a debugging environment, or do nothing if the target CPU doesn't allow for such an instruction
    • void *__builtin_alloca(size_t s);
    • void *__builtin_alloca_with_align(size_t s, size_t a);
    • void __builtin_assume(expr x),__assume(expr x);
    • long __builtin_expect(long x, long e);
    • const char (&__builtin_FILE(void))[];
    • int __builtin_LINE(void);
    • const char (&__builtin_FUNCTION(void))[];
    • void *__builtin_assume_aligned(void *p, size_t align, ...);
    • size_t __builtin_offsetof(typename T, members...);
    • T (__builtin_bitfield(T expr, constexpr int const_index, constexpr int const_size)) : const_size;
      • Access a given sub-range of bits of any integral expression, the same way access is performed for structure bit-fields.
    • typedef ... __builtin_va_list;
    • void __builtin_va_start(__builtin_va_list &ap, T &start);
    • void __builtin_va_end(__builtin_va_list &ap);
    • void __builtin_va_copy(__builtin_va_list &dstap, __builtin_va_list &srcap);
    • T __builtin_va_arg(__builtin_va_list &ap, typename T);
      • Compiler-provided var-args helpers for generating smallest-possible code
    • int __builtin_setjmp(T &buf);
    • void __builtin_longjmp(T &buf, int sig) __attribute__((noreturn));
      • Requires: sizeof(T) == __SIZEOF_JMP_BUF__
      • Compile-time best-result code generation for register save to 'buf'
      • Optimizations for 'sig' known to never be '0'
    • void *__builtin_malloc(size_t s);
    • void *__builtin_calloc(size_t c, size_t s);
    • void *__builtin_realloc(void *p, size_t c, size_t s);
    • void __builtin_free(void *p);
    • void __builtin_cfree(void *p);
    • void *__builtin_return_address(unsigned int level);
    • void *__builtin_frame_address(unsigned int level);
    • void *__builtin_extract_return_addr(void *p);
    • void *__builtin_frob_return_address(void *p);
    • void *__builtin_isxxx(void *p);
      • ctype-style builtin functions
    • void *__builtin_memchr(void *p, int c, size_t s);
    • void *__builtin_memrchr(void *p, int c, size_t s);
      • Additional functions are available for mem(r)len/mem(r)end/rawmem(r)chr/rawmem(r)len
    • T __builtin_min(T args...);
    • T __builtin_max(T args...);
    • void __builtin_cpu_init(void);
    • int __builtin_cpu_is(char const *cpuname);
    • int __builtin_cpu_supports(char const *feature);
    • char (&__builtin_cpu_vendor(char *buf = __builtin_alloca(sizeof(__builtin_cpu_vendor()))))[?];
    • char (&__builtin_cpu_brand(char *buf = __builtin_alloca(sizeof(__builtin_cpu_brand()))))[?];
      • Returns a target-specific '\0'-terminated string describing the brand/vendor name of the host CPU. The length of the returned string is always constant and known at compile-time.
      • __builtin_cpu_init is required to be called first, and if the string cannot be determined at runtime, the returned string is filled with all '\0'-characters.
    • uint16_t __builtin_bswap16(uint16_t x);
    • uint32_t __builtin_bswap32(uint32_t x);
    • uint64_t __builtin_bswap64(uint64_t x);
    • int __builtin_ffs(int x);
    • int __builtin_ffsl(long x);
    • int __builtin_ffsll(long long x);
    • int __builtin_clz(int x);
    • int __builtin_clzl(long x);
    • int __builtin_clzll(long long x);
      • Generate inline code with per-case optimizations for best results
    • T __builtin_bswapcc(T x, size_t s = sizeof(T));
    • int __builtin_ffscc(T x, size_t s = sizeof(T));
    • int __builtin_clzcc(T x, size_t s = sizeof(T));
      • General purpose functions that works for any size
    • void *__builtin_memcpy(void *dst, void const *src, size_t s);
      • Replace with inlined code for sizes known at compile-time
      • Warn about dst/src known to overlap
    • void *__builtin_memmove(void *dst, void const *src, size_t s);
      • Optimize away dst == src cases
      • Hint about dst/src never overlapping
    • void *__builtin_memset(void *dst, int byte, size_t s);
      • Replace with inlined code for sizes known at compile-time
    • int __builtin_memcmp(void const *a, void const *b, size_t s);
      • Replace with compile-time constant for constant
      • Replace with inline code for sizes known at compile-time
    • size_t __builtin_strlen(char const *s);
      • Resolve length of static strings at compile-time
  • Split between declaration and assembly name (aka. __asm__("foo") suffix in declarations)
  • Arbitrary size arithmetic operations (The sky's the limit; as well as your binary size bloated with hundreds of add-instructions for one line of source code).
  • Support for deemon's 'pack' keyword (now called __pack):
    • Can be used to emit parenthesis almost everywhere (except in the preprocessor, or when calling macros)
  • Explicit alignment of code, data, or entire sections in-source
  • Support for #pragma comment(lib,"foo") to link against a given library "foo"
  • Support for #pragma pack(...)
  • Supports GCC builtin macros for fixed-length integral constants (__(U)INT(8|16|32|64|MAX)_C(...)).
  • GCC-compatible predefined CPU macros, such as __i386__ or __LP64__.
  • Support for GCC builtin macros, such as __SIZEOF_POINTER__, __SIZE_TYPE__, etc.

Features (Attributes):

  • Ever attribute can be written in one of three ways:
    • GCC attribte syntax (e.g.: __attribute__((noreturn)))
    • cxx-11 attributes syntax (e.g.: [[noreturn]])
    • MSVC declspec syntax (e.g.: __declspec(noreturn))
  • The name of an attribute (in the above examples noreturn) can be written with any number of leading, or terminating underscores to prevent ambiguity with user-defined macros:
    • __attribute__((____noreturn_)) is the same as __attribute__((noreturn))
  • The following attributes (as supported by other compiler) are recognized:
    • __attribute__((noreturn*))
    • __attribute__((warn_unused_result*))
    • __attribute__((weak*))
    • __attribute__((dllexport*))
    • __attribute__((dllimport*))
    • __attribute__((visibility("default")))
    • __attribute__((alias("my_alias")))
    • __attribute__((weakref("my_alias")))
    • __attribute__((used*))
    • __attribute__((unused*))
    • __attribute__((cdecl*))
    • __attribute__((stdcall*))
    • __attribute__((thiscall*))
    • __attribute__((fastcall*))
    • __attribute__((section(".text")))
    • __attribute__((regparm(x)))
    • __attribute__((naked*))
    • __attribute__((deprecated))
    • __attribute__((deprecated(msg)))
    • __attribute__((aligned(x)))
    • __attribute__((packed*))
    • __attribute__((transparent_union*))
    • __attribute__((mode(x))) (Underscores surrounding x are ignored)
    • All attribute names marked with '*' accept an optional suffix that adds an enabled-dependency on a compiler-time expression. (e.g.: __attribute__((noreturn(sizeof(int) == 4))) - Mark as noreturn, if int is 4 bytes wide)
  • Attributes not currently implemented (But planned to be):
    • __attribute__((constructor))
    • __attribute__((constructor(priority)))
    • __attribute__((destructor))
    • __attribute__((destructor(priority)))
    • __attribute__((ms_struct))
    • __attribute__((gcc_struct))
  • Attributes ignored without warning:
    • __attribute__((noinline...))
    • __attribute__((returns_twice...))
    • __attribute__((force_align_arg_pointer...))
    • __attribute__((cold...))
    • __attribute__((hot...))
    • __attribute__((pure...))
    • __attribute__((nothrow...))
    • __attribute__((noclone...))
    • __attribute__((nonnull...))
    • __attribute__((malloc...))
    • __attribute__((leaf...))
    • __attribute__((format_arg...))
    • __attribute__((format...))
    • __attribute__((externally_visible...))
    • __attribute__((alloc_size...))
    • __attribute__((always_inline...))
    • __attribute__((gnu_inline...))
    • __attribute__((artificial...))
  • New attributes added by DCC:
    • __attribute__((lib("foo")))
      • Most effective for PE targets: 'foo' is the name of the DLL file that the associated declaration should be linked against.
      • Using this attribute, one can link against DLL files that don't exist at compile-time, or create artificial dependencies on ELF targets.
    • __attribute__((arithmetic*))
      • Used on struct types of arbirary size to enable arithmetic operations with said structure. Using this attribute you could easily create e.g.: a 512-bit integer type.
        • Most operators are implemented through inline-code, but some (mul,div,mod,shl,shr,sar) generate calls to external symbols.
      • When this attribute is present, the associated structure type can be modified with 'signed'/'unsigned' to control the sign-behavior.
  • In addition, the following keywords can be used anywhere attributes are allowed.
    • {_}_cdecl: Same as __attribute__((cdecl))
    • {_}_stdcall: Same as __attribute__((stdcall))
    • {_}_fastcall: Same as __attribute__((fastcall))
    • __thiscall: Same as __attribute__((thiscall))

Features (Warnings):

  • DCC features an enourmous amount of warnings covering everything from code quality, to value truncation, to syntax errors, to unresolved references during linkage, etc...
  • Any warning can be configured as
    • Disabled: (Compilation is continued, but based on severity, generated assembly/binary may be wrong)
    • Enabled: Emit a warning, but continue compilation as if it was disabled
    • Error: Emit an error message and halt compilation at the next convenient location
    • Supress: Works recursively: Handle the warning as Disabled for every time it is suppressed before reverting its state to before it was.
  • Warnings are sorted into named groups that can be disabled as a whole. The main group of a warning is always displayed when it is emit. (e.g.: W1401("-WSyntax"): Expected ']', but got ...)
  • The global warning state can be pushed/popped from usercode:
    • Push:
      • #pragma warning(push)
      • #pragma GCC diagnostic push
    • Pop:
      • #pragma warning(pop)
      • #pragma GCC diagnostic pop
  • Individual warnings/warning group states can be explicitly defined from usercode:
    • Disabled:
      • #pragma warning("[-][W]no-<name>")
      • #pragma warning(disable: <IDS>)
      • #pragma warning(disable: "[-][W]<name>")
      • #pragma GCC diagnostic ignored "[-][W]<name>"
    • Enabled:
      • #pragma warning(enable: <IDS>)
      • #pragma warning(enable: "[-][W]<name>")
      • #pragma GCC diagnostic warning "[-][W]<name>"
    • Error:
      • #pragma warning(error: <IDS>)
      • #pragma warning(error: "[-][W]<name>")
      • #pragma GCC diagnostic error "[-][W]<name>"
    • Suppress (once for every time a warning/group is listed):
      • #pragma warning(suppress: <IDS>)
      • #pragma warning(suppress: "[-][W]<name>")
      • #pragma warning("[-][W]sup-<name>")
      • #pragma warning("[-][W]suppress-<name>")
    • Revert to default state:
      • #pragma warning(default: <IDS>)
      • #pragma warning(default: "[-][W]<name>")
      • #pragma warning("[-][W]def-<name>")
    • IDS is a space-separated list of individual warning IDS as integral constants
      • Besides belonging to any number of groups, each warning also has an ID
      • Use of these IDS should be refrained from, as they might change randomly
    • Similar to the extension-pragma, #pragma warning(...) accepts a comma-seperated list of commands.
      • #pragma warning(push,disable: "-Wsyntax")
  • All warnings can be enabled/disabled on-the-fly using pragmas:
    • #pragma warning(push|pop) Push/pop currently enabled extensions
    • #pragma warning("-W<name>") Enable warning 'name'
    • #pragma warning("-Wno-<name>") Disable warning 'name'
  • #pragma GCC system_header treats the current input file as though all warnings disabled
    • Mainly meant for headers in /fixinclude which may re-define type declarations, but are not meant to cause any problems

Features (Extensions):

  • Extensions are implemented in two different ways:
    • Extensions that are always enabled, but emit a warning when used.
      • The warning can either be disabled individually (e.g.: #pragma warning("-Wno-declaration-in-if")).
      • Or all extension warnings can be disabled using #pragma warning("-Wno-extensions").
      • Don't let yourself be fooled. Writing "-Wno-extensions" disables warnings about extensions, not extensions themself!
      • Some warnings are also emit for deprecated or newer language features.
      • "constant-case-expressions": Emit for old-style function declarations.
      • "old-function-decl": Emit for old-style function declarations.
    • Extensions that may change semantics and can therefor be disabled.
      • All of these extensions can be enabled/disabled on-the-fly using pragmas:
        • As comma-seperated list in #pragma extension(...)
          • push: Push currently enabled extensions (e.g.: #pragma extension(push))
          • pop: Pop previously enabled extensions (e.g.: #pragma extension(pop))
          • "[-][f]<name>": Enable extension name (e.g.: #pragma extension("-fmacro-recursion"))
          • "[-][f]no-<name>": Disable extension name (e.g.: #pragma extension("-fno-macro-recursion"))
      • "expression-statements": Recognize GCC statement-expressions.
      • "label-expressions": Allow use of labels in expression (prefixed by &&).
      • "local-labels": Allow labels to be scoped (using GCC's __label__ syntax).
      • "gcc-attributes": Recognize GCC __attribute__((...)) syntax.
      • "msvc-attributes": Recognize MSVC __declspec(...) syntax.
      • "cxx-11-attributes": Recognize c++11 [[...]] syntax.
      • "attribute-conditions": Allow optional conditional expression to follow a switch-attribute.
      • "calling-convention-attributes": Recognize MSVC stand-alone calling convention attributes (e.g.: __cdecl).
      • "fixed-length-integer-types": Recognize fixed-length integer types (__int(8|16|32|64)).
      • "asm-registers-in-expressions": Allow assembly registers to be used in expressions (e.g.: int x = %eax;).
      • "asm-address-in-expressions": Allow assembly registers to be used in expressions (e.g.: int x = %eax;).
      • "void-arithmetic": sizeof(void) == __has_extension("void-arithmetic") ? 1 : 0.
      • "struct-compatible": When enabled, same-layout structures are compatible, when disabled, only same-declaration structs are.
      • "auto-in-type-expressions": Allow auto be be used either as storage class, or as alias for __auto_type.
      • "variable-length-arrays": Allow declaration of C99 VLA variables.
      • "function-string-literals": Treat __FUNCTION__ and __PRETTY_FUNCTION__ as language-level string literals.
      • "if-else-optional-true": Recognize GCC if-else syntax int x = (p ?: other_p)->x; // Same as '(p ? p : other_p)->x'.
      • "fixed-length-integrals": Recognize MSVC fixed-length integer suffix: __int32 x = 42i32;.
      • "macro-recursion": Enable/Disable TCC recursive macro declaration.
      • Many more extensions are provided by TPP to control preprocessor syntax, such as #include_next directives. Their list is too long to be documented here.

Features (Optimization):

  • Dead code elimination
    • Correct deduction on merging branches, such as if-statement with two dead branches
    • Re-enable control flow when encountering a label
    • Correctly interpretation of __builtin_unreachable()
    • Correctly interpretation of __{builtin_}assume(0)
  • Automatic constant propagation
    • Even capable of handling generic offsetof: (size_t)&((struct foo *)0)->bar
  • Automatic removal of unused symbols/data
    • Recursively delete unused functions/data symbols from generated binary
    • Can be suppressed for any symbol using __attribute__((used))
  • Automatic merging of data in sections marked with M (merge) (Not fully implemented, because of missing re-use counter; the rest already works)
    • Using the same string (or sub-string) more than once will only allocate a single data segment:
      • printf("foobar\n"); printf("bar\n"); Re-use "bar\n\0" as a sub-string of "foobar\n\0"

Features (Assembler):

  • Full AT&T Assembly support
  • Extension for fixed-length
  • Supported assembly directives are:
    • .align <N> [, <FILL>]
    • .skip <N> [, <FILL>]
    • .space <N> [, <FILL>]
    • .quad <I>
    • .short <I>
    • .byte <I>
    • .word <I>
    • .hword <I>
    • .octa <I>
    • .long <I>
    • .int <I>
    • .fill <REPEAT> [, <SIZE> [, <FILL>]]
    • . = <ORG>
    • .org <ORG>
    • .extern <SYM>
    • .global <SYM>
    • .globl <SYM>
    • .protected <SYM>
    • .hidden <SYM>
    • .internal <SYM>
    • .weak <SYM>
    • .local <SYM>
    • .used <SYM>
    • .unused <SYM>
    • .size <SYM>, <\SIZE>
    • .string <STR>
    • .ascii <STR>
    • .asciz <STR>
    • .text
    • .data
    • .bss
    • .section
    • .previous
    • .set <SYM>, <VAL>
    • .include <NAME>
    • .incbin <NAME> [, <SKIP> [, <MAX>]]
  • CPU-specific, recognized directives:
    • I386+
      • .code16
      • .code32
    • X86-64
      • .code64
  • Directives ignored without warning:
    • .file ...
    • .ident ...
    • .type ...
    • .lflags ...
    • .line ...
    • .ln ...

Features (Linker):

  • Integrated linker allows for direct (and very fast) creation of executables
  • Merge multiple source files into a single compilation unit
  • ELF-style visibility control/attributes (__attribute__((visibility(...))))
  • Directly link against already-generated PE binaries
  • Add new library dependencies from source code (#pragma comment(lib,...))
  • Output to PE binary (*.exe/*.dll)