GitHub - fkie-cad/Codescanner

Codescanner (with Python bindings)

The Codescanner detects machine code in files and identifies the cpu architecture, endianness, and bitness. It can be used against data files (pdf, jpgs, unknown binary files).

Version: 1.3.1 Last changed: 02. Nov 2024

What this contains

The Python 2/3 analysis framework and the Codescanner core in standalone binary form as well as library form, with C/C++ headers. The directory C_lib contains the C/C++ backend and C headers.

Author and copyright information

Please read the included LICENSE. This program is free for academic use and research.
In case you want to use it in a commercial project you can write an email.

Author and maintainer:

Viviane Zwanger (Codescanner core, old Python bindings) (viviane.zwanger@fkie.fraunhofer.de)
Henning Braun (Maintainer of the modern Python bindings) (henning.braun@fkie.fraunhofer.de)

Requirements

Python2 >= 2.7 or Python3 (Warning: Python2 will soon become deprecated.)
matplotlib
numpy

Installation

sudo pip install .

The installation works as well without sudo for the current user.

Deinstallation

sudo pip uninstall codescanner_analysis

Skip the sudo, if codescanner_analysis was installed without it, i.e. just for the current user.

Usage

General

from codescanner_analysis import CodescannerAnalysisData as CAD
cad = CAD(filenamepath, (0xstartOffset), (0xendOffset))
cad = CAD(filenamepath)
cad = CAD(filenamepath, 0x100, 0x2000)

Print regions (if any)

cad.regions.get("Code")
cad.regions.get("Ascii")
cad.regions.get("Data")
cad.regions.get("HighEntropy")
for coderegion in cad.regions.get("Code"): 
    print("Coderegion: 0x%x - 0x%x (%s)" % (coderegion[0], coderegion[1], coderegion[2]))

Print sizes of regions (if any)

cad.sizes.get("Code")
cad.sizes.get("Ascii")
cad.sizes.get("Data")
cad.sizes.get("HighEntropy")
cad.sizes.get("FileSize")

for s in cad.sizes: 
    print("%s : %i" % (s, cad.sizes[s]))

get cpu architecture dictionary (empty dictionary, if no code exists)

cad.architecture
cad.architecture.get("Full")      # Full Codescanner CPU architecture string
cad.architecture.get("ISA")       # ISA only (e.g., Intel, Arm, etc)
cad.architecture.get("Bitness")   # If relevant.
cad.architecture.get("Endianess") # If relevant.

Plot an image to file

There are two different types of plots: byteplots that plot each byte (cad.BYTE_PLOT alias (1)) and colormaps (cad.COLOR_MAP alias (2)). Byteplots are generally considered best. For large files colormaps become increasingly powerful, since matplotlib has certain limits to how much points (bytes) can be plotted on a canvas. A typical Codescanner plot of a benign executable is shown below.

dpi = 100  # recommended: dpi=75, 100, 150.
plot_type = cad.BYTE_PLOT  # (1) or cad.COLOR_MAP (2) 
cad.plot_to_file('img/file/name', dpi, plot_type)
cad.plot_to_file('/tmp/a.png', dpi)

# Dynamic-size plots are possible with:

width = 1600
height = 1000
cad.plot_to_dynamic_size_file('/tmp/a.png', dpi, width, height, plot_type)

Plot an image to buffer

plot_type = cad.BYTE_PLOT  #  (1) or cad.COLOR_MAP (2) 
buffer = cad.plot_to_buffer(dpi, plot_type)
buffer = cad.plot_to_buffer(100)

The buffer can then be used elsewhere. For example, it can be encoded to base64 and then be included as an image in an html-sheet.

Use of a COLOR_MAP plot

The ColorMap plot may be useful, if the input file is very large, exceeding the plotting capabilities of matplotlib and the users RAM.

Standalone usage of ColorMap and ImagePlot

The ColorMap and BytePlot classes may be used independently.

Using the extended comparative analysis (COMA)

You may cross check the code regions found by Codescanner by comparing them visually with executable-flagged regions ELF/PE header. (By default, this is done using Headerparser, or as a fallback, objdump.) This can be useful to see if the binary has a strange/unusual layout. Examples of potential interest: packed/dropper, ROM files, and other manipulation.

from codescanner_analysis.comparison_analysis import ComparisonAnalysis as COMA
coma = COMA(filepathname)

# This will (try to) overlay code regions of header with code regions of Codescanner.
coma.plot_to_file(outpngfile, dpi=xx) # dpi common values: 75 or 100

# Check if code regions from header are inside file (e.g., not true for ROM files or memdumps).
print(coma.are_code_regions_in_file())

# Code regions by Codescanner (Alien regions are code regions only found by Codescanner and not found py parsing the header.)
for r in coma.cs_regions: 
    print("%s : %s" % (r, coma.cs_regions[r]))

# Code regions as found by parsing the header.
for r in coma.x_regions: 
    print("%s : %s" % (r, coma.x_regions[r]))

Plotting pafish.exe in cad and coma

pafish.exe (normal 'cad' plot)

pafish.exe ('coma' plot)

The PE header matches the code region found by Codescanner exactly (red overlay). Everything absolutely normal as expected. (This can look different, e.g., malware or dropper...)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
C_lib		C_lib
Example_scripts		Example_scripts
codescanner_analysis		codescanner_analysis
examples		examples
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codescanner (with Python bindings)

What this contains

Author and copyright information

Author and maintainer:

Requirements

Installation

Deinstallation

Usage

General

Print regions (if any)

Print sizes of regions (if any)

get cpu architecture dictionary (empty dictionary, if no code exists)

Plot an image to file

Plot an image to buffer

Use of a COLOR_MAP plot

Standalone usage of ColorMap and ImagePlot

Using the extended comparative analysis (COMA)

Plotting pafish.exe in cad and coma

pafish.exe (normal 'cad' plot)

pafish.exe ('coma' plot)

About

Releases

Packages

Contributors 4

Languages

License

fkie-cad/Codescanner

Folders and files

Latest commit

History

Repository files navigation

Codescanner (with Python bindings)

What this contains

Author and copyright information

Author and maintainer:

Requirements

Installation

Deinstallation

Usage

General

Print regions (if any)

Print sizes of regions (if any)

get cpu architecture dictionary (empty dictionary, if no code exists)

Plot an image to file

Plot an image to buffer

Use of a COLOR_MAP plot

Standalone usage of ColorMap and ImagePlot

Using the extended comparative analysis (COMA)

Plotting pafish.exe in cad and coma

pafish.exe (normal 'cad' plot)

pafish.exe ('coma' plot)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages