Documentation for scanoss-py

Introduction

In order to complete a Software Composition Analysis of your project, you will need to scan the fingerprints of the source code against a knowledge base (for example, the Open Source Software Knowledge Base).

Notice we mention fingerprints, and not the source code itself. Keeping the privacy of your information is the most important rule we follow, and what makes us different than our competitors. In order to achieve this, the SCANOSS Platform calculates file and snippet fingerprints (32-bit identifiers calculated with the winnowing algorithm).

The fingerprints of each file or snippet are then sent to the SCANOSS API, that means you are scanning against the knowledge base and not the other way around.

One way to query the SCANOSS Platform is through our Python package: scanoss-py.

Note

All of SCANOSS software is open source and free to use, explore our GitHub Organization page. You can contribute to this tool, for more information check the contribution guidelines for this project.

Features

  • The package can be run from the command line, or consumed from another Python script

  • Scan your source code fingerprints against a knowledge base

  • Dependency detection

  • Decoration services for cryptographic algorithm, vulnerabilities, semgrep issues/findings and component version detection

  • Generate an SBOM (software bill of materials) in SPDX-Lite and CycloneDX

Installation

To install (from pypi.org), run: pip3 install scanoss.

Requirements

Python 3.9 or higher.

The dependencies can be found in the requirements.txt and requirements-dev.txt files.

To install dependencies run: pip3 install -r requirements.txt and pip3 install requirements-dev.txt.

To enable dependency scanning, an extra tool is required: scancode-toolkit.

To install it run: pip3 install -r requirements-scancode.txt

Settings File

Warning

Deprecated — This documentation is no longer maintained here. The settings schema and its documentation have moved to the scanoss/schema repository. Please refer to the interactive docs or the canonical JSON Schema for the latest version.

SCANOSS provides a settings file to customize the scanning process. The settings file is a JSON file that contains project information and BOM (Bill of Materials) rules. It allows you to include, remove, or replace components in the BOM before and after scanning.

The schema is available to download here

The settings file consists of two main sections:

The self section contains basic information about your project:

{
    "self": {
        "name": "my-project",
        "license": "MIT",
        "description": "Project description"
    }
}

Settings

The settings object allows you to configure various aspects of the scanning process. Currently, it provides control over which files should be skipped during scanning through the skip property.

The skip object lets you define rules for excluding files from being scanned or fingerprinted. This can be useful for improving scan performance and avoiding unnecessary processing of certain files.

A list of patterns that determine which files should be skipped during scanning. The patterns follow the same format as .gitignore files. For more information, see the gitignore patterns documentation.

Type:

Array of strings

Required:

No

Example:
{
    "settings": {
        "skip": {
            "patterns": {
                "scanning": [
                    "*.log",
                    "!important.log",
                    "temp/",
                    "debug[0-9]*.txt",
                    "src/client/specific-file.js",
                    "src/nested/folder/"
                ]
            }
        }
    }
}

A list of patterns that determine which files should be skipped during fingerprinting. The patterns follow the same format as .gitignore files. For more information, see the gitignore patterns documentation.

Type:

Array of strings

Required:

No

Example:
{
    "settings": {
        "skip": {
            "patterns": {
                "fingerprinting": [
                    "*.log",
                    "!important.log",
                    "temp/",
                    "debug[0-9]*.txt",
                    "src/client/specific-file.js",
                    "src/nested/folder/"
                ]
            }
        }
    }
}

Rules for skipping files based on their size during scanning.

Type:

Object

Required:

No

Properties:
  • patterns (array of strings): List of glob patterns to apply the min/max size rule

  • min (integer): Minimum file size in bytes

  • max (integer): Maximum file size in bytes (Required)

Example:
{
    "settings": {
        "skip": {
            "sizes": {
                "scanning": [
                    {
                        "patterns": [
                            "*.log",
                            "!important.log",
                            "temp/",
                            "debug[0-9]*.txt",
                            "src/client/specific-file.js",
                            "src/nested/folder/"
                        ],
                        "min": 100,
                        "max": 1000000
                    }
                ]
            }
        }
    }
}

Rules for skipping files based on their size during fingerprinting.

Type:

Object

Required:

No

Properties:
  • patterns (array of strings): List of glob patterns to apply the min/max size rule

  • min (integer): Minimum file size in bytes

  • max (integer): Maximum file size in bytes (Required)

Example:
{
    "settings": {
        "skip": {
            "sizes": {
                "fingerprinting": [
                    {
                        "patterns": [
                            "*.log",
                            "!important.log",
                            "temp/",
                            "debug[0-9]*.txt",
                            "src/client/specific-file.js",
                            "src/nested/folder/"
                        ],
                        "min": 100,
                        "max": 1000000
                    }
                ]
            }
        }
    }
}
  • Patterns are matched relative to the scan root directory

  • A trailing slash indicates a directory (e.g., path/ matches only directories)

  • An asterisk * matches anything except a slash

  • Two asterisks ** match zero or more directories (e.g., path/**/folder matches path/to, path/to/folder, path/to/folder/b)

  • Range notations like [0-9] match any character in the range

  • Question mark ? matches any single character except a slash

# Match all .txt files
*.txt

# Match all .log files except important.log
*.log
!important.log

# Match all files in the build directory
build/

# Match all .pdf files in docs directory and its subdirectories
docs/**/*.pdf

# Match files like test1.js, test2.js, etc.
test[0-9].js

The SCANOSS scan engine supports tuning parameters for snippet matching. These parameters allow you to fine-tune how the scanner identifies code snippets in your repository.

Parameter

Type

Default

Description

min_snippet_hits

integer

0

Minimum snippet hits required. 0 defers to server configuration.

min_snippet_lines

integer

0

Minimum snippet lines required. 0 defers to server configuration.

ranking_enabled

boolean | null

null

Enable/disable result ranking. null defers to server configuration.

ranking_threshold

integer | null

0

Ranking threshold value (-1 to 10). -1 defers to server configuration.

honour_file_exts

boolean | null

true

Honour file extensions during matching. null defers to server configuration.

Add the file_snippet section to your scanoss.json file:

{
    "settings": {
        "file_snippet": {
            "min_snippet_hits": 3,
            "min_snippet_lines": 5,
            "ranking_enabled": true,
            "ranking_threshold": 5,
            "honour_file_exts": true
        }
    }
}

Here’s a comprehensive example combining pattern and size-based skipping:

{
  "settings": {
    "skip": {
      "patterns": {
        "scanning": [
          "# Node.js dependencies",
          "node_modules/",

          "# Build outputs",
          "dist/",
          "build/"
        ],
        "fingerprinting": [
          "# Logs except important ones",
          "*.log",
          "!important.log",

          "# Temporary files",
          "temp/",
          "*.tmp",

          "# Debug files with numbers",
          "debug[0-9]*.txt",

          "# All test files in any directory",
          "**/*test.js"
        ]
      },
      "sizes": {
        "scanning": [
          {
            "patterns": [
              "*.log",
              "!important.log"
            ],
            "min": 512,
            "max": 5242880
          }
        ],
        "fingerprinting": [
          {
            "patterns": [
              "temp/",
              "*.tmp",
              "debug[0-9]*.txt",
              "src/client/specific-file.js",
              "src/nested/folder/"
            ],
            "min": 512,
            "max": 5242880
          }
        ]
      }
    }
  }
}

The bom section defines rules for modifying the BOM before and after scanning. It contains three main operations:

Rules for adding context when scanning. These rules will be sent to the SCANOSS API meaning they have more chance of being considered part of the resulting scan.

{
    "bom": {
        "include": [
            {
                "path": "/path/to/file",
                "purl": "pkg:npm/vue@2.6.12",
                "comment": "Optional comment"
            }
        ]
    }
}

Rules for removing files from results after scanning. These rules will be applied to the results file after scanning. The post processing happens on the client side.

{
    "bom": {
        "remove": [
            {
                "path": "/path/to/file",
                "purl": "pkg:npm/vue@2.6.12",
                "comment": "Optional comment"
            }
        ]
    }
}

Rules for replacing components after scanning. These rules will be applied to the results file after scanning. The post processing happens on the client side.

{
    "bom": {
        "replace": [
            {
                "path": "/path/to/file",
                "purl": "pkg:npm/vue@2.6.12",
                "replace_with": "pkg:npm/vue@2.6.14",
                "license": "MIT",
                "comment": "Optional comment"
            }
        ]
    }
}
  1. Full Match: Requires both PATH and PURL to match. It means the rule will be applied ONLY to the specific file with the matching PURL and PATH.

  2. Partial Match: Matches based on either: - File path only (PURL is optional). It means the rule will be applied to all files with the matching path. - PURL only (PATH is optional). It means the rule will be applied to all files with the matching PURL.

  • All paths should be specified relative to the scanned directory

  • Use forward slashes (/) as path separators

Given the following example directory structure:

project/
├── src/
│   └── component.js
└── lib/
    └── utils.py
  • If the scanned directory is /project/src, then:
    • component.js is a valid path

    • lib/utils.py is an invalid path and will not match any files

  • If the scanned directory is /project, then:
    • src/component.js is a valid path

    • lib/utils.py is a valid path

PURLs must follow the Package URL specification:

  • Format: pkg:<type>/<namespace>/<name>@<version>

  • Examples: - pkg:npm/vue@2.6.12 - pkg:golang/github.com/golang/go@1.17.3

  • Must be valid and include all required components

  • Version is strongly recommended but optional

Here’s a complete example showing all sections:

{
    "self": {
        "name": "example-project",
        "license": "Apache-2.0",
        "description": "Example project configuration"
    },
    "settings": {
        "skip": {
            "patterns": {
                "scanning": [
                    "node_modules/",
                    "dist/",
                    "build/",
                ],
                "fingerprinting": [
                    "*.log",
                    "!important.log",
                    "temp/",
                    "*.tmp",
                    "debug[0-9]*.txt",
                    "**/*test.js"
                ]
            },
            "sizes": {
                "scanning": [
                    {
                        "patterns": [
                            "*.log",
                            "!important.log",
                        ],
                        "min": 512,
                        "max": 5242880
                    }
                ],
                "fingerprinting": [
                    {
                        "patterns": [
                            "temp/",
                            "debug[0-9]*.txt",
                            "src/client/specific-file.js",
                            "src/nested/folder/"
                        ],
                        "min": 512,
                        "max": 5242880
                    }
                ]
            }
        },
        "file_snippet": {
            "min_snippet_hits": 3,
            "min_snippet_lines": 5,
            "ranking_enabled": true,
            "ranking_threshold": 5,
            "honour_file_exts": true
        }
    },
    "bom": {
        "include": [
            {
                "path": "src/lib/component.js",
                "purl": "pkg:npm/lodash@4.17.21",
                "comment": "Include lodash dependency"
            }
        ],
        "remove": [
            {
                "purl": "pkg:npm/deprecated-pkg@1.0.0",
                "comment": "Remove deprecated package"
            }
        ],
        "replace": [
            {
                "path": "src/utils/helper.js",
                "purl": "pkg:npm/old-lib@1.0.0",
                "replace_with": "pkg:npm/new-lib@2.0.0",
                "license": "MIT",
                "comment": "Upgrade to newer version"
            }
        ]
    }
}

You can pass the settings file path as an argument to the CLI

$ scanoss-py scan . --settings /path/to/settings.json

If no settings file is provided, the default settings file will be used. The default location for the settings file is scanoss.json in the current working directory. If this file does not exist, settings will be omitted.

You can also skip the default settings file:

$ scanoss-py scan . --skip-settings-file

Commands and arguments

Scanning: scan, sc

Scans a directory or file (source code or .wfp fingerprint file) and shows results on the STDOUT, by default. This command is highly customizable, from the output format to the matching selection logic using an SBOM file, everything can be set to your preference.

scanoss-py scan <file or directory>

Argument

Description

–wfp <wfp file>, -w <wfp file>

Allows to scan a wfp (winnowing fingerprint) file instead of a directory

–dep <dependency file>, -p <dependency file>

Use a dependency file instead of a directory

–identify <SBOM file>, -i <SBOM file>

Scan and identify components in SBOM file (an API key is required for this feature)

–ignore <SBOM file>, -n <SBOM file>

Ignore components specified in the IGNORE SBOM file (an API key is required for this feature)

–format <format>, -f <format>

Indicates the result output format: {plain, cyclonedx, spdxlite, csv} (optional - default plain)

–flags <FLAGS>, -F <FLAGS>

Sends scanning flags (or definitions)

–threads <THREADS>, -T <THREADS>

Number of threads to use while scanning (optional - default 10 - max 30)

–skip-snippets, -S

Skip the generation of snippets

–post-size <POST_SIZE>, -P <POST_SIZE>

Number of kilobytes to limit the post to while scanning (optional - default 64)

–timeout <TIMEOUT>, -M <TIMEOUT>

Timeout (in seconds) for API communication (optional - default 120)

–all folders

Scan all folders

–all-extensions

Scan all file extensions

–all-hidden

Scan all hidden file/folders

–obfuscate

Obfuscate fingerprints

–dependencies, -D

Add dependency scanning

–dependencies-only

Run dependency scanning only

–sc-command <SC_COMMAND>

Scancode command and path if required (optional - default scancode)

–sc-timeout <SC_TIMEOUT>

Timeout (in seconds) for Scancode to complete (optional - default 600)

–apiurl <API_URL>

SCANOSS API base URL (optional - default https://api.osskb.org)

–ignore-cert-errors

Ignore certificate errors

–key <KEY>, -k <KEY>

SCANOSS API Key token (optional - not required for default API_URL)

–proxy <PROXY>

Proxy URL to use for connections, can also use the environment variable HTTPS_PROXY (optional)

–pac <PAC>

Proxy auto configuration (optional).

–ca-cert <CA_CERT>

Alternative certificate PEM file, can also use the environment variables REQUEST_CA_BUNDLE and GRPC_DEFAULT_SSL_ROOTS_FILE_PATH (optional)

Generate fingerprints: fingerprint, fp, wfp

Calculates hashes for a directory or file and shows them on the STDOUT.

scanoss-py fingerprint <file or directory>

Argument

Description

–output <file name>, -o <file name>

Output result file name (optional - default STDOUT)

–obfuscate

Obfuscate fingerprints

–skip-snippets, -S

Skip the generation of snippets

–all-extensions

Fingerprint all file extensions

–all-folders

Fingerprint all folders

–all-hidden

Fingerprint all hidden files/folders

Detect dependencies: dependencies, dp, dep

Scan source code for dependencies, but do not decorate them.

scanoss-py dependencies <>

Argument

Description

–output <file name>, -o <file name>

Output result file name (optional - default STDOUT)

–container <image_name:tag>

Analyze dependencies from a Docker container image instead of a directory

–sc-command SC_COMMAND

Scancode command and path if required (optional - default scancode)

–sc-timeout SC_TIMEOUT

Timeout (in seconds) for scancode to complete (optional - default 600)

Note

Remember that in order to enable dependency scanning, an extra tool is required: scancode-toolkit. To install it, run: pip3 install -r requirements-scancode.txt.

File count: file_count, fc

Search the source tree and produce a file type summary.

scanoss-py file_count <directory>

Argument

Description

–output <file name>, -o <file name>

Output result file name (optional - default STDOUT)

–all-hidden

Scan all hidden files/directories

Format conversion: convert, cv, cnv, cvrt

Convert file format to plain, SPDX-Lite, CycloneDX or csv.

scanoss-py convert -i <input file> --format <example, spdxlite> -o <output file>

Argument

Description

-input <file>, -i <file>

Input file name.

–output <file name>, -o <file name>

Output result file name (optional - default STDOUT)

–format <format>, -f <format>

Indicates the result output format: {plain, cyclonedx, spdxlite, csv}. (optional - default plain)

Folder Scanning: folder-scan, fs

Performs a comprehensive scan of a directory using folder hashing to identify components and their matches.

scanoss-py folder-scan <directory>

Argument

Description

–output <file name>, -o <file name>

Output result file name (optional - default STDOUT)

–format <format>, -f <format>

Output format: {json, cyclonedx} (optional - default json)

–timeout <seconds>, -M <seconds>

Timeout in seconds for API communication (optional - default 600)

–rank-threshold <number>

Filter results to only show those with rank value at or below this threshold (e.g., –rank-threshold 3 returns results with rank 1, 2, or 3). Lower rank values indicate higher quality matches.

–settings <file>, -st <file>

Settings file to use for scanning (optional - default scanoss.json)

–skip-settings-file, -stf

Skip default settings file (scanoss.json) if it exists

–key <token>, -k <token>

SCANOSS API Key token (optional - not required for default OSSKB URL)

–apiurl <API_URL>

SCANOSS API base URL (optional - default https://api.osskb.org)

–proxy <url>

Proxy URL to use for connections

–pac <file/url>

Proxy auto configuration. Specify a file, http url or “auto”

–ca-cert <file>

Alternative certificate PEM file

Folder Hashing: folder-hash, fh

Generates cryptographic hashes for files in a given directory and its subdirectories.

scanoss-py folder-hash <directory>

Argument

Description

–output <file name>, -o <file name>

Output result file name (optional - default STDOUT)

–format <format>, -f <format>

Output format: {json} (optional - default json)

–settings <file>, -st <file>

Settings file to use for scanning (optional - default scanoss.json)

–skip-settings-file, -stf

Skip default settings file (scanoss.json) if it exists

Both commands also support these general options:
  • –debug, -d: Enable debug messages

  • –trace, -t: Enable trace messages

  • –quiet, -q: Enable quiet mode

Container Scanning: container-scan, cs

Scans Docker container images for dependencies, extracting and analyzing components within containerized applications.

scanoss-py container-scan -i <image_name:tag>

Argument

Description

–image <image_name:tag>, -i <image_name:tag>

Docker image name and tag to scan (required)

–output <file name>, -o <file name>

Output result file name (optional - default STDOUT)

–include-base-image

Include base image dependencies in the scan results

–format <format>, -f <format>

Output format: {json} (optional - default json)

–timeout <seconds>, -M <seconds>

Timeout in seconds for API communication (optional - default 600)

–key <token>, -k <token>

SCANOSS API Key token (optional - not required for default OSSKB URL)

–proxy <url>

Proxy URL to use for connections

–ca-cert <file>

Alternative certificate PEM file

Crypto: crypto, cr

Provides subcommands to retrieve cryptographic information for components.

scanoss-py crypto <subcommand>

Subcommands:

algorithms (alg)

Retrieve cryptographic algorithms for the given components.

scanoss-py crypto algorithms --purl <purl_string>

Argument

Description

–with-range

Returns the list of versions in the specified range that contains cryptographic algorithms. (Replaces the previous –range option)

hints

Retrieve encryption hints for the given components.

scanoss-py crypto hints --purl <purl_string>

Argument

Description

–with-range

Returns the list of versions in the specified range that contains encryption hints.

versions-in-range (vr)

Given a list of PURLs and version ranges, get a list of versions that do/don’t contain crypto algorithms.

scanoss-py crypto versions-in-range --purl <purl_string_with_range>

Common Crypto Arguments:

The following arguments are common to the algorithms, hints, and versions-in-range subcommands:

Argument

Description

–purl <PURL>, -p <PURL>

Package URL (PURL) to process. Can be specified multiple times.

–input <file>, -i <file>

Input file name containing PURLs.

–output <file name>, -o <file name>

Output result file name (optional - default STDOUT).

–timeout <seconds>, -M <seconds>

Timeout (in seconds) for API communication (optional - default 600).

–key <KEY>, -k <KEY>

SCANOSS API Key token (optional - not required for default OSSKB URL).

–ca-cert <CA_CERT>

Alternative certificate PEM file.

–debug, -d

Enable debug messages.

–trace, -t

Enable trace messages, including API posts.

–quiet, -q

Enable quiet mode.

Component:

To be done

Utilities: utilities, ut

scanoss-py utilities

Argument

Description

fast

SCANOSS fast winnowing (requires the SCANOSS Winnowing Python Package)

certloc, cl

Display the location of Python CA certificates

cert-download, cdl, cert-dl

Download the specified server’s SSL PEM certificate

pac-proxy, pac

Use Proxy Auto-Config to determine proxy configuration

General Arguments

Argument

Description

-debug, -d

Enable debug messages

–trace, -t

Enable trace messages, including API posts

–quiet, -q

Enable quiet mode

Package consumption

This package can be run from the command line, or consumed from another Python script. A good example of how to consume it can be found in this file.

In general the easiest way is to import the required module as follows:

from scanoss.scanner import Scanner

def main():
   scanner = Scanner()
   scanner.scan_folder( '.' )

if __name__ == "__main__":
   main()

Alternatively, there is a docker image of the compiled package, which can be found in this repository. Details on how to run it can be found in this file.

Integrations

At SCANOSS we want to provide easy recipes for practical solutions, that is the reason we are constantly working on building integrations for our software. No need to adapt your existing systems to work with our software, we will adapt our software to your needs.

From CI/CD integrations with Jenkins and GitHub Actions, to our SonarQube plugin and our most recent VSCode extension. We are always working on making our software as easy to access, consume and integrate as possible.

The full list of existing integrations is down below:

Integration

Description

Jenkins

Integrate scanoss-py into your pipelines

GitHub Actions

Enhance your software development process with the SCANOSS Code Scan Action

SonarQube

Scan your code with the SCANOSS plugin for SonarQube

Visual Studio Code

Software Composition Analysis as you code

Best practices


Choose the tool based on your use case, and not the other way around

SCANOSS offers many tools and software in the field of Software Composition Analysis, and many have similar features.

For example, you can perform scans and generate software bill of materials (SBOM) with scanoss-py and the SBOM Workbench, but that doesn’t mean these tools are interchangeable. The SBOM Workbench’s GUI can be an advantage for auditors and such, but may be a complication for developers that need to integrate an SCA solution into an existing workflow.

There is also the case for language preferences, we also offer a Javascript package and a Java SDK so you have freedom to consume the SCANOSS API however you want.


Get the most accurate results

License

The Scanoss Open Source scanoss-py package is released under the MIT license.