Documentation for scanoss-py
Introduction
In order to complete a Software Composition Analysis of your project, you will need to scan the fingerprints of the source code against a knowledge base (for example, the Open Source Software Knowledge Base).
Notice we mention fingerprints, and not the source code itself. Keeping the privacy of your information is the most important rule we follow, and what makes us different than our competitors. In order to achieve this, the SCANOSS Platform calculates file and snippet fingerprints (32-bit identifiers calculated with the winnowing algorithm).
The fingerprints of each file or snippet are then sent to the SCANOSS API, that means you are scanning against the knowledge base and not the other way around.
One way to query the SCANOSS Platform is through our Python package: scanoss-py.
Note
All of SCANOSS software is open source and free to use, explore our GitHub Organization page. You can contribute to this tool, for more information check the contribution guidelines for this project.
Features
The package can be run from the command line, or consumed from another Python script
Scan your source code fingerprints against a knowledge base
Dependency detection
Decoration services for cryptographic algorithm, vulnerabilities, semgrep issues/findings and component version detection
Generate an SBOM (software bill of materials) in SPDX-Lite and CycloneDX
Installation
To install (from pypi.org), run: pip3 install scanoss.
Requirements
Python 3.9 or higher.
The dependencies can be found in the requirements.txt and requirements-dev.txt files.
To install dependencies run: pip3 install -r requirements.txt and pip3 install requirements-dev.txt.
To enable dependency scanning, an extra tool is required: scancode-toolkit.
To install it run: pip3 install -r requirements-scancode.txt
Settings File
Warning
Deprecated — This documentation is no longer maintained here. The settings schema and its documentation have moved to the scanoss/schema repository. Please refer to the interactive docs or the canonical JSON Schema for the latest version.
SCANOSS provides a settings file to customize the scanning process. The settings file is a JSON file that contains project information and BOM (Bill of Materials) rules. It allows you to include, remove, or replace components in the BOM before and after scanning.
The schema is available to download here
The settings file consists of two main sections:
The self section contains basic information about your project:
{
"self": {
"name": "my-project",
"license": "MIT",
"description": "Project description"
}
}
Settings
The settings object allows you to configure various aspects of the scanning process. Currently, it provides control over which files should be skipped during scanning through the skip property.
The skip object lets you define rules for excluding files from being scanned or fingerprinted. This can be useful for improving scan performance and avoiding unnecessary processing of certain files.
A list of patterns that determine which files should be skipped during scanning. The patterns follow the same format as .gitignore files. For more information, see the gitignore patterns documentation.
- Type:
Array of strings
- Required:
No
- Example:
{ "settings": { "skip": { "patterns": { "scanning": [ "*.log", "!important.log", "temp/", "debug[0-9]*.txt", "src/client/specific-file.js", "src/nested/folder/" ] } } } }
A list of patterns that determine which files should be skipped during fingerprinting. The patterns follow the same format as .gitignore files. For more information, see the gitignore patterns documentation.
- Type:
Array of strings
- Required:
No
- Example:
{ "settings": { "skip": { "patterns": { "fingerprinting": [ "*.log", "!important.log", "temp/", "debug[0-9]*.txt", "src/client/specific-file.js", "src/nested/folder/" ] } } } }
Rules for skipping files based on their size during scanning.
- Type:
Object
- Required:
No
- Properties:
patterns(array of strings): List of glob patterns to apply the min/max size rulemin(integer): Minimum file size in bytesmax(integer): Maximum file size in bytes (Required)
- Example:
{ "settings": { "skip": { "sizes": { "scanning": [ { "patterns": [ "*.log", "!important.log", "temp/", "debug[0-9]*.txt", "src/client/specific-file.js", "src/nested/folder/" ], "min": 100, "max": 1000000 } ] } } } }
Rules for skipping files based on their size during fingerprinting.
- Type:
Object
- Required:
No
- Properties:
patterns(array of strings): List of glob patterns to apply the min/max size rulemin(integer): Minimum file size in bytesmax(integer): Maximum file size in bytes (Required)
- Example:
{ "settings": { "skip": { "sizes": { "fingerprinting": [ { "patterns": [ "*.log", "!important.log", "temp/", "debug[0-9]*.txt", "src/client/specific-file.js", "src/nested/folder/" ], "min": 100, "max": 1000000 } ] } } } }
Patterns are matched relative to the scan root directory
A trailing slash indicates a directory (e.g.,
path/matches only directories)An asterisk
*matches anything except a slashTwo asterisks
**match zero or more directories (e.g.,path/**/foldermatchespath/to,path/to/folder,path/to/folder/b)Range notations like
[0-9]match any character in the rangeQuestion mark
?matches any single character except a slash
# Match all .txt files
*.txt
# Match all .log files except important.log
*.log
!important.log
# Match all files in the build directory
build/
# Match all .pdf files in docs directory and its subdirectories
docs/**/*.pdf
# Match files like test1.js, test2.js, etc.
test[0-9].js
The SCANOSS scan engine supports tuning parameters for snippet matching. These parameters allow you to fine-tune how the scanner identifies code snippets in your repository.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Minimum snippet hits required. |
|
|
|
Minimum snippet lines required. |
|
|
|
Enable/disable result ranking. |
|
|
|
Ranking threshold value ( |
|
|
|
Honour file extensions during matching. |
Add the file_snippet section to your scanoss.json file:
{
"settings": {
"file_snippet": {
"min_snippet_hits": 3,
"min_snippet_lines": 5,
"ranking_enabled": true,
"ranking_threshold": 5,
"honour_file_exts": true
}
}
}
Here’s a comprehensive example combining pattern and size-based skipping:
{
"settings": {
"skip": {
"patterns": {
"scanning": [
"# Node.js dependencies",
"node_modules/",
"# Build outputs",
"dist/",
"build/"
],
"fingerprinting": [
"# Logs except important ones",
"*.log",
"!important.log",
"# Temporary files",
"temp/",
"*.tmp",
"# Debug files with numbers",
"debug[0-9]*.txt",
"# All test files in any directory",
"**/*test.js"
]
},
"sizes": {
"scanning": [
{
"patterns": [
"*.log",
"!important.log"
],
"min": 512,
"max": 5242880
}
],
"fingerprinting": [
{
"patterns": [
"temp/",
"*.tmp",
"debug[0-9]*.txt",
"src/client/specific-file.js",
"src/nested/folder/"
],
"min": 512,
"max": 5242880
}
]
}
}
}
}
The bom section defines rules for modifying the BOM before and after scanning. It contains three main operations:
Rules for adding context when scanning. These rules will be sent to the SCANOSS API meaning they have more chance of being considered part of the resulting scan.
{
"bom": {
"include": [
{
"path": "/path/to/file",
"purl": "pkg:npm/vue@2.6.12",
"comment": "Optional comment"
}
]
}
}
Rules for removing files from results after scanning. These rules will be applied to the results file after scanning. The post processing happens on the client side.
{
"bom": {
"remove": [
{
"path": "/path/to/file",
"purl": "pkg:npm/vue@2.6.12",
"comment": "Optional comment"
}
]
}
}
Rules for replacing components after scanning. These rules will be applied to the results file after scanning. The post processing happens on the client side.
{
"bom": {
"replace": [
{
"path": "/path/to/file",
"purl": "pkg:npm/vue@2.6.12",
"replace_with": "pkg:npm/vue@2.6.14",
"license": "MIT",
"comment": "Optional comment"
}
]
}
}
Full Match: Requires both PATH and PURL to match. It means the rule will be applied ONLY to the specific file with the matching PURL and PATH.
Partial Match: Matches based on either: - File path only (PURL is optional). It means the rule will be applied to all files with the matching path. - PURL only (PATH is optional). It means the rule will be applied to all files with the matching PURL.
All paths should be specified relative to the scanned directory
Use forward slashes (
/) as path separators
Given the following example directory structure:
project/
├── src/
│ └── component.js
└── lib/
└── utils.py
- If the scanned directory is
/project/src, then: component.jsis a valid pathlib/utils.pyis an invalid path and will not match any files
- If the scanned directory is
- If the scanned directory is
/project, then: src/component.jsis a valid pathlib/utils.pyis a valid path
- If the scanned directory is
PURLs must follow the Package URL specification:
Format:
pkg:<type>/<namespace>/<name>@<version>Examples: -
pkg:npm/vue@2.6.12-pkg:golang/github.com/golang/go@1.17.3Must be valid and include all required components
Version is strongly recommended but optional
Here’s a complete example showing all sections:
{
"self": {
"name": "example-project",
"license": "Apache-2.0",
"description": "Example project configuration"
},
"settings": {
"skip": {
"patterns": {
"scanning": [
"node_modules/",
"dist/",
"build/",
],
"fingerprinting": [
"*.log",
"!important.log",
"temp/",
"*.tmp",
"debug[0-9]*.txt",
"**/*test.js"
]
},
"sizes": {
"scanning": [
{
"patterns": [
"*.log",
"!important.log",
],
"min": 512,
"max": 5242880
}
],
"fingerprinting": [
{
"patterns": [
"temp/",
"debug[0-9]*.txt",
"src/client/specific-file.js",
"src/nested/folder/"
],
"min": 512,
"max": 5242880
}
]
}
},
"file_snippet": {
"min_snippet_hits": 3,
"min_snippet_lines": 5,
"ranking_enabled": true,
"ranking_threshold": 5,
"honour_file_exts": true
}
},
"bom": {
"include": [
{
"path": "src/lib/component.js",
"purl": "pkg:npm/lodash@4.17.21",
"comment": "Include lodash dependency"
}
],
"remove": [
{
"purl": "pkg:npm/deprecated-pkg@1.0.0",
"comment": "Remove deprecated package"
}
],
"replace": [
{
"path": "src/utils/helper.js",
"purl": "pkg:npm/old-lib@1.0.0",
"replace_with": "pkg:npm/new-lib@2.0.0",
"license": "MIT",
"comment": "Upgrade to newer version"
}
]
}
}
You can pass the settings file path as an argument to the CLI
$ scanoss-py scan . --settings /path/to/settings.json
If no settings file is provided, the default settings file will be used.
The default location for the settings file is scanoss.json in the current working directory.
If this file does not exist, settings will be omitted.
You can also skip the default settings file:
$ scanoss-py scan . --skip-settings-file
Commands and arguments
Scanning: scan, sc
Scans a directory or file (source code or .wfp fingerprint file) and shows results on the STDOUT, by default. This command is highly customizable, from the output format to the matching selection logic using an SBOM file, everything can be set to your preference.
scanoss-py scan <file or directory>
Argument |
Description |
|---|---|
–wfp <wfp file>, -w <wfp file> |
Allows to scan a wfp (winnowing fingerprint) file instead of a directory |
–dep <dependency file>, -p <dependency file> |
Use a dependency file instead of a directory |
–identify <SBOM file>, -i <SBOM file> |
Scan and identify components in SBOM file (an API key is required for this feature) |
–ignore <SBOM file>, -n <SBOM file> |
Ignore components specified in the IGNORE SBOM file (an API key is required for this feature) |
–format <format>, -f <format> |
Indicates the result output format: {plain, cyclonedx, spdxlite, csv} (optional - default plain) |
–flags <FLAGS>, -F <FLAGS> |
Sends scanning flags (or definitions) |
–threads <THREADS>, -T <THREADS> |
Number of threads to use while scanning (optional - default 10 - max 30) |
–skip-snippets, -S |
Skip the generation of snippets |
–post-size <POST_SIZE>, -P <POST_SIZE> |
Number of kilobytes to limit the post to while scanning (optional - default 64) |
–timeout <TIMEOUT>, -M <TIMEOUT> |
Timeout (in seconds) for API communication (optional - default 120) |
–all folders |
Scan all folders |
–all-extensions |
Scan all file extensions |
–all-hidden |
Scan all hidden file/folders |
–obfuscate |
Obfuscate fingerprints |
–dependencies, -D |
Add dependency scanning |
–dependencies-only |
Run dependency scanning only |
–sc-command <SC_COMMAND> |
Scancode command and path if required (optional - default scancode) |
–sc-timeout <SC_TIMEOUT> |
Timeout (in seconds) for Scancode to complete (optional - default 600) |
–apiurl <API_URL> |
SCANOSS API base URL (optional - default https://api.osskb.org) |
–ignore-cert-errors |
Ignore certificate errors |
–key <KEY>, -k <KEY> |
SCANOSS API Key token (optional - not required for default API_URL) |
–proxy <PROXY> |
Proxy URL to use for connections, can also use the environment variable |
–pac <PAC> |
Proxy auto configuration (optional). |
–ca-cert <CA_CERT> |
Alternative certificate PEM file, can also use the environment variables |
Generate fingerprints: fingerprint, fp, wfp
Calculates hashes for a directory or file and shows them on the STDOUT.
scanoss-py fingerprint <file or directory>
Argument |
Description |
|---|---|
–output <file name>, -o <file name> |
Output result file name (optional - default STDOUT) |
–obfuscate |
Obfuscate fingerprints |
–skip-snippets, -S |
Skip the generation of snippets |
–all-extensions |
Fingerprint all file extensions |
–all-folders |
Fingerprint all folders |
–all-hidden |
Fingerprint all hidden files/folders |
Detect dependencies: dependencies, dp, dep
Scan source code for dependencies, but do not decorate them.
scanoss-py dependencies <>
Argument |
Description |
|---|---|
–output <file name>, -o <file name> |
Output result file name (optional - default STDOUT) |
–container <image_name:tag> |
Analyze dependencies from a Docker container image instead of a directory |
–sc-command SC_COMMAND |
Scancode command and path if required (optional - default scancode) |
–sc-timeout SC_TIMEOUT |
Timeout (in seconds) for scancode to complete (optional - default 600) |
Note
Remember that in order to enable dependency scanning, an extra tool is required: scancode-toolkit. To install it, run: pip3 install -r requirements-scancode.txt.
File count: file_count, fc
Search the source tree and produce a file type summary.
scanoss-py file_count <directory>
Argument |
Description |
|---|---|
–output <file name>, -o <file name> |
Output result file name (optional - default STDOUT) |
–all-hidden |
Scan all hidden files/directories |
Format conversion: convert, cv, cnv, cvrt
Convert file format to plain, SPDX-Lite, CycloneDX or csv.
scanoss-py convert -i <input file> --format <example, spdxlite> -o <output file>
Argument |
Description |
|---|---|
-input <file>, -i <file> |
Input file name. |
–output <file name>, -o <file name> |
Output result file name (optional - default STDOUT) |
–format <format>, -f <format> |
Indicates the result output format: {plain, cyclonedx, spdxlite, csv}. (optional - default plain) |
Folder Scanning: folder-scan, fs
Performs a comprehensive scan of a directory using folder hashing to identify components and their matches.
scanoss-py folder-scan <directory>
Argument |
Description |
|---|---|
–output <file name>, -o <file name> |
Output result file name (optional - default STDOUT) |
–format <format>, -f <format> |
Output format: {json, cyclonedx} (optional - default json) |
–timeout <seconds>, -M <seconds> |
Timeout in seconds for API communication (optional - default 600) |
–rank-threshold <number> |
Filter results to only show those with rank value at or below this threshold (e.g., –rank-threshold 3 returns results with rank 1, 2, or 3). Lower rank values indicate higher quality matches. |
–settings <file>, -st <file> |
Settings file to use for scanning (optional - default scanoss.json) |
–skip-settings-file, -stf |
Skip default settings file (scanoss.json) if it exists |
–key <token>, -k <token> |
SCANOSS API Key token (optional - not required for default OSSKB URL) |
–apiurl <API_URL> |
SCANOSS API base URL (optional - default https://api.osskb.org) |
–proxy <url> |
Proxy URL to use for connections |
–pac <file/url> |
Proxy auto configuration. Specify a file, http url or “auto” |
–ca-cert <file> |
Alternative certificate PEM file |
Folder Hashing: folder-hash, fh
Generates cryptographic hashes for files in a given directory and its subdirectories.
scanoss-py folder-hash <directory>
Argument |
Description |
|---|---|
–output <file name>, -o <file name> |
Output result file name (optional - default STDOUT) |
–format <format>, -f <format> |
Output format: {json} (optional - default json) |
–settings <file>, -st <file> |
Settings file to use for scanning (optional - default scanoss.json) |
–skip-settings-file, -stf |
Skip default settings file (scanoss.json) if it exists |
- Both commands also support these general options:
–debug, -d: Enable debug messages
–trace, -t: Enable trace messages
–quiet, -q: Enable quiet mode
Container Scanning: container-scan, cs
Scans Docker container images for dependencies, extracting and analyzing components within containerized applications.
scanoss-py container-scan -i <image_name:tag>
Argument |
Description |
|---|---|
–image <image_name:tag>, -i <image_name:tag> |
Docker image name and tag to scan (required) |
–output <file name>, -o <file name> |
Output result file name (optional - default STDOUT) |
–include-base-image |
Include base image dependencies in the scan results |
–format <format>, -f <format> |
Output format: {json} (optional - default json) |
–timeout <seconds>, -M <seconds> |
Timeout in seconds for API communication (optional - default 600) |
–key <token>, -k <token> |
SCANOSS API Key token (optional - not required for default OSSKB URL) |
–proxy <url> |
Proxy URL to use for connections |
–ca-cert <file> |
Alternative certificate PEM file |
Crypto: crypto, cr
Provides subcommands to retrieve cryptographic information for components.
scanoss-py crypto <subcommand>
Subcommands:
- algorithms (alg)
Retrieve cryptographic algorithms for the given components.
scanoss-py crypto algorithms --purl <purl_string>
Argument
Description
–with-range
Returns the list of versions in the specified range that contains cryptographic algorithms. (Replaces the previous –range option)
- hints
Retrieve encryption hints for the given components.
scanoss-py crypto hints --purl <purl_string>
Argument
Description
–with-range
Returns the list of versions in the specified range that contains encryption hints.
- versions-in-range (vr)
Given a list of PURLs and version ranges, get a list of versions that do/don’t contain crypto algorithms.
scanoss-py crypto versions-in-range --purl <purl_string_with_range>
Common Crypto Arguments:
The following arguments are common to the algorithms, hints, and versions-in-range subcommands:
Argument |
Description |
|---|---|
–purl <PURL>, -p <PURL> |
Package URL (PURL) to process. Can be specified multiple times. |
–input <file>, -i <file> |
Input file name containing PURLs. |
–output <file name>, -o <file name> |
Output result file name (optional - default STDOUT). |
–timeout <seconds>, -M <seconds> |
Timeout (in seconds) for API communication (optional - default 600). |
–key <KEY>, -k <KEY> |
SCANOSS API Key token (optional - not required for default OSSKB URL). |
–ca-cert <CA_CERT> |
Alternative certificate PEM file. |
–debug, -d |
Enable debug messages. |
–trace, -t |
Enable trace messages, including API posts. |
–quiet, -q |
Enable quiet mode. |
Component:
To be done
Utilities: utilities, ut
scanoss-py utilities
Argument |
Description |
|---|---|
fast |
SCANOSS fast winnowing (requires the SCANOSS Winnowing Python Package) |
certloc, cl |
Display the location of Python CA certificates |
cert-download, cdl, cert-dl |
Download the specified server’s SSL PEM certificate |
pac-proxy, pac |
Use Proxy Auto-Config to determine proxy configuration |
General Arguments
Argument |
Description |
|---|---|
-debug, -d |
Enable debug messages |
–trace, -t |
Enable trace messages, including API posts |
–quiet, -q |
Enable quiet mode |
Package consumption
This package can be run from the command line, or consumed from another Python script. A good example of how to consume it can be found in this file.
In general the easiest way is to import the required module as follows:
from scanoss.scanner import Scanner
def main():
scanner = Scanner()
scanner.scan_folder( '.' )
if __name__ == "__main__":
main()
Alternatively, there is a docker image of the compiled package, which can be found in this repository. Details on how to run it can be found in this file.
Integrations
At SCANOSS we want to provide easy recipes for practical solutions, that is the reason we are constantly working on building integrations for our software. No need to adapt your existing systems to work with our software, we will adapt our software to your needs.
From CI/CD integrations with Jenkins and GitHub Actions, to our SonarQube plugin and our most recent VSCode extension. We are always working on making our software as easy to access, consume and integrate as possible.
The full list of existing integrations is down below:
Integration |
Description |
|---|---|
Integrate scanoss-py into your pipelines |
|
Enhance your software development process with the SCANOSS Code Scan Action |
|
Scan your code with the SCANOSS plugin for SonarQube |
|
Software Composition Analysis as you code |
Best practices
Choose the tool based on your use case, and not the other way around
SCANOSS offers many tools and software in the field of Software Composition Analysis, and many have similar features.
For example, you can perform scans and generate software bill of materials (SBOM) with scanoss-py and the SBOM Workbench, but that doesn’t mean these tools are interchangeable. The SBOM Workbench’s GUI can be an advantage for auditors and such, but may be a complication for developers that need to integrate an SCA solution into an existing workflow.
There is also the case for language preferences, we also offer a Javascript package and a Java SDK so you have freedom to consume the SCANOSS API however you want.
Get the most accurate results
License
The Scanoss Open Source scanoss-py package is released under the MIT license.