LinuxCommandLibrary

phpcpd

Detect copied and pasted PHP code

TLDR

Analyze duplicated code for a specific file or directory

$ phpcpd [path/to/file_or_directory]
copy

Analyze using fuzzy matching for variable names
$ phpcpd --fuzzy [path/to/file_or_directory]
copy

Specify a minimum number of identical lines (defaults to 5)
$ phpcpd --min-lines [number_of_lines] [path/to/file_or_directory]
copy

Specify a minimum number of identical tokens (defaults to 70)
$ phpcpd --min-tokens [number_of_tokens] [path/to/file_or_directory]
copy

Exclude a directory from analysis (must be relative to the source)
$ phpcpd --exclude [path/to/excluded_directory] [path/to/file_or_directory]
copy

Output the results to a PHP-CPD XML file
$ phpcpd --log-pmd [path/to/log_file] [path/to/file_or_directory]
copy

SYNOPSIS

phpcpd [options] <directories_or_files...>

PARAMETERS

--min-lines
    Specifies the minimum number of identical lines to consider a block as a copy. The default value is 5.

--min-tokens
    Specifies the minimum number of identical tokens to consider a block as a copy. The default value is 70.

--suffix
    Only process files with the specified suffix(es). Multiple suffixes can be provided comma-separated (e.g., .php,.inc). The default suffix is .php.

--exclude


    Excludes a directory from the scan. This option can be specified multiple times to exclude several directories (e.g., --exclude vendor --exclude cache).

--fuzzy
    Enables fuzzy matching for duplicates, allowing for slight variations in the duplicated code blocks.

--log-xml
    Writes the copy-paste detection results to an XML file in a custom format.

--log-pmd
    Writes the copy-paste detection results to an XML file in PMD (Programming Mass Detector) format, compatible with tools like Jenkins or SonarQube.

--progress
    Displays a progress bar during the scan operation.

--verbose
    Enables verbose output, providing more detailed information during the scan.

--help
    Displays a help message with available options and usage.

--version
    Displays the version information of phpcpd.

<directories_or_files...>
    One or more directories or specific files to be scanned for duplicated code.

DESCRIPTION

phpcpd (PHP Copy/Paste Detector) is a command-line tool that scans PHP source code for identical blocks of code, helping developers identify and reduce code duplication. It employs an algorithm to find exact or near-exact duplicates across multiple files or within a single file.

Identifying duplicated code is crucial for code maintainability, as changes often need to be applied in multiple places, potentially leading to bugs and increased development effort. phpcpd outputs a report listing the duplicated blocks, their file locations, and the number of lines involved, serving as a valuable aid in refactoring efforts.

CAVEATS

While highly effective for detecting exact and near-exact code duplication, phpcpd might produce false positives with very small --min-lines or --min-tokens values. It focuses on syntactic duplication, not semantic, meaning it won't detect if two different pieces of code achieve the same logical outcome.

Scanning very large codebases can be resource-intensive in terms of CPU and memory. Ensure your PHP installation has the tokenizer extension enabled, as it's required for phpcpd to parse PHP code.

INTEGRATION WITH CI/CD

phpcpd is commonly integrated into Continuous Integration/Continuous Deployment (CI/CD) pipelines. This ensures that new code duplication is automatically detected and flagged before it gets merged into the main codebase, enforcing a high standard of code quality and maintainability.

REFACTORING AID

The detailed reports generated by phpcpd serve as a crucial guide for refactoring efforts. By highlighting specific duplicated blocks, developers can efficiently identify areas where code can be abstracted, encapsulated, or removed, leading to more modular, readable, and maintainable software.

HISTORY

phpcpd is an integral part of the PHP ecosystem's static analysis tools, largely developed and maintained by Sebastian Bergmann, who is also the creator of PHPUnit. It emerged as a specialized tool within the broader set of PHP quality assurance utilities, providing a dedicated solution for identifying code duplication. Its development has consistently aligned with best practices in PHP development, making it a staple for maintaining clean and efficient codebases over many years.

SEE ALSO

phpcs(1), phpmd(1), diff(1), grep(1)

Copied to clipboard