LinuxCommandLibrary

csv2tsv

Convert CSV files to TSV files

TLDR

Convert from CSV to TSV

$ csv2tsv [path/to/input_csv1 path/to/input_csv2 ...] > [path/to/output_tsv]
copy

Convert field delimiter separated CSV to TSV
$ csv2tsv -c'[field_delimiter]' [path/to/input_csv]
copy

Convert semicolon separated CSV to TSV
$ csv2tsv -c';' [path/to/input_csv]
copy

SYNOPSIS

csv2tsv [OPTIONS] [FILE...]

PARAMETERS

--help, -h
    Displays usage information for the command.

--version, -V
    Outputs the version of the csv2tsv utility.

DESCRIPTION

The csv2tsv command is a utility designed to transform data from a Comma Separated Values (CSV) format to a Tab Separated Values (TSV) format. CSV is a common plaintext format where data fields are delimited by commas, while TSV uses tab characters as delimiters. This conversion is crucial for interoperability between different applications and systems that may prefer one format over the other.

csv2tsv typically reads CSV data from standard input or specified files and writes the corresponding TSV data to standard output. A key feature is its robust handling of quoted fields in CSV. This means if a data field contains a comma or newline character, it is usually enclosed in double quotes (e.g., "field, with, comma"). The command correctly parses these quoted fields, ensuring that the embedded commas are not treated as delimiters and that the quotes themselves are removed in the output. Similarly, if a double quote appears within a quoted field (e.g., "field with ""quote"" here"), it is often represented by two consecutive double quotes, which csv2tsv should also correctly interpret and convert to a single quote in the TSV output.

This utility is particularly useful in data processing pipelines, shell scripting, and scenarios where data needs to be reshaped for database imports, spreadsheet programs, or analytical tools that have specific delimiter requirements.

CAVEATS

The functionality of csv2tsv can vary significantly depending on its implementation. Basic versions might not handle malformed CSV (e.g., unbalanced quotes, missing delimiters) gracefully, potentially leading to incorrect output or errors. Complex CSV variations, such as different quoting styles, custom escape characters, or multi-line records not properly quoted, might also pose challenges. Encoding issues (e.g., UTF-8 vs. Latin-1) are also not typically handled by simple csv2tsv scripts and can lead to character corruption unless input is properly encoded. For advanced CSV manipulation or robust error handling, more sophisticated tools like miller or dedicated CSV parsing libraries in scripting languages are often preferred.

COMMON IMPLEMENTATIONS

Many versions of csv2tsv are custom scripts. A common approach for implementing csv2tsv is using awk for parsing (e.g., handling quoted fields and delimiters), or Python with its built-in csv module for robust parsing. These scripts provide a lightweight solution for data transformation without requiring larger, more complex data processing frameworks.

HISTORY

The command csv2tsv does not typically refer to a single, universally distributed program but rather a common task and often a specific script or small utility developed within various Linux environments or data processing toolkits. Its emergence stems from the widespread use of both CSV and TSV as simple, human-readable data interchange formats. As data processing grew, the need for quick, reliable conversion between these formats led to the creation of numerous ad-hoc scripts (often using awk, sed, or Python) and dedicated utilities to bridge the gap. While not part of the core GNU utilities, its functionality is frequently implemented and distributed within larger data analysis suites or as standalone scripts to facilitate common data wrangling tasks.

SEE ALSO

awk(1), sed(1), cut(1), tr(1), paste(1), csvtool(1), mlr(1)

Copied to clipboard