in2csv
Convert various file formats to CSV
TLDR
Convert an XLS file to CSV
Convert a DBF file to a CSV file
Convert a specific sheet from an XLSX file to CSV
Pipe a JSON file to in2csv
SYNOPSIS
in2csv [options] [INPUT_FILE]
Examples:
in2csv data.json > data.csv
in2csv --format xlsx report.xlsx > report.csv
cat api_response.xml | in2csv --format xml --root //item > items.csv
PARAMETERS
--format FORMAT, -f FORMAT
Required for stdin, otherwise inferred. The format of the input data. Supported values include: csv, json, xls, xlsx, dbf, gsheets, html, fixed, rdb.
--sheet SHEET, -s SHEET
The name of the sheet to convert in Excel or Google Sheets files. Can be specified multiple times. Use `--names` to list available sheets.
--names
Print the names of all sheets in an Excel or Google Sheets file and exit.
--no-header-row
Do not output a header row for the CSV.
--skip-lines N
Skip the first N lines of the input file before parsing. Useful for files with header comments.
--encoding ENCODING, -e ENCODING
Specify the encoding of the input file (e.g., utf-8, latin1).
--no-inference
Disable automatic type inference for columns. All columns will be output as strings.
--snifflimit N
Limit the number of bytes to sniff when inferring CSV dialect (default: 10240).
--zeros
Treat empty strings as zeros when type inferring.
--locale LOCALE
Specify the locale for parsing numbers and dates (e.g., en_US, fr_FR).
--root ROOT_XPATH
For XML input, specify an XPath expression for the root element of records.
--table TABLE_SELECTOR
For HTML input, specify a CSS selector for the desired table.
--url URL
For remote HTML or Google Sheets, specify the URL.
--gd-service-account FILE
Path to a Google Drive service account credentials JSON file.
--gd-credentials-file FILE
Path to a Google Drive user credentials file (for OAuth 2.0).
DESCRIPTION
in2csv is a powerful command-line utility from the csvkit suite designed to transform data from diverse formats into the ubiquitous Comma Separated Values (CSV) format. It supports a wide array of input types, including JSON, XML, Excel spreadsheets (.xls, .xlsx), DBF, Google Sheets, HTML tables, and fixed-width files. This tool is invaluable for data analysts, developers, and anyone needing to prepare data from disparate sources for standardized processing. in2csv automatically infers schema and data types where possible, but also provides options for manual override. Its integration within the csvkit ecosystem means that once data is converted to CSV, it can be seamlessly manipulated, filtered, joined, and analyzed using other csvkit tools. It can read from files or standard input, making it highly versatile for scripting and data pipelines.
CAVEATS
in2csv relies on various third-party libraries for different input formats (e.g., xlrd for XLS, openpyxl for XLSX, lxml for XML/HTML). Missing libraries can lead to errors for specific formats.
While powerful, schema inference may not always be perfect, particularly with ambiguous data. In such cases, using `--no-inference` or post-processing with other csvkit tools might be necessary.
Processing very large XML or JSON files can be memory-intensive.
Accessing Google Sheets requires proper authentication setup, either via service accounts or user credentials.
SUPPORTED INPUT FORMATS
in2csv supports a wide range of input formats, making it highly versatile:
- CSV: Comma Separated Values (native support).
- JSON: JavaScript Object Notation (flat or nested structures).
- XML: Extensible Markup Language (requires XPath for record definition).
- XLS/XLSX: Microsoft Excel Spreadsheets (older and newer formats).
- DBF: dBase database files.
- Google Sheets: Online spreadsheets (requires authentication).
- HTML: Tables embedded in HTML documents (requires CSS selector).
- Fixed-Width: Data with fixed-length fields (requires schema definition).
- RDB: Relational Database dumps from csvsql.
GOOGLE SHEETS AUTHENTICATION
To access Google Sheets, in2csv requires authentication. Two primary methods are supported:
- Service Account: Recommended for automated scripts. Requires a JSON key file from a Google Cloud service account with Google Drive API access (`--gd-service-account`).
- User Credentials (OAuth 2.0): For interactive use or if using a specific user's permissions. Involves generating and providing a token file (`--gd-credentials-file`). The first time you use it, it will usually open a browser for authorization.
HISTORY
in2csv is a core component of the csvkit library, an open-source project initiated by Christopher Groskopf around 2012. The project was conceived to address the common need for robust, command-line tools to process CSV data, offering a more programmatic and scalable alternative to spreadsheet software. in2csv specifically fills the crucial role of converting diverse data formats into CSV, making it accessible to the entire csvkit toolchain and the broader ecosystem of Unix text processing utilities. Its development has focused on extensibility to support new input formats and improve schema inference capabilities.