mongoimport

Import data into MongoDB

TLDR

Import a JSON file into a specific collection

$ mongoimport --file [path/to/file.json] --uri [mongodb_uri] [[-c|--collection]] [collection_name]

Import a CSV file, using the first line of the file to determine field names

$ mongoimport --type [csv] --file [path/to/file.csv] [[-d|--db]] [database_name] [[-c|--collection]] [collection_name]

Import a JSON array, using each element as a separate document

$ mongoimport --jsonArray --file [path/to/file.json]

Import a JSON file using a specific mode and a query to match existing documents

$ mongoimport --file [path/to/file.json] --mode [delete|merge|upsert] --upsertFields "[field1,field2,...]"

Import a CSV file, reading field names from a separate CSV file and ignoring fields with empty values

$ mongoimport --type [csv] --file [path/to/file.csv] --fieldFile [path/to/field_file.csv] --ignoreBlanks

Display help

$ mongoimport --help

mongoimport --file filename --db database --collection collection [options]
mongoimport --uri connection-string --file filename [options]
cat filename | mongoimport --db database --collection collection [options]

PARAMETERS

--help
    Displays help information about the command.

--version
    Displays the mongoimport version number.

--host
    Specifies the MongoDB server hostname to connect to.

--port
    Specifies the port number to connect to the MongoDB server.

--username
    Specifies the username for authentication.

--password
    Specifies the password for authentication. Not recommended for security.

--authenticationDatabase
    Specifies the database where the user is defined for authentication.

--db
    Specifies the database to import data into.

--collection
    Specifies the collection to import data into.

--file
    Specifies the path to the input data file. Use '-' for standard input.

--type
    Specifies the input file format. Defaults to 'json'.

--headerline
    For CSV or TSV files, uses the first line as field names rather than importing it as data.

--drop
    Drops the collection before importing data, removing all existing documents.

--upsert
    Updates an existing document if a document with the same unique identifier (or specified fields) exists; otherwise, inserts a new document. Requires --upsertFields or a unique index.

--upsertFields
    Specifies a comma-separated list of fields for the query portion of the upsert operation.

--jsonArray
    Accepts a single JSON array as the input file, rather than one JSON object per line.

--ignoreBlanks
    For CSV/TSV input, ignores any blank fields in the input.

--uri
    Specifies a full MongoDB connection URI string to connect to the database.

--mode
    (CSV/TSV only) Defines the import mode when matching existing documents.
insert: Inserts new documents, skipping existing ones.
upsert: Updates existing documents or inserts new ones.
merge: Updates existing documents with new field values from the import file.

--stopOnError
    Stops the import operation if an error occurs during processing.

--numInsertionWorkers
    Number of workers to import data concurrently, improving performance for large imports.

DESCRIPTION

mongoimport is a command-line utility for importing data from various file formats into a MongoDB collection. It supports JSON, CSV, and TSV (Tab-Separated Values) formats. This tool is part of the MongoDB Database Tools suite and is essential for populating databases with existing datasets, migrating data, or integrating with other systems.

It can create new collections or append to existing ones, and offers options for specifying database credentials, handling duplicates through upsert operations, and controlling the import process behavior like dropping existing collections or parsing specific fields as headers. mongoimport is widely used for initial data loading and batch processing operations.

CAVEATS

Performance for Large Data: For extremely large datasets, mongoimport can be slower than direct data loading or custom scripts. While --numInsertionWorkers helps, consider bulk operations or client-side drivers for maximum throughput.
Idempotency and Data Loss: Using --drop will permanently delete all existing data in the target collection. Without --upsert or --mode merge, repeated imports will create duplicate documents if the data already exists.
Schema Flexibility vs. Strictness: MongoDB is schema-less, but mongoimport can still infer data types. Be mindful of potential parsing issues, especially with mixed-type columns in CSV/TSV, which might require pre-processing or explicit type handling.
Security: Providing passwords directly on the command line is insecure as they can be visible in process lists. Use environment variables, a configuration file (via --config), or interactive prompts.
Resource Usage: Large imports can consume significant system resources (CPU, memory, disk I/O) on both the client machine and the MongoDB server.

INPUT DATA FORMATS

mongoimport automatically detects JSON if no --type is specified. For CSV and TSV, it expects a consistent number of fields per row and attempts type inference, which can sometimes be problematic for columns with mixed data types. For more control, consider using --columnsHaveTypes for CSV/TSV imports.

ERROR HANDLING

By default, mongoimport continues processing even after encountering non-critical errors (e.g., duplicate key errors). The --stopOnError option forces it to halt the import process immediately upon the first error, which can be useful for debugging or strict data integrity requirements.

SECURITY BEST PRACTICES

To avoid exposing sensitive credentials on the command line, it is best practice to use a --config file containing connection details, set environment variables (e.g., MONGODB_URI, MONGODB_USERNAME, MONGODB_PASSWORD), or rely on interactive password prompts where available.

HISTORY

mongoimport is an integral part of the MongoDB Database Tools suite, which was historically bundled with the MongoDB server distribution. As of MongoDB 4.2, the Database Tools (including mongoimport, mongoexport, mongodump, mongorestore, etc.) were separated into their own release package, allowing for independent versioning and updates. This change streamlined development and ensured compatibility across different MongoDB server versions. Its core functionality for importing structured data has remained consistent since its inception.