LinuxCommandLibrary

ogrmerge.py

Merge multiple vector datasets into one

TLDR

Create a GeoPackage with a layer for each input Shapefile

$ ogrmerge.py -f [GPKG] -o [path/to/output.gpkg] [path/to/input1.shp path/to/input2.shp ...]
copy

Create a virtual datasource (VRT) with a layer for each input GeoJSON
$ ogrmerge.py -f [VRT] -o [path/to/output.vrt] [path/to/input1.geojson path/to/input2.geojson ...]
copy

Concatenate two vector datasets and store source name of dataset in attribute 'source_name'
$ ogrmerge.py -single -f [GeoJSON] -o [path/to/output.geojson] -src_layer_field_name country [source_name] [path/to/input1.shp path/to/input2.shp ...]
copy

SYNOPSIS

ogrmerge.py [-single] [-f format] [-o output_file] [-src_layer src_layer] [-overwrite_ds] [-nln layer_name] [-s_srs srs_def] [-t_srs srs_def] [-a_srs srs_def] [-force_driver] [-skipfailures] [-relaxed] [-update] [-append] [-sql sql_statement] [-dialect sql_dialect] [-field_strategy {FirstLayer|Unmodified|Union|Intersection}] [-src_defn defn_file] [-progress] src_datasource...

PARAMETERS

-single
    Merges all layers in the source dataset(s) into a single layer in the output file. The default behavior is to create a new layer in the output file for each layer in the source datasets.

-f format
    Specifies the output file format. Examples include 'ESRI Shapefile', 'GeoJSON', 'GeoPackage', 'PostgreSQL'. If not specified, the format is guessed from the output file extension.

-o output_file
    Specifies the output file name. If not specified, a default filename is used.

-src_layer src_layer
    Select only the specified layer(s) from the source datasets. Can be repeated.

-overwrite_ds
    Overwrites the output datasource if it already exists. Without this option, the script will fail if the output datasource exists.

-nln layer_name
    Specifies the name of the output layer. Only valid if `-single` is used, or only one source layer is specified.

-s_srs srs_def
    Specifies the source SRS (Spatial Reference System). Overrides the SRS information found in the source file(s).

-t_srs srs_def
    Specifies the target SRS. Reprojects the data to the specified SRS during the merge process.

-a_srs srs_def
    Defines an SRS for input files which lack one.

-force_driver
    Force driver when creating output dataset

-skipfailures
    Continue operation even if some source layers fail to process.

-relaxed
    Relaxed mode. Minor errors will not cause the process to terminate.

-update
    Open existing output datasource in update mode rather than trying to create a new one.

-append
    Append to existing layer rather than creating a new one.

-sql sql_statement
    SQL statement to execute to select features from the source datasets.

-dialect sql_dialect
    SQL dialect. Can be 'OGRSQL' (default) or 'SQLite'.

-field_strategy {FirstLayer|Unmodified|Union|Intersection}
    Define the strategy to use to merge fields

-src_defn defn_file
    Apply field definition from source defn_file

-progress
    Display progress on terminal

src_datasource...
    One or more source datasources (files, directories, databases). Wildcards are supported.

DESCRIPTION

ogrmerge.py is a Python script that simplifies the process of merging multiple vector datasets (OGR data sources) into a single output file. It's particularly useful when dealing with geospatial data spread across various files, formats, or layers that need to be combined for analysis or distribution. The script handles format conversions, projection transformations (reprojection), attribute schema handling (field type changes, renaming, dropping), and general data consolidation. It is usually bundled with the GDAL/OGR package.

It leverages the OGR library for data access and manipulation, enabling support for a wide range of vector formats, including Shapefile, GeoJSON, GeoPackage, and PostGIS. The script accepts wildcards and can process whole directories of data and also allows specification of target CRS.

CAVEATS

ogrmerge.py relies on the OGR library, so the supported file formats depend on the GDAL/OGR installation and its enabled drivers. The script might not handle very large datasets efficiently, especially during re-projection. Data loss can occur if the target format does not support all the attributes present in the source data.

FIELD STRATEGY CONSIDERATIONS

The `-field_strategy` option dictates how the fields from different layers are combined. FirstLayer: Fields are taken from the first layer processed, and new fields are discarded. Unmodified: Fields are taken as they are, without modification. Union: All fields from all layers are included in the output layer. Intersection: Only common fields are included.

HISTORY

ogrmerge.py is a Python script that has been part of the GDAL/OGR suite for a long time. It was created to provide a convenient way to combine multiple vector datasets into a single file. Over the years, it has been improved and updated to support more features, formats, and handle various edge cases encountered when merging geospatial data. Its development has been driven by the need to simplify data integration tasks in geospatial workflows.

SEE ALSO

Copied to clipboard