Spatial Join#

The spatial_join module provides functions for performing spatial joins and aggregations on geospatial data.

class libadalina_core.spatial_operators.AggregationFunction(column: str, aggregation_type: AggregationType, alias: str | None = None, proportional: str | None = None)[source]#

Dataclass representing an aggregation function to be applied to a column in a DataFrame.

aggregation_type: AggregationType#

The function to use for aggregation.

alias: str | None = None#

Optional alias for the aggregated column.

column: str#

The name of the column to aggregate.

proportional: str | None = None#

Optional column to use for proportional aggregation. If specified, the aggregation will be weighted by the area of the intersection with this column’s geometry.

class libadalina_core.spatial_operators.AggregationType(value)[source]#

Functions that can be used to aggregate data in a DataFrame.

AVG = 'avg'#

Calculate the average of the values in a column for each group.

COUNT = 'count'#

Count the number of rows in a group.

MAX = 'max'#

Find the maximum value in a column for each group.

MIN = 'min'#

Find the minimum value in a column for each group.

SUM = 'sum'#

Sum the values in a column for each group.

class libadalina_core.spatial_operators.JoinType(value)[source]#

Enumerate the types of joins that can be performed on two DataFrames.

FULL = 'full'#

Full join returns all records from both tables, matching where possible.

INNER = 'inner'#

Inner join returns only the matching records from both tables.

LEFT = 'left'#

Left join returns all records from left table and matching records from right table.

RIGHT = 'right'#

Right join returns all records from right table and matching records from left table.

libadalina_core.spatial_operators.area(df: DataFrame | GeoDataFrame | DataFrame) DataFrame[source]#

Calculate the area of the geometries in column ‘geometry’ of the DataFrame.

Parameters:

df (pandas.DataFrame or geopandas.GeoDataFrame or pyspark.sql.DataFrame)

Returns:

A Spark DataFrame with an additional column ‘area’ containing the area of each geometry.

Return type:

pyspark.sql.DataFrame

libadalina_core.spatial_operators.bounding_box(df: DataFrame | GeoDataFrame | DataFrame) Polygon[source]#

Calculate the bounding box of the geometries in a DataFrame.

Parameters:

df (DataFrame) – Either a pandas DataFrame, a GeoPandas GeoDataFrame, or a Spark DataFrame containing geometries.

Returns:

A Polygon representing the bounding box of the geometries in the DataFrame.

Return type:

shapely.geometry.Polygon

libadalina_core.spatial_operators.cut_features(df: DataFrame | GeoDataFrame | DataFrame, cut_geometry: Polygon) DataFrame[source]#

Cut the features of a DataFrame that do not intersect with a given geometry.

This function will cut the geometries in the DataFrame by the provided cut geometry, returning only the parts of the geometries that intersect with the cut geometry.

Parameters:
  • df (DataFrame) – Either a pandas DataFrame, a GeoPandas GeoDataFrame, or a Spark DataFrame containing geometries.

  • cut_geometry (Polygon) – The geometry to use for cutting the features.

Returns:

A Spark DataFrame with only the geometries that intersect with the cut geometry.

Return type:

pyspark.sql.DataFrame

libadalina_core.spatial_operators.explode_multi_geometry(df: DataFrame | GeoDataFrame | DataFrame) DataFrame[source]#

Explode multi-geometry features into individual geometries in such a way that each element of the multi-geometry is represented as a separate row in the DataFrame.

Parameters:

df (DataFrame) – Either a pandas DataFrame, a GeoPandas GeoDataFrame, or a Spark DataFrame containing geometries.

Returns:

The input DataFrame with an additional column that contains the exploded geometries.

Return type:

pyspark.sql.DataFrame

libadalina_core.spatial_operators.perimeter(df: DataFrame | GeoDataFrame | DataFrame) DataFrame[source]#

Calculate the perimeter of the geometries in column ‘geometry’ of the DataFrame.

Parameters:

df (pandas.DataFrame or geopandas.GeoDataFrame or pyspark.sql.DataFrame)

Returns:

A Spark DataFrame with an additional column ‘perimeter’ containing the perimeter of each geometry.

Return type:

pyspark.sql.DataFrame

libadalina_core.spatial_operators.polygonize(df: DataFrame | GeoDataFrame | DataFrame, radius_meters: float) DataFrame[source]#

Transform lines and points into polygons by buffering them with a given radius.

Each line (or multi-line) is transformed into a polygon by buffering it on both sides, while points are buffered to create a circular area around them.

Geometries are implicitly converted to DEFAULT_EPSG.

Parameters:
  • df (DataFrame) – The input DataFrame containing geometries.

  • radius_meters (float) – The radius in meters to use for buffering points and lines.

Returns:

A Spark DataFrame with a new column ‘polygonized_geometry’ containing the buffered geometries.

Return type:

pyspark.sql.DataFrame

Examples

>>> df = pd.DataFrame({'geometry': ['POINT(1 1)', 'LINESTRING(0 0, 1 1, 2 2)']})
>>> polygonized_df = polygonize(df, radius_meters=10)
libadalina_core.spatial_operators.spatial_aggregation(table: DataFrame | GeoDataFrame | DataFrame, aggregate_functions: list[AggregationFunction], group_by_column: str = 'geometry') DataFrame[source]#

Perform spatial aggregation on a DataFrame based on specified aggregation functions. Entries are aggregated by either the geometries in the DataFrame or the column given as input, and the specified aggregation functions are applied to the other grouped columns.

Parameters:
  • table (DataFrame) – Either a pandas DataFrame, a GeoPandas GeoDataFrame, or a Spark DataFrame containing geometries.

  • aggregate_functions (list[AggregationFunction]) – List of aggregation functions to apply to the DataFrame grouped columns.

  • group_by_column (str, optional) – The name of the column to group by. Default is ‘geometry’.

Returns:

A DataFrame with aggregated results based on the specified aggregation functions. All columns that are not specified in the aggregation functions will be aggregated using the first value found in each group.

Return type:

pyspark.sql.DataFrame

libadalina_core.spatial_operators.spatial_join(left_table: DataFrame | GeoDataFrame | DataFrame, right_table: DataFrame | GeoDataFrame | DataFrame, join_type: JoinType = inner) DataFrame[source]#

Perform a spatial join between two DataFrames based on the intersection of their geometries.

Parameters:
  • left_table (DataFrame) – DataFrame containing the left table of the join

  • right_table (DataFrame) – DataFrame containing the right table of the join

  • join_type (JoinType) – Type of the join to perform

Returns:

A Spark DataFrame containing the result of the spatial join.

Return type:

pyspark.sql.DataFrame