Reference

TulipaClustering.ProfilesTableLayoutType
ProfilesTableLayout(;key = value, ...)
ProfilesTableLayout(path; ...)

Structure to hold the profiles input data table layout. Column names in the layout are defined by default.

If path is passed, it is expected to be a string pointing to a TOML file with a key = value list of parameters. Explicit keyword arguments take precedence.

Parameters

  • value::Symbol = :value: The column name with the profile values.
  • timestep::Symbol = :timestep: The column name with the time steps in the profile.
  • period::Symbol = :period: The column name with the period number in the profile.
source
TulipaClustering.append_period_from_source_df_as_rp!Method

appendperiodfromsourcedfasrp!(df; sourcedf, period, rp, keycolumns, layout = ProfilesTableLayout())

Extracts a period with index period from source_df and appends it as a representative period with index rp to df, using key_columns as keys. Respects custom column names via layout.

Examples

Default layout:

julia> source_df = DataFrame([:period => [1, 1, 2, 2], :timestep => [1, 2, 1, 2], :a .=> "b", :value => 5:8])
4×4 DataFrame
 Row │ period  timestep  a       value
     │ Int64   Int64      String  Int64
─────┼──────────────────────────────────
   1 │      1          1  b           5
   2 │      1          2  b           6
   3 │      2          1  b           7
   4 │      2          2  b           8

julia> df = DataFrame([:rep_period => [1, 1, 2, 2], :timestep => [1, 2, 1, 2], :a .=> "a", :value => 1:4])
4×4 DataFrame
 Row │ rep_period  timestep  a       value
     │ Int64       Int64      String  Int64
─────┼──────────────────────────────────────
   1 │          1          1  a           1
   2 │          1          2  a           2
   3 │          2          1  a           3
   4 │          2          2  a           4

julia> TulipaClustering.append_period_from_source_df_as_rp!(df; source_df, period = 2, rp = 3, key_columns = [:timestep, :a])
6×4 DataFrame
 Row │ rep_period  timestep  a       value
     │ Int64       Int64      String  Int64
─────┼──────────────────────────────────────
   1 │          1          1  a           1
   2 │          1          2  a           2
   3 │          2          1  a           3
   4 │          2          2  a           4
   5 │          3          1  b           7
   6 │          3          2  b           8

Custom layout:

julia> layout = ProfilesTableLayout(; period=:p, timestep=:ts, value=:val)
julia> src = DataFrame([:p => [1,1,2,2], :ts => [1,2,1,2], :a .=> "b", :val => 5:8])
julia> df = DataFrame([:rep_period => [1,1], :ts => [1,2], :a .=> "a", :val => [1,2]])
julia> TulipaClustering.append_period_from_source_df_as_rp!(df; source_df = src, period = 2, rp = 3, key_columns = [:ts, :a], layout)
4×4 DataFrame
 Row │ rep_period  ts    a       val
   │ Int64       Int64  String  Int64
─────┼──────────────────────────────────
   1 │          1     1  a           1
   2 │          1     2  a           2
   3 │          3     1  b           7
   4 │          3     2  b           8
source
TulipaClustering.cluster!Method
cluster!(
    connection,
    period_duration,
    num_rps;
    input_database_schema = "",
    input_profile_table_name = "profiles",
    database_schema = "",
    drop_incomplete_last_period::Bool = false,
    method::Symbol = :k_means,
    distance::SemiMetric = SqEuclidean(),
    initial_representatives::AbstractDataFrame = DataFrame(),
    layout::ProfilesTableLayout = ProfilesTableLayout(),
    weight_type::Symbol = :convex,
    tol::Float64 = 1e-2,
    clustering_kwargs = Dict(),
    weight_fitting_kwargs = Dict(),
)

Convenience function to cluster the table named in input_profile_table_name using period_duration and num_rps. The resulting tables profiles_rep_periods, rep_periods_mapping, and rep_periods_data are loaded into connection in the database_schema, if given, and enriched with year information.

This function extracts the table (expecting columns profile_name, timestep, value), then calls split_into_periods!, find_representative_periods, fit_rep_period_weights!, and finally write_clustering_result_to_tables.

Arguments

Required

  • connection: DuckDB connection
  • period_duration: Duration of each period, i.e., number of timesteps.
  • num_rps: Number of findrepresentativeperiods

Keyword arguments

  • input_database_schema (default ""): Schema of the input tables
  • input_profile_table_name (default "profiles"): Default name of the profiles table inside the above schemaa
  • database_schema (default ""): Schema of the output tables
  • drop_incomplete_last_period (default false): controls how the last period is treated if it is not complete: if this parameter is set to true, the incomplete period is dropped and the weights are rescaled accordingly; otherwise, clustering is done for n_rp - 1 periods, and the last period is added as a special shorter representative period
  • method (default :k_means): clustering method to use, either:kmeansand:kmedoids`
  • distance (default Distances.SqEuclidean()): semimetric used to measure distance between data points.
  • initial_representatives initial representatives that should be included in the clustering. The period column in the initial representatives should be 1-indexed and the key columns should be the same as in the clustering data. For the hull methods it will be added before clustering, for :kmeans and :kmedoids it will be added after clustering.
  • layout (default ProfilesTableLayout()): describes the column names for period, timestep, and value in in-memory DataFrames. It does not change the SQL input table schema, which must contain profile_name, timestep, and value. Weight fitting operates on matrices and does not use layout.
  • weight_type (default :convex): the type of weights to find; possible values are:
    • :convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one)
    • :conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights)
    • :conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
  • tol (default 1e-2): algorithm's tolerance; when the weights are adjusted by a value less then or equal to tol, they stop being fitted further.
  • clustering_kwargs (default Dict()): Extra keyword arguments passed to find_representative_periods
  • weight_fitting_kwargs (default Dict()): Extra keyword arguments passed to fit_rep_period_weights! (e.g., niters, learning_rate, adaptive_grad).
source
TulipaClustering.combine_periods!Method
combine_periods!(df; layout = ProfilesTableLayout())

Combine per-period time steps into a single global timestep column in-place.

Given a long-format dataframe df with (at least) a per-period timestep column and, optionally, a period column (names provided by layout), this function rewrites the timestep column so that time becomes a single global, monotonically increasing index across all periods, then removes the original period column.

Period length inference:

  • The (nominal) period duration L is inferred as the maximum value found in the per-period time-step column across the whole dataframe (NOT per period).
  • Each row's global timestep is computed as (period - 1) * L + timestep.
  • If the final period is shorter than L, the resulting global time index will simply end earlier; missing intermediate global timesteps are not created.

Arguments:

  • df::AbstractDataFrame (mutated): Source data in long format.
  • layout::ProfilesTableLayout: Describes the column names for period and timestep (defaults to standard names). Pass a custom layout if your dataframe uses different symbols.

Behavior & edge cases:

  • If the timestep column (as specified by layout) is missing, a DomainError is thrown.
  • If the period column is absent, the function is a no-op (returns immediately).
  • Non-1-based or non-consecutive per-period timesteps are not validated; unusual values may result in non-contiguous or non-strictly increasing global indices.
  • Works in-place; the modified dataframe (without period) is also returned for convenience.

Examples

Basic usage with default layout:

julia> df = DataFrame([:period => [1, 1, 2], :timestep => [1, 2, 1], :value => 1:3])
3×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      1
   2 │      1          2      2
   3 │      2          1      3

julia> TulipaClustering.combine_periods!(df)
3×2 DataFrame
 Row │ timestep  value
     │ Int64      Int64
─────┼──────────────────
   1 │         1      1
   2 │         2      2
   3 │         3      3

Custom column names via a layout:

julia> layout = ProfilesTableLayout(; period = :p, timestep = :ts)
julia> df = DataFrame([:p => [1,1,2], :ts => [1,2,1], :value => 10:12])
3×3 DataFrame
 Row │ p      ts   value
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1     1     10
   2 │     1     2     11
   3 │     2     1     12

julia> TulipaClustering.combine_periods!(df; layout)
3×2 DataFrame
 Row │ ts    value
     │ Int64  Int64
─────┼──────────────
   1 │    1     10
   2 │    2     11
   3 │    3     12

No period column (no-op):

julia> df = DataFrame([:timestep => 1:3, :value => 4:6])
julia> TulipaClustering.combine_periods!(df)
3×2 DataFrame
 Row │ timestep  value
     │ Int64      Int64
─────┼──────────────────
   1 │         1      4
   2 │         2      5
   3 │         3      6
source
TulipaClustering.df_to_matrix_and_keysMethod

dftomatrixandkeys(df, key_columns; layout = ProfilesTableLayout())

Converts a long-format dataframe df to a matrix, using the value/period columns from layout. Columns listed in key_columns are kept as keys.

Returns (matrix::Matrix{Float64}, keys::DataFrame).

Examples

Default layout:

julia> df = DataFrame([:period => [1, 1, 2, 2], :timestep => [1, 2, 1, 2], :a .=> "a", :value => 1:4])
4×4 DataFrame
 Row │ period  timestep  a       value
     │ Int64   Int64      String  Int64
─────┼──────────────────────────────────
   1 │      1          1  a           1
   2 │      1          2  a           2
   3 │      2          1  a           3
   4 │      2          2  a           4

julia> m, k = TulipaClustering.df_to_matrix_and_keys(df, [:timestep, :a]); m
2×2 Matrix{Float64}:
 1.0  3.0
 2.0  4.0

julia> k
2×2 DataFrame
 Row │ timestep  a
     │ Int64      String
─────┼───────────────────
   1 │         1  a
   2 │         2  a

Custom layout:

julia> layout = ProfilesTableLayout(; period=:p, timestep=:ts, value=:val)
julia> df = DataFrame([:p => [1,1,2,2], :ts => [1,2,1,2], :a .=> "a", :val => 1:4])
julia> m, k = TulipaClustering.df_to_matrix_and_keys(df, [:ts, :a]; layout); m
2×2 Matrix{Float64}:
 1.0  3.0
 2.0  4.0

julia> k
2×2 DataFrame
 Row │ ts    a
     │ Int64  String
─────┼────────────────
   1 │    1  a
   2 │    2  a
source
TulipaClustering.dummy_cluster!Method
dummy_cluster!(connection)

Convenience function to create the necessary columns and tables when clustering is not required.

This is essentially creating a single representative period with the size of the whole profile. See cluster! for more details of what is created.

source
TulipaClustering.find_auxiliary_dataMethod
find_auxiliary_data(clustering_data; layout = ProfilesTableLayout())

Calculates auxiliary data associated with clustering_data, considering custom column names via layout.

Returns AuxiliaryClusteringData with:

  • key_columns: key columns in the dataframe
  • period_duration: nominal duration of periods (max timestep across data)
  • last_period_duration: duration of the last period
  • n_periods: total number of periods

Example

julia> df = DataFrame([:period => [1,1,2,2], :timestep => [1,2,1,2], :a => "x", :value => 10:13])
julia> aux = TulipaClustering.find_auxiliary_data(df)
AuxiliaryClusteringData([:timestep, :a], 2, 2, 2, nothing)

julia> layout = ProfilesTableLayout(; period=:p, timestep=:ts, value=:val)
julia> df2 = DataFrame([:p => [1,1,2,2], :ts => [1,2,1,1], :a => "x", :val => 10:13])
julia> TulipaClustering.find_auxiliary_data(df2; layout)
AuxiliaryClusteringData([:ts, :a], 2, 1, 2, nothing)
source
TulipaClustering.find_period_weightsMethod
find_period_weights(period_duration, last_period_duration, n_periods, drop_incomplete_periods)

Finds weights of two different types of periods in the clustering data:

  • complete periods: these are all of the periods with length equal to period_duration.
  • incomplete last period: if last period duration is less than period_duration, it is incomplete.
source
TulipaClustering.find_representative_periodsMethod

findrepresentativeperiods( clusteringdata, nrp; dropincompletelastperiod = false, method = :kmeans, distance = SqEuclidean(), initial_representatives = DataFrame(), layout = ProfilesTableLayout(), kwargs..., )

Finds representative periods via data clustering. Honors custom column names via layout (defaults to (:period, :timestep, :value)).

Arguments

  • clustering_data: long-format data to cluster.
  • n_rp: number of representative periods to find.
  • drop_incomplete_last_period: controls how the last period is treated if it is not complete: if this parameter is set to true, the incomplete period is dropped and the weights are rescaled accordingly; otherwise, clustering is done for n_rp - 1 periods, and the last period is added as a special shorter representative period.
  • method: clustering method to use :k_means, :k_medoids, :convex_hull, :convex_hull_with_null, or :conical_hull.
  • distance: semimetric used to measure distance between data points.
  • initial_representatives: dataframe of initial RPs. It must use the same key columns and follow the same layout as clustering_data. For hull methods the RPs are prepended before clustering; for :k_means/:k_medoids they are appended after clustering.
  • layout: ProfilesTableLayout describing the column names.
  • other named arguments are forwarded to the clustering method.

Returns

Returns a ClusteringResult with:

  • profiles::DataFrame: Long-format representative profiles with columns :rep_period, layout.timestep, all key columns (auxiliary_data.key_columns), and layout.value.
  • weight_matrix::SparseMatrixCSC{Float64,Int} (or dense Matrix{Float64}): rows correspond to source periods and columns to representative periods; entry (p, r) is the weight of period p assigned to representative r. If the last period is incomplete and drop_incomplete_last_period is false, it maps to its own representative column with its specific weight; if dropped, it is excluded from the rows.
  • clustering_matrix::Matrix{Float64}: The feature-by-period matrix used for clustering (features are derived from layout.timestep crossed with key columns).
  • rp_matrix::Matrix{Float64}: The representative profiles in matrix form (same feature layout as clustering_matrix).
  • auxiliary_data::AuxiliaryClusteringData: Auxiliary metadata such as key_columns, period_duration, last_period_duration, n_periods, and (for applicable methods) medoids indices.

Examples

Finding two representatives using default values:

julia> df = DataFrame(
           period = kron(1:4, ones(Int, 2)),
           timestep = repeat(1:2, 4),
           profile = "A",
           value = 1:8,
         )

julia> res = TulipaClustering.find_representative_periods(df, 2)

Finding two representatives using k-medoids and a custom layout:

julia> layout = ProfilesTableLayout(; period = :p, timestep = :ts, value = :val)

julia> df = DataFrame(
           p = kron(1:4, ones(Int, 2)),
           ts = repeat(1:2, 4),
           profile = "A",
           val = 1:8,
         )

julia> res = TulipaClustering.find_representative_periods(df, 2; method = :k_medoids, layout)
source
TulipaClustering.fit_rep_period_weights!Method

fitrepperiodweights!(weightmatrix, clusteringmatrix, rpmatrix; weight_type, tol, args...)

Given the initial weight guesses, finds better weights for convex or conical combinations of representative periods. For conical weights, it is possible to bound the total weight by one.

The arguments:

  • clustering_result: the result of running TulipaClustering.find_representative_periods
  • weight_type: the type of weights to find; possible values are:
    • :convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one)
    • :conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights)
    • :conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
  • tol: algorithm's tolerance; when the weights are adjusted by a value less then or equal to tol, they stop being fitted further.
  • other arguments control the projected subgradient method; they are passed through to TulipaClustering.projected_subgradient_descent!.
source
TulipaClustering.fit_rep_period_weights!Method

fitrepperiodweights!(weightmatrix, clusteringmatrix, rpmatrix; weight_type, tol, args...)

Given the initial weight guesses, finds better weights for convex or conical combinations of representative periods. For conical weights, it is possible to bound the total weight by one.

The arguments:

  • weight_matrix: the initial guess for weights; the weights are adjusted using a projected subgradient descent method
  • clustering_matrix: the matrix of raw clustering data
  • rp_matrix: the matrix of raw representative period data
  • weight_type: the type of weights to find; possible values are:
    • :convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one)
    • :conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights)
    • :conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
  • tol: algorithm's tolerance; when the weights are adjusted by a value less then or equal to tol, they stop being fitted further.
  • show_progress: if true, a progress bar will be displayed.
  • other arguments control the projected subgradient method; they are passed through to TulipaClustering.projected_subgradient_descent!.
source
TulipaClustering.greedy_convex_hullMethod
greedy_convex_hull(matrix; n_points, distance, initial_indices, mean_vector)

Greedy method for finding n_points points in a hull of the dataset. The points are added iteratively, at each step the point that is the furthest away from the hull of the current set of points is found and added to the hull.

  • matrix: the clustering matrix
  • n_points: number of hull points to find
  • distance: distance semimetric
  • initial_indices: initial points which must be added to the hull, can be nothing
  • mean_vector: when adding the first point (if initial_indices is not given), it will be chosen as the point furthest away from the mean_vector; this can be nothing, in which case the first step will add a point furtherst away from the centroid (the mean) of the dataset
source
TulipaClustering.matrix_and_keys_to_dfMethod
matrix_and_keys_to_df(matrix, keys; layout = ProfilesTableLayout())

Converts a matrix matrix to a long-format dataframe with columns (:rep_period, layout.timestep, keys..., layout.value).

Examples

Default layout:

julia> m = [1.0 3.0; 2.0 4.0]
2×2 Matrix{Float64}:
 1.0  3.0
 2.0  4.0

julia> k = DataFrame([:timestep => 1:2, :a .=> "a"])
2×2 DataFrame
 Row │ timestep  a
     │ Int64      String
─────┼───────────────────
   1 │         1  a
   2 │         2  a

julia> TulipaClustering.matrix_and_keys_to_df(m, k)
4×4 DataFrame
 Row │ rep_period  timestep  a       value
     │ Int64       Int64      String  Float64
─────┼────────────────────────────────────────
   1 │          1          1  a           1.0
   2 │          1          2  a           2.0
   3 │          2          1  a           3.0
   4 │          2          2  a           4.0

Custom layout:

julia> layout = ProfilesTableLayout(; timestep=:ts, value=:val)
julia> k = DataFrame([:ts => 1:2, :a .=> "a"])
julia> TulipaClustering.matrix_and_keys_to_df(m, k; layout)
4×4 DataFrame
 Row │ rep_period  ts    a       val
   │ Int64       Int64  String  Float64
─────┼────────────────────────────────────
   1 │          1     1  a           1.0
   2 │          1     2  a           2.0
   3 │          2     1  a           3.0
   4 │          2     2  a           4.0
source
TulipaClustering.project_onto_standard_basisMethod

projectontostandard_basis(vector)

Projects vector onto the standard basis. This projection is trivial: replace all components of the vector with zeros, except for the largest one, which is replaced with one.

source
TulipaClustering.projected_subgradient_descent!Method

projectedsubgradientdescent!(x; gradient, projection, niters, rtol, learningrate, adaptivegrad)

Fits x using the projected gradient descent scheme.

The arguments:

source
TulipaClustering.split_into_periods!Method
split_into_periods!(df; period_duration=nothing, layout=ProfilesTableLayout())

Modifies a dataframe df by separating the time column into periods of length period_duration, respecting custom column names provided by layout.

The new data is written into two columns defined by the layout:

  • layout.period: the period ID
  • layout.timestep: the time step within the current period

If period_duration is nothing, then all time steps are in a single period (ID 1).

Examples

julia> df = DataFrame([:timestep => 1:4, :value => 5:8])
4×2 DataFrame
 Row │ timestep  value
     │ Int64      Int64
─────┼──────────────────
   1 │         1      5
   2 │         2      6
   3 │         3      7
   4 │         4      8

julia> TulipaClustering.split_into_periods!(df; period_duration=2)
4×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      5
   2 │      1          2      6
   3 │      2          1      7
   4 │      2          2      8

julia> df = DataFrame([:period => [1, 1, 2], :timestep => [1, 2, 1], :value => 1:3])
3×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      1
   2 │      1          2      2
   3 │      2          1      3

julia> TulipaClustering.split_into_periods!(df; period_duration=1)
3×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      1
   2 │      2          1      2
   3 │      3          1      3

julia> TulipaClustering.split_into_periods!(df)
3×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      1
   2 │      1          2      2
   3 │      1          3      3

Custom column names via a layout:

julia> layout = ProfilesTableLayout(; timestep = :time_step, period = :periods)
julia> df = DataFrame([:time_step => 1:4, :value => 5:8])
4×2 DataFrame
 Row │ time_step  value
    │ Int64      Int64
─────┼──────────────────
  1 │         1      5
  2 │         2      6
  3 │         3      7
  4 │         4      8

julia> TulipaClustering.split_into_periods!(df; period_duration=2, layout)
4×3 DataFrame
 Row │ periods  time_step  value
    │ Int64    Int64      Int64
─────┼───────────────────────────
  1 │       1          1      5
  2 │       1          2      6
  3 │       2          1      7
  4 │       2          2      8
source
TulipaClustering.transform_wide_to_long!Method
transform_wide_to_long!(
    connection,
    wide_table_name,
    long_table_name;
)

Convenience function to convert a table in wide format to long format using DuckDB. Originally aimed at converting a profile table like the following:

| year | timestep | name1 | name2 | ⋯ | nameN | | –– | –––– | ––- | ––- | – | ––- | | 2030 | 1 | 1.0 | 2.5 | ⋯ | 0.0 | | 2030 | 2 | 1.5 | 2.6 | ⋯ | 0.0 | | 2030 | 3 | 2.0 | 2.6 | ⋯ | 0.0 |

To a table like the following:

yeartimestepprofile_namevalue
20301name11.0
20302name11.5
20303name12.0
20301name22.5
20302name22.6
20303name22.6
20301nameN0.0
20302nameN0.0
20303nameN0.0

This conversion is done using the UNPIVOT SQL command from DuckDB.

Keyword arguments

  • exclude_columns = ["year", "timestep"]: Which tables to exclude from the conversion. Note that if you have more columns that you want to exclude from the wide table, e.g., scenario, you can add them to this list, e.g., ["scenario", "year", "timestep"].
  • name_column = "profile_name": Name of the new column that contains the names of the old columns
  • value_column = "value": Name of the new column that holds the values from the old columns
source
TulipaClustering.validate_data!Method
validate_data!(connection)

Validate that the required data in connection exists and is correct. Throws a DataValidationException if any error is found.

source
TulipaClustering.validate_df_and_find_key_columnsMethod

validatedfandfindkey_columns(df; layout = ProfilesTableLayout())

Checks that dataframe df contains the necessary columns (as described by layout) and returns a list of columns that act as keys (i.e., unique data identifiers within different periods). Keys are all columns except layout.period and layout.value.

Examples

Default column names:

julia> df = DataFrame([:period => [1, 1, 2], :timestep => [1, 2, 1], :a .=> "a", :value => 1:3])
3×4 DataFrame
 Row │ period  timestep  a       value
     │ Int64   Int64      String  Int64
─────┼──────────────────────────────────
   1 │      1          1  a           1
   2 │      1          2  a           2
   3 │      2          1  a           3

julia> TulipaClustering.validate_df_and_find_key_columns(df)
2-element Vector{Symbol}:
 :timestep
 :a

Custom column names via a layout:

julia> layout = ProfilesTableLayout(; period = :p, timestep = :ts, value = :val)
julia> df = DataFrame(p = [1, 1, 2], ts = [1, 2, 1], a = "a", val = 1:3)
3×4 DataFrame
 Row │ p      ts   a       val
     │ Int64  Int64  String  Int64
─────┼─────────────────────────────
   1 │     1     1  a           1
   2 │     1     2  a           2
   3 │     2     1  a           3

julia> TulipaClustering.validate_df_and_find_key_columns(df; layout)
2-element Vector{Symbol}:
 :ts
 :a

Missing columns error references layout-provided names:

julia> df = DataFrame([:value => 1])
julia> TulipaClustering.validate_df_and_find_key_columns(df)
ERROR: DomainError: DataFrame must contain columns `timestep` and `value`
source
TulipaClustering.validate_initial_representativesFunction
validate_initial_representatives(
  initial_representatives,
  clustering_data,
  aux_clustering,
  last_period_excluded,
  n_rp;
  layout = ProfilesTableLayout()
)

Validates that initial_representatives is compatible with clustering_data for use in find_representative_periods, considering custom column names via layout. Checks include:

  1. Key columns match between initial representatives and clustering data.
  2. Initial representatives do not contain an incomplete last period.
  3. Both dataframes have the same set of keys (no extra/missing keys).
  4. The number of periods in initial_representatives does not exceed n_rp (adjusted for last_period_excluded).

Examples

julia> df = DataFrame([:period => [1,1,2,2], :timestep => [1,2,1,2], :zone .=> "A", :value => 10:13])
julia> aux = TulipaClustering.find_auxiliary_data(df)
julia> init = DataFrame([:period => [1,1], :timestep => [1,2], :zone .=> "A", :value => [10, 11]])
julia> TulipaClustering.validate_initial_representatives(init, df, aux, false, 2)

Custom layout:

julia> layout = ProfilesTableLayout(; period=:p, timestep=:ts, value=:val)
julia> df2 = DataFrame([:p => [1,1,2,2], :ts => [1,2,1,2], :zone .=> "A", :val => 10:13])
julia> aux2 = TulipaClustering.find_auxiliary_data(df2; layout)
julia> init2 = DataFrame([:p => [1,1], :ts => [1,2], :zone .=> "A", :val => [10, 11]])
julia> TulipaClustering.validate_initial_representatives(init2, df2, aux2, false, 2; layout)
source
TulipaClustering.weight_matrix_to_dfMethod
weight_matrix_to_df(weights)

Converts a weight matrix from a (sparse) matrix, which is more convenient for internal computations, to a dataframe, which is better for saving into a file. Zero weights are dropped to avoid cluttering the dataframe.

source
TulipaClustering.write_clustering_result_to_tablesMethod
write_clustering_result_to_tables(connection, clustering_result; database_schema="", layout=ProfilesTableLayout())

Writes a TulipaClustering.ClusteringResult into DuckDB tables in connection.

Column naming:

  • The profiles_rep_periods table preserves the column names provided by layout for the time and value axes. Resulting columns are: profile_name, rep_period, <layout.timestep>, <layout.value>.
  • Other tables (rep_periods_data, rep_periods_mapping, timeframe_data) are not affected by the layout and keep their original schema.
source