Reference

TulipaClustering.AuxiliaryClusteringData
TulipaClustering.ClusteringResult
TulipaClustering.DataValidationException
TulipaClustering.append_period_from_source_df_as_rp!
TulipaClustering.cluster!
TulipaClustering.combine_periods!
TulipaClustering.df_to_matrix_and_keys
TulipaClustering.dummy_cluster!
TulipaClustering.find_auxiliary_data
TulipaClustering.find_period_weights
TulipaClustering.find_representative_periods
TulipaClustering.fit_rep_period_weights!
TulipaClustering.fit_rep_period_weights!
TulipaClustering.greedy_convex_hull
TulipaClustering.matrix_and_keys_to_df
TulipaClustering.project_onto_nonnegative_orthant
TulipaClustering.project_onto_simplex
TulipaClustering.project_onto_standard_basis
TulipaClustering.projected_subgradient_descent!
TulipaClustering.split_into_periods!
TulipaClustering.transform_wide_to_long!
TulipaClustering.validate_data!
TulipaClustering.validate_df_and_find_key_columns
TulipaClustering.weight_matrix_to_df
TulipaClustering.write_clustering_result_to_tables

TulipaClustering.AuxiliaryClusteringData — Type

Structure to hold the time series used in clustering together with some summary statistics on the data.

TulipaClustering.ClusteringResult — Type

Structure to hold the clustering result.

TulipaClustering.DataValidationException — Type

DataValidationException

Exception related to data validation of the Tulipa Energy Model input data.

source

TulipaClustering.append_period_from_source_df_as_rp! — Method

append_period_from_source_df_as_rp!(df; source_df, period, rp, key_columns)

Extracts a period with index period from source_df and appends it as a representative period with index rp to df, using key_columns as keys.

Examples

julia> source_df = DataFrame([:period => [1, 1, 2, 2], :timestep => [1, 2, 1, 2], :a .=> "b", :value => 5:8])
4×4 DataFrame
 Row │ period  timestep  a       value
     │ Int64   Int64      String  Int64
─────┼──────────────────────────────────
   1 │      1          1  b           5
   2 │      1          2  b           6
   3 │      2          1  b           7
   4 │      2          2  b           8

julia> df = DataFrame([:rep_period => [1, 1, 2, 2], :timestep => [1, 2, 1, 2], :a .=> "a", :value => 1:4])
4×4 DataFrame
 Row │ rep_period  timestep  a       value
     │ Int64       Int64      String  Int64
─────┼──────────────────────────────────────
   1 │          1          1  a           1
   2 │          1          2  a           2
   3 │          2          1  a           3
   4 │          2          2  a           4

julia> TulipaClustering.append_period_from_source_df_as_rp!(df; source_df, period = 2, rp = 3, key_columns = [:timestep, :a])
6×4 DataFrame
 Row │ rep_period  timestep  a       value
     │ Int64       Int64      String  Int64
─────┼──────────────────────────────────────
   1 │          1          1  a           1
   2 │          1          2  a           2
   3 │          2          1  a           3
   4 │          2          2  a           4
   5 │          3          1  b           7
   6 │          3          2  b           8

source

TulipaClustering.cluster! — Method

cluster!(
    connection,
    period_duration,
    num_rps;
    input_database_schema = "",
    input_profile_table_name = "profiles",
    database_schema = "",
    drop_incomplete_last_period::Bool = false,
    method::Symbol = :k_means,
    distance::SemiMetric = SqEuclidean(),
    initial_representatives::AbstractDataFrame = DataFrame(),
    weight_type::Symbol = :convex,
    tol::Float64 = 1e-2,
    clustering_kwargs = Dict(),
    weight_fitting_kwargs = Dict(),
    niters::Int = 100,
    learning_rate::Float64 = 0.001,
    adaptive_grad::Bool = false,
)

Convenience function to cluster the table named in input_profile_table_name using period_duration and num_rps. The resulting tables profiles_rep_periods, rep_periods_mapping, and rep_periods_data are loaded into connection in the database_schema, if given, and enriched with year information.

This function extract the table, then calls split_into_periods!, find_representative_periods, fit_rep_period_weights!, and finally write_clustering_result_to_tables.

Arguments

Required

connection: DuckDB connection
period_duration: Duration of each period, i.e., number of timesteps.
num_rps: Number of findrepresentativeperiods

Keyword arguments

input_database_schema (default ""): Schema of the input tables
input_profile_table_name (default "profiles"): Default name of the profiles table inside the above schemaa
database_schema (default ""): Schema of the output tables
drop_incomplete_last_period (default false): controls how the last period is treated if it is not complete: if this parameter is set to true, the incomplete period is dropped and the weights are rescaled accordingly; otherwise, clustering is done for n_rp - 1 periods, and the last period is added as a special shorter representative period
method (default :k_means): clustering method to use, either:kmeansand:kmedoids`
distance (default Distances.SqEuclidean()): semimetric used to measure distance between data points.
initial_representatives initial representatives that should be included in the clustering. The period column in the initial representatives should be 1-indexed and the key columns should be the same as in the clustering data. For the hull methods it will be added before clustering, for :kmeans and :kmedoids it will be added after clustering.
weight_type (default :convex): the type of weights to find; possible values are:
- :convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one)
- :conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights)
- :conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
tol (default 1e-2): algorithm's tolerance; when the weights are adjusted by a value less then or equal to tol, they stop being fitted further.
clustering_kwargs (default Dict()): Extra keyword arguments passed to find_representative_periods
weight_fitting_kwargs (default Dict()): Extra keyword arguments passed to fit_rep_period_weights!

source

TulipaClustering.combine_periods! — Method

combine_periods!(df)

Modifies a dataframe df by combining the columns timestep and period into a single column timestep of global time steps. The period duration is inferred automatically from the maximum time step value, assuming that periods start with time step 1.

Examples

julia> df = DataFrame([:period => [1, 1, 2], :timestep => [1, 2, 1], :value => 1:3])
3×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      1
   2 │      1          2      2
   3 │      2          1      3

julia> TulipaClustering.combine_periods!(df)
3×2 DataFrame
 Row │ timestep  value
     │ Int64      Int64
─────┼──────────────────
   1 │         1      1
   2 │         2      2
   3 │         3      3

source

TulipaClustering.df_to_matrix_and_keys — Method

df_to_matrix_and_keys(df, key_columns)

Converts a dataframe df (in a long format) to a matrix, ignoring the columns specified as key_columns. The key columns are converted from long to wide format and returned alongside the matrix.

Examples

julia> df = DataFrame([:period => [1, 1, 2, 2], :timestep => [1, 2, 1, 2], :a .=> "a", :value => 1:4])
4×4 DataFrame
 Row │ period  timestep  a       value
     │ Int64   Int64      String  Int64
─────┼──────────────────────────────────
   1 │      1          1  a           1
   2 │      1          2  a           2
   3 │      2          1  a           3
   4 │      2          2  a           4

julia> m, k = TulipaClustering.df_to_matrix_and_keys(df, [:timestep, :a]); m
2×2 Matrix{Float64}:
 1.0  3.0
 2.0  4.0

julia> k
2×2 DataFrame
 Row │ timestep  a
     │ Int64      String
─────┼───────────────────
   1 │         1  a
   2 │         2  a

source

TulipaClustering.dummy_cluster! — Method

dummy_cluster!(connection)

Convenience function to create the necessary columns and tables when clustering is not required.

This is essentially creating a single representative period with the size of the whole profile. See cluster! for more details of what is created.

source

TulipaClustering.find_auxiliary_data — Method

find_auxiliary_data(clustering_data)

Calculates auxiliary data associated with the clustering_data. These include:

key_columns_demand: key columns in the demand dataframe
key_columns_generation_availability: key columns in the generation availability dataframe
period_duration: duration of time periods (in time steps)
last_period_duration: duration of the last period
n_periods: total number of periods

source

TulipaClustering.find_period_weights — Method

find_period_weights(period_duration, last_period_duration, n_periods, drop_incomplete_periods)

Finds weights of two different types of periods in the clustering data:

complete periods: these are all of the periods with length equal to period_duration.
incomplete last period: if last period duration is less than period_duration, it is incomplete.

source

TulipaClustering.find_representative_periods — Method

findrepresentativeperiods( clusteringdata; nrp = 10, rescaledemanddata = true, dropincompletelastperiod = false, method = :kmeans, distance = SqEuclidean(), initial_representatives = DataFrame(), args..., )

Finds representative periods via data clustering.

clustering_data: the data to perform clustering on.
n_rp: number of representative periods to find.
rescale_demand_data: if true, demands are first divided by the maximum demand value, so that they are between zero and one like the generation availability data
drop_incomplete_last_period: controls how the last period is treated if it is not complete: if this parameter is set to true, the incomplete period is dropped and the weights are rescaled accordingly; otherwise, clustering is done for n_rp - 1 periods, and the last period is added as a special shorter representative period
method: clustering method to use, either :k_means and :k_medoids
distance: semimetric used to measure distance between data points.
initial_representatives initial representatives that should be included in the clustering. The period column in the initial representatives should be 1-indexed and the key columns should be the same as in the clustering data. For the hull methods it will be added before clustering, for :kmeans and :kmedoids it will be added after clustering.
other named arguments can be provided; they are passed to the clustering method.

source

TulipaClustering.fit_rep_period_weights! — Method

fitrepperiodweights!(weightmatrix, clusteringmatrix, rpmatrix; weight_type, tol, args...)

Given the initial weight guesses, finds better weights for convex or conical combinations of representative periods. For conical weights, it is possible to bound the total weight by one.

The arguments:

clustering_result: the result of running TulipaClustering.find_representative_periods
weight_type: the type of weights to find; possible values are:
- :convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one)
- :conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights)
- :conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
tol: algorithm's tolerance; when the weights are adjusted by a value less then or equal to tol, they stop being fitted further.
other arguments control the projected subgradient method; they are passed through to TulipaClustering.projected_subgradient_descent!.

source

TulipaClustering.fit_rep_period_weights! — Method

fitrepperiodweights!(weightmatrix, clusteringmatrix, rpmatrix; weight_type, tol, args...)

Given the initial weight guesses, finds better weights for convex or conical combinations of representative periods. For conical weights, it is possible to bound the total weight by one.

The arguments:

weight_matrix: the initial guess for weights; the weights are adjusted using a projected subgradient descent method
clustering_matrix: the matrix of raw clustering data
rp_matrix: the matrix of raw representative period data
weight_type: the type of weights to find; possible values are:
- :convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one)
- :conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights)
- :conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
tol: algorithm's tolerance; when the weights are adjusted by a value less then or equal to tol, they stop being fitted further.
show_progress: if true, a progress bar will be displayed.
other arguments control the projected subgradient method; they are passed through to TulipaClustering.projected_subgradient_descent!.

source

TulipaClustering.greedy_convex_hull — Method

greedy_convex_hull(matrix; n_points, distance, initial_indices, mean_vector)

Greedy method for finding n_points points in a hull of the dataset. The points are added iteratively, at each step the point that is the furthest away from the hull of the current set of points is found and added to the hull.

matrix: the clustering matrix
n_points: number of hull points to find
distance: distance semimetric
initial_indices: initial points which must be added to the hull, can be nothing
mean_vector: when adding the first point (if initial_indices is not given), it will be chosen as the point furthest away from the mean_vector; this can be nothing, in which case the first step will add a point furtherst away from the centroid (the mean) of the dataset

source

TulipaClustering.matrix_and_keys_to_df — Method

matrix_and_keys_to_df(matrix, keys)

Converts a a matrix matrix to a dataframe, appending the key columns given by keys.

Examples

julia> m = [1.0 3.0; 2.0 4.0]
2×2 Matrix{Float64}:
 1.0  3.0
 2.0  4.0

julia> k = DataFrame([:timestep => 1:2, :a .=> "a"])
2×2 DataFrame
 Row │ timestep  a
     │ Int64      String
─────┼───────────────────
   1 │         1  a
   2 │         2  a

julia> TulipaClustering.matrix_and_keys_to_df(m, k)
4×4 DataFrame
 Row │ rep_period  timestep  a       value
     │ Int64       Int64      String  Float64
─────┼────────────────────────────────────────
   1 │          1          1  a           1.0
   2 │          1          2  a           2.0
   3 │          2          1  a           3.0
   4 │          2          2  a           4.0

source

TulipaClustering.project_onto_nonnegative_orthant — Method

projectontononnegative_orthant(vector)

Projects vector onto the nonnegative orthant. This projection is trivial: replace negative components of the vector with zeros.

source

TulipaClustering.project_onto_simplex — Method

projectontosimplex(vector)

Projects vector onto a unit simplex using Michelot's algorithm in Condat's accelerated implementation (2017). See Figure 2 of Condat, L. Fast projection onto the simplex and the ball. Math. Program. 158, 575–585 (2016).. For the details on the meanings of v, ṽ, ρ and other variables, see the original paper.

source

TulipaClustering.project_onto_standard_basis — Method

projectontostandard_basis(vector)

Projects vector onto the standard basis. This projection is trivial: replace all components of the vector with zeros, except for the largest one, which is replaced with one.

source

TulipaClustering.projected_subgradient_descent! — Method

projectedsubgradientdescent!(x; gradient, projection, niters, rtol, learningrate, adaptivegrad)

Fits x using the projected gradient descent scheme.

The arguments:

x: the value to fit
subgradient: the subgradient operator, that is, a function that takes vectors of the same shape as x as inputs and returns a subgradient of the loss at that point; the fitting is done to minimize the corresponding implicit loss
projection: the projection operator, that is, a function that, given a vector x, finds a point within some subspace that is closest to x
niters: maximum number of projected gradient descent iterations
tol: tolerance; when no components of x improve by more than tol, the algorithm stops
learning_rate: learning rate of the algorithm
adaptive_grad: if true, the learning rate is adjusted using the adaptive gradient method, see John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 12, null (2/1/2011), 2121–2159.

source

TulipaClustering.split_into_periods! — Method

split_into_periods!(df; period_duration=nothing)

Modifies a dataframe df by separating the column timestep into periods of length period_duration. The new data is written into two columns:

period: the period ID;
timestep: the time step within the current period.

If period_duration is nothing, then all of the time steps are within the same period with index 1.

Examples

julia> df = DataFrame([:timestep => 1:4, :value => 5:8])
4×2 DataFrame
 Row │ timestep  value
     │ Int64      Int64
─────┼──────────────────
   1 │         1      5
   2 │         2      6
   3 │         3      7
   4 │         4      8

julia> TulipaClustering.split_into_periods!(df; period_duration=2)
4×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      5
   2 │      1          2      6
   3 │      2          1      7
   4 │      2          2      8

julia> df = DataFrame([:period => [1, 1, 2], :timestep => [1, 2, 1], :value => 1:3])
3×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      1
   2 │      1          2      2
   3 │      2          1      3

julia> TulipaClustering.split_into_periods!(df; period_duration=1)
3×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      1
   2 │      2          1      2
   3 │      3          1      3

julia> TulipaClustering.split_into_periods!(df)
3×3 DataFrame
 Row │ period  timestep  value
     │ Int64   Int64      Int64
─────┼──────────────────────────
   1 │      1          1      1
   2 │      1          2      2
   3 │      1          3      3

source

TulipaClustering.transform_wide_to_long! — Method

transform_wide_to_long!(
    connection,
    wide_table_name,
    long_table_name;
)

Convenience function to convert a table in wide format to long format using DuckDB. Originally aimed at converting a profile table like the following:

| year | timestep | name1 | name2 | ⋯ | name2 | | –– | –––– | ––- | ––- | – | ––- | | 2030 | 1 | 1.0 | 2.5 | ⋯ | 0.0 | | 2030 | 2 | 1.5 | 2.6 | ⋯ | 0.0 | | 2030 | 3 | 2.0 | 2.6 | ⋯ | 0.0 |

To a table like the following:

year	timestep	profile_name	value
2030	1	name1	1.0
2030	2	name1	1.5
2030	3	name1	2.0
2030	1	name2	2.5
2030	2	name2	2.6
2030	3	name2	2.6
⋮	⋮	⋮	⋮
2030	1	name3	0.0
2030	2	name3	0.0
2030	3	name3	0.0

This conversion is done using the UNPIVOT SQL command from DuckDB.

Keyword arguments

exclude_columns = ["year", "timestep"]: Which tables to exclude from the conversion
name_column = "profile_name": Name of the new column that contains the names of the old columns
value_column = "value": Name of the new column that holds the values from the old columns

source

TulipaClustering.validate_data! — Method

validate_data!(connection)

Validate that the required data in connection exists and is correct. Throws a DataValidationException if any error is found.

source

TulipaClustering.validate_df_and_find_key_columns — Method

validate_df_and_find_key_columns(df)

Checks that dataframe df contains the necessary columns and returns a list of columns that act as keys (i.e., unique data identifiers within different periods).

Examples

julia> df = DataFrame([:period => [1, 1, 2], :timestep => [1, 2, 1], :a .=> "a", :value => 1:3])
3×4 DataFrame
 Row │ period  timestep  a       value
     │ Int64   Int64      String  Int64
─────┼──────────────────────────────────
   1 │      1          1  a           1
   2 │      1          2  a           2
   3 │      2          1  a           3

julia> TulipaClustering.validate_df_and_find_key_columns(df)
2-element Vector{Symbol}:
 :timestep
 :a

julia> df = DataFrame([:value => 1])
1×1 DataFrame
 Row │ value
     │ Int64
─────┼───────
   1 │     1

julia> TulipaClustering.validate_df_and_find_key_columns(df)
ERROR: DomainError with 1×1 DataFrame
 Row │ value
     │ Int64
─────┼───────
   1 │     1:
DataFrame must contain columns `timestep` and `value`

source

TulipaClustering.weight_matrix_to_df — Method

weight_matrix_to_df(weights)

Converts a weight matrix from a (sparse) matrix, which is more convenient for internal computations, to a dataframe, which is better for saving into a file. Zero weights are dropped to avoid cluttering the dataframe.

source

TulipaClustering.write_clustering_result_to_tables — Method

write_clustering_result_to_table(connection, clustering_result)

Writes a TulipaClustering.ClusteringResult to CSV files in the output_folder.

source