Reference
TulipaClustering.AuxiliaryClusteringDataTulipaClustering.ClusteringResultTulipaClustering.DataValidationExceptionTulipaClustering.ProfilesTableLayoutTulipaClustering._combine_group_profilesTulipaClustering._combine_rep_periods_dataTulipaClustering._combine_timeframe_dataTulipaClustering._combine_weight_matricesTulipaClustering._get_initial_representatives_for_groupTulipaClustering._rep_period_offsetTulipaClustering._update_period_numbers_using_crossby_cols!TulipaClustering.append_period_from_source_df_as_rp!TulipaClustering.cluster!TulipaClustering.combine_periods!TulipaClustering.df_to_matrix_and_keysTulipaClustering.dummy_cluster!TulipaClustering.find_auxiliary_dataTulipaClustering.find_period_weightsTulipaClustering.find_representative_periodsTulipaClustering.fit_rep_period_weights!TulipaClustering.fit_rep_period_weights!TulipaClustering.greedy_convex_hullTulipaClustering.matrix_and_keys_to_dfTulipaClustering.project_onto_nonnegative_orthantTulipaClustering.project_onto_simplexTulipaClustering.project_onto_standard_basisTulipaClustering.projected_subgradient_descent!TulipaClustering.split_into_periods!TulipaClustering.transform_wide_to_long!TulipaClustering.validate_data!TulipaClustering.validate_df_and_find_key_columnsTulipaClustering.validate_initial_representativesTulipaClustering.weight_matrix_to_dfTulipaClustering.write_clustering_result_to_tablesTulipaClustering.write_clustering_result_to_tables
TulipaClustering.AuxiliaryClusteringData — Type
Structure to hold the time series used in clustering together with some summary statistics on the data.
TulipaClustering.ClusteringResult — Type
Structure to hold the clustering result.
TulipaClustering.DataValidationException — Type
DataValidationExceptionException related to data validation of the Tulipa Energy Model input data.
TulipaClustering.ProfilesTableLayout — Type
ProfilesTableLayout(;key = value, ...)
ProfilesTableLayout(path; ...)Structure to hold the profiles input data table layout. Column names in the layout are defined by default.
If path is passed, it is expected to be a string pointing to a TOML file with a key = value list of parameters. Explicit keyword arguments take precedence.
Parameters
value::Symbol = :value: The column name with the profile values.timestep::Symbol = :timestep: The column name with the time steps in the profile.period::Symbol = :period: The column name with the period number in the profile.year::Symbol = :year: The column name with the year of the profile.scenario::Symbol = :scenario: The column name with the scenario of the profile.cols_to_groupby::Vector{Symbol} = [:year]: The column names to group by when performing clustering on groups of profiles separately. If empty, no grouping is done.cols_to_crossby::Vector{Symbol} = []: The column names to cross by when performing clustering on profiles. If empty, no cross-column is done.
TulipaClustering._combine_group_profiles — Method
A function to offset representative period indices so that groups have disjoint repperiod ranges. For group index g, newrepperiod = oldrepperiod + offset given by _repperiodoffset(nrp, g).
TulipaClustering._combine_rep_periods_data — Method
A function to combine repperiodsdata from different groups. For group index g, newrepperiod = oldrepperiod + offset given by repperiodoffset(nrp, g). In addition, the group key columns are added to the resulting dataframe.
TulipaClustering._combine_timeframe_data — Method
A function to combine timeframe_data from different groups. Creates timeframe data with period information for each group. The year column (specified by layout.year) is extracted from either group keys or cross column metadata. For each unique year value, a row is created for each period with its duration information.
TulipaClustering._combine_weight_matrices — Method
A function to combine weight matrices from different groups. For group index g, newrepperiod = oldrepperiod + offset given by repperiodoffset(nrp, g). In addition, the group key columns and cross column values are added to the resulting dataframe. The period column is updated to reflect the original period within each cross group.
TulipaClustering._get_initial_representatives_for_group — Method
_get_initial_representatives_for_group(
initial_representatives::AbstractDataFrame,
group_key::DataFrames.GroupKey{GroupedDataFrame{DataFrame}},
)Get the initial representatives for a specific group from a grouped DataFrame.
Arguments
initial_representatives::AbstractDataFrame: A DataFrame containing the initial representative data points.group_key::DataFrames.GroupKey{GroupedDataFrame{DataFrame}}: A key identifying a specific group within a grouped DataFrame.
Returns
The subset of initial_representatives that corresponds to the specified group_key.
Description
This is an internal helper function that extracts the initial representative data points for a particular group identified by its group key. It is typically used during the initialization phase of clustering operations on grouped data.
TulipaClustering._rep_period_offset — Method
A helper function to compute the rep_period offset for group indexing.
TulipaClustering._update_period_numbers_using_crossby_cols! — Method
_update_period_numbers_using_crossby_cols!(grouped_profiles_data, layout)Update period numbers in grouped profile data by creating sequential periods across groups defined by cols_to_crossby.
Arguments
grouped_profiles_data::GroupedDataFrame: A grouped DataFrame containing profile datalayout::ProfilesTableLayout: Layout specification containing:period: Column name for period numberscols_to_crossby: Column names used to cross/combine groups
Returns
grouped_profiles_data: Modified GroupedDataFrame with updated period numbers and crossby columns removedmetadata_per_group: Dictionary containing metadata for each group with keys:group_values: Values identifying the groupnum_periods: Maximum period number in the groupcross_values_list: List of NamedTuples containing crossby column values for each cross group
Details
For each group in the grouped data, this function:
- Creates subgroups based on
cols_to_crossbycolumns - Adjusts period numbers so that each subgroup has sequential, non-overlapping periods
- Period numbers for subgroup
iare offset bynum_periods * (i - 1) - Removes the
cols_to_crossbycolumns from the final result
Modifications in Place
Modifies grouped_profiles_data in place by updating period numbers and removing crossby columns.
TulipaClustering.append_period_from_source_df_as_rp! — Method
appendperiodfromsourcedfasrp!(df; sourcedf, period, rp, keycolumns, layout = ProfilesTableLayout())
Extracts a period with index period from source_df and appends it as a representative period with index rp to df, using key_columns as keys. Respects custom column names via layout.
Examples
Default layout:
julia> source_df = DataFrame([:period => [1, 1, 2, 2], :timestep => [1, 2, 1, 2], :a .=> "b", :value => 5:8])
4×4 DataFrame
Row │ period timestep a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────
1 │ 1 1 b 5
2 │ 1 2 b 6
3 │ 2 1 b 7
4 │ 2 2 b 8
julia> df = DataFrame([:rep_period => [1, 1, 2, 2], :timestep => [1, 2, 1, 2], :a .=> "a", :value => 1:4])
4×4 DataFrame
Row │ rep_period timestep a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 2 1 a 3
4 │ 2 2 a 4
julia> TulipaClustering.append_period_from_source_df_as_rp!(df; source_df, period = 2, rp = 3, key_columns = [:timestep, :a])
6×4 DataFrame
Row │ rep_period timestep a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 2 1 a 3
4 │ 2 2 a 4
5 │ 3 1 b 7
6 │ 3 2 b 8Custom layout:
julia> layout = ProfilesTableLayout(; period=:p, timestep=:ts, value=:val)
julia> src = DataFrame([:p => [1,1,2,2], :ts => [1,2,1,2], :a .=> "b", :val => 5:8])
julia> df = DataFrame([:rep_period => [1,1], :ts => [1,2], :a .=> "a", :val => [1,2]])
julia> TulipaClustering.append_period_from_source_df_as_rp!(df; source_df = src, period = 2, rp = 3, key_columns = [:ts, :a], layout)
4×4 DataFrame
Row │ rep_period ts a val
│ Int64 Int64 String Int64
─────┼──────────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 3 1 b 7
4 │ 3 2 b 8TulipaClustering.cluster! — Method
cluster!(
connection,
period_duration,
num_rps;
input_database_schema = "",
input_profile_table_name = "profiles",
database_schema = "",
drop_incomplete_last_period::Bool = false,
method::Symbol = :k_means,
distance::SemiMetric = SqEuclidean(),
initial_representatives::AbstractDataFrame = DataFrame(),
layout::ProfilesTableLayout = ProfilesTableLayout(),
weight_type::Symbol = :convex,
tol::Float64 = 1e-2,
clustering_kwargs = Dict(),
weight_fitting_kwargs = Dict(),
)Convenience function to cluster the table named in input_profile_table_name using period_duration and num_rps. The resulting tables profiles_rep_periods, rep_periods_mapping, and rep_periods_data are loaded into connection in the database_schema, if given, and enriched with year information.
This function extracts the table (expecting columns year, profile_name, timestep, value), then calls split_into_periods!, find_representative_periods, fit_rep_period_weights!, and finally write_clustering_result_to_tables.
Arguments
Required
connection: DuckDB connectionperiod_duration: Duration of each period, i.e., number oftimesteps.num_rps: Number of findrepresentativeperiods
Keyword arguments
input_database_schema(default""): Schema of the input tablesinput_profile_table_name(default"profiles"): Default name of theprofilestable inside the above schemaadatabase_schema(default""): Schema of the output tablesdrop_incomplete_last_period(defaultfalse): controls how the last period is treated if it is not complete: if this parameter is set totrue, the incomplete period is dropped and the weights are rescaled accordingly; otherwise, clustering is done forn_rp - 1periods, and the last period is added as a special shorter representative periodmethod(default:k_medoids): clustering method to use:k_means,:k_medoids,:convex_hull,:convex_hull_with_null, or:conical_hull.distance(defaultDistances.Euclidean()): semimetric used to measure distance between data points.initial_representativesinitial representatives that should be included in the clustering. The period column in the initial representatives should be 1-indexed and the key columns should be the same as in the clustering data. For the hull methods it will be added before clustering, for :kmeans and :kmedoids it will be added after clustering.layout(defaultProfilesTableLayout()): describes the column names forperiod,timestep, andvaluein in-memory DataFrames. It does not change the SQL input table schema, which must containprofile_name,timestep, andvalue. Weight fitting operates on matrices and does not uselayout.weight_type(default:dirac): the type of weights to find; possible values are::dirac: each period is represented by exactly one representative period (a one unit weight and the rest are zeros):convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one):conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights):conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
tol(default1e-2): algorithm's tolerance; when the weights are adjusted by a value less then or equal totol, they stop being fitted further.clustering_kwargs(defaultDict()): Extra keyword arguments passed tofind_representative_periodsweight_fitting_kwargs(defaultDict()): Extra keyword arguments passed tofit_rep_period_weights!(e.g.,niters,learning_rate,adaptive_grad).
TulipaClustering.combine_periods! — Method
combine_periods!(df; layout = ProfilesTableLayout())Combine per-period time steps into a single global timestep column in-place.
Given a long-format dataframe df with (at least) a per-period timestep column and, optionally, a period column (names provided by layout), this function rewrites the timestep column so that time becomes a single global, monotonically increasing index across all periods, then removes the original period column.
Period length inference:
- The (nominal) period duration
Lis inferred as the maximum value found in the per-period time-step column across the whole dataframe (NOT per period). - Each row's global timestep is computed as
(period - 1) * L + timestep. - If the final period is shorter than
L, the resulting global time index will simply end earlier; missing intermediate global timesteps are not created.
Arguments:
df::AbstractDataFrame(mutated): Source data in long format.layout::ProfilesTableLayout: Describes the column names forperiodandtimestep(defaults to standard names). Pass a custom layout if your dataframe uses different symbols.
Behavior & edge cases:
- If the
timestepcolumn (as specified bylayout) is missing, aDomainErroris thrown. - If the
periodcolumn is absent, the function is a no-op (returns immediately). - Non-1-based or non-consecutive per-period timesteps are not validated; unusual values may result in non-contiguous or non-strictly increasing global indices.
- Works in-place; the modified dataframe (without
period) is also returned for convenience.
Examples
Basic usage with default layout:
julia> df = DataFrame([:period => [1, 1, 2], :timestep => [1, 2, 1], :value => 1:3])
3×3 DataFrame
Row │ period timestep value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 2 1 3
julia> TulipaClustering.combine_periods!(df)
3×2 DataFrame
Row │ timestep value
│ Int64 Int64
─────┼──────────────────
1 │ 1 1
2 │ 2 2
3 │ 3 3Custom column names via a layout:
julia> layout = ProfilesTableLayout(; period = :p, timestep = :ts)
julia> df = DataFrame([:p => [1,1,2], :ts => [1,2,1], :value => 10:12])
3×3 DataFrame
Row │ p ts value
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 1 10
2 │ 1 2 11
3 │ 2 1 12
julia> TulipaClustering.combine_periods!(df; layout)
3×2 DataFrame
Row │ ts value
│ Int64 Int64
─────┼──────────────
1 │ 1 10
2 │ 2 11
3 │ 3 12No period column (no-op):
julia> df = DataFrame([:timestep => 1:3, :value => 4:6])
julia> TulipaClustering.combine_periods!(df)
3×2 DataFrame
Row │ timestep value
│ Int64 Int64
─────┼──────────────────
1 │ 1 4
2 │ 2 5
3 │ 3 6TulipaClustering.df_to_matrix_and_keys — Method
dftomatrixandkeys(df, key_columns; layout = ProfilesTableLayout())
Converts a long-format dataframe df to a matrix, using the value/period columns from layout. Columns listed in key_columns are kept as keys.
Returns (matrix::Matrix{Float64}, keys::DataFrame).
Examples
Default layout:
julia> df = DataFrame([:period => [1, 1, 2, 2], :timestep => [1, 2, 1, 2], :a .=> "a", :value => 1:4])
4×4 DataFrame
Row │ period timestep a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 2 1 a 3
4 │ 2 2 a 4
julia> m, k = TulipaClustering.df_to_matrix_and_keys(df, [:timestep, :a]); m
2×2 Matrix{Float64}:
1.0 3.0
2.0 4.0
julia> k
2×2 DataFrame
Row │ timestep a
│ Int64 String
─────┼───────────────────
1 │ 1 a
2 │ 2 aCustom layout:
julia> layout = ProfilesTableLayout(; period=:p, timestep=:ts, value=:val)
julia> df = DataFrame([:p => [1,1,2,2], :ts => [1,2,1,2], :a .=> "a", :val => 1:4])
julia> m, k = TulipaClustering.df_to_matrix_and_keys(df, [:ts, :a]; layout); m
2×2 Matrix{Float64}:
1.0 3.0
2.0 4.0
julia> k
2×2 DataFrame
Row │ ts a
│ Int64 String
─────┼────────────────
1 │ 1 a
2 │ 2 aTulipaClustering.dummy_cluster! — Method
dummy_cluster!(connection)Convenience function to create the necessary columns and tables when clustering is not required.
This is essentially creating a single representative period with the size of the whole profile. See cluster! for more details of what is created.
TulipaClustering.find_auxiliary_data — Method
find_auxiliary_data(clustering_data; layout = ProfilesTableLayout())Calculates auxiliary data associated with clustering_data, considering custom column names via layout.
Returns AuxiliaryClusteringData with:
key_columns: key columns in the dataframeperiod_duration: nominal duration of periods (max timestep across data)last_period_duration: duration of the last periodn_periods: total number of periods
Example
julia> df = DataFrame([:period => [1,1,2,2], :timestep => [1,2,1,2], :a => "x", :value => 10:13])
julia> aux = TulipaClustering.find_auxiliary_data(df)
AuxiliaryClusteringData([:timestep, :a], 2, 2, 2, nothing)
julia> layout = ProfilesTableLayout(; period=:p, timestep=:ts, value=:val)
julia> df2 = DataFrame([:p => [1,1,2,2], :ts => [1,2,1,1], :a => "x", :val => 10:13])
julia> TulipaClustering.find_auxiliary_data(df2; layout)
AuxiliaryClusteringData([:ts, :a], 2, 1, 2, nothing)TulipaClustering.find_period_weights — Method
find_period_weights(period_duration, last_period_duration, n_periods, drop_incomplete_periods)Finds weights of two different types of periods in the clustering data:
- complete periods: these are all of the periods with length equal to
period_duration. - incomplete last period: if last period duration is less than
period_duration, it is incomplete.
TulipaClustering.find_representative_periods — Method
findrepresentativeperiods( clusteringdata, nrp; dropincompletelastperiod = false, method = :kmeans, distance = SqEuclidean(), initial_representatives = DataFrame(), layout = ProfilesTableLayout(), kwargs..., )
Finds representative periods via data clustering. Honors custom column names via layout (defaults to (:period, :timestep, :value)).
Arguments
clustering_data: long-format data to cluster.n_rp: number of representative periods to find.drop_incomplete_last_period: controls how the last period is treated if it is not complete: if this parameter is set totrue, the incomplete period is dropped and the weights are rescaled accordingly; otherwise, clustering is done forn_rp - 1periods, and the last period is added as a special shorter representative period.method: clustering method to use:k_means,:k_medoids,:convex_hull,:convex_hull_with_null, or:conical_hull.distance: semimetric used to measure distance between data points.initial_representatives: dataframe of initial RPs. It must use the same key columns and follow the samelayoutasclustering_data. For hull methods the RPs are prepended before clustering; for:k_means/:k_medoidsthey are appended after clustering.layout:ProfilesTableLayoutdescribing the column names.- other named arguments are forwarded to the clustering method.
Returns
Returns a ClusteringResult with:
profiles::DataFrame: Long-format representative profiles with columns:rep_period,layout.timestep, all key columns (auxiliary_data.key_columns), andlayout.value.weight_matrix::SparseMatrixCSC{Float64,Int}(or denseMatrix{Float64}): rows correspond to source periods and columns to representative periods; entry(p, r)is the weight of periodpassigned to representativer. If the last period is incomplete anddrop_incomplete_last_periodis false, it maps to its own representative column with its specific weight; if dropped, it is excluded from the rows.clustering_matrix::Matrix{Float64}: The feature-by-period matrix used for clustering (features are derived fromlayout.timestepcrossed with key columns).rp_matrix::Matrix{Float64}: The representative profiles in matrix form (same feature layout asclustering_matrix).auxiliary_data::AuxiliaryClusteringData: Auxiliary metadata such askey_columns,period_duration,last_period_duration,n_periods, and (for applicable methods)medoidsindices.
Examples
Finding two representatives using default values:
julia> df = DataFrame(
period = kron(1:4, ones(Int, 2)),
timestep = repeat(1:2, 4),
profile = "A",
value = 1:8,
)
julia> res = TulipaClustering.find_representative_periods(df, 2)Finding two representatives using k-medoids and a custom layout:
julia> layout = ProfilesTableLayout(; period = :p, timestep = :ts, value = :val)
julia> df = DataFrame(
p = kron(1:4, ones(Int, 2)),
ts = repeat(1:2, 4),
profile = "A",
val = 1:8,
)
julia> res = TulipaClustering.find_representative_periods(df, 2; method = :k_medoids, layout)TulipaClustering.fit_rep_period_weights! — Method
fitrepperiodweights!(weightmatrix, clusteringmatrix, rpmatrix; weight_type, tol, args...)
Given the initial weight guesses, finds better weights for convex or conical combinations of representative periods. For conical weights, it is possible to bound the total weight by one.
The arguments:
clustering_result: the result of runningTulipaClustering.find_representative_periodsweight_type: the type of weights to find; possible values are::convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one):conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights):conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
tol: algorithm's tolerance; when the weights are adjusted by a value less then or equal totol, they stop being fitted further.- other arguments control the projected subgradient method; they are passed through to
TulipaClustering.projected_subgradient_descent!.
TulipaClustering.fit_rep_period_weights! — Method
fitrepperiodweights!(weightmatrix, clusteringmatrix, rpmatrix; weight_type, tol, args...)
Given the initial weight guesses, finds better weights for convex or conical combinations of representative periods. For conical weights, it is possible to bound the total weight by one.
The arguments:
weight_matrix: the initial guess for weights; the weights are adjusted using a projected subgradient descent methodclustering_matrix: the matrix of raw clustering datarp_matrix: the matrix of raw representative period dataweight_type: the type of weights to find; possible values are::dirac: each period is represented by exactly one representative period (a one unit weight and the rest are zeros):convex: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one):conical: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights):conical_bounded: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
tol: algorithm's tolerance; when the weights are adjusted by a value less then or equal totol, they stop being fitted further.show_progress: iftrue, a progress bar will be displayed.- other arguments control the projected subgradient method; they are passed through to
TulipaClustering.projected_subgradient_descent!.
TulipaClustering.greedy_convex_hull — Method
greedy_convex_hull(matrix; n_points, distance, initial_indices, mean_vector)Greedy method for finding n_points points in a hull of the dataset. The points are added iteratively, at each step the point that is the furthest away from the hull of the current set of points is found and added to the hull.
matrix: the clustering matrixn_points: number of hull points to finddistance: distance semimetricinitial_indices: initial points which must be added to the hull, can be nothingmean_vector: when adding the first point (ifinitial_indicesis not given), it will be chosen as the point furthest away from themean_vector; this can be nothing, in which case the first step will add a point furtherst away from the centroid (the mean) of the dataset
TulipaClustering.matrix_and_keys_to_df — Method
matrix_and_keys_to_df(matrix, keys; layout = ProfilesTableLayout())Converts a matrix matrix to a long-format dataframe with columns (:rep_period, layout.timestep, keys..., layout.value).
Examples
Default layout:
julia> m = [1.0 3.0; 2.0 4.0]
2×2 Matrix{Float64}:
1.0 3.0
2.0 4.0
julia> k = DataFrame([:timestep => 1:2, :a .=> "a"])
2×2 DataFrame
Row │ timestep a
│ Int64 String
─────┼───────────────────
1 │ 1 a
2 │ 2 a
julia> TulipaClustering.matrix_and_keys_to_df(m, k)
4×4 DataFrame
Row │ rep_period timestep a value
│ Int64 Int64 String Float64
─────┼────────────────────────────────────────
1 │ 1 1 a 1.0
2 │ 1 2 a 2.0
3 │ 2 1 a 3.0
4 │ 2 2 a 4.0Custom layout:
julia> layout = ProfilesTableLayout(; timestep=:ts, value=:val)
julia> k = DataFrame([:ts => 1:2, :a .=> "a"])
julia> TulipaClustering.matrix_and_keys_to_df(m, k; layout)
4×4 DataFrame
Row │ rep_period ts a val
│ Int64 Int64 String Float64
─────┼────────────────────────────────────
1 │ 1 1 a 1.0
2 │ 1 2 a 2.0
3 │ 2 1 a 3.0
4 │ 2 2 a 4.0TulipaClustering.project_onto_nonnegative_orthant — Method
projectontononnegative_orthant(vector)
Projects vector onto the nonnegative orthant. This projection is trivial: replace negative components of the vector with zeros.
TulipaClustering.project_onto_simplex — Method
projectontosimplex(vector)
Projects vector onto a unit simplex using Michelot's algorithm in Condat's accelerated implementation (2017). See Figure 2 of Condat, L. Fast projection onto the simplex and the ball. Math. Program. 158, 575–585 (2016).. For the details on the meanings of v, ṽ, ρ and other variables, see the original paper.
TulipaClustering.project_onto_standard_basis — Method
projectontostandard_basis(vector)
Projects vector onto the standard basis. This projection is trivial: replace all components of the vector with zeros, except for the largest one, which is replaced with one.
TulipaClustering.projected_subgradient_descent! — Method
projectedsubgradientdescent!(x; gradient, projection, niters, rtol, learningrate, adaptivegrad)
Fits x using the projected gradient descent scheme.
The arguments:
x: the value to fitsubgradient: the subgradient operator, that is, a function that takes vectors of the same shape asxas inputs and returns a subgradient of the loss at that point; the fitting is done to minimize the corresponding implicit lossprojection: the projection operator, that is, a function that, given a vectorx, finds a point within some subspace that is closest toxniters: maximum number of projected gradient descent iterationstol: tolerance; when no components ofximprove by more thantol, the algorithm stopslearning_rate: learning rate of the algorithmadaptive_grad: if true, the learning rate is adjusted using the adaptive gradient method, see [John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 12, null (2/1/2011), 2121–2159.] (https://dl.acm.org/doi/10.5555/1953048.2021068)
TulipaClustering.split_into_periods! — Method
split_into_periods!(df; period_duration=nothing, layout=ProfilesTableLayout())Modifies a dataframe df by separating the time column into periods of length period_duration, respecting custom column names provided by layout.
The new data is written into two columns defined by the layout:
layout.period: the period IDlayout.timestep: the time step within the current period
If period_duration is nothing, then all time steps are in a single period (ID 1).
Examples
julia> df = DataFrame([:timestep => 1:4, :value => 5:8])
4×2 DataFrame
Row │ timestep value
│ Int64 Int64
─────┼──────────────────
1 │ 1 5
2 │ 2 6
3 │ 3 7
4 │ 4 8
julia> TulipaClustering.split_into_periods!(df; period_duration=2)
4×3 DataFrame
Row │ period timestep value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 5
2 │ 1 2 6
3 │ 2 1 7
4 │ 2 2 8
julia> df = DataFrame([:period => [1, 1, 2], :timestep => [1, 2, 1], :value => 1:3])
3×3 DataFrame
Row │ period timestep value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 2 1 3
julia> TulipaClustering.split_into_periods!(df; period_duration=1)
3×3 DataFrame
Row │ period timestep value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 1
2 │ 2 1 2
3 │ 3 1 3
julia> TulipaClustering.split_into_periods!(df)
3×3 DataFrame
Row │ period timestep value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 1 3 3Custom column names via a layout:
julia> layout = ProfilesTableLayout(; timestep = :time_step, period = :periods)
julia> df = DataFrame([:time_step => 1:4, :value => 5:8])
4×2 DataFrame
Row │ time_step value
│ Int64 Int64
─────┼──────────────────
1 │ 1 5
2 │ 2 6
3 │ 3 7
4 │ 4 8
julia> TulipaClustering.split_into_periods!(df; period_duration=2, layout)
4×3 DataFrame
Row │ periods time_step value
│ Int64 Int64 Int64
─────┼───────────────────────────
1 │ 1 1 5
2 │ 1 2 6
3 │ 2 1 7
4 │ 2 2 8TulipaClustering.transform_wide_to_long! — Method
transform_wide_to_long!(
connection,
wide_table_name,
long_table_name;
)Convenience function to convert a table in wide format to long format using DuckDB. Originally aimed at converting a profile table like the following:
| year | timestep | name1 | name2 | ⋯ | nameN | | –– | –––– | ––- | ––- | – | ––- | | 2030 | 1 | 1.0 | 2.5 | ⋯ | 0.0 | | 2030 | 2 | 1.5 | 2.6 | ⋯ | 0.0 | | 2030 | 3 | 2.0 | 2.6 | ⋯ | 0.0 |
To a table like the following:
| year | timestep | profile_name | value |
|---|---|---|---|
| 2030 | 1 | name1 | 1.0 |
| 2030 | 2 | name1 | 1.5 |
| 2030 | 3 | name1 | 2.0 |
| 2030 | 1 | name2 | 2.5 |
| 2030 | 2 | name2 | 2.6 |
| 2030 | 3 | name2 | 2.6 |
| ⋮ | ⋮ | ⋮ | ⋮ |
| 2030 | 1 | nameN | 0.0 |
| 2030 | 2 | nameN | 0.0 |
| 2030 | 3 | nameN | 0.0 |
This conversion is done using the UNPIVOT SQL command from DuckDB.
Keyword arguments
exclude_columns = ["year", "timestep"]: Which tables to exclude from the conversion. Note that if you have more columns that you want to exclude from the wide table, e.g.,scenario, you can add them to this list, e.g.,["scenario", "year", "timestep"].name_column = "profile_name": Name of the new column that contains the names of the old columnsvalue_column = "value": Name of the new column that holds the values from the old columns
TulipaClustering.validate_data! — Method
validate_data!(connection)Validate that the required data in connection exists and is correct. Throws a DataValidationException if any error is found.
TulipaClustering.validate_df_and_find_key_columns — Method
validatedfandfindkey_columns(df; layout = ProfilesTableLayout())
Checks that dataframe df contains the necessary columns (as described by layout) and returns a list of columns that act as keys (i.e., unique data identifiers within different periods). Keys are all columns except layout.period and layout.value.
Examples
Default column names:
julia> df = DataFrame([:period => [1, 1, 2], :timestep => [1, 2, 1], :a .=> "a", :value => 1:3])
3×4 DataFrame
Row │ period timestep a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 2 1 a 3
julia> TulipaClustering.validate_df_and_find_key_columns(df)
2-element Vector{Symbol}:
:timestep
:aCustom column names via a layout:
julia> layout = ProfilesTableLayout(; period = :p, timestep = :ts, value = :val)
julia> df = DataFrame(p = [1, 1, 2], ts = [1, 2, 1], a = "a", val = 1:3)
3×4 DataFrame
Row │ p ts a val
│ Int64 Int64 String Int64
─────┼─────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 2 1 a 3
julia> TulipaClustering.validate_df_and_find_key_columns(df; layout)
2-element Vector{Symbol}:
:ts
:aMissing columns error references layout-provided names:
julia> df = DataFrame([:value => 1])
julia> TulipaClustering.validate_df_and_find_key_columns(df)
ERROR: DomainError: DataFrame must contain columns `timestep` and `value`TulipaClustering.validate_initial_representatives — Function
validate_initial_representatives(
initial_representatives,
clustering_data,
aux_clustering,
last_period_excluded,
n_rp;
layout = ProfilesTableLayout()
)Validates that initial_representatives is compatible with clustering_data for use in find_representative_periods, considering custom column names via layout. Checks include:
- Key columns match between initial representatives and clustering data.
- Initial representatives do not contain an incomplete last period.
- Both dataframes have the same set of keys (no extra/missing keys).
- The number of periods in
initial_representativesdoes not exceedn_rp(adjusted forlast_period_excluded).
Examples
julia> df = DataFrame([:period => [1,1,2,2], :timestep => [1,2,1,2], :zone .=> "A", :value => 10:13])
julia> aux = TulipaClustering.find_auxiliary_data(df)
julia> init = DataFrame([:period => [1,1], :timestep => [1,2], :zone .=> "A", :value => [10, 11]])
julia> TulipaClustering.validate_initial_representatives(init, df, aux, false, 2)Custom layout:
julia> layout = ProfilesTableLayout(; period=:p, timestep=:ts, value=:val)
julia> df2 = DataFrame([:p => [1,1,2,2], :ts => [1,2,1,2], :zone .=> "A", :val => 10:13])
julia> aux2 = TulipaClustering.find_auxiliary_data(df2; layout)
julia> init2 = DataFrame([:p => [1,1], :ts => [1,2], :zone .=> "A", :val => [10, 11]])
julia> TulipaClustering.validate_initial_representatives(init2, df2, aux2, false, 2; layout)TulipaClustering.weight_matrix_to_df — Method
weight_matrix_to_df(weights)Converts a weight matrix from a (sparse) matrix, which is more convenient for internal computations, to a dataframe, which is better for saving into a file. Zero weights are dropped to avoid cluttering the dataframe.
TulipaClustering.write_clustering_result_to_tables — Method
write_clustering_result_to_tables(
connection,
results_per_group::Dict{
DataFrames.GroupKey{GroupedDataFrame{DataFrame}},
TulipaClustering.ClusteringResult,
},
metadata_per_group::Dict,
n_rp::Int;
database_schema = "",
layout::ProfilesTableLayout = ProfilesTableLayout(),)
Writes clustering results from different groups into DuckDB tables in connection. The results from different groups are combined into single tables, adjusting the representative period indices to ensure uniqueness across groups.
TulipaClustering.write_clustering_result_to_tables — Method
write_clustering_result_to_tables(
connection,
clustering_result;
database_schema="",
layout=ProfilesTableLayout()
)Writes a TulipaClustering.ClusteringResult into DuckDB tables in connection.