Reference
Contents
Index
TulipaClustering.AssetProfiles
TulipaClustering.AuxiliaryClusteringData
TulipaClustering.ClusteringResult
TulipaClustering.append_period_from_source_df_as_rp!
TulipaClustering.combine_periods!
TulipaClustering.df_to_matrix_and_keys
TulipaClustering.find_auxiliary_data
TulipaClustering.find_period_weights
TulipaClustering.find_representative_periods
TulipaClustering.fit_rep_period_weights!
TulipaClustering.fit_rep_period_weights!
TulipaClustering.matrix_and_keys_to_df
TulipaClustering.project_onto_nonnegative_orthant
TulipaClustering.project_onto_simplex
TulipaClustering.projected_subgradient_descent!
TulipaClustering.read_clustering_data_from_csv_folder
TulipaClustering.read_csv_with_schema
TulipaClustering.split_into_periods!
TulipaClustering.validate_df_and_find_key_columns
TulipaClustering.weight_matrix_to_df
TulipaClustering.write_clustering_result_to_csv_folder
TulipaClustering.write_csv_with_prefixes
TulipaClustering.AssetProfiles
— TypeSchema for the input assets-profiles.csv
file.
TulipaClustering.AuxiliaryClusteringData
— TypeStructure to hold the time series used in clustering together with some summary statistics on the data.
TulipaClustering.ClusteringResult
— TypeStructure to hold the clustering result.
TulipaClustering.append_period_from_source_df_as_rp!
— Methodappend_period_from_source_df_as_rp!(df; source_df, period, rp, key_columns)
Extracts a period with index period
from source_df
and appends it as a representative period with index rp
to df
, using key_columns
as keys.
Examples
julia> source_df = DataFrame([:period => [1, 1, 2, 2], :time_step => [1, 2, 1, 2], :a .=> "b", :value => 5:8])
4×4 DataFrame
Row │ period time_step a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────
1 │ 1 1 b 5
2 │ 1 2 b 6
3 │ 2 1 b 7
4 │ 2 2 b 8
julia> df = DataFrame([:rep_period => [1, 1, 2, 2], :time_step => [1, 2, 1, 2], :a .=> "a", :value => 1:4])
4×4 DataFrame
Row │ rep_period time_step a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 2 1 a 3
4 │ 2 2 a 4
julia> TulipaClustering.append_period_from_source_df_as_rp!(df; source_df, period = 2, rp = 3, key_columns = [:time_step, :a])
6×4 DataFrame
Row │ rep_period time_step a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 2 1 a 3
4 │ 2 2 a 4
5 │ 3 1 b 7
6 │ 3 2 b 8
TulipaClustering.combine_periods!
— Methodcombine_periods!(df)
Modifies a dataframe df
by combining the columns time_step
and period
into a single column time_step
of global time steps. The period duration is inferred automatically from the maximum time step value, assuming that periods start with time step 1.
Examples
julia> df = DataFrame([:period => [1, 1, 2], :time_step => [1, 2, 1], :value => 1:3])
3×3 DataFrame
Row │ period time_step value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 2 1 3
julia> TulipaClustering.combine_periods!(df)
3×2 DataFrame
Row │ time_step value
│ Int64 Int64
─────┼──────────────────
1 │ 1 1
2 │ 2 2
3 │ 3 3
TulipaClustering.df_to_matrix_and_keys
— Methoddf_to_matrix_and_keys(df, key_columns)
Converts a dataframe df
(in a long format) to a matrix, ignoring the columns specified as key_columns
. The key columns are converted from long to wide format and returned alongside the matrix.
Examples
julia> df = DataFrame([:period => [1, 1, 2, 2], :time_step => [1, 2, 1, 2], :a .=> "a", :value => 1:4])
4×4 DataFrame
Row │ period time_step a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 2 1 a 3
4 │ 2 2 a 4
julia> m, k = TulipaClustering.df_to_matrix_and_keys(df, [:time_step, :a]); m
2×2 Matrix{Float64}:
1.0 3.0
2.0 4.0
julia> k
2×2 DataFrame
Row │ time_step a
│ Int64 String
─────┼───────────────────
1 │ 1 a
2 │ 2 a
TulipaClustering.find_auxiliary_data
— Methodfind_auxiliary_data(clustering_data)
Calculates auxiliary data associated with the clustering_data
. These include:
key_columns_demand
: key columns in the demand dataframekey_columns_generation_availability
: key columns in the generation availability dataframeperiod_duration
: duration of time periods (in time steps)last_period_duration
: duration of the last periodn_periods
: total number of periods
TulipaClustering.find_period_weights
— Methodfind_period_weights(period_duration, last_period_duration, n_periods, drop_incomplete_periods)
Finds weights of two different types of periods in the clustering data:
- complete periods: these are all of the periods with length equal to
period_duration
. - incomplete last period: if last period duration is less than
period_duration
, it is incomplete.
TulipaClustering.find_representative_periods
— Methodfindrepresentativeperiods( clusteringdata; nrp = 10, rescaledemanddata = true, dropincompletelastperiod = false, method = :kmeans, distance = SqEuclidean(), args..., )
Finds representative periods via data clustering.
clustering_data
: the data to perform clustering on.n_rp
: number of representative periods to find.rescale_demand_data
: iftrue
, demands are first divided by the maximum demand value, so that they are between zero and one like the generation availability datadrop_incomplete_last_period
: controls how the last period is treated if it is not complete: if this parameter is set totrue
, the incomplete period is dropped and the weights are rescaled accordingly; otherwise, clustering is done forn_rp - 1
periods, and the last period is added as a special shorter representative periodmethod
: clustering method to use, either:k_means
and:k_medoids
distance
: semimetric used to measure distance between data points.- other named arguments can be provided; they are passed to the clustering method.
TulipaClustering.fit_rep_period_weights!
— Methodfitrepperiodweights!(weightmatrix, clusteringmatrix, rpmatrix; weight_type, tol, args...)
Given the initial weight guesses, finds better weights for convex or conical combinations of representative periods. For conical weights, it is possible to bound the total weight by one.
The arguments:
clustering_result
: the result of runningTulipaClustering.find_representative_periods
weight_type
: the type of weights to find; possible values are::convex
: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one):conical
: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights):conical_bounded
: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
tol
: algorithm's tolerance; when the weights are adjusted by a value less then or equal totol
, they stop being fitted further.- other arguments control the projected subgradient method; they are passed through to
TulipaClustering.projected_subgradient_descent!
.
TulipaClustering.fit_rep_period_weights!
— Methodfitrepperiodweights!(weightmatrix, clusteringmatrix, rpmatrix; weight_type, tol, args...)
Given the initial weight guesses, finds better weights for convex or conical combinations of representative periods. For conical weights, it is possible to bound the total weight by one.
The arguments:
weight_matrix
: the initial guess for weights; the weights are adjusted using a projected subgradient descent methodclustering_matrix
: the matrix of raw clustering datarp_matrix
: the matrix of raw representative period dataweight_type
: the type of weights to find; possible values are::convex
: each period is represented as a convex sum of the representative periods (a sum with nonnegative weights adding into one):conical
: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights):conical_bounded
: each period is represented as a conical sum of the representative periods (a sum with nonnegative weights) with the total weight bounded from above by one.
tol
: algorithm's tolerance; when the weights are adjusted by a value less then or equal totol
, they stop being fitted further.show_progress
: iftrue
, a progress bar will be displayed.- other arguments control the projected subgradient method; they are passed through to
TulipaClustering.projected_subgradient_descent!
.
TulipaClustering.matrix_and_keys_to_df
— Methodmatrix_and_keys_to_df(matrix, keys)
Converts a a matrix matrix
to a dataframe, appending the key columns given by keys
.
Examples
julia> m = [1.0 3.0; 2.0 4.0]
2×2 Matrix{Float64}:
1.0 3.0
2.0 4.0
julia> k = DataFrame([:time_step => 1:2, :a .=> "a"])
2×2 DataFrame
Row │ time_step a
│ Int64 String
─────┼───────────────────
1 │ 1 a
2 │ 2 a
julia> TulipaClustering.matrix_and_keys_to_df(m, k)
4×4 DataFrame
Row │ rep_period time_step a value
│ Int64 Int64 String Float64
─────┼────────────────────────────────────────
1 │ 1 1 a 1.0
2 │ 1 2 a 2.0
3 │ 2 1 a 3.0
4 │ 2 2 a 4.0
TulipaClustering.project_onto_nonnegative_orthant
— Methodprojectontononnegative_orthant(vector)
Projects vector
onto the nonnegative_orthant. This projection is trivial: replace negative components of the vector with zeros.
TulipaClustering.project_onto_simplex
— Methodprojectontosimplex(vector)
Projects vector
onto a unit simplex using Michelot's algorithm in Condat's accelerated implementation (2017). See Figure 2 of Condat, L. Fast projection onto the simplex and the ball. Math. Program. 158, 575–585 (2016).. For the details on the meanings of v, ṽ, ρ and other variables, see the original paper.
TulipaClustering.projected_subgradient_descent!
— Methodprojectedsubgradientdescent!(x; gradient, projection, niters, rtol, learningrate, adaptivegrad)
Fits x
using the projected gradient descent scheme.
The arguments:
x
: the value to fitsubgradient
: the subgradient operator, that is, a function that takes vectors of the same shape asx
as inputs and returns a subgradient of the loss at that point; the fitting is done to minimize the corresponding implicit lossprojection
: the projection operator, that is, a function that, given a vectorx
, finds a point within some subspace that is closest tox
niters
: maximum number of projected gradient descent iterationstol
: tolerance; when no components ofx
improve by more thantol
, the algorithm stopslearning_rate
: learning rate of the algorithmadaptive_grad
: if true, the learning rate is adjusted using the adaptive gradient method, see John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 12, null (2/1/2011), 2121–2159.
TulipaClustering.read_clustering_data_from_csv_folder
— Methodread_clustering_data_from_csv_folder(input_folder)
Returns the data frame with all of the needed data from the input_folder
.
assets-profiles.csv
should exist in the directory, following the TulipaClustering.AssetProfiles
specification.
TulipaClustering.read_csv_with_schema
— Methodread_csv_with_schema(file_path, schema)
Reads the csv with file_name at location path validating the data using the schema. It is assumes that the file's header is at the second row. The first row of the file contains some metadata information that is not used.
TulipaClustering.split_into_periods!
— Methodsplit_into_periods!(df; period_duration=nothing)
Modifies a dataframe df
by separating the column time_step
into periods of length period_duration
. The new data is written into two columns:
period
: the period ID;time_step
: the time step within the current period.
If period_duration
is nothing
, then all of the time steps are within the same period with index 1.
Examples
julia> df = DataFrame([:time_step => 1:4, :value => 5:8])
4×2 DataFrame
Row │ time_step value
│ Int64 Int64
─────┼──────────────────
1 │ 1 5
2 │ 2 6
3 │ 3 7
4 │ 4 8
julia> TulipaClustering.split_into_periods!(df; period_duration=2)
4×3 DataFrame
Row │ period time_step value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 5
2 │ 1 2 6
3 │ 2 1 7
4 │ 2 2 8
julia> df = DataFrame([:period => [1, 1, 2], :time_step => [1, 2, 1], :value => 1:3])
3×3 DataFrame
Row │ period time_step value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 2 1 3
julia> TulipaClustering.split_into_periods!(df; period_duration=1)
3×3 DataFrame
Row │ period time_step value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 1
2 │ 2 1 2
3 │ 3 1 3
julia> TulipaClustering.split_into_periods!(df)
3×3 DataFrame
Row │ period time_step value
│ Int64 Int64 Int64
─────┼──────────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 1 3 3
TulipaClustering.validate_df_and_find_key_columns
— Methodvalidate_df_and_find_key_columns(df)
Checks that dataframe df
contains the necessary columns and returns a list of columns that act as keys (i.e., unique data identifiers within different periods).
Examples
julia> df = DataFrame([:period => [1, 1, 2], :time_step => [1, 2, 1], :a .=> "a", :value => 1:3])
3×4 DataFrame
Row │ period time_step a value
│ Int64 Int64 String Int64
─────┼──────────────────────────────────
1 │ 1 1 a 1
2 │ 1 2 a 2
3 │ 2 1 a 3
julia> TulipaClustering.validate_df_and_find_key_columns(df)
2-element Vector{Symbol}:
:time_step
:a
julia> df = DataFrame([:value => 1])
1×1 DataFrame
Row │ value
│ Int64
─────┼───────
1 │ 1
julia> TulipaClustering.validate_df_and_find_key_columns(df)
ERROR: DomainError with 1×1 DataFrame
Row │ value
│ Int64
─────┼───────
1 │ 1:
DataFrame must contain columns `time_step` and `value`
TulipaClustering.weight_matrix_to_df
— Methodweightmatrixto_df(weights)
Converts a weight matrix from a (sparse) matrix, which is more convenient for internal computations, to a dataframe, which is better for saving into a file. Zero weights are dropped to avoid cluttering the dataframe.
TulipaClustering.write_clustering_result_to_csv_folder
— Methodwrite_clustering_result_to_csv_folder(output_folder, clustering_result)
Writes a TulipaClustering.ClusteringResult
to CSV files in the output_folder
.
TulipaClustering.write_csv_with_prefixes
— Methodwrite_csv_with_prefixes(file_path, df; prefixes)
Writes the dataframe df
into a csv file at file_path
. If prefixes
are provided, they are written above the column names. For example, these prefixes can contain metadata describing the columns.