Representative Periods with Tulipa Clustering
Introduction
Using representative periods is a simplification method to reduce the size of the problem. Instead of solving for every time period, the model solves for a few chosen representatives of the data. The original data is then reconstructed or approximated by blending the representatives.
Tulipa uses the package TulipaClustering.jl to choose representatives and cluster input data.
Set up the environment
Add the new packages:
using Pkg: Pkg
Pkg.activate(".")
Pkg.add(name="TulipaClustering", version="0.4.0")
Pkg.add("Distances")
Import packages:
import TulipaIO as TIO
import TulipaEnergyModel as TEM
import TulipaClustering as TC
using DuckDB
using DataFrames
using Plots
using Distances
Question: Do you remember how to install the two new libraries into your environment?
Set up the data
Let's now go to the repository and download the new files my-awesome-energy-system-lesson-4 from the following link: case studies github repo
Then load the data:
connection = DBInterface.connect(DuckDB.DB)
input_dir = "my-awesome-energy-system-lesson-4"
output_dir = "my-awesome-energy-system-results"
TIO.read_csv_folder(connection, input_dir)
Try to run the problem as usual:
TEM.populate_with_defaults!(connection)
energy_problem = TEM.run_scenario(connection; output_folder=output_dir)
Uh oh! It doesn't work. Why not?
ERROR: DataValidationException: The following issues were found in the data:
- Table 'rep_periods_data' expected but not found
- Table 'rep_periods_mapping' expected but not found
- Table 'timeframe_data' expected but not found
- Column 'is_milestone' is missing from table 'year_data'
Because we need the tables from the clustering!
Adding TulipaClustering
We need to produce representative period data from the base period data.
Splitting the Profile Data into Periods
Let's say we want to split the year into days, i.e., periods of length 24. TulipaClustering
provides two methods that can help: combine_periods!
combines existing periods into consequentive timesteps, and split_into_periods!
splits it back into periods of desired length:
period_duration = 24 # group data into days
profiles_df = TIO.get_table(connection, "profiles_periods")
TC.combine_periods!(profiles_df)
TC.split_into_periods!(profiles_df; period_duration)
Clustering the Data
We use find_representative_periods
to reduce the base periods to RPs. The method has two mandatory positional arguments:
- the profile dataframe,
- the number of representative periods you want to obtain.
You can also change two optional arguments (after a semicolon):
drop_incomplete_last_period
tells the algorithm how to treat the last period if it has fewer timesteps than the other ones (defaults tofalse
),method
clustering method (defaults to:k_means
),distance
a metric used to measure how different the datapoints are (defaults toSqEuclidean()
),
num_rep_periods = 7
method = :k_medoids # :k_means, :convex_hull, :convex_hull_with_null, :conical_hull
distance = Euclidean() # CosineDist()
clustering_result = TC.find_representative_periods(profiles_df, num_rep_periods; method, distance)
The clustering_result
contains some useful information:
profiles
is a dataframe with profiles for RPs,weight_matrix
is a matrix of weights of RPs in blended periods,clustering_matrix
andrp_matrix
are matrices of profile data for each base and representative period (useful to keep for the next step, but you should not need these unless you want to do some extra math here)auxiliary_data
contains some extra data that was generated during the clustering process and is generally not interesting to the user who is not planning to interact with the clustering method on a very low level.
Weight Fitting
After the clustering is done, each period is assigned to one representative period. We call this a "Dirac assignment" after the Dirac measure: a measure that is concentrated on one item (i.e., one base period is mapped into exactly one representative period).
TulipaClustering
supports blended weights for representative periods. To produce these, we use projected gradient descent. You don't need to know all the math behind it, but it has a few parameters that are useful to understand:
weight_type
can be:conical
(weights are positive),:conical_bounded
(weights are positive, add at most into one),:convex
(weights are positive, add into one),:dirac
(one unit weight and the rest are zeros). The order here is from less restrictive to more restrictive.tol
is the algorithm's tolerance. A tolerance of1e-2
means that weights are estimated up to two decimal places (e.g., something like0.15
).niters
andlearning_rate
tell for how many iterations to run the descent and by how much to adjust the weights in each iterations. More iterations make the method slower but produce better results. Larger learning rate makes the method converge faster but in a less stable manner (i.e., weights might start going up and down a lot from iteration to iteration). Sometimes you need to find the right balance for yourself. In general, if the weights produced by the method look strange, try decreasing the learning rate and/or increasing the number of iterations.
Now fit the weights:
weight_type = :dirac # :convex, :conical, :conical_bounded
tol = 1e-2
niters = 100
learning_rate = 0.001
TC.fit_rep_period_weights!(
clustering_result;
weight_type,
tol,
niters,
learning_rate,
)
Running the Model
To run the model, add the data to the system with TulipaIO
and then run it as usual:
TC.write_clustering_result_to_tables(connection, clustering_result)
TEM.populate_with_defaults!(connection)
energy_problem = TEM.run_scenario(connection; output_folder=output_dir)
Interpreting the Results
To plot the results, first read the data with TulipaIO
and filter what's needed (and rename time_block_start
to timestep
while you're at it):
flows = TIO.get_table(connection, "var_flow")
select!(
flows,
:from_asset,
:to_asset,
:year,
:rep_period,
:time_block_start => :timestep,
:solution
)
from_asset = "ccgt"
to_asset = "e_demand"
year = 2030
filtered_flow = filter(
row ->
row.from_asset == from_asset &&
row.to_asset == to_asset &&
row.year == year,
flows,
)
To reinterpret the RP data as base periods data, first create a new dataframe that contains both by using the inner join operation:
rep_periods_mapping = TIO.get_table(connection, "rep_periods_mapping")
df = innerjoin(filtered_flow, rep_periods_mapping, on=[:year, :rep_period])
Next, use Julia's Split-Apply-Combine approach to group the dataframe into smaller ones. Each grouped dataframe contains a single data point for one base period and all RPs it maps to. Then multiply the results by weights and add them up.
gdf = groupby(df, [:from_asset, :to_asset, :year, :period, :timestep])
result_df = combine(gdf, [:weight, :solution] => ((w, s) -> sum(w .* s)) => :solution)
Now you can plot the results. Remove the period data since you don't need it anymore, and re-sort the data to make sure it is in the right order.
TC.combine_periods!(result_df)
sort!(result_df, :timestep)
plot(
result_df.timestep,
result_df.solution;
label=string(from_asset, " -> ", to_asset),
xlabel="Hour",
ylabel="[MWh]",
marker=:circle,
markersize=2,
xlims=(1, 168),
dpi=600,
)
This concludes this tutorial! Play around with different parameters to see how the results change. For example, when you use :dirac
vs :convex
weights, do you see the difference? How does the solution change as you increase the number of RPs?
Troubleshooting
If you can't run up to the end, then check that you have the following information in the Project.toml
file in your repository, and then in your Julia REPL activate and instantiate the project.
Project.toml
:
[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
DuckDB = "d2f5444f-75bc-4fdf-ac35-56f514c445e1"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
TulipaClustering = "314fac8b-c762-4aa3-9d12-851379729163"
TulipaEnergyModel = "5d7bd171-d18e-45a5-9111-f1f11ac5d04d"
TulipaIO = "7b3808b7-0819-42d4-885c-978ba173db11"
[compat]
TulipaEnergyModel = "0.15.0"
Close your current REPL, open a new one, and type:
using Pkg: Pkg
Pkg.activate(".")
Pkg.resolve()
Pkg.instantiate()
The Script as a Whole
using Pkg
Pkg.activate(".")
# Pkg.add("TulipaEnergyModel")
# Pkg.add("TulipaIO")
Pkg.add("TulipaClustering") # NB: this is new; ask students if they remember how to install a package
# Pkg.add("DuckDB")
# Pkg.add("DataFrames")
# Pkg.add("Plots")
Pkg.add("Distances")
Pkg.instantiate()
import TulipaIO as TIO
import TulipaEnergyModel as TEM
import TulipaClustering as TC # NB: this is new
using DuckDB
using DataFrames
using Plots
using Distances # NB: this is new
connection = DBInterface.connect(DuckDB.DB)
input_dir = "my-awesome-energy-system-lesson-4"
output_dir = "my-awesome-energy-system-results"
TIO.read_csv_folder(connection, input_dir)
period_duration = 24
profiles_df = TIO.get_table(connection, "profiles_periods")
TC.combine_periods!(profiles_df)
TC.split_into_periods!(profiles_df; period_duration)
num_rep_periods = 2
method = :k_medoids # :k_means, :convex_hull, :convex_hull_with_null, :conical_hull
distance = Euclidean() # CosineDist()
clustering_result = TC.find_representative_periods(profiles_df, num_rep_periods; method, distance)
weight_type = :dirac # :convex, :conical, :conical_bounded
tol = 1e-2
niters = 100
learning_rate = 0.001
TC.fit_rep_period_weights!(
clustering_result;
weight_type,
tol,
niters,
learning_rate,
)
TC.write_clustering_result_to_tables(connection, clustering_result)
TEM.populate_with_defaults!(connection)
energy_problem = TEM.run_scenario(connection; output_folder=output_dir)
flows = TIO.get_table(connection, "var_flow")
select!(
flows,
:from_asset,
:to_asset,
:year,
:rep_period,
:time_block_start => :timestep,
:solution
)
from_asset = "ccgt"
to_asset = "e_demand"
year = 2030
filtered_flow = filter(
row ->
row.from_asset == from_asset &&
row.to_asset == to_asset &&
row.year == year,
flows,
)
rep_periods_mapping = TIO.get_table(connection, "rep_periods_mapping")
df = innerjoin(filtered_flow, rep_periods_mapping, on=[:year, :rep_period])
gdf = groupby(df, [:from_asset, :to_asset, :year, :period, :timestep])
result_df = combine(gdf, [:weight, :solution] => ((w, s) -> sum(w .* s)) => :solution)
TC.combine_periods!(result_df)
sort!(result_df, :timestep)
plot(
result_df.timestep,
result_df.solution;
label=string(from_asset, " -> ", to_asset),
xlabel="Hour",
ylabel="[MWh]",
marker=:circle,
markersize=2,
xlims=(1, 168),
dpi=600,
)
Working with the New Tables Created by TulipaClustering
You can check the new tables with TulipaIO, for example:
TIO.get_table(connection,"rep_periods_mapping")
If you want to save the intermediary tables created by the clustering, you can do this with DuckDB:
DuckDB.execute(
connection,
"COPY 'profiles_rep_periods' TO 'profiles-rep-periods.csv' (HEADER, DELIMITER ',')",
)
The new tables are:
- profilesrepperiods
- repperiodsdata
- repperiodsmapping
- timeframe_data
This is useful when you don't have to rerun the clustering every time.